[LibreQoS] Fwd: CGNAT growing pains
Dave Taht
dave.taht at gmail.com
Tue Oct 8 15:40:27 EDT 2024
---------- Forwarded message ---------
From: C. Jon Larsen <jlarsen at richweb.com>
Date: Tue, Oct 8, 2024 at 12:34 PM
Subject: Re: CGNAT growing pains
To: Jon Lewis <jlewis at lewis.org>
Cc: <nanog at nanog.org>
We have had very good success with A10 vthunder on rural broadband
co-op networks for Resi subscribers. No problems with the NAT aspect,
literally 0. Operationally it just works. Games, streaming, xbox,
nintendo switch, all just works.
We typically do 32:1 or about 2000 udp/tcp ports allocated per
customer behind the A10. The closer you climb to 48:1 64:1 128:1 etc
the ratio of CDN blocking b/c "you are behind a vpn" starts to go up
noticeably.
If you have your LIDs (what A10 calls the inside ips that get mapped
to nat pools) setup properly and your inside CGN 100.64/10 ip space
sanely laid out its pretty easy. You can carve out pools for each
market (say a couple of /21s or a /19) and map that to a pool of
public ips accordingly and then in your self hosted geofeed lay out
that block with the correct data.
We try to give all business customers a /32 public ip either from dhcp
reservation or static assignment on an evpn subnet so business
customers would not get CGN ips typically. Also encourage them to
enable v6 and get that setup where possible.
> We started rolling out CGNAT about 6 months ago. It was smooth sailing
for
> the first few months, but we eventually did run into a number of issues.
>
> Our customer base is primarily FTTH with "dynamic" IP assignment via
DHCP.
> Since connections are always-on, customer ONTs/routers get an IP
assigned,
> and then when the lease is renewed, they request a new lease for the
existing
> IP, and, in general, that request is granted. This gives customers the
> mistaken impression they have a static IP. So, my impression, from
working
> with some customers who've needed to be moved from CGNAT back to public
IP is
> that customers who are doing port-forwarding don't even bother with
dynamic
> DNS. They just know they can connect to their IP as they've never seen
it
> change. We do offer/sell static IP, but pre-CGNAT, it was strictly for
> business customers. i.e. A residential customer could only get static IP
> service by converting their account to a business account. That may
change in
> the near future.
>
> One issue we didn't foresee has been IP Geo issues. i.e. We all knew
that
> streaming services like Netflix use IP Geo to determine what content
should
> be made available, but that's, AFAIK, limited by country or region. What
we
> didn't anticipate is services like Hulu Live TV doing IP Geo down to the
city
> level to determine which local channels are a subscriber's local
channels.
> We're using Juniper MX gear and SPC3 cards for our CGNAT routers, each
one
> having a single large external pool. Since we serve most of FL, one
external
> pool can't IP Geo correctly for customers as far apart as Miami and
> Jacksonville hitting the same CGNAT router. We don't currently have an
> acceptable solution to this other than moving impacted customers off
CGNAT.
>
> One of the great unknowns (at least for us) with CGNAT was what our PBA
> settings should be. i.e. How large each port-block should be, and how
many
> port-blocks to allow per customer. We started with 256x4. It seemed to
> work. We eventually noticed that we were logging port-block exceeded
errors.
> This is one aspect where Juniper's CGNAT support is lacking. There's a
> counter for these errors, and it's available via SNMP, but there's no way
to
> attribute the errors to subscriber IPs. We're polling the mib and
graphing
> it, so we know it's a continuing issue and can see when it's incrementing
> faster/slower, but Junos provides no means for determining if "PBEs" are
all
> being caused by a single customer, a handful of customers, etc. We have
a
> JTAC case open on this. As a quick & hopeful fix, we both increased the
> port-block size and block limit. That helped, but didn't stop the
errors.
> It also cut our CGNAT ratio by more than half (64:1 -> 28:1), if we stay
at
> this ratio, we'll need much larger external pools than originally
> anticipated. Tuning these settings is kind of painful as JTAC strongly
> recommends bouncing the CGNAT service anytime CGNAT related config
changes
> are made. This means briefly breaking Internet access for all CGNAT'd
> customers. For the PBEs, JTAC's suggestions so far have been to shorten
some
> of the timeouts in the config and to keep doing what we're doing, which
is a
> cron job that essentially does a "show services nat source port-block",
> parses the output looking for subscriber IPs that have used up the ports
in
> several of their port-blocks, then does a "show services sessions
> source-prefix ..." and logs all of this. This at least gives us
snapshots of
> "who's a heavy user right now" and lets us look at how they were using
all
> their ports. i.e. was it bittorent, are they compromised and scanning
the
> internet for more systems to compromise, is it legit looking traffic -
just
> lots of it, etc.?
>
> The latest CGNAT issue is a customer with a Palo Alto Networks firewall
> connected to our network and several of their employees are our FTTH
> customers. On their PANW firewall, they're doing IP Geo based filtering,
> limiting access to internal servers to "US IPs". Since we only CGNAT
traffic
> to the external Internet, their on-net employees hit the firewall from
their
> 100.64/10 IPs and get blocked. I suggested they whitelist 100.64/10,
saying
> we block traffic from 100.64/10 from entering our network via peering and
> transit, so they can be assured anything from 100.64/10 came from inside
our
> network / our customers. They say the firewall won't let them whitelist
> 100.64.0.0/10, giving an error that it's invalid IP space.
>
> I know we're not the first to implement CGNAT, so I'm curious if others
have
> run into these sorts of issues, or others we haven't run into yet, and if
so,
> how you solved them.
>
>
> ----------------------------------------------------------------------
> Jon Lewis, MCP :) | I route
> Blue Stream Fiber, Sr. Neteng | therefore you are
> _________ http://www.lewis.org/~jlewis/pgp for PGP public key_________
>
>
--
Dave Täht CSO, LibreQos
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.bufferbloat.net/pipermail/libreqos/attachments/20241008/2a895a46/attachment.html>
More information about the LibreQoS
mailing list