<div dir="ltr">couple things on CGNAT.  We never do less than 1000 ports per IP.  That seems to be the limit for having general problems.  Dialing back TCP timeouts to 5-10 minutes also helps, any shorter than that and people report issues with some security cameras etc because their keep alives are longer.  customers per IP is irrelevant because you run out of ports with 1000-2000 per IP before any other practical limits hit.  This was our primary issue with CGNAT for a while, connections hanging on for 24 hours and customers running out of ports.<br><br>We do a hairpin nat config on the head end so customers can talk to each other on the public IP.<br><br>The primary issue we see is when some common business between subscribers, say a local hospital that has work-from-home people, blocks the /24 because of failed login attempts and that hits everyone in the CGNAT pool and they can't RDP to the workplace.  RDP being stupid insecure these places have tried to short it up with whitelists but that's their flaw that we have to deal with.<br><br>When we get new IPs they are often geocoded elsewhere which causes some issues I have to chase down.  We keep CGNAT pools very localized because we are multi-head-end and multi-homed, no one exits our network very far from where they really are.<br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, Oct 8, 2024 at 1:40 PM Dave Taht via LibreQoS <<a href="mailto:libreqos@lists.bufferbloat.net">libreqos@lists.bufferbloat.net</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><br><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">---------- Forwarded message ---------<br>From: <strong class="gmail_sendername" dir="auto">C. Jon Larsen</strong> <span dir="auto"><<a href="mailto:jlarsen@richweb.com" target="_blank">jlarsen@richweb.com</a>></span><br>Date: Tue, Oct 8, 2024 at 12:34 PM<br>Subject: Re: CGNAT growing pains<br>To: Jon Lewis <<a href="mailto:jlewis@lewis.org" target="_blank">jlewis@lewis.org</a>><br>Cc:  <<a href="mailto:nanog@nanog.org" target="_blank">nanog@nanog.org</a>><br></div><br><br><br>

We have had very good success with A10 vthunder on rural broadband<br>

co-op networks for Resi subscribers. No problems with the NAT aspect,<br>

literally 0. Operationally it just works. Games, streaming, xbox,<br>

nintendo switch, all just works.<br>

<br>

We typically do 32:1 or about 2000 udp/tcp ports allocated per<br>

customer behind the A10. The closer you climb to 48:1 64:1 128:1 etc<br>

the ratio of CDN blocking b/c "you are behind a vpn" starts to go up<br>

noticeably.<br>

<br>

If you have your LIDs (what A10 calls the inside ips that get mapped<br>

to nat pools) setup properly and your inside CGN 100.64/10 ip space<br>

sanely laid out its pretty easy. You can carve out pools for each<br>

market (say a couple of /21s or a /19) and map that to a pool of<br>

public ips accordingly and then in your self hosted geofeed lay out<br>

that block with the correct data.<br>

<br>

We try to give all business customers a /32 public ip either from dhcp<br>

reservation or static assignment on an evpn subnet so business<br>

customers would not get CGN ips typically. Also encourage them to<br>

enable v6 and get that setup where possible.<br>

<br>

> We started rolling out CGNAT about 6 months ago.  It was smooth sailing for <br>

> the first few months, but we eventually did run into a number of issues.<br>

><br>

> Our customer base is primarily FTTH with "dynamic" IP assignment via DHCP. <br>

> Since connections are always-on, customer ONTs/routers get an IP assigned, <br>

> and then when the lease is renewed, they request a new lease for the existing <br>

> IP, and, in general, that request is granted.  This gives customers the <br>

> mistaken impression they have a static IP.  So, my impression, from working <br>

> with some customers who've needed to be moved from CGNAT back to public IP is <br>

> that customers who are doing port-forwarding don't even bother with dynamic <br>

> DNS.  They just know they can connect to their IP as they've never seen it <br>

> change.  We do offer/sell static IP, but pre-CGNAT, it was strictly for <br>

> business customers.  i.e. A residential customer could only get static IP <br>

> service by converting their account to a business account. That may change in <br>

> the near future.<br>

><br>

> One issue we didn't foresee has been IP Geo issues.  i.e.  We all knew that <br>

> streaming services like Netflix use IP Geo to determine what content should <br>

> be made available, but that's, AFAIK, limited by country or region. What we <br>

> didn't anticipate is services like Hulu Live TV doing IP Geo down to the city <br>

> level to determine which local channels are a subscriber's local channels. <br>

> We're using Juniper MX gear and SPC3 cards for our CGNAT routers, each one <br>

> having a single large external pool.  Since we serve most of FL, one external <br>

> pool can't IP Geo correctly for customers as far apart as Miami and <br>

> Jacksonville hitting the same CGNAT router.  We don't currently have an <br>

> acceptable solution to this other than moving impacted customers off CGNAT.<br>

><br>

> One of the great unknowns (at least for us) with CGNAT was what our PBA <br>

> settings should be.  i.e.  How large each port-block should be, and how many <br>

> port-blocks to allow per customer.  We started with 256x4.  It seemed to <br>

> work.  We eventually noticed that we were logging port-block exceeded errors. <br>

> This is one aspect where Juniper's CGNAT support is lacking. There's a <br>

> counter for these errors, and it's available via SNMP, but there's no way to <br>

> attribute the errors to subscriber IPs.  We're polling the mib and graphing <br>

> it, so we know it's a continuing issue and can see when it's incrementing <br>

> faster/slower, but Junos provides no means for determining if "PBEs" are all <br>

> being caused by a single customer, a handful of customers, etc.  We have a <br>

> JTAC case open on this.  As a quick & hopeful fix, we both increased the <br>

> port-block size and block limit.  That helped, but didn't stop the errors. <br>

> It also cut our CGNAT ratio by more than half (64:1 -> 28:1), if we stay at <br>

> this ratio, we'll need much larger external pools than originally <br>

> anticipated.  Tuning these settings is kind of painful as JTAC strongly <br>

> recommends bouncing the CGNAT service anytime CGNAT related config changes <br>

> are made.  This means briefly breaking Internet access for all CGNAT'd <br>

> customers.  For the PBEs, JTAC's suggestions so far have been to shorten some <br>

> of the timeouts in the config and to keep doing what we're doing, which is a <br>

> cron job that essentially does a "show services nat source port-block", <br>

> parses the output looking for subscriber IPs that have used up the ports in <br>

> several of their port-blocks, then does a "show services sessions <br>

> source-prefix ..." and logs all of this.  This at least gives us snapshots of <br>

> "who's a heavy user right now" and lets us look at how they were using all <br>

> their ports.  i.e. was it bittorent, are they compromised and scanning the <br>

> internet for more systems to compromise, is it legit looking traffic - just <br>

> lots of it, etc.?<br>

><br>

> The latest CGNAT issue is a customer with a Palo Alto Networks firewall <br>

> connected to our network and several of their employees are our FTTH <br>

> customers.  On their PANW firewall, they're doing IP Geo based filtering, <br>

> limiting access to internal servers to "US IPs".  Since we only CGNAT traffic <br>

> to the external Internet, their on-net employees hit the firewall from their <br>

> 100.64/10 IPs and get blocked.  I suggested they whitelist 100.64/10, saying <br>

> we block traffic from 100.64/10 from entering our network via peering and <br>

> transit, so they can be assured anything from 100.64/10 came from inside our <br>

> network / our customers.  They say the firewall won't let them whitelist <br>

> <a href="http://100.64.0.0/10" rel="noreferrer" target="_blank">100.64.0.0/10</a>, giving an error that it's invalid IP space.<br>

><br>

> I know we're not the first to implement CGNAT, so I'm curious if others have <br>

> run into these sorts of issues, or others we haven't run into yet, and if so, <br>

> how you solved them.<br>

><br>

><br>

> ----------------------------------------------------------------------<br>

> Jon Lewis, MCP :)              |  I route<br>

> Blue Stream Fiber, Sr. Neteng  |  therefore you are<br>

> _________ <a href="http://www.lewis.org/~jlewis/pgp" rel="noreferrer" target="_blank">http://www.lewis.org/~jlewis/pgp</a> for PGP public key_________<br>

><br>

><br>

</div><br clear="all"><div><br></div><span class="gmail_signature_prefix">-- </span><br><div dir="ltr" class="gmail_signature"><div dir="ltr"><div>Dave Täht CSO, LibreQos<br></div></div></div></div>

_______________________________________________<br>

LibreQoS mailing list<br>

<a href="mailto:LibreQoS@lists.bufferbloat.net" target="_blank">LibreQoS@lists.bufferbloat.net</a><br>

<a href="https://lists.bufferbloat.net/listinfo/libreqos" rel="noreferrer" target="_blank">https://lists.bufferbloat.net/listinfo/libreqos</a><br>

</blockquote></div>