[Bloat] Fwd: Inform: M-Labs, ~12% of network identifications incorrect

Dave Taht dave.taht at gmail.com
Fri Mar 3 09:23:46 EST 2023


---------- Forwarded message ---------
From: 'Livingood, Jason' via National Broadband Mapping Coalition
<BBCoalition at marconisociety.org>
Date: Fri, Mar 3, 2023 at 5:40 AM
Subject: Inform: M-Labs, ~12% of network identifications incorrect
To: National Broadband Mapping Coalition <bbcoalition at marconisociety.org>


FYI for those folks using data from M-Labs.



From: 'Stephen Soltesz' via discuss <discuss at measurementlab.net>
Reply-To: Stephen Soltesz <soltesz at google.com>
Date: Thursday, March 2, 2023 at 15:50
To: discuss <discuss at measurementlab.net>
Subject: [EXTERNAL] [M-Lab-Discuss] 8-12% missing or misattributed
network annotations between 2020-03-10 and 2023-02-09



This only affects “client.Network” (e.g. ASN and ASName) annotations
on M-Lab data collected between 2020-03-10 and 2023-02-09. The
“client.Geo” (e.g. Latitude, Longitude, SubDivsion1ISOCode, City and
Country) annotations are not affected. We are working to correct these
annotations by early April or sooner.



Impact

The network annotations on all data collected between 2020-03-10 and
2023-02-09 may be incorrect. We estimate ~7-10% are missing and ~1-2%
are attributed to an incorrect (larger) network address block. These
incorrect annotations were not random and depend on the client IP
being annotated. So, if a client IP was annotated incorrectly, it
would continue to receive an incorrect annotation.



We deployed a fix for new annotations on 2023-02-09. So, all data
collected since 2023-02-10 will be correct. We are working on a plan
to repair the historical network annotations between 2020-03-10 and
2023-02-09.



Unfortunately, until the historical data is reprocessed we will not
know precisely which historical annotations are incorrect. We cannot
identify present-but-incorrect annotations until we recreate the
annotation correctly. For aggregate analysis using the ASNs, you
should expect ~1-2% errors. For analysis targeting specific networks
and depending on the ASN annotations, the impact is harder to quantify
and could be much higher.



Context

In 2020-03-10, M-Lab introduced a measurement annotation process
(uuid-annotator) that runs at measurement-time on nodes rather than
during post-processing by the data pipeline. This architectural change
decoupled the collection of annotations from the need to archive
client IP addresses.



However, we recently discovered that the percentage of missing
annotations was unexpectedly high, ~10%. After further investigation,
we discovered a fundamental bug in the uuid-annotator's network
annotations that resulted in both the missing annotations and the
potential for misattributed annotations. Based on a prototype
reprocessor, we estimate that between 1-2% of annotations are
annotated with incorrect ASNs because a shorter network prefix was
chosen over a correct longer prefix, e.g. 12.0.0.0/8 vs 12.a.b.0/24.



Repair

Because the annotation and hopannotation1 datatypes are collected at
measurement-time without the client IP, these annotations were
originally intended to be created once and loaded directly into
BigQuery by the data pipeline. Recreating the annotation was not part
of the original design. So, to repair these network annotations we
must build a new data processing utility to recreate the annotation
archives and reprocess them with the existing pipeline.



We estimate this work will take four to six weeks, ideally early April.



More information and updates will be added here:

m-lab/data-annotations#34: 8-12% Missing or incorrect Network
annotations 2020-03-10 to 2023-02-09



Please let us know if you have any questions or concerns.

-- 
You received this message because you are subscribed to the Google
Groups "discuss" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to discuss+unsubscribe at measurementlab.net.
To view this discussion on the web visit
https://groups.google.com/a/measurementlab.net/d/msgid/discuss/127eeac8-b5f3-4bdb-a462-a22f268c1fa6n%40measurementlab.net.

--
You received this message because you are subscribed to the Google
Groups "National Broadband Mapping Coalition" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to BBCoalition+unsubscribe at marconisociety.org.
To view this discussion on the web visit
https://groups.google.com/a/marconisociety.org/d/msgid/BBCoalition/05C1CD2B-D7AF-4F2B-952B-37D009792530%40cable.comcast.com.


-- 
A pithy note on VOQs vs SQM: https://blog.cerowrt.org/post/juniper/
Dave Täht CEO, TekLibre, LLC


More information about the Bloat mailing list