IMC '23: Proceedings of the 2023 ACM on Internet Measurement Conference

Full Citation in the ACM Digital Library

SESSION: Replication

Replication: Towards a Publicly Available Internet Scale IP Geolocation Dataset

IP geolocation is one of the most widely used forms of metadata for IP addresses, and despite almost twenty years of effort from the research community, the reality is that there is no accurate, complete, up-to-date, and explainable publicly available dataset for IP geolocation. We argue that a central reason for this state of affairs is the impressive results from prior publications, both in terms of accuracy and coverage: up to street level accuracy and locating millions of IP addresses with a few hundred vantage points in months. We believe the community would substantially benefit from a public baseline dataset and code. To encourage future research in IP geolocation, we replicate two geolocation techniques and evaluate their accuracy and coverage. We show that we can neither use the first technique to obtain the previously claimed street level accuracy, nor the second to geolocate millions of IP addresses on today's Internet and with publicly available measurement infrastructure. In addition to this reappraisal, we re-evaluate the fundamental insights that led to these prior results, as well as provide new insights and recommendations to help the design of future geolocation techniques. All of our code and data are publicly available to support reproducibility.

Replication: 20 Years of Inferring Interdomain Routing Policies

In 2003, Wang and Gao [67] presented an algorithm to infer and characterize routing policies as this knowledge could be valuable in predicting and debugging routing paths. They used their algorithm to measure the phenomenon of selectively announced prefixes, in which, ASes would announce their prefixes to specific providers to manipulate incoming traffic. Since 2003, the Internet has evolved from a hierarchical graph, to a flat and dense structure. Despite 20 years of extensive research since that seminal work, the impact of these topological changes on routing policies is still blurred.

In this paper we conduct a replicability study of the Wang and Gao paper [67], to shed light on the evolution and the current state of selectively announced prefixes. We show that selective announcements are persistent, not only across time, but also across networks. Moreover, we observe that neighbors of different AS relationships may be assigned with the same local preference values, and path selection is not as heavily dependent on AS relationships as it used to be. Our results highlight the need for BGP policy inference to be conducted as a high-periodicity process to account for the dynamic nature of AS connectivity and the derived policies.

Replication: "When to Use and When Not to Use BBR"

We replicate the paper, "When to Use and When Not to Use BBR: An Empirical Analysis and Evaluation Study" by Cao et al, published in IMC 2019 [2], with a focus on the relative goodput of TCP BBR and TCP CUBIC for a range of bottleneck buffer sizes, bandwidths, and delays. We replicate the experiments performed by the original authors on two large-scale open-access testbeds, to validate the conclusions of the paper. We further extend the experiments to BBRv2. We package the experiment artifacts and make them publicly available so that others can repeat and build on this work.

Replication: Contrastive Learning and Data Augmentation in Traffic Classification Using a Flowpic Input Representation

Over the last years we witnessed a renewed interest toward Traffic Classification (TC) captivated by the rise of Deep Learning (DL). Yet, the vast majority of TC literature lacks code artifacts, performance assessments across datasets and reference comparisons against Machine Learning (ML) methods. Among those works, a recent study from IMC'22 [16] is worth of attention since it adopts recent DL methodologies (namely, few-shot learning, self-supervision via contrastive learning and data augmentation) appealing for networking as they enable to learn from a few samples and transfer across datasets. The main result of [16] on the UCDAVIS, ISCXVPN and ISCXTOR datasets is that, with such DL methodologies, 100 input samples are enough to achieve very high accuracy using an input representation called "flowpic'' (i.e., a per-flow 2d histograms of the packets size evolution over time).

In this paper (i) we reproduce[16] on the same datasets and (ii) we replicate its most salient aspect (the importance of data augmentation) on three additional public datasets (MIRAGEA, MIRAGEB and UTMOBILENET). While we confirm most of the original results, we also found a ≈ 20% accuracy drop on some of the investigated scenarios due to a data shift in the original dataset that we uncovered. Additionally, our study validates that the data augmentation strategies studied in[16] perform well on other datasets too. In the spirit of reproducibility and replicability we make all artifacts (code and data) available to the research community at

SESSION: Routing

On the Importance of Being an AS: An Approach to Country-Level AS Rankings

Recent geopolitical events demonstrate that control of Internet infrastructure in a region is critical to economic activity and defense against armed conflict. This geopolitical importance necessitates novel empirical techniques to assess which countries remain susceptible to degraded or severed Internet connectivity because they rely heavily on networks based in other nation states. Currently, two preeminent BGP-based methods exist to identify influential or market-dominant networks on a global scale-network-level customer cone size and path hegemony-but these metrics fail to capture regional or national differences.

We adapt the two global metrics to capture country-specific differences by restricting the input data for a country-specific metric to destination prefixes in that country. Although conceptually simple, our study required tackling methodological challenges common to most Internet measurement research today, such as geolocation, incomplete data, vantage point access, and lack of ground truth. Restricting public routing data to individual countries requires substantial downsampling compared to global analysis, and we analyze the impact of downsampling on the robustness and stability of our country-specific metrics. As a measure of validation, we apply our country-specific metrics to case studies of Australia, Japan, Russia, Taiwan, and the United States, illuminating aspects of concentration and interdependence in telecommunications markets. To support reproducibility, we will share our code, inferences, and data sets with other researchers.

Coarse-grained Inference of BGP Community Intent

BGP communities allow operators to influence routing decisions made by other networks (action communities) and to annotate their network's routing information with metadata such as where each route was learned or the relationship the network has with their neighbor (information communities). BGP communities also help researchers understand complex Internet routing behaviors. However, there is no standard convention for how operators assign community values, and significant efforts to scalably infer community meanings have ignored this high-level classification. We discovered that doing so comes at significant cost in accuracy, of both inference and validation. To advance this narrow but powerful direction in Internet infrastructure research, we design and validate an algorithm to execute this first fundamental step: inferring whether a BGP community is action or information. We applied our method to 78,480 community values observed in public BGP data for May 2023. Validating our inferences (24,376 action and 54,104 informational communities) against available ground truth (6,259 communities) we find that our method classified 96.5% correctly. We found that the precision of a state-of-the-art location community inference method increased from 68.2% to 94.8% with our classifications. We publicly share our code, dictionaries, inferences, and datasets to enable the community to benefit from them.

RoVista: Measuring and Analyzing the Route Origin Validation (ROV) in RPKI

The Resource Public Key Infrastructure (RPKI) is a system to add security to the Internet routing. In recent years, the publication of Route Origin Authorization (ROA) objects, which bind IP prefixes to their legitimate origin ASN, has been rapidly increasing. However, ROAs are effective only if the routers use them to verify and filter invalid BGP announcements, a process called Route Origin Validation (ROV).

There are many proposed approaches to measure the status of ROV in the wild, but they are limited in scalability or accuracy. In this paper, we present RoVista, an ROV measurement framework that leverages IP-ID side channel and in-the-wild RPKI-invalid prefix. With over 20 months of longitudinal measurement, RoVista successfully covers more than 28K ASes where 63.8% of ASes have derived benefits from ROV, although the percentage of fully protected ASes remains relatively low at 12.3%. In order to validate our findings, we have also sought input from network operators.

We then evaluate the security impact of current ROV deployment and reveal misconfigurations that will weaken the protection of ROV. Lastly, we compare RoVista with other approaches and conclude with a discussion of our findings and limitations.

Illuminating Router Vendor Diversity Within Providers and Along Network Paths

The Internet architecture has facilitated a multi-party, distributed, and heterogeneous physical infrastructure where routers from different vendors connect and inter-operate via IP. Such vendor heterogeneity can have important security and policy implications. For example, a security vulnerability may be specific to a particular vendor and implementation, and thus will have a disproportionate impact on particular networks and paths if exploited. From a policy perspective, governments are now explicitly banning particular vendors-or have threatened to do so.

Despite these critical issues, the composition of router vendors across the Internet remains largely opaque. Remotely identifying router vendors is challenging due to their strict security posture, indistinguishability due to code sharing across vendors, and noise due to vendor mergers. We make progress in overcoming these challenges by developing LFP, a tool that improves the coverage, accuracy, and efficiency of router fingerprinting as compared to the current state-of-the-art. We leverage LFP to characterize the degree of router vendor homogeneity within networks and the regional distribution of vendors. We then take a path-centric view and apply LFP to better understand the potential for correlated failures and fate-sharing. Finally, we perform a case study on inter and intra-United States data paths to explore the feasibility to make vendor-based routing policy decisions, i.e., whether it is possible to avoid a particular vendor given the current infrastructure.

IRRegularities in the Internet Routing Registry

The Internet Routing Registry (IRR) is a set of distributed databases used by networks to register routing policy information and to validate messages received in the Border Gateway Protocol (BGP). First deployed in the 1990s, the IRR remains the most widely used database for routing security purposes, despite the existence of more recent and more secure alternatives. Yet, the IRR lacks a strict validation standard and the limited coordination across different database providers can lead to inaccuracies. Moreover, it has been reported that attackers have begun to register false records in the IRR to bypass operators' defenses when launching attacks on the Internet routing system, such as BGP hijacks. In this paper, we provide a longitudinal analysis of the IRR over the span of 1.5 years. We develop a workflow to identify irregular IRR records that contain conflicting information compared to different routing data sources. We identify 34,199 irregular route objects out of 1,542,724 route objects from November 2021 to May 2023 in the largest IRR database and find 6,373 to be potentially suspicious.


Flocking to Mastodon: Tracking the Great Twitter Migration

The acquisition of Twitter by Elon Musk has spurred controversy and uncertainty among Twitter users. The move raised both praise and concerns, particularly regarding Musk's views on free speech. As a result, a large number of Twitter users have looked for alternatives to Twitter. Mastodon, a decentralized micro-blogging social network, has attracted the attention of many users and the general media. In this paper, we analyze the migration of 136,009 users from Twitter to Mastodon. We inspect the impact that this has on the wider Mastodon ecosystem, particularly in terms of user-driven pressure towards centralization. We further explore factors that influence users to migrate, highlighting the effect of users' social networks. Finally, we inspect the behavior of individual users, showing how they utilize both Twitter and Mastodon in parallel. We find a clear difference in the topics discussed on the two platforms. This leads us to build classifiers to explore if migration is predictable. Through feature analysis, we find that the content of tweets as well as the number of URLs, the number of likes, and the length of tweets are effective metrics for the prediction of user migration.

The Prevalence of Single Sign-On on the Web: Towards the Next Generation of Web Content Measurement

Much of the content and structure of the Web remains inaccessible to evaluate at scale because it is gated by user authentication. This limitation restricts researchers to examining only a superficial layer of a website: the landing page or public, search-indexable pages. Since it is infeasible to create individual accounts across thousands of webpages, we examine the prevalence of Single Sign-On (SSO) on the web to explore the feasibility of using a few accounts to authenticate to many sites. We find that 58% of the top 10K websites with logins are accessible with popular 3rd-party SSO providers, such as Google, Facebook, and Apple, indicating that leveraging SSO offers a scalable solution to access a large volume of user-gated content.

Reviving Dead Links on the Web with Fable

The web is littered with millions of links which previously worked but no longer do. When users encounter any such broken link, they resort to looking up an archived copy of the linked page. But, for a sizeable fraction of these broken links, no archived copies exist. Even if a copy exists, it often poorly approximates the original page, e.g., any functionality on the page which requires the client browser to communicate with the page's backend servers will not work, and even the latest copy will be missing updates made to the page's content after that copy was captured.

To address this situation, we observe that broken links are often merely a result of website reorganizations; the linked page still exists on the same site, albeit at a different URL. Therefore, given a broken link, our system FABLE attempts to find the linked page's new URL by learning and exploiting the pattern in how the old URLs for other pages on the same site have transformed to their new URLs. We show that our approach is significantly more accurate and efficient than prior approaches which rely on stability in page content over time. FABLE increases the fraction of dead links for which the corresponding new URLs can be found by 50%, while reducing the median delay incurred in identifying the new URL for a broken link from over 40 seconds to less than 10 seconds.

Demystifying Web-based Mobile Extended Reality Accelerated by WebAssembly

By combining various emerging technologies, mobile extended reality (XR) blends the real world with virtual content to create a spectrum of immersive experiences. Although Web-based XR can offer attractive features such as better accessibility, cross-platform compatibility, and instant updates, its performance may not be on par with its standalone counterpart. As a low-level bytecode, WebAssembly has the potential to drastically accelerate Web-based XR by enabling near-native execution speed. However, little has been known about how well Web-based XR performs with WebAssembly acceleration. To bridge this crucial gap, we conduct a first-of-its-kind systematic and empirical study to analyze the performance of Web-based XR expedited by WebAssembly on four diverse platforms with five different browsers. Our measurement results reveal that although WebAssemlby can accelerate different XR tasks in various contexts, there remains a substantial performance disparity between Web-based and standalone XR. We hope our findings can foster the realization of an immersive Web that is accessible to a wider audience with various emerging technologies.


Thou Shalt Not Reject: Analyzing Accept-Or-Pay Cookie Banners on the Web

Privacy regulations have led to many websites showing cookie banners to their users. Usually, cookie banners present the user with the option to "accept" or "reject" cookies. Recently, a new form of paywall-like cookie banner has taken hold on the Web, giving users the option to either accept cookies (and consequently user tracking) or buy a paid subscription for a tracking-free website experience.

In this paper, we perform the first completely automated analysis of cookiewalls, i.e., cookie banners acting as a paywall. We find cookiewalls on 0.6% of all queried 45k websites. Moreover, cookiewalls are deployed to a large degree on European websites, e.g., for Germany we see cookiewalls on 8.5% of top 1k websites. Additionally, websites using cookiewalls send 6.4 times more third-party cookies and 42 times more tracking cookies to visitors, compared to regular cookie banner websites. We also uncover two large subscription Management Platforms used on hundreds of websites, which provide website operators with easy-to-setup cookiewall solutions. Finally, we publish tools, data, and code to foster reproducibility and further studies.

A Longitudinal Study of Vulnerable Client-side Resources and Web Developers' Updating Behaviors

Modern Websites rely on various client-side web resources, such as JavaScript libraries, to provide end-users with rich and interactive web experiences. Unfortunately, anecdotal evidence shows that improperly managed client-side resources could open up attack surfaces that adversaries can exploit. However, there is still a lack of a comprehensive understanding of the updating practices among web developers and the potential impact of inaccuracies in Common Vulnerabilities and Exposures (CVE) information on the security of the web ecosystem. In this paper, we conduct a longitudinal (four-year) measurement study of the security practices and implications on client-side resources (e.g., JavaScript libraries and Adobe Flash) across the Web. Specifically, we first collect a large-scale dataset of 157.2M webpages of Alexa Top 1M websites for four years in the wild. Analyzing the dataset, we find an average of 41.2% of websites (in each year of the four years) carry at least one vulnerable client-side resource (e.g., JavaScript or Adobe Flash). We also reveal that vulnerable JavaScript library versions are frequently observed in the wild, suggesting a concerning level of lagging update practice in the wild. On average, we observe 531.2 days with 25,337 websites of the window of vulnerability due to the unpatched client-side resources from the release of security patches. Furthermore, we manually investigate the fidelity of CVE (Common Vulnerabilities and Exposures) reports on client-side resources, leveraging PoC (Proof of Concept) code. We find that 13 CVE reports (out of 27) have incorrect vulnerable version information, which may impact security-related tasks such as security updates.

Not only E.T. Phones Home: Analysing the Native User Tracking of Mobile Browsers

Contemporary browsers constitute a critical component of our everyday interactions with the Web. Similar to a small, but powerful operating system, a browser is responsible to fetch and run web apps locally, on the user's (mobile) device. Even though in the last few years, there has been an increased interest for tools and mechanisms to block potentially malicious behaviours of web domains against the users' privacy (e.g., ad blockers, incognito browsing mode, etc.), it is still unclear if the user can browse the Web in private.

In this paper, we analyse the natively generated network traffic of 15 mobile browser apps under different configurations to investigate if the users are capable of browsing the Web privately, without sharing their browsing history with remote servers. We develop a novel framework (Panoptes) to instrument and monitor separately the mobile browser traffic generated by (a) the web engine and (b) natively by the mobile app. By crawling a set of websites via Panoptes, and analyzing the native traffic of browsers, we find that there are browsers (i) who persistently track their users, and (ii) browsers that report to remote servers (geolocated outside EU), the exact page and content the user is browsing at that moment. Finally, we see browsers communicating with third-party ad servers while leaking personal and device identifiers.

SESSION: Security 1

Wolf in Sheep's Clothing: Evaluating Security Risks of the Undelegated Record on DNS Hosting Services

Leveraging DNS for covert communications is appealing since most networks allow DNS traffic, especially the ones directed toward renowned DNS hosting services. Unfortunately, most DNS hosting services overlook domain ownership verification, enabling miscreants to host undelegated DNS records of a domain they do not own. Consequently, miscreants can conduct covert communication through such undelegated records for whitelisted domains on reputable hosting providers. In this paper, we shed light on the emerging threat posed by undelegated records and demonstrate their exploitation in the wild. To the best of our knowledge, this security risk has not been studied before.

We conducted a comprehensive measurement to reveal the prevalence of the risk. In total, we observed 1,580,925 unique undelegated records that are potentially abused. We further observed that a considerable portion of these records are associated with malicious behaviors. By utilizing threat intelligence and malicious traffic collected by malware sandbox, we extracted malicious IP addresses from 25.41% of these records, spanning 1,369 Tranco top 2K domains and 248 DNS hosting providers, including Cloudflare and Amazon. Furthermore, we discovered that the majority of the identified malicious activities are Trojan-related. Moreover, we conducted case studies on two malware families (Dark.IOT and Specter) that exploit undelegated records to obtain C2 servers, in addition to the masquerading SPF records to conceal SMTP-based covert communication. Also, we provided mitigation options for different entities. As a result of our disclosure, several popular hosting providers have taken action to address this issue.

Dial "N" for NXDomain: The Scale, Origin, and Security Implications of DNS Queries to Non-Existent Domains

Non-Existent Domain (NXDomain) is one type of the Domain Name System (DNS) error responses, indicating that the queried domain name does not exist and cannot be resolved. Unfortunately, little research has focused on understanding why and how NXDomain responses are generated, utilized, and exploited. In this paper, we conduct the first comprehensive and systematic study on NXDomain by investigating its scale, origin, and security implications. Utilizing a large-scale passive DNS database, we identify 146,363,745,785 NXDomains queried by DNS users between 2014 and 2022. Within these 146 billion NXDomains, 91 million of them hold historic WHOIS records, of which 5.3 million are identified as malicious domains including about 2.4 million blocklisted domains, 2.8 million DGA (Domain Generation Algorithms) based domains, and 90 thousand squatting domains targeting popular domains. To gain more insights into the usage patterns and security risks of NXDomains, we register 19 carefully selected NXDomains in the DNS database, each of which received more than ten thousand DNS queries per month. We then deploy a honeypot for our registered domains and collect 5,925,311 incoming queries for 6 months, from which we discover that 5,186,858 and 505,238 queries are generated from automated processes and web crawlers, respectively. Finally, we perform extensive traffic analysis on our collected data and reveal that NXDomains can be misused for various purposes, including botnet takeover, malicious file injection, and residue trust exploitation.

Extended DNS Errors: Unlocking the Full Potential of DNS Troubleshooting

The Domain Name System (DNS) relies on response codes to confirm successful transactions or indicate anomalies. Yet, the codes are not sufficiently fine-grained to pinpoint the root causes of resolution failures. RFC~8914 (Extended DNS Errors or EDE) addresses the problem by defining a new extensible registry of error codes to be served inside the OPT resource record. In this paper, we show that four major DNS resolver vendors and three large public DNS resolvers support this standard and correctly narrow down the cause of underlying problems. Yet, they do not agree in 94% of our test cases in terms of the returned EDE codes. We reveal that Cloudflare DNS is the most precise in indicating various DNS misconfigurations via the EDE mechanism, so we use it to perform a large-scale analysis of more than 303M registered domain names. We show that 17.7M of them trigger EDE codes. Lame delegations and DNSSEC validation failures are the most common problems encountered.

Stale TLS Certificates: Investigating Precarious Third-Party Access to Valid TLS Keys

Certificate authorities enable TLS server authentication by generating certificates that attest to the mapping between a domain name and a cryptographic keypair, for up to 398 days. This static, name-to-key caching mechanism belies a complex reality: a tangle of dynamic infrastructure involving domains, servers, cryptographic keys, etc. When any of these operations changes, the authentication information in a certificate becomes stale and no longer accurately reflects reality. In this work, we examine the broader phenomenon of certificate invalidation events and discover three classes of security-relevant events that enable a third-party to impersonate a domain outside of their control. Longitudinal measurement of these precarious scenarios reveals that they affect over 15K new domains per day, on average. Unfortunately, modern certificate revocation provides little recourse, so we examine the potential impact of reducing certificate lifetimes (cache duration): shortening the current 398-day limit to 90 days yields a 75% decrease in precarious access to valid TLS keys.

The CVE Wayback Machine: Measuring Coordinated Disclosure from Exploits against Two Years of Zero-Days

Software security depends on coordinated vulnerability disclosure (CVD) from researchers, a process that the community has continually sought to measure and improve. Yet, CVD practices are only as effective as the data that informs them. In this paper, we use DScope, a cloud-based interactive Internet telescope, to build statistical models of vulnerability lifecycles, bridging the data gap in over 20 years of CVD research. By analyzing application-layer Internet scanning traffic over two years, we identify real-world exploitation timelines for 63 threats. We bring this data together with six additional datasets to build a complete birth-to-death model of these vulnerabilities, the most complete analysis of vulnerability lifecycles to date. Our analysis reaches three key recommendations: (1) CVD across diverse vendors shows lower effectiveness than previously thought, (2) intrusion detection systems are underutilized to provide protection for critical vulnerabilities, and (3) existing data sources of CVD can be augmented by novel approaches to Internet measurement. In this way, our vantage point offers new opportunities to improve the CVD process, achieving a safer software ecosystem in practice.

SESSION: Security 2

Re-measuring the Label Dynamics of Online Anti-Malware Engines from Millions of Samples

VirusTotal is the most widely used online scanning service in both academia and industry. However, it is known that the results returned by antivirus engines are often inconsistent and changing over time. The intrinsic dynamics of VirusTotal labeling have prompted researchers to investigate the characteristics of label dynamics for more effective use. However, they are generally limited in terms of the size and diversity of the datasets used in the measurements. This poses threats to many of their conclusions. In this paper, we perform an extraordinary large-scale study to re-measure the label dynamics of VirusTotal. Our dataset involves all the scan data in VirusTotal over a 14-month period, including over 571 million samples and 847 million reports in total. With this large dataset, we are able to revisit many issues related to the label dynamics of VirusTotal, including the prevalence of label dynamics/silence, the characteristics across file types, the impact of label dynamics on common label aggregation methods, the stabilization patterns of labels, etc. Our measurement reveals some observations that are unknown to the research community and even inconsistent with previous research. We believe that our findings could help researchers advance the understanding of the VirusTotal ecosystem.

Phishing in the Free Waters: A Study of Phishing Attacks Created using Free Website Building Services

Free Website Building services (FWBs) provide individuals with a cost-effective and convenient way to create a website without requiring advanced technical knowledge or coding skills. However, malicious actors often abuse these services to host phishing websites. In this work, we propose FreePhish, a scalable framework to continuously identify phishing websites that are created using FWBs. Using FreePhish, we were able to detect and characterize more than 31.4K phishing URLs that were created using 17 unique free website builder services and shared on Twitter and Facebook over a period of six months. We find that FWBs provide attackers with several features that make it easier to create and maintain phishing websites at scale while simultaneously evading anti-phishing countermeasures. Our study indicates that anti-phishing blocklists and browser protection tools have significantly lower coverage and high detection time against FWB phishing attacks when compared to regular (self-hosted) phishing websites. While our prompt disclosure of these attacks helped some FWBs to remove these attacks, we found several others who were slow at removal or did not remove them outright, with the same also being true for Twitter and Facebook. Finally, we also provide FreePhish as a free Chromium web extension that can be utilized to prevent end-users from accessing potential FWB-based phishing attacks.

Fifteen Months in the Life of a Honeyfarm

Honeypots have been used for decades to detect, monitor, and understand attempts of unauthorized use of information systems. Previous studies focused on characterizing the spread of malware, e.g., Mirai and other attacks, or proposed stealthy and interactive architectures to improve honeypot efficiency.

In this paper, we present insights and benefits gained from collaborating with an operational honeyfarm, i.e., a set of honeypots distributed around the globe with centralized data collection. We analyze data of about 400 million sessions over a 15-month period, gathered from a globally distributed honeyfarm consisting of 221 honeypots deployed in 55 countries. Our analysis unveils stark differences among the activity seen by the honeypots-some are contacted millions of times while others only observe a few thousand sessions. We also analyze the behavior of scouters and intruders of these honeypots. Again, some honeypots report orders of magnitude more interactions with command execution than others. Still, diversity is needed since even if we focus on the honeypots with the highest visibility, they see only a small fraction of the intrusions, including only 5% of the files. Thus, although around 2% of intrusions are visible by most of the honeypots in our honeyfarm, the rest are only visible to a few. We conclude with a discussion of the findings of work.

Evolving Bots: The New Generation of Comment Bots and their Underlying Scam Campaigns in YouTube

This paper presents a pioneering investigation into a novel form of scam advertising method on YouTube, termed "social scam bots'' (SSBs). These bots have evolved to emulate benign user behavior by posting comments and engaging with other users, oftentimes appearing prominently among the top rated comments. We analyzed the YouTube video comments and proposed a method to identify SSBs and extract the underlying scam domains. Our study revealed 1,134 SSBs promoting 72 scam campaigns responsible for infecting 31.73% of crawled videos. Further investigation revealed that SSBs exhibit advances that surpass traditional bots. Notably, they targeted specific audience by aligning scam campaigns with related video content, effectively leveraging the YouTube recommendation algorithm. We monitored these SSBs over a period of six months, enabling us to evaluate the effectiveness of YouTube's mitigation efforts. We also uncovered various strategies they use to evade mitigation attempts, including a novel strategy called "self-engagement," aimed at boosting their comment ranking. By shedding light on the phenomenon of SSBs and their evolving tactics, our study aims to raise awareness and contribute to the prevention of these malicious actors, ultimately fostering a safer online platform.

Cloud Watching: Understanding Attacks Against Cloud-Hosted Services

Cloud computing has dramatically changed service deployment patterns. In this work, we analyze how attackers identify and target cloud services in contrast to traditional enterprise networks and network telescopes. Using a diverse set of cloud honeypots in 5 providers and 23 countries as well as 2 educational networks and 1 network telescope, we analyze how IP address assignment, geography, network, and service-port selection, influence what services are targeted in the cloud. We find that scanners that target cloud compute are selective: they avoid scanning networks without legitimate services and they discriminate between geographic regions. Further, attackers mine Internet-service search engines to find exploitable services and, in some cases, they avoid targeting IANA-assigned protocols, causing researchers to misclassify at least 15% of traffic on select ports. Based on our results, we derive recommendations for researchers and operators.

SESSION: Security 3

How to Operate a Meta-Telescope in your Spare Time

Unsolicited traffic sent to advertised network space that does not host active services provides insights about misconfigurations as well as potentially malicious activities, including the spread of Botnets, DDoS campaigns, and exploitation of vulnerabilities. Network telescopes have been used for many years to monitor such unsolicited traffic. Unfortunately, they are limi the available address space for such tasks and, thus, limited to specific geographic and/or network regions.

In this paper, we introduce a novel concept to broadly capture unsolicited Internet traffic, which we call a "meta-telescope". A meta-telescope is based on the intuition that, with the availability of appropriate vantage points, one can (i) infer which address blocks on the Internet are unused and (ii) capture traffic towards them-both without having control of such address blocks. From this intuition, we develop and evaluate a methodology for identifying unlikely to be used Internet address space and build a meta-telescope that has very desirable properties, such as broad coverage of dark space both in terms of size and topological placement. Such meta-telescope identifies and captures unsolicited traffic to more than 350k /24 blocks in more than 7k ASes. Through the analysis of background radiation towards these networks, we also highlight that unsolicited traffic differs by destination network/geographic region as well as by network type. Finally, we discuss our experience and challenges when operating a meta-telescope in the wild.

Lazy Gatekeepers: A Large-Scale Study on SPF Configuration in the Wild

The Sender Policy Framework (SPF) is a basic mechanism for authorizing the use of domains in email. In combination with other mechanisms, it serves as a cornerstone for protecting users from forged senders. In this paper, we investigate the configuration of SPF across the Internet. To this end, we analyze SPF records from 12 million domains in the wild. Our analysis shows a growing adoption, with 56.5 % of the domains providing SPF records. However, we also uncover notable security issues: First, 2.9 % of the SPF records have errors, undefined content or ineffective rules, undermining the intended protection. Second, we observe a large number of very lax configurations. For example, 34.7 % of the domains allow emails to be sent from over 100 000 IP addresses. We explore the reasons for these loose policies and demonstrate that they facilitate email forgery. As a remedy, we derive recommendations for an adequate configuration and notify all operators of domains with misconfigured SPF records.

On the Similarity of Web Measurements Under Different Experimental Setups

Measurement studies are essential for research and industry alike better understand the Web's inner workings and help quantify specific phenomena. Performing such studies is demanding due to the dynamic nature and size of the Web. Designing and setting up an experiment is a complex task, and many factors might affect the results. However, while several works have independently observed differences in the outcome of an experiment (e.g., the number of observed trackers) based on the measurement setup, it is unclear what causes such deviations. This work investigates the reasons for these differences by visiting 1.7M webpages with five different measurement setups. Based on this investigation, we build 'dependency trees' for each page and cross-compare the nodes in the trees. The results show that the measured trees differ considerably, that the cause of differences can be attributed to specific nodes, and that even identical measurement setups can produce different results.

Understanding the Privacy Risks of Popular Search Engine Advertising Systems

We present the first extensive measurement of the privacy properties of the advertising systems used by privacy-focused search engines. We propose an automated methodology to study the impact of clicking on search ads on three popularprivate search engines which have advertising-based business models: StartPage, Qwant, and DuckDuckGo, and we compare them to two dominant data-harvesting ones: Google and Bing. We investigate the possibility of third parties tracking users when clicking on ads by analyzing first-party storage, redirection domain paths, and requests sent before, when, and after the clicks.

Our results show that privacy-focused search engines fail to protect users' privacy when clicking ads. Users' requests are sent through redirectors on 4% of ad clicks on Bing, 86% of ad clicks on Qwant, and 100% of ad clicks on Google, DuckDuckGo, and StartPage. Even worse, advertising systems collude with advertisers across all search engines by passing unique IDs to advertisers in most ad clicks. These IDs allow redirectors to aggregate users' activity on ads' destination websites in addition to the activity they record when users are redirected through them. Overall, we observe that both privacy-focused and traditional search engines engage in privacy-harming behaviors allowing cross-site tracking, even in privacy-enhanced browsers.

A First Look at the Privacy Harms of the Public Suffix List

The public suffix list is a community-maintained list of rules that can be applied to domain names to determine how they should be grouped into logical organizations or companies. We present the first large-scale measurement study of how the public suffix list is used by open-source software on the Web and the privacy harm resulting from projects using outdated versions of the list. We measure how often developers include out-of-date versions of the public suffix list in their projects, how old included lists are, and estimate the real-world privacy harm with a model based on a large-scale crawl of the Web. We find that incorrect use of the public suffix list is common in open-source software, and that at least 43 open-source projects use hard-coded, outdated versions of the public suffix list. These include popular, security-focused projects, such as password managers and digital forensics tools. We also estimate that, because of these out-of-date lists, these projects make incorrect privacy decisions for 1313 effective top-level domains (eTLDs), affecting 50,750 domains, by extrapolating from data gathered by the HTTP Archive project.

SESSION: Distributed protocols

The Cloud Strikes Back: Investigating the Decentralization of IPFS

Interplanetary Filesystem (IPFS) is one of the largest peer-to-peer filesystems in operation. The network is the default storage layer for Web3 and is being presented as a solution to the centralization of the web. In this paper, we present a large-scale, multi-modal measurement study of the IPFS network. We analyze the topology, the traffic, the content providers and the entry points from the classical Internet. Our measurements show significant centralization in the IPFS network and a high share of nodes hosted in the cloud. We also shed light on the main stakeholders in the ecosystem. We discuss key challenges that might disrupt continuing efforts to decentralize the Web and highlight multiple properties that are creating pressures toward centralization.

Ethereum's Proposer-Builder Separation: Promises and Realities

With Ethereum's transition from Proof-of-Work to Proof-of-Stake in September 2022 came another paradigm shift, the Proposer-Builder Separation (PBS) scheme. PBS was introduced to decouple the roles of selecting and ordering transactions in a block (i.e., the builder), from those validating its contents and proposing the block to the network as the new head of the blockchain (i.e., the proposer). In this landscape, proposers are the validators in the Proof-of-Stake consensus protocol, while now relying on specialized block builders for creating blocks with the highest value for the proposer. Additionally, relays act as mediators between builders and proposers. We study PBS adoption and show that the current landscape exhibits significant centralization amongst the builders and relays. Further, we explore whether PBS effectively achieves its intended objectives of enabling hobbyist validators to maximize block profitability and preventing censorship. Our findings reveal that although PBS grants validators the opportunity to access optimized and competitive blocks, it tends to stimulate censorship rather than reduce it. Additionally, we demonstrate that relays do not consistently uphold their commitments and may prove unreliable. Specifically, proposers do not always receive the complete promised value, and the censorship or filtering capabilities pledged by relays exhibit significant gaps.


BehavIoT: Measuring Smart Home IoT Behavior Using Network-Inferred Behavior Models

Smart home IoT platforms are typically closed systems, meaning that there is poor visibility into device behavior. Understanding device behavior is important not only for determining whether devices are functioning as expected, but also can reveal implications for privacy (e.g., surreptitious audio/video recording), security (e.g., device compromise), and safety (e.g., denial of service on a baby monitor). While there has been some work on identifying devices and a handful of activities, an open question is what is the extent to which we can automatically model the entire behavior of an IoT deployment, and how it changes over time, without any privileged access to IoT devices or platform messages.

In this work, we demonstrate that the vast majority of IoT behavior can indeed be modeled, using a novel multi-dimensional approach that relies only on the (often encrypted) network traffic exchanged by IoT devices. Our key insight is that IoT behavior (including cross-device interactions) can often be captured using relatively simple models such as timers (for periodic behavior) and probabilistic state-machines (for user-initiated behavior and devices interactions) during a limited observation phase. We then propose deviation metrics that can identify when the behavior of an IoT device or an IoT system changes over time. Our models and metrics successfully identify several notable changes in our IoT deployment, including a camera that changed locations, network outages that impact connectivity, and device malfunctions.

In the Room Where It Happens: Characterizing Local Communication and Threats in Smart Homes

The network communication between Internet of Things (IoT) devices on the same local network has significant implications for platform and device interoperability, security, privacy, and correctness. Yet, the analysis of local home Wi-Fi network traffic and its associated security and privacy threats have been largely ignored by prior literature, which typically focuses on studying the communication between IoT devices and cloud end-points, or detecting vulnerable IoT devices exposed to the Internet. In this paper, we present a comprehensive and empirical measurement study to shed light on the local communication within a smart home deployment and its threats. We use a unique combination of passive network traffic captures, protocol honeypots, dynamic mobile app analysis, and crowdsourced IoT data from participants to identify and analyze a wide range of device activities on the local network. We then analyze these datasets to characterize local network protocols, security and privacy threats associated with them. Our analysis reveals vulnerable devices, insecure use of network protocols, and sensitive data exposure by IoT devices. We provide evidence of how this information is exfiltrated to remote servers by mobile apps and third-party SDKs, potentially for household fingerprinting, surveillance and cross-device tracking. We make our datasets and analysis publicly available to support further research in this area.

Behind the Scenes: Uncovering TLS and Server Certificate Practice of IoT Device Vendors in the Wild

IoT devices are increasingly used in consumer homes. Despite recent works in characterizing IoT TLS usage for a limited number of in-lab devices, there exists a gap in quantitatively understanding TLS behaviors from devices in the wild and server-side certificate management.

To bridge this knowledge gap, we conduct a new measurement study by focusing on the practice of device vendors, through a crowdsourced dataset of network traffic from 2,014 real-world IoT devices across 721 global users. By quantifying the sharing of TLS fingerprints across vendors and across devices, we uncover the prevalent use of customized TLS libraries (i.e., not matched to any known TLS libraries) and potential security concerns resulting from co-located TLS stacks of different services. Furthermore, we present the first known study on server-side certificate management for servers contacted by IoT devices. Our study highlights potential concerns in the TLS/PKI practice by IoT device vendors. We aim to raise visibility for these issues and motivate vendors to improve security practice.

An LLM-based Framework for Fingerprinting Internet-connected Devices

In this paper we propose the use of large language models (LLMs) for characterizing, clustering, and fingerprinting raw text obtained from network measurements. To this end, We first train a transformer-based masked language model, namely RoBERTa, on a dataset containing hundreds of millions of banners obtained from Internet-wide scans. We further fine-tune this model using a contrastive loss function (driven by domain knowledge) to produce temporally stable numerical representations (embeddings) that can be used out-of-the-box for downstream learning tasks. Our embeddings are robust, resilient to small random changes in the content of a banner, and maintain proximity between embeddings of similar hardware/software products. We further cluster HTTP banners using a density-based approach (HDBSCAN), and examine the obtained clusters to generate text-based fingerprints for the purpose of labeling raw scan data. We compare our fingerprints to Recog, an existing database of manually curated fingerprints, and show that we can identify new IoT devices and server products that were not previously captured by Recog. Our proposed methodology poses an important direction for future research by utilizing state-of-the-art language models to automatically analyze, interpret, and label the large amounts of data generated by Internet scans.

SESSION: Transport

Estimating WebRTC Video QoE Metrics Without Using Application Headers

The increased use of video conferencing applications (VCAs) has made it critical to understand and support end-user quality of experience (QoE) by all stakeholders in the VCA ecosystem, especially network operators, who typically do not have direct access to client software. Existing VCA QoE estimation methods use passive measurements of application-level Real-time Transport Protocol (RTP) headers. However, a network operator does not always have access to RTP headers in all cases, particularly when VCAs use custom RTP protocols (e.g., Zoom) or due to system constraints (e.g., legacy measurement systems). Given this challenge, this paper considers the use of more standard features in the network traffic, namely, IP and UDP headers, to provide per-second estimates of key VCA QoE metrics such as frames rate and video resolution. We develop a method that uses machine learning with a combination of flow statistics (e.g., throughput) and features derived based on the mechanisms used by the VCAs to fragment video frames into packets. We evaluate our method for three prevalent VCAs running over WebRTC: Google Meet, Microsoft Teams, and Cisco Webex. Our evaluation consists of 54,696 seconds of VCA data collected from both (1), controlled in-lab network conditions, and (2) real-world networks from 15 households. We show that the ML-based approach yields similar accuracy compared to the RTP-based methods, despite using only IP/UDP data. For instance, we can estimate FPS within 2 FPS for up to 83.05% of one-second intervals in the real-world data, which is only 1.76% lower than using the application-level RTP headers.

PTPerf: On the Performance Evaluation of Tor Pluggable Transports

Tor, one of the most popular censorship circumvention systems, faces regular blocking attempts by censors. Thus, to facilitate access, it relies on "pluggable transports" (PTs) that disguise Tor's traffic and make it hard for the adversary to block Tor. However, these are not yet well studied and compared for the performance they provide to the users. Thus, we conduct a first comparative performance evaluation of a total of 12 PTs-the ones currently supported by the Tor project and those that can be integrated in the future.

Our results reveal multiple facets of the PT ecosystem. (1) PTs' download time significantly varies even under similar network conditions. (2) All PTs are not equally reliable. Thus, clients who regularly suffer censorship may falsely believe that such PTs are blocked. (3) PT performance depends on the underlying communication primitive. (4) PTs performance significantly depends on the website access method (browser or command-line). Surprisingly, for some PTs, website access time was even less than vanilla Tor.

Based on our findings from more than 1.25M measurements, we provide recommendations about selecting PTs and believe that our study can facilitate access for users who face censorship.

Containing the Cambrian Explosion in QUIC Congestion Control

Since its introduction in 2015, QUIC has seen rapid adoption and is set to be the default transport stack for HTTP3. Given that developers can now easily implement and deploy their own congestion control algorithms in the user space, there is an imminent risk of the proliferation of QUIC implementations of congestion control algorithms that no longer resemble their corresponding standard kernel implementations.

In this paper, we present the results of a comprehensive measurement study of the congestion control algorithm (CCA) implementations for 11 popular open-source QUIC stacks. We propose a new metric called Conformance-T that can help us identify the implementations with large deviations more accurately and also provide hints on how they can be modified to be more conformant to reference kernel implementations. Our results show that while most QUIC CCA implementations are conformant in shallow buffers, they become less conformant in deep buffers. In the process, we also identified five new QUIC implementations that had low conformance and demonstrated how low-conformance implementations can cause unfairness and subvert our expectations of how we expect different CCAs to interact. With the hints obtained from our new metric, we were able to identify implementation-level differences that led to the low conformance and derive the modifications required to improve conformance for three of them.

ECN with QUIC: Challenges in the Wild

TCP and QUIC can both leverage ECN to avoid congestion loss and its retransmission overhead. However, both protocols require support of their remote endpoints and it took two decades since the initial standardization of ECN for TCP to reach 80% ECN support and more in the wild. In contrast, the QUIC standard mandates ECN support, but there are notable ambiguities that make it unclear if and how ECN can actually be used with QUIC on the Internet. Hence, in this paper, we analyze ECN support with QUIC in the wild: We conduct repeated measurements on more than 180 M domains to identify HTTP/3 websites and analyze the underlying QUIC connections w.r.t. ECN support. We only find 20% of QUIC hosts, providing 6% of HTTP/3 websites, to mirror client ECN codepoints. Yet, mirroring ECN is only half of what is required for ECN with QUIC, as QUIC validates mirrored ECN codepoints to detect network impairments: We observe that less than 2% of QUIC hosts, providing less than 0.3% of HTTP/3 websites, pass this validation. We identify possible root causes in content providers not supporting ECN via QUIC and network impairments hindering ECN. We thus also characterize ECN with QUIC distributedly to traverse other paths and discuss our results w.r.t. QUIC and ECN innovations beyond QUIC.

Does It Spin? On the Adoption and Use of QUIC's Spin Bit

Encrypted QUIC traffic complicates network management as traditional transport layer semantics can no longer be used for RTT or packet loss measurements. Addressing this challenge, QUIC includes an optional, carefully designed mechanism: the spin bit. While its capabilities have already been studied in test settings, its real-world usefulness and adoption are unknown. In this paper, we thus investigate the spin bit's deployment and utility on the web.

Analyzing our long-term measurements of more than 200 M domains, we find that the spin bit is enabled on ~10% of those with QUIC support and for ~50% / 60% of the underlying IPv4 / IPv6 hosts. The support is mainly driven by medium-sized cloud providers while most hyperscalers do not implement it. Assessing the utility of spin bit RTT measurements, the theoretical issue of reordering does not significantly manifest in our study and the spin bit provides accurate estimates for around 30.5% of connections using the mechanism, but drastically overestimates the RTT for another 51.7%. Overall, we conclude that the spin bit, even though an optional feature, indeed sees use in the wild and is able to provide reasonable RTT estimates for a solid share of QUIC connections, but requires solutions for making its measurements more robust.

SESSION: Tagging

I Tag, You Tag, Everybody Tags!

Location tags are designed to track personal belongings. Nevertheless, there has been anecdotal evidence that location tags are also misused to stalk people. Tracking is achieved locally, e.g., via Bluetooth with a paired phone, and remotely, by piggybacking on location-reporting devices which come into proximity of a tag. This paper studies the performance of the two most popular location tags (Apple's AirTag and Samsung's SmartTag) through controlled experiments - with a known large distribution of location-reporting devices - as well as in-the-wild experiments - with no control on the number and kind of reporting devices encountered, thus emulating real-life use-cases. We find that both tags achieve similar performance, e.g., they are located 55% of the times in about 10 minutes within a 100~m radius. It follows that real time stalking to a precise location via location tags is impractical, even when both tags are concurrently deployed which achieves comparable accuracy in half the time. Nevertheless, half of a victim's exact movements can be backtracked accurately (10m error) with just a one-hour delay, which is still perilous information in the possession of a stalker.

Tracking, Profiling, and Ad Targeting in the Alexa Echo Smart Speaker Ecosystem

Smart speakers collect voice commands, which can be used to infer sensitive information about users. Given the potential for privacy harms, there is a need for greater transparency and control over the data collected, used, and shared by smart speaker platforms as well as third party skills supported on them. To bridge this gap, we build a framework to measure data collection, usage, and sharing by the smart speaker platforms. We apply our framework to the Amazon smart speaker ecosystem. Our results show that Amazon and third parties, including advertising and tracking services that are unique to the smart speaker ecosystem, collect smart speaker interaction data. We also find that Amazon processes smart speaker interaction data to infer user interests and uses those inferences to serve targeted ads to users. Smart speaker interaction also leads to ad targeting and as much as 30X higher bids in ad auctions, from third party advertisers. Finally, we find that Amazon's and third party skills' data practices are often not clearly disclosed in their policy documents.

Pushing Alias Resolution to the Limit

In this paper, we show that utilizing multiple protocols offers a unique opportunity to improve IP alias resolution and dual-stack inference substantially. Our key observation is that prevalent protocols, e.g., SSH and BGP, reply to unsolicited requests with a set of values that can be combined to form a unique device identifier. More importantly, this is possible by just completing the TCP hand-shake. Our empirical study shows that utilizing readily available scans and our active measurements can double the discovered IPv4 alias sets and more than 30× the dual-stack sets compared to the state-of-the-art techniques. We provide insights into our method's accuracy and performance compared to popular techniques.

SESSION: Latency

Localizing Traffic Differentiation

Network neutrality is important for users, content providers, policymakers, and regulators interested in understanding how network providers differentiate performance. When determining whether a network differentiates against certain traffic, it is important to have strong evidence, especially given that traffic differentiation is illegal in certain countries. In prior work, WeHe detects differentiation via end-to-end throughput measurements between a client and server but does not isolate the network responsible for it. Differentiation can occur anywhere on the network path between endpoints; thus, further evidence is needed to attribute differentiation to a specific network. We present a system, WeHeY, built atop WeHe, that can localize traffic differentiation, i.e., obtain concrete evidence that the differentiation happened within the client's ISP. Our system builds on ideas from network performance tomography; the challenge we solve is that TCP congestion control creates an adversarial environment for performance tomography (because it can significantly reduce the performance correlation on which tomography fundamentally relies). We evaluate our system via measurements "in the wild,'' as well as in emulated scenarios with a wide-area testbed; we further explore its limits via simulations and show that it accurately localizes traffic differentiation across a wide range of network conditions. WeHeY's source code is publicly available at

Using Gaming Footage as a Source of Internet Latency Information

Keeping track of Internet latency is a classic measurement problem. Open measurement platforms like RIPE Atlas are a great solution, but they also face challenges: preventing network overload that may result from uncontrolled active measurements, and maintaining the involved devices, which are typically contributed by volunteers and non-profit organizations, and tend to lag behind the state of the art in terms of features and performance. We explore gaming footage as a new source of real-time, publicly available, passive latency measurements, which have the potential to complement open measurement platforms. We show that it is feasible to mine this source of information by presenting Tero, a system that continuously downloads gaming footage from the Twitch streaming platform, extracts latency measurements from it, and converts them to latency distributions per geographical location. Our data-sets and source code are publicly available at

Inferring Changes in Daily Human Activity from Internet Response

Network traffic is often diurnal, with some networks peaking during the workday and many homes during evening streaming hours. Monitoring systems consider diurnal trends for capacity planning and anomaly detection. In this paper, we reverse this inference and use diurnal network trends and their absence to infer human activity. We draw on existing and new ICMP echo-request scans of more than 5.2M /24 IPv4 networks to identify diurnal trends in IP address responsiveness. Some of these networks are change-sensitive, with diurnal patterns correlating with human activity. We develop algorithms to clean this data, extract underlying trends from diurnal and weekly fluctuation, and detect changes in that activity. Although firewalls hide many networks, and Network Address Translation often hides human trends, we show about 168k to 330k (3.3-6.4% of the 5.2M) /24 IPv4 networks are change-sensitive. These blocks are spread globally, representing some of the most active 60% of 2 × 2° geographic gridcells, regions that include 98.5% of ping-responsive blocks. Finally, we detect interesting changes in human activity. Reusing existing data allows our new algorithm to identify changes, such as Work-from-Home due to the global reaction to the emergence of Covid-19 in 2020. We also see other changes in human activity, such as national holidays and government-mandated curfews. This ability to detect trends in human activity from the Internet data provides a new ability to understand our world, complementing other sources of public information such as news reports and wastewater virus observation.

SESSION: Cellular and mobile networks

Characterizing Mobile Service Demands at Indoor Cellular Networks

Indoor cellular networks (ICNs) are anticipated to become a principal component of 5G and beyond systems. ICNs aim at extending network coverage and enhancing users' quality of service and experience, consequently producing a substantial volume of traffic in the coming years. Despite the increasing importance that ICNs will have in cellular deployments, there is nowadays little understanding of the type of traffic demands that they serve. Our work contributes to closing that gap, by providing a first characterization of the usage of mobile services across more than 4, 500 cellular antennas deployed at over 1,000 indoor locations in a whole country. Our analysis reveals that ICNs inherently manifest a limited set of mobile application utilization profiles, which are not present in conventional outdoor macro base stations (BSs). We interpret the indoor traffic profiles via explainable machine learning techniques, and show how they are correlated to the indoor environment. Our findings show how indoor cellular demands are strongly dependent on the nature of the deployment location, which allows anticipating the type of demands that indoor 5G networks will have to serve and paves the way for their efficient planning and dimensioning.

Modeling and Generating Control-Plane Traffic for Cellular Networks

With 5G deployment gaining momentum, the control-plane traffic volume of cellular networks is escalating. Such rapid traffic growth motivates the need to study the mobile core network (MCN) control-plane design and performance optimization. Doing so requires realistic, large control-plane traffic traces in order to profile and debug the mobile network performance under real workload. However, large-scale control-plane traffic traces are not made available to the public by mobile operators due to business and privacy concerns. As such, it is critically important to develop accurate, scalable, versatile, and open-to-innovation control traffic generators, which in turn critically rely on an accurate traffic model for the control plane. Developing an accurate model of control-plane traffic faces several challenges: (1) how to capture the dependence among the control events generated by each User Equipment (UE), (2) how to model the inter-arrival time and sojourn time of control events of individual UEs, and (3) how to capture the diversity of control-plane traffic across UEs. We present a novel two-level hierarchical state-machine-based control-plane traffic model. We further show how our model can be easily adjusted from LTE to NextG networks (e.g., 5G) to support modeling future control-plane traffic. We experimentally validate that the proposed model can generate large realistic control-plane traffic traces. We have open-sourced our traffic generator to the public to foster MCN research.

Performance of Cellular Networks on the Wheels

After 4 years of rapid deployment in the US, 5G is expected to have significantly improved the performance and overall user experience of mobile networks. However, recent measurement studies have focused either on static performance or a single aspect (e.g., handovers) under driving conditions of 5G, and do not provide a complete picture of cellular network performance today under driving conditions - a major use case of mobile networks. Through a cross-continental US driving trip (from LA to Boston, 5700km+), we conduct an in-depth measurement study of user-perceived experience (network coverage/performance and QoE of a set of major latency-critical 5G "killer'' apps) To understand the root cause of the observed network performance, while collecting low-level 5G statistics and signaling messages. Our study shows disappointingly low coverage of 5G networks today under driving and highly fragmented coverage by cellular technologies. More importantly, network and application performance are often poor under driving even in areas with full 5G coverage. We also examine the correlation of technology-wise coverage and performance with geo-location and the vehicle's speed and analyze the impact of a number of lower layer KPIs on network performance.

Characterizing and Modeling Session-Level Mobile Traffic Demands from Large-Scale Measurements

We analyze 4G and 5G transport-layer sessions generated by a wide range of mobile services at over 282,000 base stations (BSs) of an operational mobile network, and carry out a statistical characterization of their demand rates, associated traffic volume and temporal duration. Based on the gained insights, we model the arrival process of sessions at heterogeneously loaded BSs, the distribution of the session-level load and its relationship with the session duration, using simple yet effective mathematical approaches. Our models are fine-tuned to a variety of services, and complement existing tools that mimic packet-level statistics or aggregated spatiotemporal traffic demands at mobile network BSs. They thus offer an original angle to mobile traffic data generation, and support a more credible performance evaluation of solutions for network planning and management. We assess the utility of the models in practical application use cases, demonstrating how they enable a more trustworthy evaluation of solutions for the orchestration of sliced and virtualized networks.

SESSION: Poster Session

Poster: SmartX BGP BVT: A First Real-Time BGP Blackholing Visibility Tool

BGP Blackholing is an effective mitigation solution for networks to counter the frequent Distributed Denial of Service (DDoS) attacks. It enables to drop all network traffic that is directed towards a particular victim prefix under DDoS attack, ideally, as close to the source as possible. Despite its huge importance in the Internet, there is no tool available for the real-time visualization of BGP Blackholing activity. Visualization is one of the most powerful techniques for network operators to monitor network activity. From discovering successful network topology to expose anomalous behaviors in networks, easy-to-use visualizations are powerful weapons to capture important patterns on the Internet traffic[1, 5]. In this work, we propose a first real-time BGP Blackholing Visibility Tool (named as SmartX BGP-BVT) to detect and visualize community based BGP Blackholing on live BGP data. This tool will be helpful for network operators and researchers interested in BGP Blackholing service and DDoS mitigation in the Internet.

Analysis of BGP Blackholing on BGP datasets and its significant usage rate is highlighted in the literature [2, 3]. Community based BGP Blackholing enables with the help of BGP community attribute. So, for the monitoring of BGP Blackholing activity, first, we need to collect all BGP Blackhole communities from the Internet. Today, many networks including ISPs and IXPs offer BGP Blackholing service to their customers, and they publish their corresponding BGP Blackhole communities either on their official Web pages or in their Internet Routing Registry (IRR) records. By following Giotsas et al. [3] methodology, we make BGP Blackhole communities dictionary that is described in detail in our previous work [2].

To the best of our knowledge, SmartX BGP BVT3 is the first real-time BGP Blackholing Visibility Tool which visualizes community-based BGP Blackholing with the help of Blackhole Communities Dictionary. However, real-time BGP Blackholing visualizations are helpful for measuring the current adoption rate of this mitigation solution and also can serve as a proxy for identifying DDoS attacks.

Poster: The Impact of the Client Environment on Residential IP Proxies Detection

Residential IP Proxies (RESIPs) enable proxying out requests from a vast network of residential devices without inserting any information revealing it. While RESIPs can be used for legitimate purposes, previous studies also associate them with malicious activities. In our last work, we proposed a server-side detection method for RESIP connections based on the difference in the Round Trip Time at the TCP and TLS layers. In this new work, thanks to real-world connections, we investigate if and how specific factors in the client environment influence the technique. We show that genuine users utilizing web browsers or performing hotspots do not result in false positives for our technique. Moreover, our early results suggest that false positives caused by Mobile TCP Terminating Proxies used by mobile Internet Service Providers have a Round Trip Time difference higher than the detection threshold but much smaller than the average RESIP one. This suggests that we can reduce these false positives by highering the detection threshold for mobile connections.

Poster: Through the ccTLD Looking Glass: Mining CT Logs for Fun, Profit and Domain Names

Poster: E3PO - An Open Platform for 360° Video Streaming Simulation and Evaluation

The simulation and evaluation of diverse 360° video streaming systems present inherent challenges due to the varied design objectives, streaming strategies, and evaluation metrics. This poster introduces E3PO, a versatile and extensible simulation and evaluation platform for 360° video streaming systems. E3PO excels in simulating all proposed variations of 360° video streaming methods, while also generating the precise view that users experience. We have implemented E3PO and developed a variety of examples that simulate different streaming approaches. The promising results from our evaluation of these simulated scenarios demonstrate E3PO's substantial potential to assist researchers in testing new designs, fine-tuning parameters, and comparing performance with their counterparts.

Poster: Modified Dynamic Beta RED -- A New AQM Algorithm for Internet Congestion Control

In this work we present a performance study of modified Dynamic Beta RED (mDBetaRED), a new Active Queue Management (AQM) RED-type algorithm based on dynamical variants with the ability to adapt to the characteristics of mixed networks. As well, we rely on real campus traffic traces to build a traffic model on which we evaluate our proposal under the NS3 simulator. The parameters of our AQM algorithm are dynamically adjusted so that the queue length remains stable around a predetermined reference value and according to changing network traffic conditions. We present a performance comparison of mDBetaRED with other AQM algorithms and diverse flows of Internet. Of all the AQM algorithms compared, the mDBetaRED algorithm offers the best throughput while effectively controlling delay and stability.

Poster: Novel Client-Side Watermarking Technique for Tor User De-Anonymization

Most traditional techniques for Tor user de-anonymization rely on the use of server-side originating traffic watermarks. In this poster, we outline the key ideas behind our novel client-side originating watermark scheme and describe some possible ways of how this scheme could be implemented in practice. We also demonstrate the superior real-world performance of this approach vs. those previously discussed in the literature.

Poster: COPA -- Parsing Outputs of CLI Commands for Failure Diagnosis of Network Devices

Network operators in telecom carriers perform failure diagnosis by executing various commands (e.g., show interfaces) on network devices through a command line interface (CLI). The outputs of these CLI commands, which we call command outputs, contain more detailed information than typical one-line system logs but have complex characteristics; thus, existing parsers for system logs are not applicable. In this study, we propose COPA, a parsing method for command outputs for automating time-consuming and labor-intensive failure diagnosis.

Poster: Empirically Testing the PacketLab Model

PacketLab is a recently proposed model for accessing remote vantage points. The core design is for the vantage points to export low-level network operations that measurement researchers could rely on to construct more complex measurements. Motivating the model is the assumption that such an approach can overcome persistent challenges such as the operational cost and security concerns of vantage point sharing that researchers face in launching distributed active Internet measurement experiments. However, the limitations imposed by the core design merit a deeper analysis of the applicability of such model to real-world measurements of interest. We undertook this analysis based on a survey of recent Internet measurement studies, followed by an empirical comparison of PacketLab-based versus native implementations of common measurement methods. We showed that for several canonical measurement types common in past studies, PacketLab yielded similar results to native versions of the same measurements. Our results suggest that PacketLab could help reproduce or extend around 16.4% (28 out of 171) of all surveyed studies and accommodate a variety of measurements from latency, throughput, network path, to non-timing data.

Poster: Analysis of User Uniqueness on LinkedIn Based on Publicly Available Non-PII

The literature has shown combining a few non-Personal Identifiable Information (non-PII) is enough to make a user unique in a dataset including millions of users. In this work, we demonstrate that the combination of the location and 6 rare (14 random) skills in a LinkedIn profile is enough to become unique in a user base of ~800M users with a probability of 75%. The novelty is these attributes are publicly accessible to anyone registered on LinkedIn and could be activated through advertising campaigns.

Poster: Towards a Publicly Available Framework to Process Traceroutes with MetaTrace

The objective of this research is to contribute towards the development of an open-source framework for processing large-scale traceroute datasets. By providing such a framework, we aim to benefit the community by saving time in everyday traceroute analysis and enabling the design of new scalable reactive measurements [1], where prior traceroute measurements are leveraged to make informed decisions for future ones[8, 12].

It is important to clarify that our goal is not to surpass proprietary solutions like BigQuery, which are utilized by CDNs for processing billions of traceroutes [6, 10]. These proprietary solutions are not freely accessible to the public, whereas our focus is on creating an open and freely available framework for the wider community.

Our contributions include (1) sharing the ideas and thinking process behind building MetaTrace, which efficiently utilizes ClickHouse features for traceroute processing; and (2) providing an open-source implementation of MetaTrace.

We evaluated MetaTrace using two types of queries: predicate queries for filtering traceroutes based on conditions, and aggregate queries for computing metrics on traceroutes. Our results show that MetaTrace is significantly faster compared to alternative solutions. For predicate queries, it outperforms a multiprocessed Rust solution by a factor of 552 and is 3.4 times faster than ClickHouse without MetaTrace optimizations. For aggregate queries, MetaTrace processes 202 million traceroutes in 11 seconds, with its performance scaling linearly with traceroute volume. Notably, on a single server, MetaTrace can perform a predicate query on a 6-year dataset of 6 billion traceroutes in just 240 seconds.

Furthermore, MetaTrace is resource-efficient, making it accessible for research groups with limited resources to conduct Internet-scale traceroute studies.

Poster: QUIC is not Quick Enough over Fast Internet

QUIC is a multiplexed transport-layer protocol over UDP and comes with enforced encryption. It is expected to be a game-changer in improving web application performance. Together with the network layer and layers below, UDP, QUIC, and HTTP/3 form a new protocol stack for future network communication, whose current counterpart is TCP, TLS, and HTTP/2. In this study, to understand QUIC's performance over high-speed networks and its potential to replace the TCP stack, we carry out a series of experiments to compare the UDP+QUIC+HTTP/3 (QUIC) stack and the TCP+TLS+HTTP/2 (HTTP/2) stack. Preliminary measurements on file download reveal that QUIC suffers from a data rate reduction compared to HTTP/2 across different hosts.