IMC '25: Proceedings of the 2025 ACM Internet Measurement Conference

Full Citation in the ACM Digital Library

SESSION: Full Research Papers

Unraveling the Complexities of MTA-STS Deployment and Management in Securing Email

Email has been a cornerstone of online communication for decades, but its lack of built-in confidentiality has left it vulnerable to various attacks. To address this issue, two key protocols are being used: MTA-STS (Mail Transfer Agent Strict Transport Security) and DANE (DNS-based Authentication of Named Entities). While DANE was introduced first, MTA-STS has been actively adopted by major email providers like Google and Microsoft, as it does not require the complex DNSSEC chain that poses a significant challenge in deploying and managing DANE. However, despite its significance, there has been limited research on how MTA-STS is deployed and managed in practice. In this study, we present a thorough, longitudinal investigation of the MTA-STS ecosystem. We base our analysis on a dataset capturing over 87 million domains from DNS scans collected across four TLDs over 31 months, along with 10 months of additional component scanning such as TLS certificates, thereby offering a broad perspective on MTA-STS adoption and its management. Our analysis uncovers a concerning trend of misconfigurations and inconsistencies in MTA-STS setups. In our most recent snapshot, out of ~68K domains with MTA-STS record, 29.6% of domains were incorrectly configured, while 3.2% of these should encounter email delivery failure from MTA-STS supporting senders. To gain insights into the challenges faced by email administrators, we surveyed 117 operators. While awareness of MTA-STS was high (94.7%), many cited operational complexity (48.8%) and a preference for DANE (45.4%) as reasons for not deploying the protocol. Our study not only highlights the growing importance of MTA-STS but also reveals the significant challenges in its deployment and management.

5G Metamorphosis: A Longitudinal Study of 5G Performance from the Beginning

The cellular network has undergone rapid progress since its inception in 1980s. While rapid iteration of newer generations of cellular technology plays a key role in this evolution, the incremental and eventually wide deployment of every new technology generation also plays a vital role in delivering the promised performance improvement. In this work, we conduct the first metamorphosis study of a cellular network generation, 5G, by measuring the user-experienced 5G performance from 5G network's birth (initial deployment) to maturity (steady state). By analyzing a 4-year 5G performance trace of 2.65M+ Ookla® ~Speedtest Intelligence® ~measurements collected in 9 cities in the United States and Europe from January 2020 to December 2023, we unveil the detailed evolution of 5G coverage, throughput, and latency at the quarterly granularity, compare the performance diversity across the 9 representative cities, and gain insights into compounding factors that affect user-experienced 5G performance, such as adoption of 5G devices and the load on the 5G network. Our study uncovers the typical life-cycle of a new cellular technology generation as it undergoes its ''growing pain'' towards delivering its promised QoE over the previous technology generation.

Exploration of the Dynamics of Buy and Sale of Social Media Accounts

There has been a rise in online platforms facilitating the buying and selling of social media accounts. While the trade of social media profiles is not inherently illegal, these platforms view such transactions as violations of their policies. They often take action against accounts involved in the misuse of platforms for financial gain. In this, we conduct a comprehensive analysis of marketplaces that enable the buying and selling of social media accounts.

We investigate the economic scale of account trading across five major platforms: X, Instagram, Facebook, TikTok, and YouTube. From February to June 2024, we identified 38,253 accounts advertising account sales across 11 online marketplaces, covering 211 distinct categories. The total value of marketed social media accounts exceeded 64 million, with a median price of 157 per account. Additionally, we analyzed the profiles of 11,457 visible advertised accounts, collecting their metadata and over 200,000 profile posts. By examining their engagement patterns and account creation methods, we evaluated the fraudulent activities commonly associated with these sold accounts. Our research reveals these marketplaces foster fraudulent activities such as bot farming, harvesting accounts for future fraud, and fraudulent engagement. Such practices pose significant risks to social media users, who are often targeted by fraudulent accounts employing social engineering tactics. We highlight social media platform weaknesses in the ability to detect and mitigate such fraudulent accounts, thereby endangering users. Alongside this, we conducted thorough disclosures with the respective platforms and proposed actionable recommendations, including indicators to identify and track these accounts. These measures aim to enhance proactive detection and safeguard users from potential threats.

Fantastic Joules and Where to Find Them. Modeling and Optimizing Router Energy Demand

Reducing our society's energy demand is critical to address the sustainability challenge. While the Internet currently accounts for 1–-1.5% of global electricity consumption and continues to grow, the energy demands of one of its core components---routers---remain poorly understood. The available power data is limited and not fine-grained enough, offering little actionable insight into strategies for effectively reducing the Internet's energy consumption.

To address this, we assemble and present a unique dataset including datasheet information, router-internal measurements, external power measurements, and router power models. This dataset depicts a clearer picture of routers' energy demand and provides insights on how to reduce it. Our initial analysis of the dataset suggests, e.g., that (i) datasheets are not useful predictors, sometimes even incorrect; (ii) internal router power measurements have limited accuracy; (iii) using more efficient and better-sized power supply units is a promising energy-saving vector; (iv) turning links off is less efficient than anticipated in the literature. This work also highlights the limitations of today's power monitoring practices and provides suggestions for improvement.

Dive into the Cloud: Unveiling the (Ab)Usage of Serverless Cloud Function in the Wild

Serverless cloud functions transfer server management responsibilities to service providers, offering scalability and cost-efficiency. This convenience not only facilitates normal activities but also raises abuse concerns. So far, public understanding of real-world cloud functions remains limited. To fill this gap, we conducted an in-depth measurement study to uncover their practical usage and abuse. Through empirical analysis of nine leading providers (e.g., AWS, Tencent), we identified 531,089 function domains from a passive DNS dataset spanning April 2022 to March 2024. We first investigated the usage status of serverless cloud functions, showing the different practices between providers. Additionally, based on active requests to these functions, we pointed out privacy risks of unauthorized access and identified four abuse types, including covert C2 communication, hosting malicious websites, promoting illicit services, and abusing egress nodes as IP proxies. Alarmingly, 4.89% of cloud functions are being abused, with over 614k invocations recorded. Only four abused functions were flagged by existing threat intelligence systems, indicating critical gaps in security monitoring for serverless environments. Our work offers insights into the serverless cloud ecosystem and provides recommendations for better management. With responsible disclosure, we hope to raise awareness and improve protective measures against abuses among cloud function providers.

Somesite I Used To Crawl: Awareness, Agency and Efficacy in Protecting Content Creators From AI Crawlers

The success of generative AI relies heavily on training on data scraped through extensive crawling of the Internet, a practice that has raised significant copyright, privacy, and ethical concerns. While few measures are designed to resist a resource-rich adversary determined to scrape a site, crawlers can be impacted by a range of existing tools such as robots.txt, NoAI meta tags, and active crawler blocking by reverse proxies. In this work, we seek to understand the ability and efficacy of today's networking tools to protect content creators against AI-related crawling. For targeted populations like human artists, do they have the technical knowledge and agency to utilize crawler blocking tools such as robots.txt, and can such tools be effective? Using large scale measurements and a targeted user study of 203 professional artists, we find strong demand for tools like robots.txt, but significantly constrained by critical hurdles in technical awareness, agency in deploying them, and limited efficacy against unresponsive crawlers. We further test and evaluate network level crawler blockers provided by reverse proxies. Despite relatively limited deployment today, they offer stronger protections against AI crawlers, but still come with their own set of limitations.

Sibling Prefixes: Identifying Similarities in IPv4 and IPv6 Prefixes

Since the standardization of IPv6 in 1998, both versions of the Internet Protocol have coexisted in the Internet. Clients usually run algorithms such as Happy Eyeballs, to decide whether to connect to an IPv4 or IPv6 endpoint for dual-stack domains. To identify whether two addresses belong to the same device or service, researchers have proposed different forms of alias resolution techniques. Similarly, one can also form siblings of IPv4 and IPv6 addresses belonging to the same device. Traditionally, all of these approaches have focused on individual IP addresses. In this work, we propose the concept of ''sibling prefixes'', where we extend the definition of an IPv4-IPv6 sibling to two IP prefixes---one IPv4 prefix and its sibling IPv6 prefix. We present a technique based on large-scale DNS resolution data to identify 76k IPv4-IPv6 sibling prefixes. We find sibling prefixes to be relatively stable over time. We present SP-Tuner algorithm to tune the CIDR size of sibling prefixes and improve the perfect match siblings from 52% to 82%. For more than half of sibling prefixes, the organization names for their IPv4 and IPv6 origin ASes are identical, and 60% of all sibling prefixes have at least one of the prefixes with a valid ROV status in RPKI. Furthermore, we identify sibling prefixes in 24 hypergiant and CDN networks. Finally, we plan to regularly publish a list of sibling prefixes to be used by network operators and fellow researchers in dual-stack studies.

Learning AS-to-Organization Mappings with Borges

We introduce Borges (Better ORGanizations Entities mappingS), a novel framework for improving AS-to-Organization mappings using Large Language Models (LLMs). Existing approaches, such as AS2Org and its extensions, rely on static WHOIS data and rule-based extraction from PeeringDB records, limiting their ability to capture complex, dynamic organizational structures. Borges overcomes these limitations by combining traditional sources with few-shot LLM prompting to extract sibling relationships from free-text fields in PeeringDB, and by introducing website-based inference using redirect chains, domain similarity, and favicon analysis. Our evaluation shows that Borges outperforms prior methods, achieving a 7% improvement in sibling ASN identification and an Organization Factor score of 0.3576. It also expands the recognized user base of large Internet conglomerates by 192 million users (≈ 5% of the global Internet population) and improves geographic footprint estimates across multiple regions.

FP-Inconsistent: Measurement and Analysis of Fingerprint Inconsistencies in Evasive Bot Traffic

Browser fingerprinting is used for bot detection. In response, bots have started altering their fingerprints to evade detection. We conduct the first large-scale evaluation to study whether and how altering fingerprints helps bots evade detection. To systematically investigate such evasive bots, we deploy a honey site that includes two anti-bot services (DataDome and BotD) and solicit bot traffic from 20 different bot services that purport to sell ''realistic and undetectable traffic.'' Across half a million requests recorded on our honey site, we find an average evasion rate of 52.93% against DataDome and 44.56% evasion rate against BotD. Our analysis of fingerprint attributes of evasive bots shows that they indeed alter their fingerprints. Moreover, we find that the attributes of these altered fingerprints are often inconsistent with each other. We propose FP-Inconsistent, a data-driven approach to detect such inconsistencies across space (two attributes in a given browser fingerprint) and time (a single attribute at two different points in time). Our evaluation shows that our approach can reduce the evasion rate of evasive bots by 44.95%-48.11% while maintaining a true negative rate of 96.84% on traffic from real users.

An In-Depth Investigation of Data Collection in LLM App Ecosystems

LLM app (tool) ecosystems are rapidly evolving to support sophisticated use cases that often require extensive user data collection. Given that LLM apps are developed by third parties and anecdotal evidence indicating inconsistent enforcement of policies by LLM platforms, sharing user data with these apps presents significant privacy risks. In this paper, we aim to bring transparency in data practices of LLM app ecosystems. We examine OpenAI's GPT app ecosystem as a case study. We propose an LLM-based framework to analyze the natural language specifications of GPT Actions (custom tools) and assess their data collection practices. Our analysis reveals that Actions collect excessive data across 24 categories and 145 data types, with third-party Actions collecting 6.03% more data on average. We find that several Actions violate OpenAI's policies by collecting sensitive information, such as passwords, which is explicitly prohibited by OpenAI. Lastly, we develop an LLM-based privacy policy analysis framework to automatically check the consistency of data collection by Actions with disclosures in their privacy policies. Our measurements indicate that the disclosures for most of the collected data types are omitted, with only 5.8% of Actions clearly disclosing their data collection practices.

Chaos in the Chain: Evaluate Deployment and Construction Compliance of Web PKI Certificate Chain

Transport Layer Security (TLS) is a cornerstone to secure Internet communications. It requires proper deployment and validation of certificate chains. During validation, clients must first construct the chain from server-provided certificates. However, existing research often integrates chain construction into the broader validation process, lacking independent analysis of this crucial step. This paper presents the first systematic assessment of certificate chain construction, covering server-side deployment compliance and client-side capabilities. On the server side, we summarized structural requirements from RFC standards and evaluated real-world website compliance. We found that approximately 3% of Tranco Top 1M domains have deployed non-compliant chains, with common issues including reversed sequences and incomplete chains. The compliance would be influenced by HTTP server and Certificate Authority checks and guidance during the configuration process. On the client side, we evaluated 9 types of chain-building capabilities across 8 mainstream TLS implementations, uncovering prevalent deficiencies like inadequate backtracking and difficulties with long chains. These deficiencies could compromise TLS security, causing a fallback to insecure HTTP or making the service unavailable. Our findings highlight critical gaps in current certificate chain practices. Based on our findings, we also propose recommendations for improving the deployment and construction of certificate chains.

SESSION: Short Papers

RemapRoute: Local Remapping of Internet Path Changes

Several systems rely on traceroute to track a large number of Internet paths as they change over time. Monitoring systems perform this task by remapping paths periodically or whenever a change is detected. This paper shows that such complete remapping is inefficient, because most path changes are localized to a few hops of a path. We develop RemapRoute, a tool to remap a path locally given the previously known path and a change point. RemapRoute sends targeted probes to locate and remap the often few hops that have changed. Our evaluation with trace-driven simulations and in a real deployment shows that local remapping reduces the average number of probes issued during remapping by 63% and 79%, respectively, when compared with complete remapping. At the same time, our results show that local remapping has little impact on the accuracy of inferred paths.

Do Spammers Dream of Electric Sheep? Characterizing the Prevalence of LLM-Generated Malicious Emails

The rapid adoption of large language models (LLMs) has fueled speculation that cybercriminals may utilize LLMs to improve and automate their attacks. However, so far, the security community has had only anecdotal evidence of attackers using LLMs, lacking large-scale data on the extent of real-world malicious LLM usage.

In this joint work between academic researchers and Barracuda Networks, we present the first large-scale study measuring AI-generated attacks in-the-wild. In particular, we focus on the use of LLMs by attackers to craft the text of malicious emails by analyzing a corpus of hundreds of thousands of real-world malicious emails detected by Barracuda. The key challenge in this analysis is determining ground truth: we cannot know for certain whether an email is LLM or human-generated. To overcome this challenge, we observe that, prior to the launch of ChatGPT, email text was almost certainly not LLM-generated. Armed with this insight, we run three state-of-the-art LLM detection methods on our corpus and calibrate them against pre-ChatGPT emails, as well as against a diverse set of LLM-generated emails we create ourselves.

Since the launch of ChatGPT, all three detection methods indicate that attackers have steadily increased their use of LLMs to generate emails, especially for spam. Using our most precise AI-detection method, we conservatively estimate that at least ~51% of spam emails and ~14% of business email compromise attacks in our dataset are generated using LLMs, as of April 2025. Finally, analyzing the text of LLM-generated emails, we find evidence that attackers use LLMs to ''polish'' their emails and to generate multiple versions of the same email message.

InternetSim: A Fast and Memory-Efficient Internet-Scale Inter-Domain Routing Simulator

Existing inter-domain routing simulators often suffer from low simulation accuracy, slow processing speeds, and high memory consumption, hindering their ability to perform large-scale Internet simulations. In this paper, we present InternetSim, a multi-threaded simulator for Internet-scale inter-domain routing. InternetSim enables incremental computation to deal with policy changes by some ASes, thus significantly accelerating multi-iteration context-continuous routing simulations. The memory efficiency is also improved by designing compact data structures for routing tables and out-of-memory errors are prevented by offloading the data to disk when necessary. Our simulator can complete an iteration of Internet-scale simulation within 13 hours (21× speedup compared to C-BGP) and can complete an ''incremental computation'' iteration within 15 seconds. The memory requirement is only 45% of C-BGP (without offloading) and the simulator can work well for Internet-scale simulations on a server with only 64GB if offloading is always enabled.

Lazy Eye Inspection: Capturing the State of Happy Eyeballs Implementations

While transitioning to an IPv6-only communication, many devices settled on a dual-stack setup. IPv4 and IPv6 are available to these hosts for new connections. Happy Eyeballs (HE) describes a mechanism to prefer IPv6 for such hosts while ensuring a fast fallback to IPv4 when IPv6 fails. The IETF is currently working on the third version of HE. While the standards include recommendations for HE parameter choices, it is up to the client and OS to implement HE. In this paper, we investigate the state of HE in various clients, particularly web browsers and recursive resolvers. We introduce a framework to analyze and measure clients' HE implementations and parameter choices. According to our evaluation, only Safari supports all HE features. Safari is also the only client implementation in our study that uses a dynamic IPv4 connection attempt delay, a resolution delay, and interlaces addresses. We further show that problems with the DNS A record lookup can even delay and interrupt the network connectivity despite a fully functional IPv6 setup with Chrome and Firefox. We operate a publicly available website ( www.happy-eyeballs.net ) which measures the browser's HE behavior, and we publish our testbed measurement framework.