# SIGMETRICS/PERFORMANCE '22: Abstract Proceedings of the 2022 ACM SIGMETRICS/IFIP PERFORMANCE Joint International Conference on Measurement and Modeling of Computer Systems

Full Citation in the ACM Digital Library

## SESSION: Session: Networking

### Curvature-based Analysis of Network Connectivity in Private Backbone Infrastructures

• Loqman Salamatian
• Scott Anderson
• Joshua Matthews
• Paul Barford
• Walter Willinger
• Mark Crovella

The main premise of this work is that since large cloud providers can and do manipulate probe packets that traverse their privately owned and operated backbones, standard traceroute-based measurement techniques are no longer a reliable means for assessing network connectivity in large cloud provider infrastructures. In response to these developments, we present a new empirical approach for elucidating private connectivity in today's Internet. Our approach relies on using only "light-weight" (i.e., simple, easily-interpretable, and readily available) measurements, but requires applying a "heavy-weight" or advanced mathematical analysis. In particular, we describe a new method for assessing the characteristics of network path connectivity that is based on concepts from Riemannian geometry (i.e., Ricci curvature) and also relies on an array of carefully crafted visualizations (e.g., a novel manifold view of a network's delay space). We demonstrate our method by utilizing latency measurements from RIPE Atlas anchors and virtual machines running in data centers of three large cloud providers to (i) study different aspects of connectivity in their private backbones and (ii) show how our manifold-based view enables us to expose and visualize critical aspects of this connectivity over different geographic scales.

### Automatic Inference of BGP Location Communities

• Brivaldo A. da Silva Jr.
• Paulo Mol
• Osvaldo Fonseca
• Ítalo Cunha
• Ronaldo A. Ferreira
• Ethan Katz-Bassett

We present a set of techniques to infer the semantics of BGP communities from public BGP data. Our techniques infer communities related to the entities or locations traversed by a route by correlating communities with AS paths. We also propose a set of heuristics to filter incorrect inferences introduced by misbehaving networks, sharing of BGP communities among sibling autonomous systems, and inconsistent BGP dumps. We apply our techniques to billions of routing records from public BGP collectors and make available a public database with more than 15 thousand location communities. Our comparison with manually-built databases shows our techniques provide high precision (up to 93%), better coverage (up to 81% recall), and dynamic updates, complementing operators' and researchers' abilities to reason about BGP community semantics.

### Understanding I/O Direct Cache Access Performance for End Host Networking

• Minhu Wang
• Mingwei Xu
• Jianping Wu

Direct Cache Access (DCA) enables a network interface card (NIC) to load and store data directly on the processor cache, as conventional Direct Memory Access (DMA) is no longer suitable as the bridge between NIC and CPU in the era of 100 Gigabit Ethernet. As numerous I/O devices and cores compete for scarce cache resources, making the most of DCA for networking applications with varied objectives and constraints is a challenge, especially given the increasing complexity of modern cache hardware and I/O stacks. In this paper, we reverse engineer details of one commercial implementation of DCA, Intel's Data Direct I/O (DDIO), to explicate the importance of hardware-level investigation into DCA. Based on the learned knowledge of DCA and network I/O stacks, we (1) develop an analytical framework to predict the effectiveness of DCA (i.e., its hit rate) under certain hardware specifications, system configurations, and application properties; (2) measure penalties of the ineffective use of DCA (i.e., its miss penalty) to characterize its benefits; and (3) show that our reverse engineering, measurement, and model contribute to a deeper understanding of DCA, which in turn helps diagnose, optimize, and design end-host networking.

### Traffic Refinery: Cost-Aware Data Representation for Machine Learning on Network Traffic

• Francesco Bronzino
• Paul Schmitt
• Sara Ayoubi
• Hyojoon Kim
• Renata Teixeira
• Nick Feamster

Network management often relies on machine learning to make predictions about performance and security from network traffic. Often, the representation of the traffic is as important as the choice of the model. The features that the model relies on, and the representation of those features, ultimately determine model accuracy, as well as where and whether the model can be deployed in practice. Thus, the design and evaluation of these models ultimately requires understanding not only model accuracy but also the systems costs associated with deploying the model in an operational network. Towards this goal, this paper develops a new framework and system that enables a joint evaluation of both the conventional notions of machine learning performance (model accuracy) and the systems-level costs of different representations of network traffic. We highlight these two dimensions for two practical network management tasks, video streaming quality inference and malware detection, to demonstrate the importance of exploring different representations to find the appropriate operating point. We demonstrate the benefit of exploring a range of representations of network traffic and present Traffic Refinery, a proof-of-concept implementation that both monitors network traffic at 10~Gbps and transforms traffic in real time to produce a variety of feature representations for machine learning. Traffic Refinery both highlights this design space and makes it possible to explore different representations for learning, balancing systems costs related to feature extraction and model training against model accuracy.

## SESSION: Session: Streaming, Gaming, and the Decentralized Web

### Xatu: Richer Neural Network Based Prediction for Video Streaming

• Yun Seong Nam
• Jianfei Gao
• Chandan Bothra
• Ehab Ghabashneh
• Sanjay Rao
• Bruno Ribeiro
• Jibin Zhan
• Hui Zhang

The performance of Adaptive Bitrate (ABR) algorithms for video streaming depends on accurately predicting the download time of video chunks. Existing prediction approaches (i) assume chunk download times are dominated by network throughput; and (ii) apriori cluster sessions (e.g., based on ISP and CDN) and only learn from sessions in the same cluster. We make three contributions. First, through analysis of data from real-world video streaming sessions, we show (i) apriori clustering prevents learning from related clusters; and (ii) factors such as the Time to First Byte (TTFB) are key components of chunk download times but not easily incorporated into existing prediction approaches. Second, we propose Xatu, a new prediction approach that jointly learns a neural network sequence model with an interpretable automatic session clustering method. Xatu learns clustering rules across all sessions it deems relevant, and models sequences with multiple chunk-dependent features (e.g., TTFB) rather than just throughput. Third, evaluations using the above datasets and emulation experiments show that Xatu significantly improves prediction accuracies by 23.8% relative to CS2P (a state-of-the-art predictor). We show Xatu provides substantial performance benefits when integrated with multiple ABR algorithms including MPC (a well studied ABR algorithm), and FuguABR (a recent algorithm using stochastic control) relative to their default predictors (CS2P and a fully connected neural network respectively). Further, Xatu combined with MPC outperforms Pensieve, an ABR based on deep reinforcement learning.

### End-to-end Characterization of Game Streaming Applications on Mobile Platforms

• Sandeepa Bhuyan
• Shulin Zhao
• Ziyu Ying
• Mahmut T. Kandemir
• Chita R. Das

With the advent of 5G, hosting high-quality game streaming applications on mobile devices has become a reality. To our knowledge, no prior study systematically investigates the < QoS, Energy > tuple on the end-to-end game streaming pipeline across the cloud, network, and edge devices to understand the individual contributions of the different pipeline stages. In this paper, we present a comprehensive performance and power analysis of the entire game streaming pipeline through extensive measurements with a high-end workstation mimicking the cloud end, an open-source platform (Moonlight-GameStreaming) emulating the edge device/mobile platform, and two network settings (WiFi and 5G). Our study shows that the rendering stage and the encoding stage at the cloud end are the bottlenecks for 4K streaming. While 5G is certainly more suitable for supporting enhanced video quality with 4K streaming, it is more expensive in terms of power consumption compared to WiFi. Further, the network interface and the decoder units in mobile devices need more energy-efficient design to support high quality games at a lower cost. These observations should help in designing more cost-effective future cloud gaming platforms.

### Dissecting Cloud Gaming Performance with DECAF

• Hassan Iqbal
• Ayesha Khalid

Cloud gaming platforms have witnessed tremendous growth over the past two years, with a number of large Internet companies including Amazon, Facebook, Google, Microsoft, and Nvidia publicly launching their own platforms. However, there is an absence of systematic performance measurement methodologies which can generally be applied. In this paper, we implement DECAF, a methodology to systematically analyze and dissect the performance of cloud gaming platforms across different game genres and game platforms. By applying DECAF, we measure the performance of Google Stadia, Amazon Luna, and Nvidia GeForceNow, and uncover a number of important findings such as processing delays in the cloud comprise majority of the total round trip delay (≈73.54%), the video streams delivered by these platforms are characterized by high variability of bitrate, frame rate, and resolution. Our work has important implications for cloud gaming platforms and opens the door for further research on measurement methodologies for cloud gaming.

### Toxicity in the Decentralized Web and the Potential for Model Sharing

• Haris Bin Zia
• Aravindh Raman
• Ignacio Castro
• Ishaku Hassan Anaobi
• Emiliano De Cristofaro
• Nishanth Sastry
• Gareth Tyson

The "Decentralised Web" (DW) is an evolving concept, which encompasses technologies aimed at providing greater transparency and openness on the web. The DW relies on independent servers (aka instances) that mesh together in a peer-to-peer fashion to deliver a range of services (e.g. micro-blogs, image sharing, video streaming). However, toxic content moderation in this decentralised context is challenging. This is because there is no central entity that can define toxicity, nor a large central pool of data that can be used to build universal classifiers. It is therefore unsurprising that there have been several high-profile cases of the DW being misused to coordinate and disseminate harmful material. Using a dataset of 9.9M posts from 117K users on Pleroma (a popular DW microblogging service), we quantify the presence of toxic content. We find that toxic content is prevalent and spreads rapidly between instances. We show that automating per-instance content moderation is challenging due to the lack of sufficient training data available and the effort required in labelling. We therefore propose and evaluate ModPair, a model sharing system that effectively detects toxic content, gaining an average per-instance macro-F1 score 0.89.

## SESSION: Session: Measurements and Security

### Understanding the Practices of Global Censorship through Accurate, End-to-End Measurements

• Lin Jin
• Shuai Hao
• Haining Wang
• Chase Cotton

It is challenging to conduct a large scale Internet censorship measurement, as it involves triggering censors through artificial requests and identifying abnormalities from corresponding responses. Due to the lack of ground truth on the expected responses from legitimate services, previous studies typically require a heavy, unscalable manual inspection to identify false positives while still leaving false negatives undetected. In this paper, we propose Disguiser, a novel framework that enables end-to-end measurement to accurately detect the censorship activities and reveal the censor deployment without manual efforts. The core of Disguiser is a control server that replies with a static payload to provide the ground truth of server responses. As such, we send requests from various types of vantage points across the world to our control server, and the censorship activities can be recognized if a vantage point receives a different response. In particular, we design and conduct a cache test to pre-exclude the vantage points that could be interfered by cache proxies along the network path. Then we perform application traceroute towards our control server to explore censors' behaviors and their deployment. With Disguiser, we conduct 58 million measurements from vantage points in 177 countries. We observe 292 thousand censorship activities that block DNS, HTTP, or HTTPS requests inside 122 countries, achieving a 106 false positive rate and zero false negative rate. Furthermore, Disguiser reveals the censor deployment in 13 countries.

### Monetizing Spare Bandwidth: The Case of Distributed VPNs

• Yunming Xiao
• Matteo Varvello
• Aleksandar Kuzmanovic

### MalRadar: Demystifying Android Malware in the New Era

• Liu Wang
• Haoyu Wang
• Ren He
• Ran Tao
• Guozhu Meng
• Xiapu Luo
• Xuanzhe Liu

A reliable and up-to-date malware dataset is critical to evaluate the effectiveness of malware detection approaches. Although there are several widely-used malware benchmarks in our community (e.g., MalGenome, Drebin, Piggybacking and AMD, etc.), these benchmarks face several limitations including out-of-date, size, coverage, and reliability issues, etc. In this paper, we first make effort to create MalRadar, a growing and up-to-date Android malware dataset using the most reliable way, i.e., by collecting malware based on the analysis reports of security experts. We have crawled all the mobile security related reports released by ten leading security companies, and used an automated approach to extract and label the useful ones describing new Android malware and containing Indicators of Compromise (IoC) information. We have successfully compiled MalRadar, a dataset that contains 4,534 unique Android malware samples (including both apks and metadata) released from 2014 to April 2021 by the time of this paper, all of which were manually verified by security experts with detailed behavior analysis. Then we characterize the MalRadar dataset from malware distribution channels, app installation methods, malware activation, malicious behaviors and anti-analysis techniques. We further investigate the malware evolution over the last decade. At last, we measure the effectiveness of commercial anti-virus engines and malware detection techniques on detecting malware in MalRadar. Our dataset can be served as the representative Android malware benchmark in the new era, and our observations can positively contribute to the community and boost a series of studies on mobile security.

### Trade or Trick?: Detecting and Characterizing Scam Tokens on Uniswap Decentralized Exchange

• Pengcheng Xia
• Haoyu Wang
• Bingyu Gao
• Weihang Su
• Zhou Yu
• Xiapu Luo
• Chao Zhang
• Xusheng Xiao
• Guoai Xu

### Cerberus: The Power of Choices in Datacenter Topology Design - A Throughput Perspective

• Chen Griner
• Johannes Zerwas
• Andreas Blenk
• Stefan Schmid
• Chen Avin

The bandwidth and latency requirements of modern datacenter applications have led researchers to propose various topology designs using static, dynamic demand-oblivious (rotor), and/or dynamic demand-aware switches. However, given the diverse nature of datacenter traffic, there is little consensus about how these designs would fare against each other. In this work, we analyze the throughput of existing topology designs under different traffic patterns and study their unique advantages and potential costs in terms of bandwidth and latency "tax''. To overcome the identified inefficiencies, we propose Cerberus, a unified, two-layer leaf-spine optical datacenter design with three topology types. Cerberus systematically matches different traffic patterns with their most suitable topology type: e.g., latency-sensitive flows are transmitted via a static topology, all-to-all traffic via a rotor topology, and elephant flows via a demand-aware topology. We show analytically and in simulations that Cerberus can improve throughput significantly compared to alternative approaches and operate datacenters at higher loads while being throughput-proportional.

### Large-System Insensitivity of Zero-Waiting Load Balancing Algorithms

• Xin Liu
• Kang Gong
• Lei Ying

This paper studies the sensitivity (or insensitivity) of a class of load balancing algorithms that achieve asymptotic zero-waiting in the sub-Halfin-Whitt regime, named LB-zero. Most existing results on zero-waiting load balancing algorithms assume the service time distribution is exponential. This paper establishes the large-system insensitivity of LB-zero for jobs whose service time follows a Coxian distribution with a finite number of phases. This result suggests that LB-zero achieves asymptotic zero-waiting for a large class of service time distributions. To prove this result, this paper develops a new technique, called "Iterative State-Space Peeling'' (or ISSP for short). ISSP first identifies an iterative relation between the upper and lower bounds on the queue states and then proves that the system lives near the fixed point of the iterative bounds with a high probability. Based on ISSP, the steady-state distribution of the system is further analyzed by applying Stein's method in the neighborhood of the fixed point. ISSP, like state-space collapse in heavy-traffic analysis, is a general approach that may be used to study other complex stochastic systems.

### Mean Field and Refined Mean Field Approximations for Heterogeneous Systems: It Works!

• Sebastian Allmeier
• Nicolas Gast

Mean field approximation is a powerful technique to study the performance of large stochastic systems represented as n interacting objects. Applications include load balancing models, epidemic spreading, cache replacement policies, or large-scale data centers. Mean field approximation is asymptotically exact for systems composed of n homogeneous objects under mild conditions. In this paper, we study what happens when objects are heterogeneous. This can represent servers with different speeds or contents with different popularities. We define an interaction model that allows obtaining asymptotic convergence results for stochastic systems with heterogeneous object behavior and show that the error of the mean field approximation is of order O(1/n). More importantly, we show how to adapt the refined mean field approximation, developed by the authors of Gast et al. 2019, and show that the error of this approximation is reduced to O(1/n2). To illustrate the applicability of our result, we present two examples. The first addresses a list-based cache replacement model, RANDOM(m), which is an extension of the RANDOM policy. The second is a heterogeneous supermarket model. These examples show that the proposed approximations are computationally tractable and very accurate. For moderate system sizes (n ≈ 30) the refined mean field approximation tends to be more accurate than simulations for any reasonable simulation time.

## SESSION: Session: Optimization II

### Asymptotic Convergence Rate of Dropout on Shallow Linear Neural Networks

• Albert Senen-Cerda
• Jaron Sanders

We analyze the convergence rate of gradient flows on objective functions induced by Dropout and Dropconnect, when applying them to shallow linear Neural Networks (NNs) ---which can also be viewed as doing matrix factorization using a particular regularizer. Dropout algorithms such as these are thus regularization techniques that use {0,1} -valued random variables to filter weights during training in order to avoid coadaptation of features. By leveraging a recent result on nonconvex optimization and conducting a careful analysis of the set of minimizers as well as the Hessian of the loss function, we are able to obtain (i) a local convergence proof of the gradient flow and (ii) a bound on the convergence rate that depends on the data, the dropout probability, and the width of the NN. Finally, we compare this theoretical bound to numerical simulations, which are in qualitative agreement with the convergence bound and match it when starting sufficiently close to a minimizer.

### Robustness and Consistency in Linear Quadratic Control with Untrusted Predictions

• Tongxin Li
• Ruixiao Yang
• Guannan Qu
• Guanya Shi
• Chenkai Yu
• Steven Low

We study the problem of learning-augmented predictive linear quadratic control. Our goal is to design a controller that balances "consistency'', which measures the competitive ratio when predictions are accurate, and "robustness'', which bounds the competitive ratio when predictions are inaccurate. We propose a novel λ-confident controller and prove that it maintains a competitive ratio upper bound of 1 + min {O(λ2ε)+ O(1-λ)2,O(1)+O(λ2)} where λ∈ [0,1] is a trust parameter set based on the confidence in the predictions, and ε is the prediction error. Further, motivated by online learning methods, we design a self-tuning policy that adaptively learns the trust parameter λ with a competitive ratio that depends on ε and the variation of system perturbations and predictions. We show that its competitive ratio is bounded from above by 1+O(ε) /(Θ)(1)+Θ(ε))+O(μVar) where μVar measures the variation of perturbations and predictions. It implies that by automatically adjusting the trust parameter online, the self-tuning scheme ensures a competitive ratio that does not scale up with the prediction error ε.

### Stationary Behavior of Constant Stepsize SGD Type Algorithms: An Asymptotic Characterization

• Zaiwei Chen
• Shancong Mou
• Siva Theja Maguluri

Stochastic approximation (SA) and stochastic gradient descent (SGD) algorithms are work-horses for modern machine learning algorithms. Their constant stepsize variants are preferred in practice due to fast convergence behavior. However, constant stepsize SA algorithms do not converge to the optimal solution, but instead have a stationary distribution, which in general cannot be analytically characterized. In this work, we study the asymptotic behavior of the appropriately scaled stationary distribution, in the limit when the constant stepsize goes to zero. Specifically, we consider the following three settings: (1) SGD algorithm with a smooth and strongly convex objective, (2) linear SA algorithm involving a Hurwitz matrix, and (3) nonlinear SA algorithm involving a contractive operator. When the iterate is scaled by 1/√α, where α is the constant stepsize, we show that the limiting scaled stationary distribution is a solution of an implicit equation. Under a uniqueness assumption (which can be removed in certain settings) on this equation, we further characterize the limiting distribution as a Gaussian distribution whose covariance matrix is the unique solution of an appropriate Lyapunov equation. For SA algorithms beyond these cases, our numerical experiments suggest that unlike central limit theorem type results: (1) the scaling factor need not be 1/√α, and (2) the limiting distribution need not be Gaussian. Based on the numerical study, we come up with a heuristic formula to determine the right scaling factor, and make a connection to the Euler-Maruyama discretization scheme for approximating stochastic differential equations.

### Learning To Maximize Welfare with a Reusable Resource

• Matthew Faw
• Constantine Caramanis
• Sanjay Shakkottai

## SESSION: Session: Miscellaneous

### Free2Shard: Adversary-resistant Distributed Resource Allocation for Blockchains

• Ranvir Rana
• Sreeram Kannan
• David Tse
• Pramod Viswanath

In this paper, we formulate and study a new, but basic, distributed resource allocation problem arising in scaling blockchain performance. While distributed resource allocation is a well-studied problem in networking, the blockchain setting additionally requires the solution to be resilient to adversarial behavior from a fraction of nodes. Scaling blockchain performance is a basic research topic; a plethora of solutions (under the umbrella of sharding) have been proposed in recent years. Although the various sharding solutions share a common thread (they cryptographically stitch together multiple parallel chains), architectural differences lead to differing resource allocation problems. In this paper we make three main contributions: (a) we categorize the different sharding proposals under a common architectural framework, allowing for the emergence of a new, uniformly improved, uni-consensus sharding architecture. (b) We formulate and exactly solve a core resource allocation problem in the uni-consensus sharding architecture -- our solution, Free2shard, is adversary-resistant and achieves optimal throughput. The key technical contribution is a mathematical connection to the classical work of Blackwell approachability in dynamic game theory. (c) We implement the sharding architecture atop a full-stack blockchain in 3000 lines of code in Rust -- we achieve a throughput of more than 250,000 transactions per second with 6 shards, a vast improvement over state-of-the-art.

### Age-Dependent Differential Privacy

• Meng Zhang
• Ermin Wei
• Randall Berry
• Jianwei Huang

The proliferation of real-time applications has motivated extensive research on analyzing and optimizing data freshness in the context of age of information. However, classical frameworks of privacy (e.g., differential privacy (DP)) have overlooked the impact of data freshness on privacy guarantees, and hence may lead to unnecessary accuracy loss when trying to achieve meaningful privacy guarantees in time-varying databases. In this work, we introduce age-dependent DP, taking into account the underlying stochastic nature of a time-varying database. In this new framework, we establish a connection between classical DP and age-dependent DP, based on which we characterize the impact of data staleness and temporal correlation on privacy guarantees. Our characterization demonstrates that aging, i.e., using stale data inputs and/or postponing the release of outputs, can be a new strategy to protect data privacy in addition to noise injection in the traditional DP framework. Furthermore, to generalize our results to a multi-query scenario, we present a sequential composition result for age-dependent DP. We then characterize and achieve the optimal tradeoffs between privacy risk and utility. Finally, case studies show that, when achieving a target of an arbitrarily small privacy risk in a single-query case, the approach of combining aging and noise injection can achieve a bounded accuracy loss, whereas using noise injection only (as in the DP benchmark) will lead to an unbounded accuracy loss.

### Unleashing the Power of Paying Multiplexing Only Once in Stochastic Network Calculus

• Anne Bouillard
• Paul Nikolaus
• Jens Schmitt

The stochastic network calculus (SNC) holds promise as a versatile and uniform framework to calculate probabilistic performance bounds in networks of queues. A great challenge to accurate bounds and efficient calculations are stochastic dependencies between flows due to resource sharing inside the network. However, by carefully utilizing the basic SNC concepts in the network analysis the necessity of taking these dependencies into account can be minimized. To that end, we unleash the power of the pay multiplexing only once principle (PMOO, known from the deterministic network calculus) in the SNC analysis. We choose an analytic combinatorics presentation of the results in order to ease complex calculations. In tree-reducible networks, a subclass of general feedforward networks, we obtain an effective analysis in terms of avoiding the need to take internal flow dependencies into account. In a comprehensive numerical evaluation, we demonstrate how this unleashed PMOO analysis can reduce the known gap between simulations and SNC calculations significantly, and how it favourably compares to state-of-the art SNC calculations in terms of accuracy and computational effort. Motivated by these promising results, we also consider general feedforward networks, when some flow dependencies have to be taken into account. To that end, the unleashed PMOO analysis is extended to the partially dependent case and a case study of a canonical topology, known as the diamond network, is provided, again displaying favourable results over the state of the art.