The ACM Proceedings of the ACM on Measurement and Analysis of Computing Systems (POMACS) focuses on the measurement and performance evaluation of computer systems and operates in close collaboration with the ACM Special Interest Group SIGMETRICS. All papers in this issue of POMACS will be presented during the ACM SIGMETRICS/Performance 2022 conference.
The issue contains papers selected by the editorial board via a rigorous review process that follows a hybrid conference and journal model, with reviews conducted by the 93 members of our POMACS editorial board. Each paper was either conditionally accepted (and shepherded), allowed a "one-shot" revision (to be resubmitted to one of the subsequent two deadlines), or rejected (with resubmission allowed after a year). For this issue, which represents the summer deadline, POMACS publishes 17 papers out of 71 submissions. All submitted papers received at least 3 reviews and we held an online TPC meeting.
Based on the indicated primary track, roughly 37% of the submissions were in the Theory track, 30% were in the Measurement & Applied Modeling track, 20% were in the Systems track, and 14% were in the Learning track.
Many people contributed to the success of this issue of POMACS. First, we would like to thank the authors, who submitted their best work to SIGMETRICS/POMACS. Second, we would like to thank the TPC members who provided constructive feedback in their reviews to authors and participated in the online discussions and TPC meetings. We also thank the several external reviewers who provided their expert opinion on specific submissions that required additional input. We are also grateful to the SIGMETRICS Board Chair, Giuliano Casale, and to past TPC Chairs, Anshul Gandhi, Negar Kiyavash, and Jia Wang, who provided a wealth of information and guidance (including a template for writing this editorial note!). Finally, we are grateful to the Organization Committee and to the SIGMETRICS Board for their ongoing efforts and initiatives for creating an exciting program for ACM SIGMETRICS/Performance 2022.
In this paper, we study the online multidimensional knapsack problem (called OMdKP) in which there is a knapsack whose capacity is represented in m dimensions, each dimension could have a different capacity. Then, n items with different scalar profit values and m-dimensional weights arrive in an online manner and the goal is to admit or decline items upon their arrival such that the total profit obtained by admitted items is maximized and the capacity of knapsack across all dimensions is respected. This is a natural generalization of the classic single-dimension knapsack problem and finds several relevant applications such as in virtual machine allocation, job scheduling, and all-or-nothing flow maximization over a graph. We develop two algorithms for OMdKP that use linear and exponential reservation functions to make online admission decisions. Our competitive analysis shows that the linear and exponential algorithms achieve the competitive ratios of O(θα ) and O(łogł(θα)), respectively, where α is the ratio between the aggregate knapsack capacity and the minimum capacity over a single dimension and θ is the ratio between the maximum and minimum item unit values. We also characterize a lower bound for the competitive ratio of any online algorithm solving OMdKP and show that the competitive ratio of our algorithm with exponential reservation function matches the lower bound up to a constant factor.
Cloud gaming platforms have witnessed tremendous growth over the past two years with a number of large Internet companies including Amazon, Facebook, Google, Microsoft, and Nvidia publicly launching their own platforms. While cloud gaming platforms continue to grow, the visibility in their performance and relative comparison is lacking. This is largely due to absence of systematic measurement methodologies which can generally be applied. As such, in this paper, we implement DECAF, a methodology to systematically analyze and dissect the performance of cloud gaming platforms across different game genres and game platforms. DECAF is highly automated and requires minimum manual intervention. By applying DECAF, we measure the performance of three commercial cloud gaming platforms including Google Stadia, Amazon Luna, and Nvidia GeForceNow, and uncover a number of important findings. First, we find that processing delays in the cloud comprise majority of the total round trip delay experienced by users, accounting for as much as 73.54% of total user-perceived delay. Second, we find that video streams delivered by cloud gaming platforms are characterized by high variability of bitrate, frame rate, and resolution. Platforms struggle to consistently serve 1080p/60 frames per second streams across different game genres even when the available bandwidth is 8-20× that of platform's recommended settings. Finally, we show that game platforms exhibit performance cliffs by reacting poorly to packet losses, in some cases dramatically reducing the delivered bitrate by up to 6.6× when loss rates increase from 0.1% to 1%. Our work has important implications for cloud gaming platforms and opens the door for further research on comprehensive measurement methodologies for cloud gaming.
The Real Time Bidding (RTB) protocol is by now more than a decade old. During this time, a handful of measurement papers have looked at bidding strategies, personal information flow, and cost of display advertising through RTB. In this paper, we present YourAdvalue, a privacy-preserving tool for displaying to end-users in a simple and intuitive manner their advertising value as seen through RTB. Using YourAdvalue, we measure desktop RTB prices in the wild, and compare them with desktop and mobile RTB prices reported by past work. We present how it estimates ad prices that are encrypted, and how it preserves user privacy while reporting results back to a data-server for analysis. We deployed our system, disseminated its browser extension, and collected data from 200 users, including 12000 ad impressions over 11 months. By analyzing this dataset, we show that desktop RTB prices have grown 4.6x over desktop RTB prices measured in 2013, and 3.8x over mobile RTB prices measured in 2015. We also study how user demographics associate with the intensity of RTB ecosystem tracking, leading to higher ad prices. We find that exchanging data between advertisers and/or data brokers through cookie-synchronization increases the median value of display ads by 19%. We also find that female and younger users are more targeted, suffering more tracking (via cookie synchronization) than male or elder users. As a result of this targeting in our dataset, the advertising value (i) of women is 2.4x higher than that of men, (ii) of 25-34 year-olds is 2.5x higher than that of 35-44 year-olds, (iii) is most expensive on weekends and early mornings.
Flash-based solid state drives (SSDs) have gained a central role in the infrastructure of large-scale datacenters, as well as in commodity servers and personal devices. The main limitation of flash media is its inability to support update-in-place: after data has been written to a physical location, it has to be erased before new data can be written to it. Moreover, SSDs support read and write operations in granularity of pages, while erasures are performed on entire blocks, which often contain hundreds of pages. When erasing a block, any valid data it stores must be rewritten to a clean location. As an SSD eventually wears out with progressing number of erasures, the efficiency of the management algorithm has a significant impact on its endurance. In this paper we first formally define the SSD management problem. We then explore this problem from an algorithmic perspective, considering it in both offline and online settings. In the offline setting, we present a near-optimal algorithm that, given any input, performs a negligible number of rewrites (relative to the input length). We also discuss the hardness of the offline problem. In the online setting, we first consider algorithms that have no prior knowledge about the input. We prove that no deterministic algorithm outperforms the greedy algorithm in this setting, and discuss the possible benefit of randomization. We then augment our model, assuming that each request for a page arrives with a prediction of the next time the page is updated. We design an online algorithm that uses such predictions, and show that its performance improves as the prediction error decreases. We also show that the performance of our algorithm is never worse than that guaranteed by the greedy algorithm, even when the prediction error is large. We complement our theoretical findings with an empirical evaluation of our algorithms, comparing them with the state-of-the-art scheme. The results confirm that our algorithms exhibit an improved performance for a wide range of input traces.
Convolutional neural networks (CNNs) are used in numerous real-world applications such as vision-based autonomous driving and video content analysis. To run CNN inference on various target devices, hardware-aware neural architecture search (NAS) is crucial. A key requirement of efficient hardware-aware NAS is the fast evaluation of inference latencies in order to rank different architectures. While building a latency predictor for each target device has been commonly used in state of the art, this is a very time-consuming process, lacking scalability in the presence of extremely diverse devices. In this work, we address the scalability challenge by exploiting latency monotonicity --- the architecture latency rankings on different devices are often correlated. When strong latency monotonicity exists, we can re-use architectures searched for one proxy device on new target devices, without losing optimality. In the absence of strong latency monotonicity, we propose an efficient proxy adaptation technique to significantly boost the latency monotonicity. Finally, we validate our approach and conduct experiments with devices of different platforms on multiple mainstream search spaces, including MobileNet-V2, MobileNet-V3, NAS-Bench-201, ProxylessNAS and FBNet. Our results highlight that, by using just one proxy device, we can find almost the same Pareto-optimal architectures as the existing per-device NAS, while avoiding the prohibitive cost of building a latency predictor for each device.
We study a cache network under arbitrary adversarial request arrivals. We propose a distributed online policy based on the online tabular greedy algorithm. Our distributed policy achieves sublinear (1-1/e)-regret, also in the case when update costs cannot be neglected. Numerical evaluation over several topologies supports our theoretical results and demonstrates that our algorithm outperforms state-of-art online cache algorithms.
We consider a simple form of pricing for a crowdsourcing system, where pricing policy is published a priori, and workers then decide their task acceptance. Such a pricing form is widely adopted in practice for its simplicity, e.g., Amazon Mechanical Turk, although additional sophistication to pricing rule can enhance budget efficiency. With the goal of designing efficient and simple pricing rules, we study the impact of the following two design features in pricing policies: (i) personalization tailoring policy worker-by-worker and (ii) bonus payment to qualified task completion. In the Bayesian setting, where the only prior distribution of workers' profiles is available, we first study the Price of Agnosticism (PoA) that quantifies the utility gap between personalized and common pricing policies. We show that PoA is bounded within a constant factor under some mild conditions, and the impact of bonus is essential in common pricing. These analytic results imply that complex personalized pricing can be replaced by simple common pricing once it is equipped with a proper bonus payment. To provide insights on efficient common pricing, we then study the efficient mechanisms of bonus payment for several profile distribution regimes which may exist in practice. We provide primitive experiments on Amazon Mechanical Turk, which support our analytical findings.
We study the optimal bids and allocations in a real-time auction for heterogeneous items subject to the requirement that specified collections of items of given types be acquired within given time constraints. The problem is cast as a continuous time optimization problem that can, under certain weak assumptions, be reduced to a convex optimization problem. Focusing on the standard first and second price auctions, we first show, using convex duality, that the optimal (infinite dimensional) bidding policy can be represented by a single finite vector of so-called ''pseudo-bids''. Using this result we are able to show that the optimal solution in the second price case turns out to be a very simple piecewise constant function of time. This contrasts with the first price case that is more complicated. Despite the fact that the optimal solution for the first price auction is genuinely dynamic, we show that there remains a close connection between the two cases and that, empirically, there is almost no difference between optimal behavior in either setting. This suggests that it is adequate to bid in a first price auction as if it were in fact second price. Finally, we detail methods for implementing our bidding policies in practice with further numerical simulations illustrating the performance.
The bandwidth and latency requirements of modern datacenter applications have led researchers to propose various topology designs using static, dynamic demand-oblivious (rotor), and/or dynamic demand-aware switches. However, given the diverse nature of datacenter traffic, there is little consensus about how these designs would fare against each other. In this work, we analyze the throughput of existing topology designs under different traffic patterns and study their unique advantages and potential costs in terms of bandwidth and latency ''tax''. To overcome the identified inefficiencies, we propose Cerberus, a unified, two-layer leaf-spine optical datacenter design with three topology types. Cerberus systematically matches different traffic patterns with their most suitable topology type: e.g., latency-sensitive flows are transmitted via a static topology, all-to-all traffic via a rotor topology, and elephant flows via a demand-aware topology. We show analytically and in simulations that Cerberus can improve throughput significantly compared to alternative approaches and operate datacenters at higher loads while being throughput-proportional.
The prosperity of the cryptocurrency ecosystem drives the need for digital asset trading platforms. Beyond centralized exchanges (CEXs), decentralized exchanges (DEXs) are introduced to allow users to trade cryptocurrency without transferring the custody of their digital assets to the middlemen, thus eliminating the security and privacy issues of traditional CEX. Uniswap, as the most prominent cryptocurrency DEX, is continuing to attract scammers, with fraudulent cryptocurrencies flooding in the ecosystem. In this paper, we take the first step to detect and characterize scam tokens on Uniswap. We first collect all the transactions related to Uniswap V2 exchange and investigate the landscape of cryptocurrency trading on Uniswap from different perspectives. Then, we propose an accurate approach for flagging scam tokens on Uniswap based on a guilt-by-association heuristic and a machine-learning powered technique. We have identified over 10K scam tokens listed on Uniswap, which suggests that roughly 50% of the tokens listed on Uniswap are scam tokens. All the scam tokens and liquidity pools are created specialized for the "rug pull" scams, and some scam tokens have embedded tricks and backdoors in the smart contracts. We further observe that thousands of collusion addresses help carry out the scams in league with the scam token/pool creators. The scammers have gained a profit of at least $16 million from 39,762 potential victims. Our observations in this paper suggest the urgency to identify and stop scams in the decentralized finance ecosystem, and our approach can act as a whistleblower that identifies scam tokens at their early stages.
Network management often relies on machine learning to make predictions about performance and security from network traffic. Often, the representation of the traffic is as important as the choice of the model. The features that the model relies on, and the representation of those features, ultimately determine model accuracy, as well as where and whether the model can be deployed in practice. Thus, the design and evaluation of these models ultimately requires understanding not only model accuracy but also the systems costs associated with deploying the model in an operational network. Towards this goal, this paper develops a new framework and system that enables a joint evaluation of both the conventional notions of machine learning performance (e.g., model accuracy) and the systems-level costs of different representations of network traffic. We highlight these two dimensions for two practical network management tasks, video streaming quality inference and malware detection, to demonstrate the importance of exploring different representations to find the appropriate operating point. We demonstrate the benefit of exploring a range of representations of network traffic and present Traffic Refinery, a proof-of-concept implementation that both monitors network traffic at 10~Gbps and transforms traffic in real time to produce a variety of feature representations for machine learning. Traffic Refinery both highlights this design space and makes it possible to explore different representations for learning, balancing systems costs related to feature extraction and model training against model accuracy.
Proof-of-Work (PoW) based blockchains typically allocate only a tiny fraction (e.g., less than 1% for Ethereum) of the average interarrival time (I) between blocks for validating smart contracts present in transactions. In such systems, block validation and PoW mining are typically performed sequentially, the former by CPUs and the latter by ASICs. A trivial increase in validation time (τ) introduces the popularly known Verifier's Dilemma, and as we demonstrate, causes more forking and hurts fairness. Large τ also reduces the tolerance for safety against a Byzantine adversary. Solutions that offload validation to a set of non-chain nodes (a.k.a. off-chain approaches) suffer from trust and performance issues that are non-trivial to resolve. In this paper, we present Tuxedo, the first on-chain protocol to theoretically scale τ/I ≈1 in PoW blockchains. The key innovation in Tuxedo is to perform CPU-based block processing in parallel to ASIC mining. We achieve this by allowing miners to delay validation of transactions in a block by up to ζ blocks, where ζ is a system parameter. We perform security analysis of Tuxedo considering all possible adversarial strategies in a synchronous network with maximum end-to-end delay Δ and demonstrate that Tuxedo achieves security equivalent to known results for longest chain PoW Nakamoto consensus. Our prototype implementation of Tuxedo atop Ethereum demonstrates that it can scale τ without suffering the harmful effects of naive scaling up of τ/I in existing blockchains
As a first step of designing O ptical-circuit-switched D ata C enters (ODC), physical topology design is critical as it determines the scalability and the performance limit of the entire ODC. However, prior works on ODC have not yet paid much attention to physical topology design, and the adopted physical topologies either scale poorly, or lack performance guarantee. We offer a mathematical foundation for the design and performance analysis of ODC physical topologies in this paper. We introduce a new performance metric β(G ) to evaluate the gap between a physical topology G and the ideal physical topology. We develop a coupling technique that bypasses a significant amount of computational complexity of calculating β(G). Using β(G ) and the coupling technique, we study four physical topologies that are representative of those in literature, analyze their scalabilities and prove their performance guarantees. Our analysis may provide new guidance for network operators to design better physical topologies for their ODCs.
It is challenging to conduct a large scale Internet censorship measurement, as it involves triggering censors through artificial requests and identifying abnormalities from corresponding responses. Due to the lack of ground truth on the expected responses from legitimate services, previous studies typically require a heavy, unscalable manual inspection to identify false positives while still leaving false negatives undetected. In this paper, we propose Disguiser, a novel framework that enables end-to-end measurement to accurately detect the censorship activities and reveal the censor deployment without manual efforts. The core of Disguiser is a control server that replies with a static payload to provide the ground truth of server responses. As such, we send requests from various types of vantage points across the world to our control server, and the censorship activities can be recognized if a vantage point receives a different response. In particular, we design and conduct a cache test to pre-exclude the vantage points that could be interfered by cache proxies along the network path. Then we perform application traceroute towards our control server to explore censors' behaviors and their deployment. With Disguiser, we conduct 58 million measurements from vantage points in 177 countries. We observe 292 thousand censorship activities that block DNS, HTTP, or HTTPS requests inside 122 countries, achieving a 10^-6 false positive rate and zero false negative rate. Furthermore, Disguiser reveals the censor deployment in 13 countries.
The performance of Adaptive Bitrate (ABR) algorithms for video streaming depends on accurately predicting the download time of video chunks. Existing prediction approaches (i) assume chunk download times are dominated by network throughput; and (ii) apriori cluster sessions (e.g., based on ISP and CDN) and only learn from sessions in the same cluster. We make three contributions. First, through analysis of data from real-world video streaming sessions, we show (i) apriori clustering prevents learning from related clusters; and (ii) factors such as the Time to First Byte (TTFB) are key components of chunk download times but not easily incorporated into existing prediction approaches. Second, we propose Xatu, a new prediction approach that jointly learns a neural network sequence model with an interpretable automatic session clustering method. Xatu learns clustering rules across all sessions it deems relevant, and models sequences with multiple chunk-dependent features (e.g., TTFB) rather than just throughput. Third, evaluations using the above datasets and emulation experiments show that Xatu significantly improves prediction accuracies by 23.8% relative to CS2P (a state-of-the-art predictor). We show Xatu provides substantial performance benefits when integrated with multiple ABR algorithms including MPC (a well studied ABR algorithm), and FuguABR (a recent algorithm using stochastic control) relative to their default predictors (CS2P and a fully connected neural network respectively). Further, Xatu combined with MPC outperforms Pensieve, an ABR based on deep reinforcement learning.