Proceedings of the ACM on Measurement and Analysis of Computing Systems: POMACS: Vol. 9, No. 1. 2025

Full Citation in the ACM Digital Library

POMACS V9, N1, March 2025 Editorial

The Proceedings of the ACM on Measurement and Analysis of Computing Systems (POMACS) focuses on the measurement and performance evaluation of computer systems and operates in close collaboration with the ACM Special Interest Group SIGMETRICS. All papers in this issue of POMACS will be presented at the ACM SIGMETRICS 2025 conference on June 9-13, 2025, in Stony Brook, New York, USA.

The papers in this issue were selected during the fall submission round by the 94 members of the ACM SIGMETRICS 2025 program committee via a rigorous review process. Each paper was conditionally accepted (and shepherded), or allowed to be resubmitted to one of the subsequent two SIGMETRICS deadlines, or rejected (with resubmission allowed only after a year).

POMACS is publishing 14 papers out of 113 submissions in this issue, of which 2 had previously received a resubmit decision. One accepted paper opted out of publication in POMACS. Submissions typically receive at least three reviews. While most papers were decided in the online discussion phase, borderline cases were extensively discussed during the online program committee meeting. Based on the indicated track(s), roughly 25% of the submissions were in the Theory track, 25% were in the Measurement & Applied Modeling track, 31% were in the Systems track, and 19% were in the Learning track.

Many individuals contributed to the success of this issue of POMACS. We would like to thank the authors, who submitted their best work to SIGMETRICS/POMACS. We would also like to thank the program committee members who provided constructive feedback in their reviews to authors and participated in the online discussions and program committee meeting. Further, we thank the several external reviewers who provided their expert opinions on specific submissions that required additional input. Finally, we are also grateful to the ACM SIGMETRICS Chair, Mor Harchol-Balter, to the SIGMETRICS Organization Committee and to the SIGMETRICS Executive Committee for their ongoing efforts and initiatives for creating an exciting program for ACM SIGMETRICS 2025.

We hope that you enjoy reading the papers in this POMACS issue, and we look forward to your continued submissions to the ACM SIGMETRICS conference and the POMACS journal.

Allocating Public Goods via Dynamic Max-Min Fairness: Long-Run Behavior and Competitive Equilibria

Dynamic max-min fair allocation (DMMF) is a simple and popular mechanism for the repeated allocation of a shared resource among competing agents: in each round, each agent can choose to request or not for the resource, which is then allocated to the requesting agent with the least number of allocations received till then. Recent work has shown that under DMMF, a simple threshold-based request policy enjoys surprisingly strong robustness properties, wherein each agent can realize a significant fraction of her optimal utility irrespective of how other agents' behave. While this goes some way in mitigating the possibility of a 'tragedy of the commons' outcome, the robust policies require that an agent defend against arbitrary (possibly adversarial) behavior by other agents. This however may be far from optimal compared to real world settings, where other agents are selfish optimizers rather than adversaries. Therefore, robust guarantees give no insight on how agents behave in an equilibrium, and whether outcomes are improved under one.

Our work aims to bridge this gap by studying the existence and properties of equilibria under DMMF. To this end, we first show that despite the strong robustness guarantees of the threshold based strategies, no Nash equilibrium exists when agents participate in DMMF, each using some fixed threshold-based policy. On the positive side, however, we show that for the symmetric case, a simple data-driven request policy guarantees that no agent benefits from deviating to a different fixed threshold policy. In our proposed policy agents aim to match the historical allocation rate with a vanishing drift towards the rate optimizing overall welfare for all users. Furthermore, the resulting equilibrium outcome can be significantly better compared to what follows from the robustness guarantees.

Our results are built on a complete characterization of the steady-state distribution under DMMF, as well as new techniques for analyzing strategic agent outcomes under dynamic allocation mechanisms; we hope these may prove of independent interest in related problems.

Asynchronous Multi-Agent Bandits: Fully Distributed vs. Leader-Coordinated Algorithms

We study the cooperative asynchronous multi-agent multi-armed bandits problem, where each agent's active (arm pulling) decision rounds are asynchronous. That is, in each round, only a subset of agents is active to pull arms, and this subset is unknown and time-varying. We consider two models of multi-agent cooperation, fully distributed and leader-coordinated, and propose algorithms for both models that attain near-optimal regret and communications bounds, both of which are almost as good as their synchronous counterparts. The fully distributed algorithm relies on a novel communication policy consisting of accuracy adaptive and on-demand components, and successive arm elimination for decision-making. For leader-coordinated algorithms, a single leader explores arms and recommends them to other agents (followers) to exploit. As agents' active rounds are unknown, a competent leader must be chosen dynamically. We propose a variant of the Tsallis-INF algorithm with low switches to choose such a leader sequence. Lastly, we report numerical simulations of our new asynchronous algorithms with other known baselines.

Blockchain Amplification Attack

Strategies related to the blockchain concept of Extractable Value (MEV/BEV), such as arbitrage, front-, or back-running create strong economic incentives for network nodes to reduce latency. Modified nodes, that minimize transaction validation time and neglect to filter invalid transactions in the Ethereum peer-to-peer (P2P) network, introduce a novel attack vector--a Blockchain Amplification Attack. An attacker can exploit those modified nodes to amplify invalid transactions thousands of times, posing a security threat to the entire network. To illustrate attack feasibility and practicality in the current Ethereum network (''mainnet''), we 1) identify thousands of similar attacks in the wild, 2) mathematically model the propagation mechanism, 3) empirically measure model parameters from our monitoring nodes, and 4) compare the performance with other existing Denial-of-Service attacks through local simulation. We show that an attacker can amplify network traffic at modified nodes by a factor of 3,600, and cause economic damages of approximately 13,800 times the amount needed to carry out the attack. Despite these risks, aggressive latency reduction may still be profitable enough for various providers to justify the existence of modified nodes. To assess this trade-off, we 1) simulate the transaction validation process in a local network and 2) empirically measure the latency reduction by deploying our modified node in the Ethereum test network (''testnet''). We conclude with a cost-benefit analysis of skipping validation and provide mitigation strategies against the blockchain amplification attack.

CHash: A High Cost-Performance Hash Design for CXL-based Disaggregated Memory System

The exponential growth in demand for high-performance computing systems has created an urgent requirement for innovative memory technologies that can provide higher bandwidth and enhanced capacity scalability. In particular, Compute Express Link (CXL) has emerged as a promising solution for memory expansion and system acceleration. Hash-based index structures are widely recognized as fundamental components of in-memory database systems, and they are commonly used for indexing in-memory key-value stores due to their capability for rapid lookup performance. How to design and maintain a hash index structure in a CXLbased disaggregated memory system, with comprehensive consolidation, is a challenging topic of in-depth research.

In this paper, we conduct extensive experiments on two real CXL memory devices based on the 4th-generation Intel® Xeon® Scalable Processor with CXL 1.0 support. Specifically, we run various microbenchmarks not only to evaluate the performance characteristics of CXL memory devices, but also to measure the performance impact of different memory allocation methods. We observe significant performance disparities and allocation inefficiencies, motivating us to innovate the hash design in the CXL-based disaggregated memory system. Lastly, we propose CHash, a highly efficient hashing scheme that adopts a DRAM-CXL memory hybrid storage mechanism. CHash maintains components across different media by carefully designing the hash structure to fully exploit the features of CXL memory devices. Furthermore, CHash implements several acceleration techniques to speed up the key-value (KV) record searches. Some optimizations, including stage buckets, two-level filtering, hashing multiplexing and so on, are also further adopted to improve the overall performance. Extensive experimental results validate the efficiency of CHash in different CXL memory devices, demonstrating that it achieves a 2.2× to 9.4× speedup for insertions, and a 1.2× to 4.5× speedup for lookups, all the while maintaining a high load factor (~90%), compared to state-of-the-art hashing schemes.

Exploring Function Granularity for Serverless Machine Learning Application with GPU Sharing

Recent years have witnessed increasing interest in machine learning (ML) inferences on serverless computing due to its auto-scaling and cost-effective properties. However, one critical aspect, function granularity, has been largely overlooked, limiting the potential of serverless ML. This paper explores the impact of function granularity on serverless ML, revealing its important effects on the SLO hit rates and resource costs of serverless applications. It further proposes adaptive granularity as an approach to addressing the phenomenon that no single granularity fits all applications and situations. It explores three predictive models and presents programming tools and runtime extensions to facilitate the integration of adaptive granularity into existing serverless platforms. Experiments show adaptive granularity produces up to a 29.2% improvement in SLO hit rates and up to a 24.6% reduction in resource costs over the state-of-the-art serverless ML which uses fixed granularity.

Game Theoretic Liquidity Provisioning in Concentrated Liquidity Market Makers

Automated marker makers (AMMs) are decentralized exchanges that enable the automated trading of digital assets. Liquidity providers (LPs) deposit digital tokens, which can be used by traders to execute trades, which generate fees for the investing LPs. In AMMs, trade prices are determined algorithmically, unlike classical limit order books. Concentrated liquidity market makers (CLMMs) are a major class of AMMs that offer liquidity providers flexibility to decide not only how much liquidity to provide, but in what ranges of prices they want the liquidity to be used. This flexibility can complicate strategic planning, since fee rewards are shared among LPs. We formulate and analyze a game theoretic model to study the incentives of LPs in CLMMs. Our main results show that while our original formulation admits multiple Nash equilibria and has complexity quadratic in the number of price ticks in the contract, it can be reduced to a game with a unique Nash equilibrium whose complexity is only linear. We further show that the Nash equilibrium of this simplified game follows a waterfilling strategy, in which low-budget LPs use up their full budget, but rich LPs do not. Finally, by fitting our game model to real-world CLMMs, we observe that in liquidity pools with risky assets, LPs adopt investment strategies far from the Nash equilibrium. Under price uncertainty, they generally invest in fewer and wider price ranges than our analysis suggests, with lower-frequency liquidity updates. In such risky pools, by updating their strategy to more closely match the Nash equilibrium of our game, LPs can improve their median daily returns by $116, which corresponds to an increase of 0.009% in median daily return on investment (ROI). At maximum, LPs can improve daily ROI by 0.855% when they reach Nash equilibrium. In contrast, in stable pools (e.g., of only stablecoins), LPs already adopt strategies that more closely resemble the Nash equilibrium of our game.

Learning-Augmented Competitive Algorithms for Spatiotemporal Online Allocation with Deadline Constraints

We introduce and study spatiotemporal online allocation with deadline constraints (SOAD), a new online problem motivated by emerging challenges in sustainability and energy. In SOAD, an online player completes a workload by allocating and scheduling it on the points of a metric space (X, d) while subject to a deadline T. At each time step, a service cost function is revealed that represents the cost of servicing the workload at each point, and the player must irrevocably decide the current allocation of work to points. Whenever the player moves this allocation, they incur a movement cost defined by the distance metric d(⋅, ⋅) that captures, e.g., an overhead cost. SOAD formalizes the open problem of combining general metrics and deadline constraints in the online algorithms literature, unifying problems such as metrical task systems and online search. We propose a competitive algorithm for SOAD along with a matching lower bound establishing its optimality. Our main algorithm, ST-CLIP, is a learning-augmented algorithm that takes advantage of predictions (e.g., forecasts of relevant costs) and achieves an optimal consistency-robustness trade-off. We evaluate our proposed algorithms in a simulated case study of carbon-aware spatiotemporal workload management, an application in sustainable computing that schedules a delay-tolerant batch compute job on a distributed network of data centers. In these experiments, we show that ST-CLIP substantially improves on heuristic baseline methods.

NetJIT: Bridging the Gap from Traffic Prediction to Preknowledge for Distributed Machine Learning

Today's distributed machine learning (DML) introduces heavy traffic load, making the interconnection network one of the primary bottlenecks. To mitigate this bottleneck, existing state-of-the-art network optimization methods, such as traffic or topology engineering, are proposed to adapt to real-time traffic. However, current traffic measurement and prediction methods struggle to collect sufficiently fine-grained and accurate traffic patterns. This limitation impedes the ability of cutting-edge network optimization techniques to react agilely to the ever-changing traffic demands of DML jobs.

This paper proposes NetJIT, a novel program-behavior-aware toolkit for accurately foreseeing the traffic pattern of DML. To the best of our knowledge, this is the first work proposing the use of just-in-time (JIT) program analysis for real-time traffic measurement. In DML applications, communication behavior is primarily determined by the previously computed results. NetJIT leverages this characteristic to anticipate communication details by tracing and analyzing the data relations in the computation process. This capability enables the deployment of optimization strategies in advance.

We deploy NetJIT in real-world network optimization for traffic preknowledge. Evaluation with the self-built testbed prototype demonstrates that NetJIT can achieve up to about 97% less error of detecting communication events compared with other methods. Simulations with real-world DML workloads further illustrate that NetJIT enables more precise network optimization, leading to approximately 50% better network performance w.r.t the metrics including average iteration time, throughput, and average packet delay.

Steady-State Convergence of the Continuous-Time Routing System with General Distributions in Heavy Traffic

This paper examines a continuous-time routing system with general interarrival and service time distributions, operating under either the join-the-shortest-queue policy or the power-of-two-choices policy. Under a weaker set of assumptions than those commonly established in the literature, we prove that the scaled steady-state queue length at each station converges weakly to a common exponential random variable in heavy traffic. Specifically, our results hold under the assumption of the (2 + ε)th moment for the interarrival and service distributions with some ε > 0. The proof leverages the Palm version of the basic adjoint relationship (BAR) as a key technique.

The Last Survivor of PoS Pools: Staker's Dilemma

In blockchains using the Proof-of-Work (PoW) consensus mechanism, a mining pool is a joint group of miners who combine their computational resources and share the generated revenue. Similarly, when the Proof-of-Stake (PoS) consensus mechanism is adopted, the staking pool imitates the design of the mining pool by aggregating the stakes. However, in PoW blockchains, the pooling approach has been criticized to be vulnerable to the block withholding (BWH) attack. BWH attackers may steal the dividends from victims by pretending to work but making invalid contributions to the victim pools. It is well known that BWH attackers against PoW face the miner's dilemma. To our knowledge, despite the popularity of PoS, we are the first to study the pool BWH attack against PoS. Interestingly, we find that, for a network only consisting of one attacker pool and one victim pool, the attacker will eventually manipulate the network while the victim will vanish by losing the stake ratio gradually. Moreover, in a more realistic scenario with multiple BWH attacker pools and one solo staker who does not join any pools, we show that only one lucky attacker and the solo staker will survive, whereas all the other pools will vanish gradually, revealing the staker's dilemma. These findings indicate that, compared to PoW, the BWH attack on PoS has a much more severe impact due to the attacker's resource aggregation advantage. Our analysis is supported by experiments on massive real blockchain systems and numerical simulations.

Tiered Cloud Routing: Methodology, Latency, and Improvement

Large cloud providers including AWS, Azure, and Google Cloud offer two tiers of network services to their customers: one class uses the providers' private wide area networks (WAN-transit) to carry a customer's traffic as much as possible, and the other uses the public internet (inet-transit). Little is known about how each cloud provider configures its network to offer different transit services, how well these services work, and whether the quality of those services can be further improved. In this work, we conduct a large-scale study to answer these questions. Using RIPE Atlas probes as vantage points, we explore how traffic enters and leaves each cloud's WAN. In addition, we measure the access latency of the WAN-transit and the inet-transit service of each cloud and compare it with that of an emulated performance-based routing strategy. Our study shows that despite the cloud providers' intention to carry customers' traffic on its WAN to the maximum extent possible, for about 12% (Azure) and 13% (Google) of our vantage points, traffic exits the cloud WAN early at cloud edges more than 5000km away from the vantage points' nearest cloud edges. In contrast, more than 84% (AWS), 78% (Azure), and 81% (Google) of vantage points enter a cloud WAN within a 500km radius of their respective locations. Moreover, we find that cloud providers employ different routing strategies to implement the inet-transit service, leading to transit policies that may deviate from their advertised service descriptions. Finally, we find that a performance-based routing strategy can significantly reduce latencies in all three cloud providers for 4% to 85% of vantage point and cloud region pairs.

Two Choice Behavioral Game Dynamics with Myopic-Rational and Herding Players

In classical game theory, the players are assumed to be rational and intelligent, which is often contradictory to reality. We consider more realistic behavioral game dynamics where the players choose actions in a turn-by-turn manner and exhibit two prominent behavioral traits --- α-fraction of them are myopic who strategically choose optimal actions against the empirical distribution of the previous plays, while the rest exhibit herding behavior by choosing the most popular action till then. The utilities are realised for all, at the end of the game, and each player gets to play only once. Our analysis focuses on scenarios when players encounter two possible choices, common in applications like participation games (e.g., crowd-sourcing) or minority games.

To begin with, we derive the almost sure mean-field limits of such dynamics. The proof is constructive and progressively narrows down the potential limit set and finally establishes the existence of a unique limit for almost all sample paths. We argue that the dynamics at the limit is captured by a differential inclusion (and not the usual ordinary differential equation) due to the discontinuities arising from the switching behavioral choices. It is noteworthy that our methodology can be easily modified to analyse the avoid-the-crowd behavior, in place of herding behavior.

We conclude with two interesting examples, named participation game and routing game, which encapsulate several real-life scenarios. Interestingly, for the first game, we observe that the game designer can induce a higher level of participation in an activity with smaller reward, by leveraging upon the presence of herding players.

VESTA: A Secure and Efficient FHE-based Three-Party <u>V</u>ectorized <u>E</u>valuation <u>S</u>ystem for <u>T</u>ree <u>A</u>ggregation Models

Machine Learning as a Service (MLaaS) platforms simplify the development of machine learning applications across multiple parties. However, the model owner, compute server, and client user may not trust each other, creating a need for privacy-preserving approaches that allow applications to run without revealing proprietary data. In this work, we focus on a widely used classical machine learning model -- tree ensembles. While previous efforts have applied Fully Homomorphic Encryption (FHE) to this model, these solutions suffer from slow inference speeds and excessive memory consumption. To address these issues, we propose VESTA, which includes a compiler and a runtime to reduce tree evaluation time and memory usage. VESTA includes two key techniques: First, VESTA precomputes a portion of the expensive FHE operations at compile-time, improving inference speed. Second, VESTA uses a partitioning pass in its compiler to divide the ensemble model into sub-models, enabling task-level parallelism. Comprehensive evaluation shows that VESTA achieves a 2.1× speedup and reduces memory consumption by 59.4% compared to the state-of-the-art.