Proceedings of the ACM on Measurement and Analysis of Computing Systems: POMACS: Vol. 7, No. 3. 2023

Full Citation in the ACM Digital Library

POMACS V7, N3, December 2023 Editorial

Florin Ciucu
Giulia Fanti
Rhonda Righter

The Proceedings of the ACM on Measurement and Analysis of Computing Systems (POMACS) focuses on the measurement and performance evaluation of computer systems and operates in close collaboration with the ACM Special Interest Group SIGMETRICS. All papers in this issue of POMACS will be presented at the ACM SIGMETRICS/Performance 2024 conference on June 10-14, 2024, in Venice, Italy. These papers have been selected during the summer submission round by the 84 members of the ACM SIGMETRICS/Performance 2024 program committee via a rigorous review process. Each paper was conditionally accepted (and shepherded), allowed a "one-shot" revision (to be resubmitted to one of the subsequent two SIGMETRICS/Performance deadlines), or rejected (with re-submission allowed after a year). For this issue, which represents the summer deadline, POMACS is publishing 20 papers out of 91 submissions, of which 11 had previously received a one-shot revision decision. All submissions received at least 3 reviews and borderline cases were extensively discussed during the online program committee meeting. Based on the indicated track(s), roughly 27% of the submissions were in the Theory track, 46% were in the Measurement & Applied Modeling track, 41% were in the Systems track, and 14% were in the Learning track. Many individuals contributed to the success of this issue of POMACS. First, we would like to thank the authors, who submitted their best work to SIGMETRICS/Performance/POMACS. Second, we would like to thank the program committee members who provided constructive feedback in their reviews to authors and participated in the online discussions and program committee meeting. We also thank the several external reviewers who provided their expert opinions on specific submissions that required additional input. We are also grateful to the SIGMETRICS Board Chair, Mor Harchol-Balter, the IFIP Working Group 7.3 Chair, Mark S. Squillante, the previous SIGMETRICS Board Chair, Giuliano Casale, and the past program committee Chairs, Konstantin Avratchenkov, Phillipa Gill, and Bhuvan Urgaonkar, who provided a wealth of information and guidance. Finally, we are grateful to the organization committee and to the SIGMETRICS Board for their ongoing efforts and initiatives for creating an exciting program for ACM SIGMETRICS/Performance 2024.

Agents of Autonomy: A Systematic Study of Robotics on Modern Hardware

Mohammad Bakhshalipour
Phillip B. Gibbons

As robots increasingly permeate modern society, it is crucial for the system and hardware research community to bridge its long-standing gap with robotics. This divide has persisted due to the lack of (i) a systematic performance evaluation of robotics on different computing platforms and (ii) a comprehensive, open-source, cross-platform benchmark suite.

To address these gaps, we present a systematic performance study of robotics on modern hardware and introduce RoWild, an open-source benchmark suite for robotics that is comprehensive and cross-platform. Our workloads encompass a broad range of robots, including driverless vehicles, pilotless drones, and stationary robotic arms, and we evaluate their performance on a spectrum of modern computing platforms, from low-end embedded CPUs to high-end server-grade GPUs. The source code of the benchmark suite is available in https://cmu-roboarch.github.io/rowild/.

Our findings reveal that current architectures experience significant inefficiencies when executing robotic workloads, highlighting the need for architectural advancements that satisfy the primary requirements of robotic tasks. We discuss approaches for meeting these requirements, offering insights for improving the performance of robotics.

Lightweight Acquisition and Ranging of Flows in the Data Plane

Andrea Monterubbiano
Jonatan Langlet
Stefan Walzer
Gianni Antichi
Pedro Reviriego
Salvatore Pontarelli

As networks get more complex, the ability to track almost all the flows is becoming of paramount importance. This is because we can then detect transient events impacting only a subset of the traffic. Solutions for flow monitoring exist, but it is getting very difficult to produce accurate estimations for every <flowID,counter> tuple given the memory constraints of commodity programmable switches. Indeed, as networks grow in size, more flows have to be tracked, increasing the number of tuples to be recorded. At the same time, end-host virtualization requires more specific flowIDs, enlarging the memory cost for every single entry. Finally, the available memory resources have to be shared with other important functions as well (e.g., load balancing, forwarding, ACL).

To address those issues, we present FlowLiDAR (Flow Lightweight Detection and Ranging), a new solution that is capable of tracking almost all the flows in the network while requiring only a modest amount of data plane memory which is not dependent on the size of flowIDs. We implemented the scheme in P4, tested it using real traffic from ISPs and compared it against four state-of-the-art solutions: FlowRadar, NZE, PR-sketch, and Elastic Sketch. While those can only reconstruct up to 60% of the tuples, FlowLiDAR can track 98.7% of them with the same amount of memory.

The Online Pause and Resume Problem: Optimal Algorithms and An Application to Carbon-Aware Load Shifting

Adam Lechowicz
Nicolas Christianson
Jinhang Zuo
Noman Bashir
Mohammad Hajiesmaili
Adam Wierman
Prashant Shenoy

We introduce and study the online pause and resume problem. In this problem, a player attempts to find the k lowest (alternatively, highest) prices in a sequence of fixed length T, which is revealed sequentially. At each time step, the player is presented with a price and decides whether to accept or reject it. The player incurs aswitching cost whenever their decision changes in consecutive time steps, i.e., whenever they pause or resume purchasing. This online problem is motivated by the goal of carbon-aware load shifting, where a workload may be paused during periods of high carbon intensity and resumed during periods of low carbon intensity and incurs a cost when saving or restoring its state. It has strong connections to existing problems studied in the literature on online optimization, though it introduces unique technical challenges that prevent the direct application of existing algorithms. Extending prior work on threshold-based algorithms, we introducedouble-threshold algorithms for both the minimization and maximization variants of this problem. We further show that the competitive ratios achieved by these algorithms are the best achievable by any deterministic online algorithm. Finally, we empirically validate our proposed algorithm through case studies on the application of carbon-aware load shifting using real carbon trace data and existing baseline algorithms.

Nautilus: A Framework for Cross-Layer Cartography of Submarine Cables and IP Links

Alagappan Ramanathan
Sangeetha Abdu Jyothi

Submarine cables constitute the backbone of the Internet. However, these critical infrastructure components are vulnerable to several natural and man-made threats, and during failures, are difficult to repair in remote oceans. In spite of their crucial role, we have a limited understanding of the impact of submarine cable failures on global connectivity, particularly on the higher layers of the Internet.

In this paper, we present Nautilus, a framework for cross-layer cartography of submarine cables and IP links. Using a corpus of public datasets and Internet cartographic techniques, Nautilus identifies IP links that are likely traversing submarine cables and maps them to one or more potential cables. Nautilus also gives each IP to cable assignment a prediction score that reflects the confidence in the mapping. Nautilus generates a mapping for 3.05 million and 1.43 million IPv4 and IPv6 links, respectively, spanning 91% of all active cables. In the absence of ground truth data, we validate Nautilus mapping using three techniques: analyzing past cable failures, using targeted traceroute measurements, and comparing with public network maps of two operators.

A Hop Away from Everywhere: A View of the Intercontinental Long-haul Infrastructure

Esteban Carisimo
Caleb J. Wang
Mia Weaver
Fabián E. Bustamante
Paul Barford

We present a longitudinal study of intercontinental long-haul links (LHLs) - links with latencies significantly higher than that of all other links in a traceroute path. Our study is motivated by the recognition of these LHLs as a network-layer manifestation of critical transoceanic undersea cables. We present a methodology and associated processing system for identifying long-haul links in traceroute measurements. We apply this system to a large corpus of traceroute data and report on multiple aspects of long haul connectivity including country-level prevalence, routers as international gateways, preferred long-haul destinations, and the evolution of these characteristics over a 7 year period. We identify 85,620 layer-3 links (out of 2.7M links in a large traceroute dataset) that satisfy our definition for intercontinental long haul with many of them terminating in a relatively small number of nodes. An analysis of connected components shows a clearly dominant component with a relative size that remains stable despite a significant growth of the long-haul infrastructure.

Near-Optimal Stochastic Bin-Packing in Large Service Systems with Time-Varying Item Sizes

Yige Hong
Qiaomin Xie
Weina Wang

In modern computing systems, jobs' resource requirements often vary over time. Accounting for this temporal variability during job scheduling is essential for meeting performance goals. However, theoretical understanding on how to schedule jobs with time-varying resource requirements is limited. Motivated by this gap, we propose a new setting of the stochastic bin-packing problem in service systems that allows for time-varying job resource requirements, also referred to as 'item sizes' in traditional bin-packing terms. In this setting, a job or 'item' must be dispatched to a server or 'bin' upon arrival. Its resource requirement may vary over time while in service, following a Markovian assumption. Once the job's service is complete, it departs from the system. Our goal is to minimize the expected number of active servers, or 'non-empty bins', in steady state.

Under our problem formulation, we develop a job dispatch policy, named Join-Reqesting-Server (JRS). Broadly, JRS lets each server independently evaluate its current job configuration and decide whether to accept additional jobs, balancing the competing objectives of maximizing throughput and minimizing the risk of resource capacity overruns. The JRS dispatcher then utilizes these individual evaluations to decide which server to dispatch each arriving job to. The theoretical performance guarantee of JRS is in the asymptotic regime where the job arrival rate scales large linearly with respect to a scaling factor r. We show that JRS achieves an additive optimality gap of O(√r) in the objective value, where the optimal objective value is Θ(r). When specialized to constant job resource requirements, our result improves upon the state-of-the-art o(r) optimality gap. Our technical approach highlights a novel policy conversion framework that reduces the policy design problem into a single-server problem.

Kernel vs. User-Level Networking: Don't Throw Out the Stack with the Interrupts

Peter Cai
Martin Karsten

This paper reviews the performance characteristics of network stack processing for communication-heavy server applications. Recent literature often describes kernel-bypass and user-level networking as a silver bullet to attain substantial performance improvements, but without providing a comprehensive understanding of how exactly these improvements come about. We identify and quantify the direct and indirect costs of asynchronous hardware interrupt requests (IRQ) as a major source of overhead. While IRQs and their handling have a substantial impact on the effectiveness of the processor pipeline and thereby the overall processing efficiency, their overhead is difficult to measure directly when serving demanding workloads. This paper presents an indirect methodology to assess IRQ overhead by constructing preliminary approaches to reduce the impact of IRQs. While these approaches are not suitable for general deployment, their corresponding performance observations indirectly confirm the conjecture. Based on these findings, a small modification of a vanilla Linux system is devised that improves the efficiency and performance of traditional kernel-based networking significantly, resulting in up to 45% increased throughput without compromising tail latency. In case of server applications, such as web servers or Memcached, the resulting performance is comparable to using kernel-bypass and user-level networking when using stacks with similar functionality and flexibility.

Near-Optimal Packet Scheduling in Multihop Networks with End-to-End Deadline Constraints

Christos Tsanikidis
Javad Ghaderi

Scheduling packets with end-to-end deadline constraints in multihop networks is an important problem that has been notoriously difficult to tackle. Recently, there has been progress on this problem in the worst-case traffic setting, with the objective of maximizing the number of packets delivered within their deadlines. Specifically, the proposed algorithms were shown to achieve Ω(1/log(L)) fraction of the optimal objective value if the minimum link capacity in the network is Cmin =Ω(log (L)), where L is the maximum length of a packet's route in the network (which is bounded by the packet's maximum deadline). However, such guarantees can be quite pessimistic due to the strict worst-case traffic assumption and may not accurately reflect real-world settings. In this work, we aim to address this limitation by exploring whether it is possible to design algorithms that achieve a constant fraction of the optimal value while relaxing the worst-case traffic assumption. We provide a positive answer by demonstrating that in stochastic traffic settings, such as i.i.d. packet arrivals, near-optimal, (1-ε)-approximation algorithms can be designed if Cmin = Ω(log (L/ε)/ε2). To the best of our knowledge, this is the first result that shows this problem can be solved near-optimally under nontrivial assumptions on traffic and link capacity. We further present extended simulations using real network traces with non-stationary traffic, which demonstrate that our algorithms outperform worst-case-based algorithms in practical settings.

Miracle or Mirage? A Measurement Study of NFT Rug Pulls

Jintao Huang
Ningyu He
Kai Ma
Jiang Xiao
Haoyu Wang

NFT rug pull is one of the most prominent type of NFT scam, whose definition is that the developers of an NFT project abandon it and run away with investors' funds. Although they have drawn attention from our community, to the best of our knowledge, the NFT rug pulls have not been systematically measured. To fill the void, this paper presents the first in-depth measurement study of NFT rug pulls. Specifically, we first compile a list of 253 known NFT rug pulls as our initial confirmed rug pulls (i.e., ground truth), based on which we perform a pilot study, highlighting the key symptoms of NFT rug pulls. Then, we design an effective rule-based detector to measure the prevalence of NFT rug pulls in the ecosystem. We have labelled 7,487 happened NFT rug pull projects which were not revealed by our community. To eliminate the potential damage brought by rug pull scams, we take a step further towards designing a real-time prediction model to proactively identify the potential rug pull projects in an early stage ahead of the scam happens. We have implemented a prototype system, and deployed it in the real-world setting for over 6 months. Our system has raised alarms for 5,557 new NFT projects by the time of this writing, which works as a whistle blower that pinpoints rug pull scams timely, thus mitigating the impacts.

Towards Understanding and Characterizing the Arbitrage Bot Scam In the Wild

Kai Li
Shixuan Guan
Darren Lee

This paper presents the first comprehensive analysis of an emerging cryptocurrency scam named "arbitrage bot" disseminated on online social networks. The scam revolves around Decentralized Exchanges (DEX) arbitrage and aims to lure victims into executing a so-called "bot contract" to steal funds from them. To entice victims and convince them of this scheme, we found that scammers have flocked to publish YouTube videos to demonstrate plausible profits and provide detailed instructions and links to the bot contract.

To collect the scam at a large scale, we developed a fully automated scam detection system namedCryptoScamHunter, which continuously collects YouTube videos and automatically detects scams. Meanwhile,CryptoScamHunter can download the source code of the bot contract from the provided links and extract the associated scam cryptocurrency address. Through deployingCryptoScamHunter from Jun. 2022 to Jun. 2023, we have detected 10,442 arbitrage bot scam videos published from thousands of YouTube accounts. Our analysis reveals that different strategies have been utilized in spreading the scam, including crafting popular accounts, registering spam accounts, and using obfuscation tricks to hide the real scam address in the bot contracts. Moreover, from the scam videos we have collected over 800 malicious bot contracts with source code and extracted 354 scam addresses. By further expanding the scam addresses with a similar contract matching technique, we have obtained a total of 1,697 scam addresses. Through tracing the transactions of all scam addresses on the Ethereum mainnet and Binance Smart Chain, we reveal that over 25,000 victims have fallen prey to this scam, resulting in a financial loss of up to 15 million USD.

Overall, our work sheds light on the dissemination tactics and censorship evasion strategies adopted in the arbitrage bot scam, as well as on the scale and impact of such a scam on online social networks and blockchain platforms, emphasizing the urgent need for effective detection and prevention mechanisms against such fraudulent activity.

Change Point Detection with Adaptive Measurement Schedules for Network Performance Verification

Simon Lindståhl
Alexandre Proutiere
Andreas Johnsson

When verifying that a communications network fulfills its specified performance, it is critical to note sudden shifts in network behavior as quickly as possible. Change point detection methods can be useful in this endeavor, but classical methods rely on measuring with a fixed measurement period, which can often be suboptimal in terms of measurement costs. In this paper, we extend the existing framework of change point detection with a notion of physical time. Instead of merely deciding when to stop, agents must now also decide at which future time to take the next measurement. Agents must now minimize the necessary number of measurements pre- and post-change, while maintaining a trade-off between post-change delay and false alarm rate. We establish, through this framework, the suboptimality of typical periodic measurements and propose a simple alternative, called crisis mode agents. We show analytically that crisis mode agents significantly outperform periodic measurements schemes. We further verify this in numerical evaluation, both on an array of synthetic change point detection problems as well as on the problem of detecting traffic load changes in a 5G test bed through end-to-end RTT measurements.

HEAL: Performance Troubleshooting Deep inside Data Center Hosts

Yicheng Pan
Yang Zhang
Tingzhu Bi
Linlin Han
Yu Zhang
Meng Ma
Xiangzhuang Shen
Xinrui Jiang
Feng Wang
Xian Liu
Ping Wang

This study demonstrates the salient facts and challenges of host failure operations in hyperscale data centers. A host incident can involve hundreds of distinct host-level metrics, covering broad aspects. The faulting mechanism inside the host connects these heterogeneous metrics through direct and indirect correlation, making it extremely difficult to sort out the propagation procedures and the root cause from these intertwined indicators. To deeply understand the failure mechanism inside the host, we develop HEAL -- a novel host metrics analysis toolkit. HEAL synergistically discovers dynamic causality in sparse heterogeneous host metrics by combining the strengths of both time series and random variable analysis. It can also proactively extract causal directional hints from causality's asymmetry and historical knowledge. Together, these breakthroughs help HEAL produce accurate results given undesirable inputs. Extensive experiments in our production environment verify that HEAL provides significantly better result accuracy and full-process interpretability than the SOTA baselines. With these advantages, HEAL successfully serves our data center and worldwide product operations and impressively contributes to many other workflows.

MetaVRadar: Measuring Metaverse Virtual Reality Network Activity

Minzhao Lyu
Rahul Dev Tripathi
Vijay Sivaraman

The "metaverse", wherein users can enter virtual worlds to work, study, play, shop, socialize, and entertain, is fast becoming a reality, attracting billions of dollars in investment from companies such as Meta, Microsoft, and Clipo Labs. Further, virtual reality (VR) headsets from entities like Oculus, HTC, and Microsoft are rapidly maturing to provide fully immersive experiences to metaverse users. However, little is known about the network dynamics of metaverse VR applications in terms of service domains, flow counts, traffic rates and volumes, content location and latency, etc., which are needed to make telecommunications network infrastructure "metaverse ready" to support superlative user experience in the coming future.

This paper is an empirical measurement study of metaverse VR network behavior aimed at helping telecommunications network operators better provision and manage the network to ensure good user experience. Using illustrative hour-long network traces of metaverse sessions on the Oculus VR headset, we first develop a categorization of user activity into distinct states ranging from login home to streetwalking and event attendance to asset trading, and undertake a detailed analysis of network traffic per state, identifying unique service domains, protocols, flow profiles, and volumetric patterns, thereby highlighting the vastly more complex nature of a metaverse session compared to streaming video or gaming. Armed with the network behavioral profiles, our second contribution develops a real-time methodMetaVRadar to detect metaverse session and classify the user activity state leveraging formalized flow signatures and volumetric attributes. Our third contribution practically implementsMetaVRadar, evaluates its accuracy in our lab environment, and demonstrates its usability in a large university network so operators can better monitor and plan resources to support requisite metaverse user experience.

GuaNary: Efficient Buffer Overflow Detection In Virtualized Clouds Using Intel EPT-based Sub-Page Write Protection Support

Stella Bitchebe
Yves Kone
Pierre Olivier
Jalil Boukhobza
Yérom-David Bromberg
Daniel Hagimont
Alain Tchana

Write buffer overflow is a widespread and prevalent memory safety violation in C/C++, reported as the top vulnerability in 2022 and 2023. Secure memory allocators are generally used to protect systems against attacks that may exploit buffer overflows. Existing allocators mainly rely on two types of countermeasures to prevent or detect write overflows: canaries and guard pages, each with pros and cons in terms of detection latency and memory footprint.

For virtualized cloud applications, this paper follows the Out of Hypervisor (OoH) trend and introduces GuaNary, a safety guard against write overflows, allowing synchronous detection at a low memory footprint cost. OoH is a new virtualization research axis introduced in 2022 advocating the exposure of hardware features for virtualization to the guest OS so that its processes can take advantage of them. Based on the OoH principle, GuaNary leverages Intel Sub-Page write Permission (SPP), a recent hardware virtualization feature that allows to write-protect guest memory at the granularity of 128B (namely, sub-page) instead of 4KB. We implement a software stack, LeanGuard, which promotes the utilization of SPP from inside virtual machines by new secure allocators that use GuaNary. Our evaluation shows that for the same number of protected buffers, LeanGuard consumes 8.3× less memory than SlimGuard, a recent state-of-art secure allocator. Further, for the same memory consumption, LeanGuard allows protecting 25× more buffers than SlimGuard.

CarbonScaler: Leveraging Cloud Workload Elasticity for Optimizing Carbon-Efficiency

Walid A. Hanafy
Qianlin Liang
Noman Bashir
David Irwin
Prashant Shenoy

Cloud platforms are increasing their emphasis on sustainability and reducing their operational carbon footprint. A common approach for reducing carbon emissions is to exploit the temporal flexibility inherent to many cloud workloads by executing them in periods with the greenest energy and suspending them at other times. Since such suspend-resume approaches can incur long delays in job completion times, we present a new approach that exploits the elasticity of batch workloads in the cloud to optimize their carbon emissions. Our approach is based on the notion of "carbon scaling," similar to cloud autoscaling, where a job dynamically varies its server allocation based on fluctuations in the carbon cost of the grid's energy. We develop a greedy algorithm for minimizing a job's carbon emissions via carbon scaling that is based on the well-known problem of marginal resource allocation. We implement a CarbonScaler prototype in Kubernetes using its autoscaling capabilities and an analytic tool to guide the carbon-efficient deployment of batch applications in the cloud. We then evaluate CarbonScaler using real-world machine learning training and MPI jobs on a commercial cloud platform and show that it can yield i) 51% carbon savings over carbon-agnostic execution; ii) 37% over a state-of-the-art suspend-resume policy; and iii) 8 over the best static scaling policy.

Optimized Cross-Path Attacks via Adversarial Reconnaissance

Yudi Huang
Yilei Lin
Ting He

While softwarization and virtualization technologies make modern communication networks appear easier to manage, they also introduce highly complex interactions within the networks that can cause unexpected security threats. In this work, we study a particular security threat due to the sharing of links between high-security paths and low-security paths, which enables a new type of DoS attacks, called cross-path attacks, that indirectly attack a set of targeted high-security paths (target paths) by congesting the shared links through a set of attacker-controlled low-security paths (attack paths). While the feasibility of such attacks has been recently demonstrated in the context of SDN, their potential performance impact has not been characterized. To this end, we develop an approach for designing an optimized cross-path attack under a constrained total attack rate, consisting of (i) novel reconnaissance algorithms that can provide consistent estimates of the locations and parameters of the shared links via network tomography, and (ii) efficient optimization methods to design the optimal allocation of attack rate over the attack paths to maximally degrade the performance of the target paths. The proposed attack has achieved a significantly larger performance impact than its non-optimized counterparts in extensive evaluations based on multiple network settings, signaling the importance of addressing such intelligent attacks in network design.

A Large Scale Study and Classification of VirusTotal Reports on Phishing and Malware URLs

Euijin Choo
Mohamed Nabeel
Doowon Kim
Ravindu De Silva
Ting Yu
Issa Khalil

VirusTotal (VT) is a widely used scanning service for researchers and practitioners to label malicious entities and predict new security threats. Unfortunately, it is little known to the end-users how VT URL scanners decide on the maliciousness of entities and the attack types they are involved in (e.g., phishing or malware-hosting websites). In this paper, we conduct a systematic comparative study on VT URL scanners' behavior for different attack types of malicious URLs, in terms of 1) detection specialties, 2) stability, 3) correlations between scanners, and 4) lead/lag behaviors. Our findings highlight that the VT scanners commonly disagree with each other on their detection and attack type classification, leading to challenges in ascertaining the maliciousness of a URL and taking prompt mitigation actions according to different attack types. This motivates us to present a new highly accurate classifier that helps correctly identify the attack types of malicious URLs at the early stage. This in turn assists practitioners in performing better threat aggregation and choosing proper mitigation actions for different attack types

Sampling for Remote Estimation of the Wiener Process over an Unreliable Channel

Jiayu Pan
Yin Sun
Ness B. Shroff

In this paper, we study a sampling problem where a source takes samples from a Wiener process and transmits them through a wireless channel to a remote estimator. Due to channel fading, interference, and potential collisions, the packet transmissions are unreliable and could take random time durations. Our objective is to devise an optimal causal sampling policy that minimizes the long-term average mean square estimation error. This optimal sampling problem is a recursive optimal stopping problem, which is generally quite difficult to solve. However, we prove that the optimal sampling strategy is, in fact, a simple threshold policy where a new sample is taken whenever the instantaneous estimation error exceeds a threshold. This threshold remains a constant value that does not vary over time. By exploring the structure properties of the recursive optimal stopping problem, a low-complexity iterative algorithm is developed to compute the optimal threshold. This work generalizes previous research by incorporating both transmission errors and random transmission times into remote estimation. Numerical simulations are provided to compare our optimal policy with the zero-wait and age-optimal policies.

Invertible Bloom Lookup Tables with Listing Guarantees

Avi Mizrahi
Daniella Bar-Lev
Eitan Yaakobi
Ori Rottenstreich

The Invertible Bloom Lookup Table (IBLT) is a probabilistic concise data structure for set representation that supports a listing operation as the recovery of the elements in the represented set. Its applications can be found in network synchronization and traffic monitoring as well as in error-correction codes. IBLT can list its elements with probability affected by the size of the allocated memory and the size of the represented set, such that it can fail with small probability even for relatively small sets. While previous works only studied the failure probability of IBLT, this work initiates the worst case analysis of IBLT that guarantees successful listing for all sets of a certain size. The worst case study is important since the failure of IBLT imposes high overhead. We describe a novel approach that guarantees successful listing when the set satisfies a tunable upper bound on its size. To allow that, we develop multiple constructions that are based on various coding techniques such as stopping sets and the stopping redundancy of error-correcting codes, and Steiner systems as well as new methodologies we develop. We analyze the sizes of IBLTs with listing guarantees obtained by the various methods as well as their mapping memory and runtime overheads. Lastly, we study lower bounds on the achievable sizes of IBLT with listing guarantees and verify the results in the paper by simulations.

Automated Backend Allocation for Multi-Model, On-Device AI Inference

Venkatraman Iyer
Sungho Lee
Semun Lee
Juitem Joonwoo Kim
Hyunjun Kim
Youngjae Shin

On-Device Artificial Intelligence (AI) services such as face recognition, object tracking and voice recognition are rapidly scaling up deployments on embedded, memory-constrained hardware devices. These services typically delegate AI inference models for execution on CPU and GPU computing backends. While GPU delegation is a common practice to achieve high speed computation, the approach suffers from degraded throughput and completion times under multi-model scenarios, i.e. concurrently executing services. This paper introduces a solution to sustain performance in multi-model, on-device AI contexts by dynamically allocating a combination of CPU and GPU backends per model. The allocation is feedback-driven, and guided by a knowledge of model-specific, multi-objective pareto fronts comprising inference latency and memory consumption. Primary contribution of this paper is a backend allocation algorithm that runs online per model, and achieves 25-100% improvement in throughput over static allocations as well as load-balancing scheduler solutions targeting multi-model scenarios. Other noteworthy contributions include a novel pareto front estimator that runs on-device, and also a software-based GPU profiler with a lightweight algorithm to detect changing GPU workloads. Specifically, the pareto front estimator outperforms state of the art algorithms NSGA-II and SPEA2 by 94% on pareto coverage, and by almost 2x on computational overhead.