SIGMETRICS '18: Abstracts of the 2018 ACM International Conference on Measurement and Modeling of Computer Systems

Full Citation in the ACM Digital Library

SESSION: Achievement Lecture

Steady-state Approximations: Achievement Lecture

Diffusion models and mean-field models have been used to approximate many stochastic dynamical systems. A functional strong law of large numbers or a functional central limit theorem justifies such an approximation. Such a result, however, does not ...

SESSION: Emerging Areas

Session details: Emerging Areas

State Dependent Control of Closed Queueing Networks

We study the design of state dependent control for a closed queueing network model, inspired by shared transportation systems such as ridesharing. In particular, we focus on the design of assignment policies, wherein the platform can choose which supply ...

Dandelion++: Lightweight Cryptocurrency Networking with Formal Anonymity Guarantees

Recent work has demonstrated significant anonymity vulnerabilities in Bitcoin's networking stack. In particular, the current mechanism for broadcasting Bitcoin transactions allows third-party observers to link transactions to the IP addresses that ...

Bootstrapped Graph Diffusions: Exposing the Power of Nonlinearity

Graph-based semi-supervised learning (SSL) algorithms predict labels for all nodes based on provided labels of a small set of seed nodes. Classic methods capture the graph structure through some underlying diffusion process that propagates through the ...

The Cost of Uncertainty in Curing Epidemics

Epidemic models are used across biological and social sciences, engineering, and computer science, and have had important impact in the study of the dynamics of human disease and computer viruses, but also trends rumors, viral videos, and most recently ...

The Price of Fragmentation in Mobility-on-Demand Services

Mobility-on-Demand platforms are a fast growing component of the urban transit ecosystem. Though a growing literature addresses the question of how to make individual MoD platforms more efficient, less is known about the cost of market fragmentation, ...

Censored Demand Estimation in Retail

In this paper, the question of interest is estimating true demand of a product at a given store location and time period in the retail environment based on a single noisy and potentially censored observation. To address this question, we introduce a non-...

SESSION: Keynote

New Metrics and Models for a Post-ISA Era: Managing Complexity and Scaling Performance in Heterogeneous Parallelism and Internet-of-Things

Pushed by both application and technology trends, today's computer systems employ unprecedented levels of heterogeneity, parallelism, and complexity as they seek to extend performance scaling and support new application domains. From datacenters to ...

SESSION: Resource Management I

Session details: Resource Management I

Delay Scaling in Many-Sources Wireless Networks without Queue State Information

We examine a canonical scenario where several wireless data sources generate sporadic delay-sensitive messages that need to be transmitted to a common access point. The access point operates in a time-slotted fashion, and can instruct the various ...

Practical Bounds on Optimal Caching with Variable Object Sizes

Many recent caching systems aim to improve miss ratios, but there is no good sense among practitioners of how much further miss ratios can be improved. In other words, should the systems community continue working on this problem? Currently, there is no ...

On Resource Pooling and Separation for LRU Caching

Caching systems using the Least Recently Used (LRU) principle have now become ubiquitous. A fundamental question for these systems is whether the cache space should be pooled together or divided to serve multiple flows of data item requests in order to ...

An Optimal Randomized Online Algorithm for QoS Buffer Management

The QoS buffer management problem, with significant and diverse computer applications, e.g., in online cloud resource allocation problems, is a classic online admission control problem in the presence of resource constraints. In its basic setting, ...

SESSION: Scheduling I

Session details: Scheduling I

Minimizing Queue Length Regret Under Adversarial Network Models

Stochastic models have been dominant in network optimization theory for over two decades, due to their analytical tractability. However, these models fail to capture non-stationary or even adversarial network dynamics which are of increasing importance ...

Dynamic Proportional Sharing: A Game-Theoretic Approach

Sharing computational resources amortizes cost and improves utilization and efficiency. When agents pool their resources, each becomes entitled to a portion of the shared pool. Static allocations in each round can guarantee entitlements and are strategy-...

SOAP: One Clean Analysis of All Age-Based Scheduling Policies

We consider an extremely broad class of M/G/1 scheduling policies called SOAP: Schedule Ordered by Age-based Priority. The SOAP policies include almost all scheduling policies in the literature as well as an infinite number of variants which have never ...

A Whittle's Index Based Approach for QoE Optimization in Wireless Networks

The design of schedulers to optimize heterogeneous users' Quality of Experience (QoE) remains a challenging and important problem for wireless systems. Our paper ([1]) explores three inter-related aspects of this problem: 1) non-linear relationships ...

SESSION: Keynote

Inherent Trade-Offs in Algorithmic Fairness

Recent discussion in both the academic literature and the public sphere about classification by algorithms has involved tension between competing notions of what it means for such a classification to be fair to different groups. We consider several of ...

SESSION: Learning I

Session details: Learning I

An Optimal Algorithm for Online Non-Convex Learning

In many online learning paradigms, convexity plays a central role in the derivation and analysis of online learning algorithms. The results, however, fail to be extended to the non-convex settings, which are necessitated by tons of recent applications. ...

Asymptotic Optimal Control of Markov-Modulated Restless Bandits

This paper studies optimal control subject to changing conditions. This is an area that recently received a lot of attention as it arises in numerous situations in practice. Some applications being cloud computing systems with fluctuating arrival rates, ...

Online Learning of Optimally Diverse Rankings

Search engines answer users' queries by listing relevant items (e.g. documents, songs, products, web pages, ...). These engines rely on algorithms that learn to rank items so as to present an ordered list maximizing the probability that it contains ...

Learning Proportionally Fair Allocations with Low Regret

We address the problem of learning Proportionally Fair (PF) allocations in parallel server systems with unknown service rates. We provide the first algorithms, to our knowledge, for learning such allocations with sub-linear regret.

Multi-armed Bandit with Additional Observations

We study multi-armed bandit (MAB) problems with additional observations, where in each round, the decision maker selects an arm to play and can also observe rewards of additional arms (within a given budget) by paying certain costs. We propose ...

Online Learning in Weakly Coupled Markov Decision Processes: A Convergence Time Study

We consider multiple parallel Markov decision processes (MDPs) coupled by global constraints, where the time varying objective and constraint functions can only be observed after the decision is made. Special attention is given to how well the decision ...

SESSION: Cloud

Hound: Causal Learning for Datacenter-scale Straggler Diagnosis

Stragglers are exceptionally slow tasks within a job that delay its completion. Stragglers, which are uncommon within a single job, are pervasive in datacenters with many jobs. We present Hound, a statistical machine learning framework that infers the ...

Working Set Size Estimation Techniques in Virtualized Environments: One Size Does not Fit All

Energy consumption is a primary concern for datacenters' management. Numerous datacenters are relying on virtualization, as it provides flexible resource management means such as virtual machine (VM) checkpoint/restart, migration and consolidation. ...

PreFix: Switch Failure Prediction in Datacenter Networks

In modern datacenter networks (DCNs), failures of network devices are the norm rather than the exception, and many research efforts have focused on dealing with failures after they happen. In this paper, we take a different approach by predicting ...

On Non-Preemptive VM Scheduling in the Cloud

We study the problem of scheduling VMs (Virtual Machines) in a distributed server platform, motivated by cloud computing applications. The VMs arrive dynamically over time to the system, and require a certain amount of resources (e.g. memory, CPU, etc) ...

Why Some Like It Loud: Timing Power Attacks in Multi-tenant Data Centers Using an Acoustic Side Channel

The common practice of power infrastructure oversubscription in data centers exposes dangerous vulnerabilities to well-timed power attacks (i.e., maliciously timed power loads), possibly creating outages and resulting in multimillion-dollar losses. In ...

ECI-Cache: A High-Endurance and Cost-Efficient I/O Caching Scheme for Virtualized Platforms

In recent years, high interest in using Virtual Machines (VMs) in data centers and cloud computing has significantly increased the demand for high-performance data storage systems. A straightforward approach to providing a high-performance storage ...

SESSION: Networking

Session details: Networking

Supporting Mobile VR in LTE Networks: How Close Are We?

In recent years, we have witnessed a boom in virtual reality (VR). 21 million wearable VR headsets are projected to be shipped in 2017, resulting in $4.9 billion revenue [3]. Among all the options, the mobile VR empowered by phones is most popular, ...

Tomographic Node Placement Strategies and the Impact of the Routing Model

Tomographic techniques can be used for the fast detection of link failures at low cost. Our paper studies the impact of the routing model on tomographic node placement costs. We present a taxonomy of path routing models and provide optimal and near-...

LTERadar: Towards LTE-Aware Wi-Fi Access Points

Major LTE hardware vendors (e.g. Qualcomm, Ericsson), mobile service providers (e.g. Verizon, T-Mobile), and standardization bodies (e.g. LTE-U forum, 3GPP) are extending LTE networks into unlicensed spectrum bands to boost the speeds and coverage of ...

Network Resilience and the Length-Bounded Multicut Problem: Reaching the Dynamic Billion-Scale with Guarantees

Motivated by networked systems in which the functionality of the network depends on vertices in the network being within a bounded distance T of each other, we study the length-bounded multicut problem: given a set of pairs, find a minimum-size set of ...

Predictive Impact Analysis for Designing a Resilient Cellular Backhaul Network

Backhaul transport network design and optimization for cellular service providers involve a unique challenge stemming from the fact that an end-user's equipment (UE) is within the radio reach of multiple cellular towers: It is hard to evaluate the ...

Synthesis of Fault-Tolerant Distributed Router Configurations

Operators of modern networks require support for diverse and complex end-to-end policies, such as, middlebox traversals, isolation, and traffic engineering. While Software-defined Networking (SDN) provides centralized custom routing functionality in ...

SESSION: Learning II

Session details: Learning II

Reinforcement with Fading Memories

We study the effect of imperfect memory on decision making in the context of a stochastic sequential action-reward problem. An agent chooses a sequence of actions which generate discrete rewards at different rates. She is allowed to make new choices at ...

On the Convergence Rate of Distributed Gradient Methods for Finite-Sum Optimization under Communication Delays

Motivated by applications in machine learning and statistics, we study distributed optimization problems over a network of processors, where the goal is to optimize a global objective composed of a sum of local functions. In these problems, due to the ...

Distributed Statistical Machine Learning in Adversarial Settings: Byzantine Gradient Descent

We consider the distributed statistical learning problem over decentralized systems that are prone to adversarial attacks. This setup arises in many practical applications, including Google's Federated Learning. Formally, we focus on a decentralized ...

Neural Network Meets DCN: Traffic-driven Topology Adaptation with Deep Learning

The emerging optical/wireless topology reconfiguration technologies have shown great potential in improving the performance of data center networks. However, it also poses a big challenge on how to find the best topology configurations to support the ...

SESSION: Systems

Session details: Systems

The CSI Framework for Compiler-Inserted Program Instrumentation

The CSI framework provides comprehensive static instrumentation that a compiler can insert into a program-under-test so that dynamic-analysis tools - memory checkers, race detectors, cache simulators, performance profilers, code-coverage analyzers, etc. ...

A Quantitative Evaluation of Contemporary GPU Simulation Methodology

Contemporary Graphics Processing Units (GPUs) are used to accelerate highly parallel compute workloads. For the last decade, researchers in academia and industry have used cycle-level GPU architecture simulators to evaluate future designs. This paper ...

Improving 3D NAND Flash Memory Lifetime by Tolerating Early Retention Loss and Process Variation

Compared to planar NAND flash memory, 3D NAND flash memory uses a new flash cell design, and vertically stacks dozens of silicon layers in a single chip. This allows 3D NAND flash memory to increase storage density using a much less aggressive ...

A Fine-grained Event-based Modem Power Model for Enabling In-depth Modem Energy Drain Analysis

Cellular modems enable ubiquitous Internet connectivities to modern smartphones, but in doing so they become a major contributor to the smartphone energy drain. Understanding modem energy drain requires a detailed power model. The prior art, an RRC-...

What Your DRAM Power Models Are Not Telling You: Lessons from a Detailed Experimental Study

Main memory (DRAM) consumes as much as half of the total system power in a computer today, due to the increasing demand for memory capacity and bandwidth. There is a growing need to understand and analyze DRAM power consumption, which can be used to ...

Intel MPX Explained: A Cross-layer Analysis of the Intel MPX System Stack

Memory-safety violations are the primary cause of security and reliability issues in software systems written in unsafe languages. Given the limited adoption of decades-long research in software-based memory safety approaches, as an alternative, Intel ...

SESSION: Load Balancing

Session details: Load Balancing

A Refined Mean Field Approximation

Stochastic models have been used to assess the performance of computer (and other) systems for many decades. As a direct analysis of large and complex stochastic models is often prohibitive, approximations methods to study their behavior have been ...

On the Power-of-d-choices with Least Loaded Server Selection

Motivated by distributed schedulers that combine the power-of-d-choices with late binding and systems that use replication with cancellation-on-start, we study the performance of the LL(d) policy which assigns a job to a server that currently has the ...

Degree of Queue Imbalance: Overcoming the Limitation of Heavy-traffic Delay Optimality in Load Balancing Systems

In this paper, we argue that heavy-traffic delay optimality is a coarse metric that does not necessarily imply good delay performance. Specifically, we show that any load balancing scheme is heavy-traffic delay optimal as long as it satisfies a fairly ...

Towards Optimality in Parallel Job Scheduling

To keep pace with Moore's law, chip designers have focused on increasing the number of cores per chip. To effectively leverage these multi-core chips, one must decide how many cores to assign to each job. Given that jobs receive sublinear speedups from ...

On a Class of Stochastic Multilayer Networks

In this paper, we introduce a new class of stochastic multilayer networks. A stochastic multilayer network is the aggregation of M networks (one per layer) where each is a subgraph of a foundational network G . Each layer network is the result of ...

Fork and Join Queueing Networks with Heavy Tails: Scaling Dimension and Throughput Limit

Parallel and distributed computing systems are foundational to the success of cloud computing and big data analytics. Fork-Join Queueing Networks with Blocking (FJQN/Bs) are natural models for such systems. While engineering solutions have long been ...

SESSION: Resource Management II

Session details: Resource Management II

Performance of Balanced Fairness in Resource Pools: A Recursive Approach

Understanding the performance of a pool of servers is crucial for proper dimensioning. One of the main challenges is to take into account the complex interactions between servers that are pooled to process jobs. In particular, a job can generally not be ...

Designing Low-Complexity Heavy-Traffic Delay-Optimal Load Balancing Schemes: Theory to Algorithms

We establish a unified analytical framework for designing load balancing algorithms that can simultaneously achieve low latency, low complexity, and low communication overhead. We first propose a general class ¶ of load balancing policies and prove that ...

Towards Fast-Convergence, Low-Delay and Low-Complexity Network Optimization

Distributed network optimization has been studied several years. However, we still do not have a good idea of how to design schemes that can simultaneously provide good performance across the dimensions of utility optimality, convergence speed, and ...

The PDE Method for the Analysis of Randomized Load Balancing Networks

We introduce a new framework for the analysis of large-scale load balancing networks with general service time distributions, motivated by applications in server farms, distributed memory machines, cloud computing and communication systems. For a ...

Safe Randomized Load-Balanced Switching by Diffusing Extra Loads

Load-balanced switch architectures are known to be scalable in both size and speed, which is of interest due to the continued exponential growth in Internet traffic. However, the main drawback of load-balanced switches is that packets can depart out of ...

Asymptotically Optimal Load Balancing Topologies

We consider a system of N ~servers inter-connected by some underlying graph topology~G N . Tasks with unit-mean exponential processing times arrive at the various servers as independent Poisson processes of rate lambda. Each incoming task is irrevocably ...

SESSION: Performance Evaluation Review

ACM Sigmetrics Performance Evaluation Review: A New Series on Diversity

This editorial announces a new series on diversity in the ACM Sigmetrics Performance Evaluation Review (PER). In several upcoming and future issues we will feature invited articles on diversity from authors in the performance evaluation community, but ...

Diversity in Faculty Recruiting: A WiSE Approach

In this article, we share approaches and practices that we have found to be successful in supporting women in STEM fields, based on our experience with the University of Southern California's (USC) Women in Science and Engineering Program (WiSE). ...