Deep learning has rapidly come to dominate AI and machine learning in the past decade. These successes have come despite deep learning largely being a "black box." A small subdiscipline has grown up trying to derive better understanding of the underlying mathematical properties. Via a tour d'horizon of recent theoretical analyses of deep learning in some concrete settings, we illustrate how the black box view can miss out on (or even be wrong about) special phenomena going on during training. These phenomena are also not captured by the training objective. We argue that understanding such phenomena via mathematical understanding will be crucial for enabling the full range of future applications.

We study Federated Bandit, a decentralized Multi-Armed Bandit (MAB) problem with a set of N agents, who can only communicate their local data with neighbors described by a connected graph G. Each agent makes a sequence of decisions on selecting an arm from M candidates, yet they only have access to local and potentially biased feedback/evaluation of the true reward for each action taken. Learning only locally will lead agents to sub-optimal actions while converging to a no-regret strategy requires a collection of distributed data. Motivated by the proposal of federated learning, we aim for a solution with which agents will never share their local observations with a central entity, and will be allowed to only share a private copy of his/her own information with their neighbors. We first propose a decentralized bandit algorithm GossipUCB, which is a coupling of variants of both the classical gossiping algorithm and the celebrated Upper Confidence Bound (UCB) bandit algorithm. We show that GossipUCB successfully adapts local bandit learning into a global gossiping process for sharing information among connected agents, and achieves guaranteed regret at the order of O(max(poly(N,M) log T, poly(N,M) logλ2-1 N)) for all N agents, where λ2∈(0,1) is the second largest eigenvalue of the expected gossip matrix, which is a function of G. We then propose FedUCB, a differentially private version of GossipUCB, in which the agents preserve ε-differential privacy of their local data while achieving O(max poly(N,M)/ε log2.5 T, poly(N,M) (logλ2-1 N + log T)) regret.

We consider combinatorial semi-bandits over a set X ⊂ (0,1)d where rewards are uncorrelated across items. For this problem, the algorithm ESCB yields the smallest known regret bound R(T) = O ( d (łn m)2 (łn T) øver Δmin) after T rounds, where m = maxx ∈ X 1Tx. However, ESCB has computational complexity O(|X|), which is typically exponential in d, and cannot be used in large dimensions. We propose the first algorithm that is both computationally and statistically efficient for this problem with regret R(T) = O (d (łn m)2 (łn T)øver Δmin) and computational asymptotic complexity O( δT-1 poly (d)), where δT is a function which vanishes arbitrarily slowly. Our approach involves carefully designing AESCB, an approximate version of ESCB with the same regret guarantees. We show that, whenever budgeted linear maximization over X can be solved up to a given approximation ratio, AESCB is implementable in polynomial time O (δT-1 poly (d)) by repeatedly maximizing a linear function over X subject to a linear budget constraint, and showing how to solve these maximization problems efficiently. Additional algorithms, proofs and numerical experiments are given in the complete version of this work.

We consider a two-controller online control problem where a central controller chooses an action from a feasible set that is determined by time-varying and coupling constraints, which depend on all past actions and states. The central controller's goal is to minimize the cumulative cost; however, the controller has access to neither the feasible set nor the dynamics directly, which are determined by a remote local controller. Instead, the central controller receives only an aggregate summary of the feasibility information from the local controller, which does not know the system costs. We show that it is possible for an online algorithm using feasibility information to nearly match the dynamic regret of an online algorithm using perfect information whenever the feasible sets satisfy a causal invariance criterion and there is a sufficiently large prediction window size. To do so, we use a form of feasibility aggregation based on entropic maximization in combination with a novel online algorithm, named Penalized Predictive Control (PPC).

The cloud computing industry has grown rapidly over the last decade, and with this growth there is a significant increase in demand for compute resources. Demand is manifested in the form of Virtual Machine (VM) requests, which need to be assigned to physical machines in a way that minimizes resource fragmentation and efficiently utilizes the available machines. This problem can be modeled as a dynamic version of the bin packing problem with the objective of minimizing the total usage time of the bins (physical machines). Motivated by advances in Machine Learning that provide good estimates of workload characteristics, this paper studies the effect of having extra information about future (total) demand. We show that the competitive factor can be dramatically improved with this additional information; in some cases, we achieve constant competitiveness, or even a competitive factor that approaches 1. Along the way, we design new offline algorithms with improved approximation ratios for the dynamic bin-packing problem.

The First-Come First-Served (FCFS) scheduling policy is the most popular scheduling algorithm used in practice. Furthermore, its usage is theoretically validated: for light-tailed job size distributions, FCFS has weakly optimal asymptotic tail of response time. But what if we don't just care about the asymptotic tail? What if we also care about the 99th percentile of response time, or the fraction of jobs that complete in under one second? Is FCFS still best? Outside of the asymptotic regime, only loose bounds on the tail of FCFS are known, and optimality is completely open.

In this paper, we introduce a new policy, Nudge, which is the first policy to provably stochastically improve upon FCFS. We prove that Nudge simultaneously improves upon FCFS at every point along the tail, for light-tailed job size distributions. As a result, Nudge outperforms FCFS for every moment and every percentile of response time. Moreover, Nudge provides a multiplicative improvement over FCFS in the asymptotic tail. This resolves a long-standing open problem by showing that, counter to previous conjecture, FCFS is not strongly asymptotically optimal.

This paper represents an abridged version of [2].

Cloud computing today is dominated by multi-server jobs. These are jobs that request multiple servers simultaneously and hold onto all of these servers for the duration of the job. Multi-server jobs add a lot of complexity to the traditional one-server-per-job model: an arrival might not "fit" into the available servers and might have to queue, blocking later arrivals and leaving servers idle. From a queueing perspective, almost nothing is understood about multi-server job queueing systems; even understanding the exact stability region is a very hard problem.

In this paper, we investigate a multi-server job queueing model under scaling regimes where the number of servers in the system grows. Specifically, we consider a system with multiple classes of jobs, where jobs from different classes can request different numbers of servers and have different service time distributions, and jobs are served in first-come-first-served order. The multi-server job model opens up new scaling regimes where both the number of servers that a job needs and the system load scale with the total number of servers. Within these scaling regimes, we derive the first results on stability, queueing probability, and the transient analysis of the number of jobs in the system for each class. In particular we derive sufficient conditions for zero queueing. Our analysis introduces a novel way of extracting information from the Lyapunov drift, which can be applicable to a broader scope of problems in queueing systems.

The Gittins scheduling policy minimizes the mean response in the single-server M/G/1 queue in a wide variety of settings. Most famously, Gittins is optimal when preemption is allowed and service requirements are unknown but drawn from a known distribution. Gittins is also optimal much more generally, adapting to any amount of available information and any preemption restrictions. However, scheduling to minimize mean response time in a multiserver setting, specifically the central-queue M/G/k, is a much more difficult problem.

In this work we give the first general analysis of Gittins in the M/G/k. Specifically, we show that under extremely general conditions, Gittins's mean response time in the M/G/k is at most its mean response time in the M/G/1 plus an O(log(1/(1 - ρ))) additive term, where ρ is the system load. A consequence of this result is that Gittins is heavy-traffic optimal in the M/G/k if the service requirement distribution S satisfies E[S2(log S)+] < ∞. This is the most general result on minimizing mean response time in the M/G/k to date.

To prove our results, we combine properties of the Gittins policy and Palm calculus in a novel way. Notably, our technique overcomes the limitations of tagged job methods that were used in prior scheduling analyses.

This work presents the first-ever detailed and large-scale measurement analysis of storage consumption behavior of applications (apps) on smart mobile devices. We start by carrying out a five-year longitudinal static analysis of millions of Android apps to study the increase in their sizes over time and identify various sources of app storage consumption. Our study reveals that mobile apps have evolved as large monolithic packages that are packed with features to monetize/engage users and optimized for performance at the cost of redundant storage consumption.

We also carry out a mobile storage usage study with 140 Android participants. We built and deployed a lightweight context-aware storage tracing tool, called cosmos, on each participant's device. Leveraging the traces from our user study, we show that only a small fraction of apps/features are actively used and usage is correlated to user context. Our findings suggest a high degree of app feature bloat and unused functionality, which leads to inefficient use of storage. Furthermore, we found that apps are not constrained by storage quota limits, and developers freely abuse persistent storage by frequently caching data, creating debug logs, user analytics, and downloading advertisements as needed.

Finally, drawing upon our findings, we discuss the need for efficient mobile storage management, and propose an elastic storage design to reclaim storage space when unused. We further identify research challenges and quantify expected storage savings from such a design. We believe our findings will be valuable to the storage research community as well as mobile app developers.

A new mobile computing paradigm, dubbed mini-app, has been growing rapidly over the past few years since being introduced by WeChat in 2017. In this paradigm, a host app allows its end-users to install and run mini-apps inside itself, enabling the host app to build an ecosystem around (much like Google Play and Apple AppStore), enrich the host's functionalities, and offer mobile users elevated convenience without leaving the host app. It has been reported that there are over millions of mini-apps in WeChat. However, little information is known about these mini-apps at an aggregated level. In this paper, we present MiniCrawler, the first scalable and open-source WeChat mini-app crawler that has indexed over 1,333,308 mini-apps. It leverages a number of reverse engineering techniques to uncover the interfaces and APIs in WeChat for crawling the mini-apps. With the crawled mini-apps, we then measure the resource consumption, API usage, library usage, obfuscation rate, app categorization, and app ratings at an aggregated level in this work.

Accurate prediction of network paths between arbitrary hosts on the Internet is of vital importance for network operators, cloud providers, and academic researchers. We present PredictRoute, a system that predicts network paths between hosts on the Internet using historical knowledge of the data and control plane. In addition to feeding on freely available traceroutes and BGP routing tables, PredictRoute optimally explores network paths towards chosen BGP prefixes. PredictRoute's strategy for exploring network paths discovers 4X more autonomous system (AS) hops than other well-known strategies used in practice today. Using a corpus of traceroutes, PredictRoute trains probabilistic models of routing towards prefixes on the Internet to predict network paths and their likelihood. PredictRoute's AS-path predictions differ from the measured path by at most 1 hop, 75% of the time. We expose PredictRoute's path prediction capability via a REST API to facilitate its inclusion in other applications and studies. We additionally demonstrate the utility of PredictRoute in improving real-world applications for circumventing Internet censorship and preserving anonymity online.

Traffic classification is essential in network management for operations ranging from capacity planning, performance monitoring, volumetry, and resource provisioning, to anomaly detection and security. Recently, it has become increasingly challenging with the widespread adoption of encryption in the Internet, e.g., as a de-facto in HTTP/2 and QUIC protocols. In the current state of encrypted traffic classification using Deep Learning (DL), we identify fundamental issues in the way it is typically approached. For instance, although complex DL models with millions of parameters are being used, these models implement a relatively simple logic based on certain header fields of the TLS handshake, limiting model robustness to future versions of encrypted protocols. Furthermore, encrypted traffic is often treated as any other raw input for DL, while crucial domain-specific considerations are commonly ignored. In this paper, we design a novel feature engineering approach that generalizes well for encrypted web protocols, and develop a neural network architecture based on Stacked Long Short-Term Memory (LSTM) layers and Convolutional Neural Networks (CNN). We evaluate our approach on a real-world web traffic dataset from a major Internet service provider and Mobile Network Operator. We achieve an accuracy of 95% in service classification with less raw traffic and smaller number of parameters, out-performing a state-of-the-art method by nearly 50% fewer false classifications. We show that our DL model generalizes for different classification objectives and encrypted web protocols. We also evaluate our approach on a public QUIC dataset with finer application-level granularity in labeling, achieving an overall accuracy of 99%.

Zero queueing delay is highly desirable in large-scale computing systems. Existing work has shown that it can be asymptotically achieved by using the celebrated Power-of-d-choices (Pod) policy with a probe overhead d = Ω(log N/1-λ), and it is impossible when d = O(1/1-λ), where N is the number of servers and λ is the load of the system. However, these results are based on the model where each job is an indivisible unit, which does not capture the parallel structure of jobs in today's predominant parallel computing paradigm.

This paper considers a model where each job consists of a batch of parallel tasks. Under this model, we propose a new notion of zero (asymptotic) queueing delay that requires the job delay under a policy to approach the job delay given by the max of its tasks' service times, i.e., the job delay assuming its tasks entered service right upon arrival. This notion quantifies the effect of queueing on a job level for jobs consisting of multiple tasks, and thus deviates from the conventional zero queueing delay for single-task jobs in the literature.

We show that zero queueing delay for parallel jobs can be achieved using the batch-filling policy (a variant of the celebrated Pod policy) with a probe overhead d = Ω(1/(1-λ)log k) in the sub-Halfin-Whitt heavy-traffic regime, where k is the number of tasks in each job and k properly scales with N (the number of servers). This result demonstrates that for parallel jobs, zero queueing delay can be achieved with a smaller probe overhead. We also establish an impossibility result: we show that zero queueing delay cannot be achieved if d = (o(log N/log k)). Simulation results are provided to demonstrate the consistency between numerical results and theoretical results under reasonable settings, and to investigate gaps in the theoretical analysis.

We investigate the achievable stability region for redundancy systems and a quite general workload model with different job types and heterogeneous servers, reflecting job-server affinity relations which may arise from data locality issues and soft compatibility constraints. Under the assumption that job types are known beforehand we establish for New-Better-than-Used (NBU) distributed speed variations that no replication gives a strictly larger stability region than replication. Strikingly, this does not depend on the underlying distribution of the intrinsic job sizes, but observing the job types is essential for this statement to hold. In case of non-observable job types we show that for New-Worse-than-Used (NWU) distributed speed variations full replication gives a larger stability region than no replication.

The supermarket model is a popular load balancing model where each incoming job is assigned to a server with the least number of jobs among d randomly selected servers. Several authors have shown that the large scale limit in case of processor sharing servers has a unique insensitive fixed point, which naturally leads to the belief that the queue length distribution in such a system is insensitive to the job size distribution as the number of servers tends to infinity. Simulation results that support this belief have also been reported. However, global attraction of the unique fixed point of the large scale limit was not proven except for exponential job sizes, which is needed to formally prove asymptotic insensitivity. The difficulty lies in the fact that with processor sharing servers, the limiting system is in general not monotone.

In this paper we focus on the class of hyperexponential distributions of order $2$ and demonstrate that for this class of distributions global attraction of the unique fixed point can still be established using monotonicity by picking a suitable state space and partial order. This allows us to formally show that we have asymptotic insensitivity within this class of job size distributions. We further demonstrate that our result can be leveraged to prove asymptotic insensitivity within this class of distributions for other load balancing systems.

Mean-field models are an established method to analyze large stochastic systems with N interacting objects by means of simple deterministic equations that are asymptotically correct when N tends to infinity. For finite N, mean-field equations provide an approximation whose accuracy is model- and parameter-dependent. Recent research has focused on refining the approximation by computing suitable quantities associated with expansions of order $1/N$ and $1/N^2$ to the mean-field equation. In this paper we present a new method for refining mean-field approximations. It couples the master equation governing the evolution of the probability distribution of a truncation of the original state space with a mean-field approximation of a time-inhomogeneous population process that dynamically shifts the truncation across the whole state space. We provide a result of asymptotic correctness in the limit when the truncation covers the state space; for finite truncations, the equations give a correction of the mean-field approximation. We apply our method to examples from the literature to show that, even with modest truncations, it is effective in models that cannot be refined using existing techniques due to non-differentiable drifts, and that it can outperform the state of the art in challenging models that cause instability due orbit cycles in their mean-field equations.

With the growth of the cryptocurrency ecosystem, there is expanding evidence that counterfeit cryptocurrency has also appeared. In this paper, we empirically explore the presence of counterfeit cryptocurrencies on Ethereum and measure their impact. By analyzing over 190K ERC-20 tokens (or cryptocurrencies) on Ethereum, we have identified 2,117 counterfeit tokens that target 94 of the 100 most popular cryptocurrencies. We perform an end-to-end characterization of the counterfeit token ecosystem, including their popularity, creators and holders, fraudulent behaviors and advertising channels. Through this, we have identified two types of scams related to counterfeit tokens and devised techniques to identify such scams. We observe that over 7,104 victims were deceived in these scams, and the overall financial loss sums to a minimum of $17 million (74,271.7 ETH). Our findings demonstrate the urgency to identify counterfeit cryptocurrencies and mitigate this threat.

Ponzi schemes are financial scams that lure users under the promise of high profits. With the prosperity of Bitcoin and blockchain technologies, there has been growing anecdotal evidence that this classic fraud has emerged in the blockchain ecosystem. Existing studies have proposed machine-learning based approaches for detecting Ponzi schemes. However, these state-of-the-art approaches face several major limitations, including lacking interpretability, high false positive rates and the weak robustness to evasion techniques, These limitations mean that existing real-world methods for detecting Ponzi schemes are ineffective.

In this paper, we propose SADPonzi, a semantic-aware detection approach for identifying Ponzi schemes in Ethereum smart contracts. Specifically, we propose a heuristic-guided symbolic execution technique to identify investor-related transfer behaviors and the distribution strategies adopted. Experimental result on a well-labelled benchmark suggests that SADPonzi can achieve 100% precision and recall, outperforming all existing machine-learning based techniques. We further apply SADPonzi to all 3.4 million smart contracts deployed by EOAs in Ethereum and identify 835 Ponzi scheme contracts, with over 17 million US Dollars invested by victims. Our observations confirm the urgency of identifying and mitigating Ponzi schemes in the blockchain ecosystem.

Online advertising (essentially display ads on websites) has proliferated in the last decade to the extent where it is now an integral part of the web. In this paper, we apply an in-depth and first-of-a-kind performance evaluation of web ads. Unlike prior efforts that rely primarily on adblockers, we perform a fine-grained analysis of the web browser's page loading process to demystify the performance cost of web ads. We aim to characterize the cost by every component of an ad, so the publisher, ad syndicate, and advertiser can improve the ad's performance with detailed guidance. For this purpose, we develop a tool, adPerf, for the Chrome browser that classifies page loading workloads into ad-related and main-content at the granularity of browser activities. Our evaluations show that online advertising entails more than 15% of browser page loading workload and approximately 88% of that is spent on JavaScript. On smartphones, this additional cost of ads is 7% lower as we observe mobile pages include fewer and well-optimized ads. We also track the sources and delivery chain of web ads and analyze performance considering the origin of the ad contents. We observe that 2 of the well-known third-party ad domains contribute to 35% of the ads performance cost and surprisingly, top news websites implicitly include unknown third-party ads which in some cases build up to more than 37% of the ads performance cost.

In the past fifteen years, the most significant paradigm shift in the computing industry is the migration to cloud computing, which brings unprecedented opportunities of digital transformation to business, society, and human life. The implication of this is profound. It means that cloud computing platforms have become part of the basic infrastructure of the world. Therefore, the non-functional properties of cloud computing platforms, including availability, reliability, performance, efficiency, security, sustainability, etc., become immensely important. The distributed nature, massive scale, and high complexity of cloud computing platforms ranging from storage to networking, computing and beyond present huge challenges to achieve effective and efficient building and operation of such software systems. There is huge wealth of various types of data available throughout the entire development lifecycle of software systems. This is manifested even stronger with the paradigm shift to cloud computing as much more data are available on system runtime and workloads. Leveraging the amount of data, AI for System is to utilize AI/ML technologies to design and build high-quality cloud systems at scale. In this talk, I will first introduce the concept of AI for System and its research landscape. Then using a few projects at Microsoft as examples [1-10], I will talk about the work from Microsoft Research on AI for System and its impact. I will also discuss the research challenges and opportunities in AI for System moving forward.

Safety compliance is paramount to the safe deployment of autonomous vehicle (AV) technologies in real-world transportation systems. As AVs will share road infrastructures with human drivers and pedestrians, it is an important requirement for AVs to obey standard driving rules. Existing AV software testing methods, including simulation and road testing, only check fundamental safety rules such as collision avoidance and safety distance. Scenario-dependent driving rules, including crosswalk and intersection rules, are more complicated because the expected driving behavior heavily depends on the surrounding circumstances. However, a testing framework is missing for checking scenario-dependent driving rules on various AV software.

In this paper, we design and implement a systematic framework AVChecker for identifying violations of scenario-dependent driving rules in AV software using formal methods. AVChecker represents both the code logic of AV software and driving rules in proposed formal specifications and leverages satisfiability modulo theory (SMT) solvers to identify driving rule violations. To improve the automation of systematic rule-based checking, AVChecker provides a powerful user interface for writing driving rule specifications and applies static code analysis to extract rule-related code logic from the AV software codebase. Evaluations on two open-source AV software platforms, Baidu Apollo and Autoware, uncover 19 true violations out of 28 real-world driving rules covering crosswalks, traffic lights, stop signs, and intersections. Seven of the violations can lead to severe risks of a collision with pedestrians or blocking traffic.

As Graphics Processing Units (GPUs) are becoming a de facto solution for accelerating a wide range of applications, their reliable operation is becoming increasingly important. One of the major challenges in the domain of GPU reliability is to accurately measure GPGPU application error resilience. This challenge stems from the fact that a typical GPGPU application spawns a huge number of threads and then utilizes a large amount of potentially unreliable compute and memory resources available on the GPUs. As the number of possible fault locations can be in the billions, evaluating every fault and examining its effect on the application error resilience is impractical. Application resilience is evaluated via extensive fault injection campaigns based on sampling of an extensive fault site space. Typically, the larger the input of the GPGPU application, the longer the experimental campaign.

In this work, we devise a methodology, SUGAR (Speeding Up G PGPU Application Resilience Estimation with input sizing), that dramatically speeds up the evaluation of GPGPU application error resilience by judicious input sizing. We show how analyzing a small fraction of the input is sufficient to estimate the application resilience with high accuracy and dramatically reduce the duration of experimentation. Key to our estimation methodology is the discovery of repeating patterns as a function of the input size. Using the well-established fact that error resilience in GPGPU applications is mostly determined by the dynamic instruction (DI) count at the thread level, we discover the patterns that allow to accurately predict application error resilience for arbitrarily large inputs. For the cases that we examine in this paper, this new resilience estimation provides significant speedups (up to 1336 times) and 97.0 on the average, while keeping estimation errors to less than 1%, for details see the full version of this SIGMETRICS paper.

Application programs that exhibit strong locality of reference lead to minimized cache misses and better performance in different architectures. In this paper, we target task-based programs, and propose a novel compiler-based approach that consists of four complementary steps. First, we partition the original tasks in the target application into sub-tasks and build a data reuse graph at a sub-task granularity. Second, based on the intensity of temporal and spatial data reuses among sub-tasks, we generate new tasks where each such (new) task includes a set of sub-tasks that exhibit high data reuse among them. Third, we assign the newly-generated tasks to cores in an architecture-aware fashion with the knowledge of data location. Finally, we re-schedule the execution order of sub-tasks within new tasks such that sub-tasks that belong to different tasks but share data among them are executed in close proximity in time. The experiments show that, when targeting a state of the art manycore system, our compiler-based approach improves the performance of 10 multithreaded programs by 23.4% on average.

Applications in cloud platforms motivate the study of efficient load balancing under job-server constraints and server heterogeneity. In this paper, we study load balancing on a bipartite graph where left nodes correspond to job types and right nodes correspond to servers, with each edge indicating that a job type can be served by a server. Thus edges represent locality constraints, i.e., an arbitrary job can only be served at servers which contain certain data and/or machine learning (ML) models. Servers in this system can have heterogeneous service rates. In this setting, we investigate the performance of two policies named Join-the-Fastest-of-the-Shortest-Queue (JFSQ) and Join-the-Fastest-of-the-Idle-Queue (JFIQ), which are simple variants of Join-the-Shortest-Queue and Join-the-Idle-Queue, where ties are broken in favor of the fastest servers. Under a "well-connected'' graph condition, we show that JFSQ and JFIQ are asymptotically optimal in the mean response time when the number of servers goes to infinity. In addition to asymptotic optimality, we also obtain upper bounds on the mean response time for finite-size systems. We further show that the well-connectedness condition can be satisfied by a random bipartite graph construction with relatively sparse connectivity.

Consider a system with N identical single-server queues and M(N) task types, where each server is able to process only a small subset of possible task types. Arriving tasks select d≥2 random compatible servers, and join the shortest queue among them. The compatibility constraints are captured by a fixed bipartite graph GN between the servers and the task types. When GN is complete bipartite, the meanfield approximation is accurate. However, such dense compatibility graphs are infeasible for large-scale implementation. We characterize a class of sparse compatibility graphs for which the meanfield approximation remains valid. For this, we introduce a novel notion, called proportional sparsity, and establish that systems with proportionally sparse compatibility graphs asymptotically match the performance of a fully flexible system. Furthermore, we show that proportionally sparse random compatibility graphs can be constructed, which reduce the server-degree almost by a factor N/ln(N) compared to the complete bipartite compatibility graph.

Mean field models are a popular tool used to analyse load balancing policies. In some exceptional cases the waiting time distribution of the mean field limit has an explicit form. In other cases it can be computed as the solution of a differential equation or a recursive relation. In our paper [6] we study the limit of the mean waiting time E[W&955;] as the arrival rate &955; approaches 1 for a number of load balancing policies when job sizes are exponential with mean 1 (i.e. when the system gets close to instability). As E[W_&955;] diverges to infinity, we scale with -log(1-&955;) and present a method to compute the limit lim&955;-> 1- -E[W&955;]/log(1-&955;). We show that this limit has a surprisingly simple form for the load balancing algorithms considered.

In addition, we propose an alternate scaling -log(p&955;) instead of -log(1-&955;), where p&955; is adapted to the policy at hand. This scaling allows us to obtain good approximations of the mean waiting time when &955; is close to one or zero.

We analyze the performance of redundancy in a multi-type job and multi-type server system where PS is implemented. We characterize the stability condition, which coincides with that of a system where each job type only dispatches copies into its least-loaded servers, and those copies need to be fully served. We then investigate the impact of redundancy in the stability condition by comparing that to the stability condition of a non-redundant system in which a job arrival is routed to only one randomly selected compatible server. We observe that if server loads are sufficiently heterogeneous redundancy can considerably improve the stability region of the system.

Driven by the rapid growth of content traffic and the demand for service quality, Internet content providers (CPs) have started to bypass transit providers and connect with access providers directly via private peering agreements. This peering relationship often raises disputes, e.g., Netflix vs. Comcast, and is not well understood. In this paper, we build a peering contract model and propose the concept of contractual equilibrium, based on which we study the formation and evolution of peering contracts. By using market data, we emulate the strategic peering behavior of providers and shed light on the understanding of private peering~agreements.

We reveal that the superiority and market dominance of providers primarily determine their peering strategies. We show that 1) superior providers tend to engage in peering more aggressively, and 2) non-dominant CPs' optimal peering strategies are negatively correlated due to market cannibalism, while the dominant CP often behaves oppositely. Our findings help explain phenomena such as why Netflix and Comcast signed the first peering contract, and reason whether private peering contracts will strengthen in future.

We consider the question of identifying which set of products are purchased and at what prices in a given transaction by observing only the total amount spent in the transaction, and nothing more. The ability to solve such an inverse problem can lead to refined information about consumer spending by simply observing anonymized credit card transactions data. Indeed, when considered in isolation, it is impossible to identify the products purchased and their prices from a given transaction based on just the transaction total. However, given a large number of transactions, there may be a hope.

As the main contribution of this work, we provide a robust estimation algorithm for decomposing transaction totals into the underlying, individual product(s) purchased by utilizing a large corpus of transactions. Our method recovers a (product prices) vector $p ∈ RN>0 of unknown dimension (number of products) N as well as matrix $A ∈ ZM x N≥0 simply from M observations (transaction totals) y ∈ RM>0 such that y = A p + η with η ∈ RM representing noise (taxes, discounts, etc.). We formally establish that our algorithm identifies N, A precisely and p approximately, as long as each product is purchased individually at least once, i.e. M ≥ N and A has rank N. Computationally, the algorithm runs in polynomial time (with respect to problem parameters), and thus we provide a computationally efficient and statistically robust method for solving such inverse problems. We apply the algorithm to a large corpus of anonymized consumer credit card transactions in the period 2016-2019, with data obtained from a commercial data vendor. The transactions are associated with spending at Apple, Chipotle, Netflix, and Spotify. From just transactions data, our algorithm identifies (i) key price points (without access to the listed prices), (ii) products purchased within a transaction, (iii) product launches, and (iv) evidence of a new 'secret' product from Netflix - rumored to be in limited release.

Motivated by applications in online marketplaces such as ridesharing, we study dynamic pricing and matching in two-sided queues with strategic servers. We consider a discrete-time process in which, heterogeneous customers and servers arrive. Each customer joins their type's queue, while servers might join a different type's queue depending on the prices posted by the system operator and an inconvenience cost. Then the system operator, constrained by a compatibility graph, decides the matching. The objective is to maximize the profit minus the expected waiting times. We develop a general framework that enables us to analyze a broad range of strategic behaviors. In particular, we encode servers' behavior in a properly defined cost function that can be tailored to various settings. Using this general cost function, we introduce a novel probabilistic fluid problem as an infinite dimensional optimization program. The probabilistic fluid model provides an upper bound on the achievable profit. We then study the system under a large market regime in which the arrival rates are scaled by η and present a probabilistic two-price policy and a max-weight matching policy which results in $O(η^1/3 )$ profit-loss. In addition, under a broad class of customer pricing policies, we show that any matching policy has profit-loss Ω(η1/3). Conditional on a given expected waiting time, we also establish scale-free lower and upper bounds for the profit. Our asymptotic analysis provides insight into near-optimal pricing and matching decisions, and our scale-free bounds provide insights into how different service levels impact the firm's profit.

Monotone submodular maximization with a knapsack constraint is NP-hard. Various approximation algorithms have been devised to address this optimization problem. In this paper, we revisit the widely known modified greedy algorithm. First, we show that this algorithm can achieve an approximation factor of 0.405, which significantly improves the known factors of 0.357 given by Wolsey and (1-1/e)/2-0.316 given by Khuller et al. More importantly, our analysis closes a gap in Khuller et al.'s proof for the extensively mentioned approximation factor of (1-1/√e)≈0.393 in the literature to clarify a long-standing misconception on this issue. Second, we enhance the modified greedy algorithm to derive a data-dependent upper bound on the optimum, which enables us to obtain a data-dependent ratio typically much higher than 0.405 between the solution value of the modified greedy algorithm and the optimum. It can also be used to significantly improve the efficiency of algorithms such as branch and bound.

Data summarization, a fundamental methodology aimed at selecting a representative subset of data elements from a large pool of ground data, has found numerous applications in big data processing, such as social network analysis [5, 7], crowdsourcing [6], clustering [4], network design [13], and document/corpus summarization [14]. Moreover, it is well acknowledged that the "representativeness" of a dataset in data summarization applications can often be modeled by submodularity - a mathematical concept abstracting the "diminishing returns" property in the real world. Therefore, a lot of studies have cast data summarization as a submodular function maximization problem (e.g., [2]).

We introduce and study a general version of the fractional online knapsack problem with multiple knapsacks, heterogeneous constraints on which items can be assigned to which knapsack, and rate-limiting constraints on the assignment of items to knapsacks. This problem generalizes variations of the knapsack problem and of the one-way trading problem that have previously been treated separately, and additionally finds application to the real-time control of electric vehicle (EV) charging. We introduce a new algorithm that achieves a competitive ratio within an additive factor of the best achievable competitive ratios for the general problem and matches or improves upon the best-known competitive ratio for special cases in the knapsack and one-way trading literatures. Moreover, our analysis provides a novel approach to online algorithm design based on an instance-dependent primal-dual analysis that connects the identification of worst-case instances to the design of algorithms. Finally, in the full version of this paper, we illustrate the proposed algorithm via trace-based experiments of EV charging.

The proliferation of novel mobile applications and the associated AI services necessitates a fresh view on the architecture, algorithms and services at the network edge in order to meet stringent performance requirements. Some recent work addressing these challenges is presented. In order to meet the requirement for low-latency, the execution of computing tasks moves form the cloud to the network edge, closer to the end-users. The joint optimization of service placement and request routing in dense mobile edge computing networks is considered. Multidimensional constraints are introduced to capture the storage requirements of the vast amounts of data needed. An algorithm that achieves close-to-optimal performance using a randomized rounding technique is presented. Recent advances in network virtualization and programmability enable realization of services as chains, where flows can be steered through a pre-defined sequence of functions deployed at different network locations. The optimal deployment of such service chains where storage is a stringent constraint in addition to computation and bandwidth is considered and an approximation algorithm with provable performance guarantees is proposed and evaluated. Finally the problem of traffic flow classification as it arises in firewalls and intrusion detection applications is presented. An approach for realizing such functions based on a novel two-stage deep learning method for attack detection is presented. Leveraging the high level of data plane programmability in modern network hardware, the realization of these mechanisms at the network edge is demonstrated.

Consider a distributed task where the communication network is fixed but the local inputs given to the nodes of the distributed system may change over time. In this work, we explore the following question: if some of the local inputs change, can an existing solution be updated efficiently, in a dynamic and distributed manner?

To address this question, we define the batch dynamic CONGEST model in which we are given a bandwidth-limited communication network and a dynamic edge labelling defines the problem input. The task is to maintain a solution to a graph problem on the labelled graph under batch changes. We investigate, when a batch of alpha edge label changes arrive, - how much time as a function of alpha we need to update an existing solution, and - how much information the nodes have to keep in local memory between batches in order to update the solution quickly.

Our work lays the foundations for the theory of input-dynamic distributed network algorithms. We give a general picture of the complexity landscape in this model, design both universal algorithms and algorithms for concrete problems, and present a general framework for lower bounds. The diverse time complexity of our model spans from constant time, through time polynomial in alpha, and to alpha time, which we show to be enough for any task.

The advent of ride-hailing platforms such as Lyft and Uber has revolutionized urban mobility in the past decade. Given their increasingly important role in today's society, recent years have seen growing interest in integrating these services with mass transit options, in order to leverage the on-demand and flexible nature of the former, with the sustainability and cost-effectiveness of the latter. Our work explores a set of operational questions that are critical to the success of any such integrated marketplace: Given a set of trip requests, and the ability to utilize ride-hailing services, which mass-transit routes should the transit agency operate? How frequently should it operate each route? And how can ride-hailing trips be used to both help connect passengers to these routes, and also cover trips which are not efficiently served by mass transit?

We study these under a model in which a Mobility-on-Demand provider (the platform ) has control of a vehicle fleet comprising both single-occupancy and high-capacity vehicles (e.g., cars and buses, respectively). The platform is faced with a number of trip requests to fill during a short time window, and can service these trip requests via different trip options : it can dispatch a car to transport the passenger from source to destination; it can use a car for the first and last legs of the trip, and have the passenger travel by bus in between; or it can use more complicated trips comprising of multiple car and bus legs. Given a set of rewards for matching passengers to trip options, and costs for operating each line (i.e., a bus route and associated frequency), the goal of the platform is to determine the optimal set of lines to operate (given a fixed budget for opening lines), as well as the assignment of passengers to trip options utilizing these lines, in order to maximize the total reward. We refer to this as the Real-Time Line Planning Problem (RLpp).

We first demonstrate the computational limits of RLpp by showing that no constant-factor approximation is possible if we relax either one of two assumptions: (i) access to a pre-specified set of feasible lines, and (ii) no bus-to-bus transfers. These assumptions are practically motivated and common in the literature, but our work is the first to formally demonstrate their necessity.

This paper considers online convex optimization (OCO) problems where decisions are constrained by available energy resources. A key scenario is optimal power control for an energy harvesting device with a finite capacity battery. The goal is to minimize a time-average loss function while keeping the used energy less than what is available. In this setup, the distribution of the randomly arriving harvestable energy (which is assumed to be i.i.d.) is unknown, the current loss function is unknown, and the controller is only informed by the history of past observations. A prior algorithm is known to achieve O(√T) regret by using a battery with an O(√T) capacity. This paper develops a new algorithm that maintains this asymptotic trade-off with the number of time steps T while improving dependency on the dimension of the decision vector from O(√n) to O(√log(n)). The proposed algorithm introduces a separation of the decision vector into amplitude and direction components. It uses two distinct types of Bregman divergence, together with energy queue information, to make decisions for each component.

This paper studies seeded graph matching for power-law graphs. Assume that two edge-correlated graphs are independently edge-sampled from a common parent graph with a power-law degree distribution. A set of correctly matched vertex-pairs is chosen at random and revealed as initial seeds. Our goal is to use the seeds to recover the remaining latent vertex correspondence between the two graphs. Departing from the existing approaches that focus on the use of high-degree seeds in $1$-hop neighborhoods, we develop an efficient algorithm that exploits the low-degree seeds in suitably-defined D-hop neighborhoods. Specifically, we first match a set of vertex-pairs with appropriate degrees (which we refer to as the first slice) based on the number of low-degree seeds in their D-hop neighborhoods. This approach significantly reduces the number of initial seeds needed to trigger a cascading process to match the rest of graphs. Under the Chung-Lu random graph model with n vertices, max degree Θ(√n), and the power-law exponent 2<β<3, we show that as soon as D>4-β/3-β, by optimally choosing the first slice, with high probability our algorithm can correctly match a constant fraction of the true pairs without any error, provided with only Ω((log n)4-β) initial seeds. Our result achieves an exponential reduction in the seed size requirement, as the best previously known result requires n1/2+ε seeds (for any small constant ε^>0). Performance evaluation with synthetic and real data further corroborates the improved performance of our algorithm.

In networks with a minority and a majority community, it is well-studied that minorities are under-represented at the top of the social hierarchy. However, researchers are less clear about the representation of minorities from the lower levels of the hierarchy, where other disadvantages or vulnerabilities may exist. We offer a more complete picture of social disparities at each social level with empirical evidence that the minority representation exhibits two opposite phases: at the higher rungs of the social ladder, the representation of the minority community decreases; but, lower in the ladder, which is more populous, as you ascend, the representation of the minority community improves. We refer to this opposing phenomenon between the upper-level and lower-level as the chasm effect. Previous models of network growth with homophily fail to detect and explain the presence of this chasm effect. We analyze the interactions among a few well-observed network-growing mechanisms with a simple model to reveal the sufficient and necessary conditions for both phases in the chasm effect to occur. By generalizing the simple model naturally, we present a complete bi-affiliation bipartite network-growth model that could successfully capture disparities at all social levels and reproduce real social networks. Finally, we illustrate that addressing the chasm effect can create fairer systems with two applications in advertisement and fact-checks, thereby demonstrating the potential impact of the chasm effect on the future research of minority-majority disparities and fair algorithms.

High scalability and low running costs have made fuzz testing the de facto standard for discovering software bugs. Fuzzing techniques are constantly being improved in a race to build the ultimate bug-finding tool. However, while fuzzing excels at finding bugs in the wild, evaluating and comparing fuzzer performance is challenging due to the lack of metrics and benchmarks. For example, crash count---perhaps the most commonly-used performance metric---is inaccurate due to imperfections in deduplication techniques. Additionally, the lack of a unified set of targets results in ad hoc evaluations that hinder fair comparison.

We tackle these problems by developing Magma, a ground-truth fuzzing benchmark that enables uniform fuzzer evaluation and comparison. By introducing real bugs into real software, Magma allows for the realistic evaluation of fuzzers against a broad set of targets. By instrumenting these bugs, Magma also enables the collection of bug-centric performance metrics independent of the fuzzer. Magma is an open benchmark consisting of seven targets that perform a variety of input manipulations and complex computations, presenting a challenge to state-of-the-art fuzzers.

We evaluate seven widely-used mutation-based fuzzers (AFL, AFLFast, AFL++, FairFuzz, MOpt-AFL, honggfuzz, and SymCC-AFL) against Magma over 200,000 CPU-hours. Based on the number of bugs reached, triggered, and detected, we draw conclusions about the fuzzers' exploration and detection capabilities. This provides insight into fuzzer performance evaluation, highlighting the importance of ground truth in performing more accurate and meaningful evaluations.

The intrinsic hardware imperfection of WiFi chipsets manifests itself in the transmitted signal, leading to a unique radiometric (radio frequency) fingerprint. This fingerprint can be used as an additional means of authentication to enhance security. In this paper, we prove analytically and experimentally that these solutions are highly vulnerable to impersonation attacks. We also demonstrate that such a unique device-specific signature can be abused to track devices, thus violating privacy. We propose RF-Veil, a radiometric fingerprinting solution that is not only robust against impersonation attacks but also effective in protecting privacy by obfuscating the radiometric fingerprint of the transmitter for non-legitimate receivers. Specifically, we introduce a randomized pattern of phase errors to the transmitted signal such that only the intended receiver can extract the original fingerprint of the transmitter. In a series of experiments and analyses, we expose the vulnerability of adopting naive randomization to statistical attacks and introduce countermeasures. Finally, we show the efficacy of RF-Veil experimentally in protecting user privacy and enhancing security. More importantly, our proposed solution allows communicating with other devices, which do not employ RF-Veil.