Key-value lookup engines running in fast memory are crucial components of many networked and distributed systems such as packet forwarding, virtual network functions, content distribution networks, distributed storage, and cloud/edge computing. These lookup engines must be memory-efficient because fast memory is small and expensive. This work presents a new key-value lookup design, called Ludo Hashing, which costs the least space (3.76 + 1.05l bits per key-value item for l-bit values) among known compact lookup solutions including the recently proposed partial-key Cuckoo and Bloomier perfect hashing. In addition to its space efficiency, Ludo Hashing works well with most practical systems by supporting fast lookups, fast updates, and concurrent writing/reading. We implement Ludo Hashing and evaluate it with both micro-benchmark and two network systems deployed in CloudLab. The results show that in practice Ludo Hashing saves 40% to 80%+ memory cost compared to existing dynamic solutions. It costs only a few GB memory for 1 billion key-value items and achieves high lookup throughput: over 65 million queries per second on a single node with multiple threads.
We study a quantum switch that distributes maximally entangled multipartite states to sets of users. The entanglement switching process requires two steps: first, each user attempts to generate bipartite entanglement between itself and the switch; and second, the switch performs local operations and a measurement to create multipartite entanglement for a set of users. In this work, we study a simple variant of this system, wherein the switch has infinite memory and the links that connect the users to the switch are identical. Further, we assume that all quantum states, if generated successfully, have perfect fidelity and that decoherence is negligible. This problem formulation is of interest to several distributed quantum applications, while the technical aspects of this work result in new contributions within queueing theory. Via extensive use of Lyapunov functions, we derive necessary and sufficient conditions for the stability of the system and closed-form expressions for the switch capacity and the expected number of qubits in memory.
This paper concerns the mechanism design for online resource allocation in a strategic setting. In this setting, a single supplier allocates capacity-limited resources to requests that arrive in a sequential and arbitrary manner. Each request is associated with an agent who may act selfishly to misreport the requirement and valuation of her request. The supplier charges payment from agents whose requests are satisfied, but incurs a load-dependent supply cost. The goal is to design an incentive compatible online mechanism, which determines not only the resource allocation of each request, but also the payment of each agent, so as to (approximately) maximize the social welfare (i.e., aggregate valuations minus supply cost). We study this problem under the framework of competitive analysis. The major contribution of this paper is the development of a unified approach that achieves the best-possible competitive ratios for setups with different supply costs. Specifically, we show that when there is no supply cost or the supply cost function is linear, our model is essentially a standard 0-1 knapsack problem, for which our approach achieves logarithmic competitive ratios that match the state-of-the-art (which is optimal). For the more challenging setup when the supply cost is strictly-convex, we provide online mechanisms, for the first time, that lead to the optimal competitive ratios as well. To the best of our knowledge, this is the first approach that unifies the characterization of optimal competitive ratios in online resource allocation for different setups including zero, linear and strictly-convex supply costs.
Optimal caching of files in a content distribution network (CDN) is a problem of fundamental and growing commercial interest. Although many different caching algorithms are in use today, the fundamental performance limits of network caching algorithms from an online learning point-of-view remain poorly understood to date. In this paper, we resolve this question in the following two settings: (1) a single user connected to a single cache, and (2) a set of users and a set of caches interconnected through a bipartite network. Recently, an online gradient-based coded caching policy was shown to enjoy sub-linear regret. However, due to the lack of known regret lower bounds, the question of the optimality of the proposed policy was left open. In this paper, we settle this question by deriving tight non-asymptotic regret lower bounds in both of the above settings. In addition to that, we propose a new Follow-the-Perturbed-Leader-based uncoded caching policy with near-optimal regret. Technically, the lower-bounds are obtained by relating the online caching problem to the classic probabilistic paradigm of balls-into-bins. Our proofs make extensive use of a new result on the expected load in the most populated half of the bins, which might also be of independent interest. We evaluate the performance of the caching policies by experimenting with the popular MovieLens dataset and conclude the paper with design recommendations and a list of open problems.
Ad and tracking blocking extensions are popular tools for improving web performance, privacy and aesthetics. Content blocking extensions generally rely on filter lists to decide whether a web request is associated with tracking or advertising, and so should be blocked. Millions of web users rely on filter lists to protect their privacy and improve their browsing experience. Despite their importance, the growth and health of filter lists are poorly understood. Filter lists are maintained by a small number of contributors who use undocumented heuristics and intuitions to determine what rules should be included. Lists quickly accumulate rules, and rules are rarely removed. As a result, users' browsing experiences are degraded as the number of stale, dead or otherwise not useful rules increasingly dwarf the number of useful rules, with no attenuating benefit. An accumulation of "dead weight" rules also makes it difficult to apply filter lists on resource-limited mobile devices. This paper improves the understanding of crowdsourced filter lists by studying EasyList, the most popular filter list. We measure how EasyList affects web browsing by applying EasyList to a sam- ple of 10,000 websites. We find that 90.16% of the resource blocking rules in EasyList provide no benefit to users in common browsing scenarios. We use our measurements of rule application rates to taxonomies ways advertisers evade EasyList rules. Finally, we propose optimizations for popular ad-blocking tools that (i) allow EasyList to be applied on performance constrained mobile devices and (ii) improve desktop performance by 62.5%, while preserving over 99% of blocking coverage. We expect these optimizations to be most useful for users in non-English locals, who rely on supplemental filter lists for effective blocking and protections.
Flow reshaping is used in time-sensitive networks (as in the context of IEEE TSN and IETF Detnet) in order to reduce burstiness inside the network and to support the computation of guaranteed latency bounds. This is performed using per-flow regulators (such as the Token Bucket Filter) or interleaved regulators (as with IEEE TSN Asynchronous Traffic Shaping, ATS). The former use one FIFO queue per flow, whereas the latter use one FIFO queue per input port. Both types of regulators are beneficial as they cancel the increase of burstiness due to multiplexing inside the network. It was demonstrated, by using network calculus, that they do not increase the worst-case latency. However, the properties of regulators were established assuming that time is perfect in all network nodes. In reality, nodes use local, imperfect clocks. Time-sensitive networks exist in two flavours: (1) in non-synchronized networks, local clocks run independently at every node and their deviations are not controlled and (2) in synchronized networks, the deviations of local clocks are kept within very small bounds using for example a synchronization protocol (such as PTP) or a satellite based geo-positioning system (such as GPS). We revisit the properties of regulators in both cases. In non-synchronized networks, we show that ignoring the timing inaccuracies can lead to network instability due to unbounded delay in per-flow or interleaved regulators. We propose and analyze two methods (rate and burst cascade, and asynchronous dual arrival-curve method) for avoiding this problem. In synchronized networks, we show that there is no instability with per-flow regulators but, surprisingly, interleaved regulators can lead to instability. To establish these results, we develop a new framework that captures industrial requirements on clocks in both non-synchronized and synchronized networks, and we develop a toolbox that extends network calculus to account for clock imperfections.
Due to the high density storage demand coming from applications from different domains, 3D NAND flash is becoming a promising candidate to replace 2D NAND flash as the dominant non-volatile memory. However, denser 3D NAND presents various performance and reliability issues, which can be addressed by the 3D NAND specific full-sequence program (FSP) operation. The FSP programs multiple pages simultaneously to mitigate the performance degradation caused by the long latency 3D NAND baseline program operations. However, the FSP-enabled 3D NAND-based SSDs introduce lifetime degradation due to the larger write granularities accessed by the FSP. To address the lifetime issue, in this paper, we propose and experimentally evaluate Centaur, a heterogeneous 2D/3D NAND heterogeneous SSD, as a solution. Centaur has three main components: a lifetime-aware inter-NAND request dispatcher, a lifetime-aware inter-NAND work stealer, and a data migration strategy from 2D NAND to 3D NAND. We used twelve SSD workloads to compare Centaur against a state-of-the-art 3D NAND-based SSD with the same capacity. Our experimental results indicate that the SSD lifetime and performance are improved by 3.7x and 1.11x, respectively, when using our 2D/3D heterogeneous SSD.
Payment channel networks (PCNs) are viewed as one of the most promising scalability solutions for cryptocurrencies today. Roughly, PCNs are networks where each node represents a user and each directed, weighted edge represents funds escrowed on a blockchain; these funds can be transacted only between the endpoints of the edge. Users efficiently transmit funds from node A to B by relaying them over a path connecting A to B, as long as each edge in the path contains enough balance (escrowed funds) to support the transaction. Whenever a transaction succeeds, the edge weights are updated accordingly. In deployed PCNs, channel balances (i.e., edge weights) are not revealed to users for privacy reasons; users know only the initial weights at time 0. Hence, when routing transactions, users typically first guess a path, then check if it supports the transaction. This guess-and-check process dramatically reduces the success rate of transactions. At the other extreme, knowing full channel balances can give substantial improvements in transaction success rate at the expense of privacy. In this work, we ask whether a network can reveal noisy channel balances to trade off privacy for utility. We show fundamental limits on such a tradeoff, and propose noise mechanisms that achieve the fundamental limit for a general class of graph topologies. Our results suggest that in practice, PCNs should operate either in the low-privacy or low-utility regime; it is not possible to get large gains in utility by giving up a little privacy, or large gains in privacy by sacrificing a little utility.
We consider the tail behavior of the response time distribution in an M/G/1 queue with heavy-tailed job sizes, specifically those with intermediately regularly varying tails. In this setting, the response time tail of many individual policies has been characterized, and it is known that policies such as Shortest Remaining Processing Time (SRPT) and Foreground-Background (FB) have response time tails of the same order as the job size tail, and thus such policies are tail-optimal. Our goal in this work is to move beyond individual policies and characterize the set of policies that are tail-optimal. Toward that end, we use the recently introduced SOAP framework to derive sufficient conditions on the form of prioritization used by a scheduling policy that ensure the policy is tail-optimal. These conditions are general and lead to new results for important policies that have previously resisted analysis, including the Gittins policy, which minimizes mean response time among policies that do not have access to job size information. As a by-product of our analysis, we derive a general upper bound for fractional moments of M/G/1 busy periods, which is of independent interest.
Root cause analysis in a large-scale production environment is challenging due to the complexity of the services running across global data centers. Due to the distributed nature of a large-scale system, the various hardware, software, and tooling logs are often maintained separately, making it difficult to review the logs jointly for understanding production issues. Another challenge in reviewing the logs for identifying issues is the scale - there could easily be millions of entities, each described by hundreds of features. In this paper we present a fast dimensional analysis framework that automates the root cause analysis on structured logs with improved scalability. We first explore item-sets, i.e. combinations of feature values, that could identify groups of samples with sufficient support for the target failures using the Apriori algorithm and a subsequent improvement, FP-Growth. These algorithms were designed for frequent item-set mining and association rule learning over transactional databases. After applying them on structured logs, we select the item-sets that are most unique to the target failures based on lift. We propose pre-processing steps with the use of a large-scale real-time database and post-processing techniques and parallelism to further speed up the analysis and improve interpretability, and demonstrate that such optimization is necessary for handling large-scale production datasets. We have successfully rolled out this approach for root cause investigation purposes within Facebook's infrastructure. We also present the setup and results from multiple production use cases in this paper.
Load balancers choose among load-balanced paths to distribute traffic as if it makes no difference using one path or another. This work shows that the latency difference between load-balanced paths (called latency imbalance ), previously deemed insignificant, is now prevalent from the perspective of the cloud and affects various latency-sensitive applications. In this work, we present the first large-scale measurement study of latency imbalance from a cloud-centric view. Using public cloud around the globe, we measure latency imbalance both between data centers (DCs) in the cloud and from the cloud to the public Internet. Our key findings include that 1) Amazon's and Alibaba's clouds together have latency difference between load-balanced paths larger than 20ms to 21% of public IPv4 addresses; 2) Google's secret in having lower latency imbalance than other clouds is to use its own well-balanced private WANs to transit traffic close to the destinations and that 3) latency imbalance is also prevalent between DCs in the cloud, where 8 pairs of DCs are found to have load-balanced paths with latency difference larger than 40ms. We further evaluate the impact of latency imbalance on three applications (i.e., NTP, delay-based geolocation and VoIP) and propose potential solutions to improve application performance. Our experiments show that all three applications can benefit from considering latency imbalance, where the accuracy of delay-based geolocation can be greatly improved by simply changing how \textttping measures the minimum path latency.
The number of cores and the capacities of main memory in modern systems have been growing significantly. Specifically, memory scaling, although at a slower pace than computation scaling, provided opportunities for very large DRAMs with Terabytes (TBs) capacity. Consequently, addressing the performance and energy consumption bottlenecks of DRAMs is more important than ever. DRAM memory refresh operation is one of the main contributing factors to the memory overheads, especially for large capacity DRAMs used in modern servers and emerging large-scale data centers. This paper addresses the memory refresh problem by leveraging the fact that most cloud servers host virtualized systems that use similar kernels, libraries, etc. We propose and experimentally evaluate a novel approach that exploits this observation to address the DRAM refresh overhead in such systems. More specifically, in this work, we present DSM, a light-weight hardware extension in memory controller to detect the pages with same content in memory and refresh only one of them and redirect the requests to the others to this page. Our detailed experimental analysis shows that the proposed DSM design can reduce 99\textsuperscriptth percentile memory access latency by up to 2.01x, and it also reduces the overall memory energy consumption by up to 8.5%.
We analyze the problem of how to optimally bid for ad spaces in online ad auctions. For this we consider the general case of multiple ad campaigns with overlapping targeting criteria. In our analysis we first characterize the structure of an optimal bidding strategy. In particular, we show that an optimal bidding strategies decomposes the problem into disjoint sets of campaigns and targeting groups. In addition, we show that pure bidding strategies that use only a single bid value for each campaign are not optimal when the supply curves are not continuous. For this case, we derive a lower-bound on the optimal cost of any bidding strategy, as well as mixed bidding strategies that either achieve the lower-bound or can get arbitrarily close to it.
The blockchain paradigm provides a mechanism for content dissemination and distributed consensus on Peer-to-Peer (P2P) networks. While this paradigm has been widely adopted in industry, it has not been carefully analyzed in terms of its network scaling with respect to the number of peers. Applications for blockchain systems, such as cryptocurrencies and IoT, require this form of network scaling. In this paper, we propose a new stochastic network model for a blockchain system. We identify a structural property called one-endedness, which we show to be desirable in any blockchain system as it is directly related to distributed consensus among the peers. We show that the stochastic stability of the network is sufficient for the one-endedness of a blockchain. We further establish that our model belongs to a class of network models, called monotone separable models. This allows us to establish upper and lower bounds on the stability region. The bounds on stability depend on the connectivity of the P2P network through its conductance and allow us to analyze the scalability of blockchain systems on large P2P networks. We verify our theoretical insights using both synthetic data and real data from the Bitcoin network.
While location data is extremely valuable for various applications, disclosing it prompts serious threats to individuals' privacy. To limit such concerns, organizations often provide analysts with aggregate time-series that indicate, e.g., how many people are in a location at a time interval, rather than raw individual traces. In this paper, we perform a measurement study to understand Membership Inference Attacks (MIAs) on aggregate location time-series, where an adversary tries to infer whether a specific user contributed to the aggregates. We find that the volume of contributed data, as well as the regularity and particularity of users' mobility patterns, play a crucial role in the attack's success. We experiment with a wide range of defenses based on generalization, hiding, and perturbation, and evaluate their ability to thwart the attack vis-à-vis the utility loss they introduce for various mobility analytics tasks. Our results show that some defenses fail across the board, while others work for specific tasks on aggregate location time-series. For instance, suppressing small counts can be used for ranking hotspots, data generalization for forecasting traffic, hotspot discovery, and map inference, while sampling is effective for location labeling and anomaly detection when the dataset is sparse. Differentially private techniques provide reasonable accuracy only in very specific settings, e.g., discovering hotspots and forecasting their traffic, and more so when using weaker privacy notions like crowd-blending privacy. Overall, our measurements show that there does not exist a unique generic defense that can preserve the utility of the analytics for arbitrary applications, and provide useful insights regarding the disclosure of sanitized aggregate location time-series.
EOSIO has become one of the most popular blockchain platforms since its mainnet launch in June 2018. In contrast to the traditional PoW-based systems (e.g., Bitcoin and Ethereum), which are limited by low throughput, EOSIO is the first high throughput Delegated Proof of Stake system that has been widely adopted by many decentralized applications. Although EOSIO has millions of accounts and billions of transactions, little is known about its ecosystem, especially related to security and fraud. In this paper, we perform a large-scale measurement study of the EOSIO blockchain and its associated DApps. We gather a large-scale dataset of EOSIO and characterize activities including money transfers, account creation and contract invocation. Using our insights, we then develop techniques to automatically detect bots and fraudulent activity. We discover thousands of bot accounts (over 30% of the accounts in the platform) and a number of real-world attacks (301 attack accounts). By the time of our study, 80 attack accounts we identified have been confirmed by DApp teams, causing 828,824 EOS tokens losses (roughly \$2.6 million) in total.
A new generation of cyber-physical systems has emerged with a large number of devices that continuously generate and consume massive amounts of data in a distributed and mobile manner. Accurate and near real-time decisions based on such streaming data are in high demand in many areas of optimization for such systems. Edge data analytics bring processing power in the proximity of data sources, reduce the network delay for data transmission, allow large-scale distributed training, and consequently help meeting real-time requirements. Nevertheless, the multiplicity of data sources leads to multiple distributed machine learning models that may suffer from sub-optimal performance due to the inconsistency in their states. In this work, we tackle the insularity, concept drift, and connectivity issues in edge data analytics to minimize its accuracy handicap without losing its timeliness benefits. To this end, we propose an efficient model synchronization mechanism for distributed and stateful data analytics. Staleness Control for Edge Data Analytics (SCEDA) ensures the high adaptability of synchronization frequency in the face of an unpredictable environment by addressing the trade-off between the generality and timeliness of the model. Making use of online reinforcement learning, SCEDA has low computational overhead, automatically adapts to changes, and does not require additional data monitoring.
We consider online convex optimization with stochastic constraints where the objective functions are arbitrarily time-varying and the constraint functions are independent and identically distributed (i.i.d.) over time. Both the objective and constraint functions are revealed after the decision is made at each time slot. The best known expected regret for solving such a problem is $\mathcalO (\sqrtT )$, with a coefficient that is polynomial in the dimension of the decision variable and relies on theSlater condition (i.e. the existence of interior point assumption), which is restrictive and in particular precludes treating equality constraints. In this paper, we show that such Slater condition is in fact not needed. We propose a newprimal-dual mirror descent algorithm and show that one can attain $\mathcalO (\sqrtT )$ regret and constraint violation under a much weaker Lagrange multiplier assumption, allowing general equality constraints and significantly relaxing the previous Slater conditions. Along the way, for the case where decisions are contained in a probability simplex, we reduce the coefficient to have only a logarithmic dependence on the decision variable dimension. Such a dependence has long been known in the literature on mirror descent but seems new in this new constrained online learning scenario. Simulation experiments on a data center server provision problem with real electricity price traces further demonstrate the performance of our proposed algorithm.
Current methods to analyze the Internet's router-level topology with paths collected using traceroute assume that the source address for each router in the path is either an inbound or off-path address on each router. In this work, we show that outbound addresses are common in our Internet-wide traceroute dataset collected by CAIDA's Ark vantage points in January 2020, accounting for 1.7% - 5.8% of the addresses seen at some point before the end of a traceroute. This phenomenon can lead to mistakes in Internet topology analysis, such as inferring router ownership and identifying interdomain links. We hypothesize that the primary contributor to outbound addresses is Layer 3 Virtual Private Networks (L3VPNs), and propose vrfinder, a technique for identifying L3VPN outbound addresses in traceroute collections. We validate vrfinder against ground truth from two large research and education networks, demonstrating high precision (100.0%) and recall (82.1% - 95.3%). We also show the benefit of accounting for L3VPNs in traceroute analysis through extensions to bdrmapIT, increasing the accuracy of its router ownership inferences for L3VPN outbound addresses from 61.5% - 79.4% to 88.9% - 95.5%.