IWQoS '19- Proceedings of the International Symposium on Quality of Service

Full Citation in the ACM Digital Library

RetroFlow: maintaining control resiliency and flow programmability for software-defined WANs

Providing resilient network control is a critical concern for deploying Software-Defined Networking (SDN) into Wide-Area Networks (WANs). For performance reasons, a Software-Defined WAN is divided into multiple domains controlled by multiple controllers with a logically centralized view. Under controller failures, we need to remap the control of offline switches from failed controllers to other active controllers. Existing solutions could either overload active controllers to interrupt their normal operations or degrade network performance because of increasing the controller-switch communication overhead. In this paper, we propose RetroFlow to achieve low communication overhead without interrupting the normal processing of active controllers during controller failures. By intelligently configuring a set of selected offline switches working under the legacy routing mode, RetroFlow relieves the active controllers from controlling the selected offline switches while maintaining the flow programmability (e.g., the ability to change paths of flows) of SDN. RetroFlow also smartly transfers the control of offline switches with the SDN routing mode to active controllers to minimize the communication overhead from these offline switches to the active controllers. Simulation results show that compared with the baseline algorithm, RetroFlow can reduce the communication overhead up to 52.6% during a moderate controller failure by recovering 100% flows from offline switches and can reduce the communication overhead up to 61.2% during a serious controller failure by setting to recover 90% of flows from offline switches.

mmHandover: a pre-connection based handover protocol for 5G millimeter wave vehicular networks

With the increase of data driven vehicular applications, existing networks cannot satisfy the communication requirements. Therefore, 5G millimeter wave (mmWave) communications, which can offer multi-gigabit data rate, hold potential to be utilized in vehicular networks. On one hand, due to the densely deployed 5G base stations, frequent handover will largely decrease the quality of service, where recent handover is at hundred-millisecond level. On the other hand, mmWave links are easily broken by obstacles because of short wavelength. Yet existing handover protocols do not consider the blockage problem, which frequently occurs in mmWave based networks. To address these problems, we propose a real-time handover protocol called mmHandover for 5G mmWave vehicular networks leveraging mmWave antennae. In mmHandover, multiple antennae in one array are divided into two parts: pre-connected antennae and data transmission antennae. In parallel, pre-connected antennae build the connection with multiple candidate base stations before activation based on a designed pre-connection strategy, while data transmission antennae are responsible for data delivery with the currently connected base station. When handover is triggered or blockage happens, one of the pre-connected links will convert into data transmission link, thus realizing almost seamless handover. Finally, real data-driven simulations demonstrate the efficiency and effectiveness of mmHandover. Compared with standard 4G/WiFi handover protocols, mmHandover greatly reduces the delay from more than 500μs to about 1000μs. Besides, the delay gap will get widened coupled with increase in the number of vehicles.

NEIVA: environment identification based video bitrate adaption in cellular networks

With the popularization of advanced cellular networks, mobile video occupies nearly three quarters of cellular network traffic. While previous adaptive bitrate (ABR) algorithms perform well under broadband network, their performance degrades in cellular networks due to throughput fluctuation. Through real world 4G/LTE network measurement, we find that throughput in cellular networks exhibits high fluctuation. It follows Markov behaviors with different states and different transition probability among states. We further find that the transition probability is stable along time but varies significantly under different environments. This inspires us to design ABR algorithms by improving throughput prediction in cellular networks. We propose NEIVA, a network environment identification based video bitrate adaption method in cellular networks. NEIVA trains a network environment identifier based on throughput data and trains a hidden Markov model (HMM) based throughput predictor for different environments. In online video bitrate selection, NEIVA utilizes the environment identifier to select the model for corresponding environment. Then NEIVA predicts future network performance by combining offline model and online throughput data. We implement NEIVA with MPC and evaluate it in real environment. The evaluation results show that with manually identifying environment, NEIVA improves 20% -- 25% bandwidth prediction accuracy and 11% -- 20% QoE improvement over the baseline predictors. With online environment identification, online NEIVA achieves 3.8% and 11.1% average QoE improvement over MPC and HMM, respectively.

An empirical study of mobile network behavior and application performance in the wild

Monitoring mobile network performance is critical for optimizing the QoE of mobile apps. Until now, few studies have considered the actual network performance that mobile apps experience in a per-app or per-server granularity. In this paper, we analyze a two-year-long dataset collected by a crowdsourcing per-app measurement tool to gain new insights into mobile network behavior and application performance. We observe that only a small portion of WiFi networks can work in high-speed mode, and more than one-third of the observed ISPs still have not deployed 4G networks. For cellular networks, the DNS settings on smartphones can have a significant impact on mobile app network performance. Moreover, we notice that instant messaging (IM) and voice over IP (VoIP) services nowadays are not as performant as Web services, because the traffic using XMPP experiences longer latencies than HTTPS. We propose an automatic performance degradation detection and localization method for finding possible network problems in our huge, imbalanced and sparse dataset. Our evaluation and case studies show that our method is effective and the running time is acceptable.

P4Tester: efficient runtime rule fault detection for programmable data planes

P4 and programmable data planes bring significant flexibility to network operation but are inevitably prone to various faults. Some faults, like P4 program bugs, can be verified statically, while some faults, like runtime rule faults, only happen to running network devices, and they are hardly possible to handle before deployment. Existing network testing systems can troubleshoot runtime rule faults via injecting probes, but are insufficient for programmable data planes due to large overheads or limited fault coverage. In this paper, we propose P4Tester, a new network testing system for troubleshooting runtime rule faults on programmable data planes. First, P4Tester proposes a new intermediate representation based on Binary Decision Diagram, which enables efficient probe generation for various P4-defined data plane functions. Second, P4Tester offers a new probe model that uses source routing to forward probes. This probe model largely reduces rule fault detection overheads, i.e. requiring only one server to generate probes for large networks and minimizing the number of probes. Moreover, this probe model can test all table rules in a network, achieving full fault coverage. Evaluation based on real-world data sets indicates that P4Tester can efficiently check all rules in programmable data planes, generate 59% fewer probes than ATPG and Pronto, be faster than ATPG by two orders of magnitude, and troubleshoot multiple rule faults within one second on BMv2 and Tofino.

NB-cache: non-blocking in-network caching for high-speed content routers

Information-Centric Networking (ICN) provides scalable and efficient content distribution at the Internet scale due to its in-network caching and native multicast capabilities. To support these features, a content router needs high performance at its data plane, which consists of three forwarding steps: checking the Content Store (CS), then the Pending Interest Table (PIT), and finally the Forwarding Information Base (FIB). While prior works focus on performance optimization of a single step, we build an analytical model of content router's entire data plane and identify that CS is the actual bottleneck in the pipeline. Compared with PIT and FIB, CS is more challenging because it has more data to read/write, may have more entries in its table to store and lookup, and needs to organize content objects to sustain frequent cache replacement. Then, we propose a novel mechanism called "NB-Cache" to address CS's performance issue from a network-wide point of view rather than a single router's. In NB-Cache, when packets arrive at a router whose CS is fully loaded, instead of being blocked and waiting for the CS, these packets are forwarded to the next-hop router, whose CS may not be fully loaded. This approach essentially utilizes Content Stores of all the routers along the forwarding path in parallel rather than checking each CS sequentially. Our experiments show significant improvement of data plane performance: 70% reduction in round-trip time (RTT) and 130% increase in throughput.

ML defense: against prediction API threats in cloud-based machine learning service

Machine learning (ML) has shown its impressive performance in the modern world, and many corporations leverage the technique of machine learning to improve their service quality, e.g., Facebook's DeepFace. Machine learning models with a collection of private data being processed by a training algorithm are deemed to be increasingly confidential. Confidential models are typically trained in a centralized cloud server but publicly accessible. ML-as-a-service (MLaaS) system is one of running examples, where users are allowed to access trained models and are charged on a pay-per-query basis.

Unfortunately, recent researchers have shown the tension between public access and confidential models, where adversarial access to a model is abused to duplicate the functionality of the model or even learn sensitive information about individuals (known to be in the training dataset). We conclude these attacks as prediction API threats for simplicity.

In this work, we propose ML defense, a framework to defend against prediction API threats, which works as an add-on to existing MLaaS systems. To the best of our knowledge, this is the first work to propose a technical countermeasure to attacks trumped by excessive query accesses. Our methodology neither modifies any classifier nor degrades the model functionality (e.g., rounds results). The framework consists of one or more simulators and one auditor. The simulator learns the hidden knowledge of adversaries. The auditor then detects whether there exists a privacy breach. We discuss the intrinsic difficulties and empirically state the efficiency and feasibility of our mechanisms in different models and datasets.

Selecting optimal mobile users for long-term environmental monitoring by crowdsourcing

Urban environmental monitoring related to such issues as air pollution and noise helps people understand their living environments and promotes urban construction. It is more and more important nowadays. By crowdsourcing, we can get mobile users at a low cost to collect measurement at different locations. This paper studies how to select optimal mobile users to construct an accurate monitoring map under a limited budget. We extend the noise Gaussian Process model to construct the data utility model. Because the monitoring map is updated in each time slot, we try to maximize the time-averaged data utility under the time-averaged budget constraint. This problem is particularly challenging given the unknown future information and the difficulty of solving the one-slot problem: maximizing a non-monotone sub-modular objective under the budget constraint. To address these challenges, we first make use of Lyapunov optimization to decompose the long-term optimization problem into a series of real-time problems which do not require a priori knowledge about the future information. We then propose a time-efficient online algorithm to solve the NP-hard one-slot problem. As long as the algorithm for the one-slot problem has a competitive ratio e, the time-averaged data utility of our online algorithm has a small gap compared with e times the optimal one. Evaluations based on the real air pollution data in Beijing [2] and real human trajectory data [1] show the efficiency of our approach.

Mechanism design for cross-market task crowdsourcing

Crowdsourcing is a new paradigm which divides work between participants to achieve a cumulative result. To achieve good service quality for a crowdsourcing system, incentive mechanisms are necessary to attract more workers to participate. Most of existing mechanisms apply only for the crowdsourcing scenario where the crowdsourcer will employ the workers to perform certain tasks only in a single market. In this paper, we consider that the crowd-sourcer would like to allocate tasks to several different sub-markets with a shared budget. The crowdsourcer wants to maximize her utility through the crowdsourcing campaign with certain budget constraint. We design two optimization objectives for our problem, maximin and weighted average. We propose a cross-market profit extract mechanism based on random sampling. Furthermore, we extend our algorithm to the online condition, and give the performance lowerbound for both offline and online cases for the cross-market crowdsourcing system. We also conduct extensive numerical experiments to demonstrate the effectiveness of our approach.

Orchestrating service chain deployment with plutus in next generation cellular core

Today's cellular core relies on a few expensive and dedicated hardware racks to connect the radio access network and the egress point to the Internet, which are geographically placed at fixed locations and use the specific routing policies. This inelastic architecture fundamentally leads to increased capital and operating expenses, poor application performance and slow evolution. The emerging paradigm of Network Function Virtualization (NFV) and Software Defined Networking (SDN) bring new opportunities for cellular networks, which makes it possible to flexibly deploy service chains on commodity servers and fine-grained control the routing policies in a centralized way.

We present a two-stage optimization framework Plutus. The network-level optimization aims to minimize the service chain deployment cost, while the server-level optimization requires to determine which Virtualized Network Function (VNF) should be deployed onto which CPU core to balance the CPU processing capability. We formulate these two problems as two optimization programs and prove their hardness. Based on parallel multi-block ADMM, we propose an (O(1), O(1)) bicriteria approximation algorithm and a 2-approximation algorithm. Large-scale simulations and DPDK-based OpenNetVM platform show that Plutus can reduce the capital cost by 84% and increase the throughput by 36% on average.

Elastic, geo-distributed RAFT

Raft is a protocol to maintain strong consistency across data replicas in cloud. It is widely used, especially by workloads that span geographically distributed sites. As these workloads grow, Raft's costs should grow, as least proportionally. However, auto scaling approaches for Raft inflate costs by provisioning at all sites when one site exhausts its local resources. This paper presents Geo-Raft, a scale-out mechanism that enables precise auto scaling for Raft. Geo-Raft extends Raft with the following abstractions: (1) secretaries which takes log processing for the leader and (2) observers which process read requests for followers. These abstractions are stateless, allowing for elastic auto scaling, even on unreliable spot instances. Geo-Raft provably preserves strong consistency guarantees provided by Raft. We implemented and evaluated Geo-Raft with multiple auto scaling techniques on Amazon EC2. Geo-Raft scales in resource footprint increments 5-7X smaller than Multi-Raft, the state of the art. Using spot instances, Geo-Raft reduces costs by 84.5% compared to Multi-Raft. Geo-Raft improves goodput of 95th-percentile SLO by 9X.Geo-Raft operates key-value services for 6 months without losing data or crash.

Chunk-level request-grant-transfer mode for QoE-sensitive video delivery in CDN

Remote Direct Memory Access (RDMA) can be deployed in Content Delivery Networks (CDN) Points of Presence (PoPs) to avoid the high CPU overheads caused by traditional TCP/IP stacks. However, RDMA cannot surmount the drawbacks of the window-based conservative of TCP and is insensitive to Quality of Experience (QoE). Moreover, the requirement of lossless networks hinders the widespread application of RDMA. In this paper, we introduce the parallel multipoint-to-multipoint Request-Grant-Transfer (RGT) mode into RDMA to solve the aforementioned problems. Compared with traditional RGT mode, our scheme supports parallel Dynamic Adaptive Streaming over HTTP (DASH) chunk delivery, thereby improving throughput and reducing initial delays. We differentiate the importance of DASH chunks according to QoE-related properties. In this way, we reduce the response time of specific DASH chunks. We provide an efficient approach to select the optimal number of requests for partially traversing pending requests to reduce the overheads of Request stages. We perform comprehensive experiments to demonstrate that our scheme improves the throughput of CDN PoPs and enhances client QoE.

CoFlux: robustly correlating KPIs by fluctuations for service troubleshooting

Internet-based service companies monitor a large number of KPIs (Key Performance Indicators) to ensure their service quality and reliability. Correlating KPIs by fluctuations reveals interactions between KPIs under anomalous situations and can be extremely useful for service troubleshooting. However, such a KPI flux-correlation has been little studied so far in the domain of Internet service operations management. A major challenge is how to automatically and accurately separate fluctuations from normal variations in KPIs with different structural characteristics (such as seasonal, trend and stationary) for a large number of KPIs. In this paper, we propose CoFlux, an unsupervised approach, to automatically (without manual selection of algorithm fitting and parameter tuning) determine whether two KPIs are correlated by fluctuations, in what temporal order they fluctuate, and whether they fluctuate in the same direction. CoFlux's robust feature engineering and robust correlation score computation enable it to work well against the diverse KPI characteristics. Our extensive experiments have demonstrated that CoFlux achieves the best F1-Scores of 0.84 (0.90), 0.92 (0.95), 0.95 (0.99), in answering these three questions, in the two real datasets from a top global Internet company, respectively. Moreover, we showed that CoFlux is effective in assisting service troubleshooting through the applications of alert compression, recommending Top N causes, and constructing fluctuation propagation chains.

Evaluating performance and inefficient routing of an anycast CDN

Anycast has been increasingly deployed for content delivery networks to map clients to their nearby replicas, which relies on the underlying routing. However, the simplicity of operation comes at cost of less precise client-mapping control. Although many works have measured anycast DNS, anycast CDNs, with different service goals and engineering, are still not fully understood. In this paper, we design novel methods and combine large-scale traceroute and HTTP measurement to evaluate the overall client-proximity and inefficient routing of the largest anycast CDN, Cloudflare. We find that 90% paths traverse only 2-4 ASes, which highlights its direct networks providers. By further identifying and characterizing direct providers at finer granularity of facilities, we quantitatively shows that Cloudflare unevenly uses few large transit providers to delivery the majority of contents. Inspired by the observations, we propose an anycast routing pathology and diagnosis methodology. Investigation reveals that few huge providers have outsized impact in that they are not only related to many inter-domain inflations, but also have path inflation inside their own networks, thus deserving priority focus when troubleshooting.

Trigger relationship aware mobile traffic classification

Network traffic classification is important to network operators to ensure visibility of traffic. Network management, monitoring, and other services are built upon such classification results for improving quality of service. Compared with traffic classification in non-mobile setting, classification in mobile settings focuses on applications and has become increasingly important. Traditionally, a rule-based method is deployed in a deep packet inspector (DPI) engine for traffic classification. However, with the explosive growth in application usage, the complicated relationships including the use of content delivery networks (CDN) and sharing behaviors among applications make such methods less effective. The traffic may be identified wrongly when one application is connected to another application's server.

In this work, we present TRAC: Trigger Relationship Aware traffic Classification, a systematic framework for classifying mobile traffic accurately. In TRAC, we first propose Trigger Relationship Graph model to describe the relationships among applications. Then, we introduce a Trigger Relationship Analyzer to build the graph based on a modified frequent item set mining method. TRAC classifies the traffic based on the application labels identified by a DPI engine. An Application Label Corrector is designed to correct the application labels based on the graph. We evaluate TRAC with one-month data collected from an enterprise network. The evaluation shows that our method can achieve a 17.4% accuracy improvement, from 64.8% to 82.2%.

LEAP: learning-based smart edge with caching and prefetching for adaptive video streaming

Dynamic Adaptive Streaming over HTTP (DASH) has emerged as a popular approach for video transmission, which brings a potential benefit for the Quality of Experience (QoE) because of its segment-based flexibility. However, the Internet can only provide no guaranteed delivery. The high dynamic of the available bandwidth may cause bitrate switching or video rebuffering, thus inevitably damaging the QoE. Besides, the frequently requested popular videos are transmitted for multiple times and contribute to most of the bandwidth consumption, which causes massive transmission redundancy. Therefore, we propose a Learning-based Edge with cAching and Prefetching (LEAP) to improve the online user QoE of adaptive video streaming. LEAP introduces caching into the edge to reduce the redundant video transmission and employs prefetching to fight against network jitters. Taking the state information of users into account, LEAP intelligently makes the most beneficial decisions of caching and prefetching by a QoE-oriented deep neural network model. To demonstrate the performance of our scheme, we deploy the implemented prototype of LEAP in both the simulated scenario and the real Internet. Compared with all selected schemes, LEAP at least raises average bitrate by 34.4% and reduces video rebuffering by 42.7%, which leads to at least 15.9% improvement in the user QoE in the simulated scenario. The results in the real Internet scenario further confirm the superiority of LEAP.

Understanding I/O performance of IPFS storage: a client's perspective

IPFS has surged into popularity in recent years. It organizes user data as multiple objects where users can obtain the objects according to their Content IDentifiers (CIDs). As a storage system, it is of great importance to understand its data I/O performance. But existing work still lacks such a comprehensive study. In this work, we deploy an IPFS storage system with geographically-distributed storage nodes on Amazon EC2. We then conduct extensive experiments to evaluate the performance of data I/O operations from a client's perspective. We find that the access patterns of I/O operations (e.g., request size) severely affect the I/O performance, since IPFS typically uses multiple I/O strategies to perform different I/O requests. Moreover, for the read operations, IPFS requires to resolve remote nodes and downloading objects via the internet. Our experimental study reveals that both resolving and downloading operations can become bottlenecks. Our results can shed light to optimizing IPFS in avoiding high-latency I/O operations.

Encrypted traffic classification of decentralized applications on ethereum using feature fusion

With the prevalence of blockchain, more and more Decentralized Applications (DApps) are deployed on Ethereum to achieve the goal of communicating without supervision. Users habits may be leaked while these applications adopt SSL/TLS to encrypt their transmission data. Encrypted protocol and the same blockchain platform bring challenges to the traffic classification of DApps. Existing encrypted traffic classification methods suffer from low accuracy in the situation of DApps.

In this paper, we design an efficient method to fuse features of different dimensions for DApp fingerprinting. We firstly analyze the reason why existing methods do not perform well before proposing to merge features of different dimensions. Then we fuse these features by a kernel function and propose a fusion feature selection method to select appropriate features to fuse. Applying features that have been fused to the machine learning algorithm can construct a strong classifier. The experiment results show that the accuracy of our method can reach more than 90%, which performs better than state-of-the-art classification approaches.

WiCAR: wifi-based in-car activity recognition with multi-adversarial domain adaptation

In-car human activity recognition is playing a critical role in detecting distracted driving and improving human-car interaction. Among multiple sensing technologies, WiFi-based in-car activity recognition exhibits unique advantages since it does not rely on visible light, avoids privacy leaks and is cost-efficient with integrated WiFi signals in cars. Existing WiFi-based recognition systems mostly focus on the relatively stable indoor space, which only yield reasonably good performance in limited situations. Based on our field studies, the in-car activity recognition, however, is much more complicated suffering from more impact factors. First, the external moving objects and the surrounding WiFi signals can cause various disturbances to the in-car activity sensing. Second, considering the compact in-car space, different car models can also lead to different multipath distortions. Moreover, different people can also perform activities in different shapes. Such extraneous information related to specific driving conditions, car models and human subjects is implicitly contained for training and prediction, inevitably leading to poor recognition performance for new environment and people.

In this paper, we consider the impact of different domains including driving conditions, car models and human subjects on the in-car activity recognition with field measurements and experiments. We present WiCAR, a WiFi-based in-car activity recognition framework that is able to remove domain-specific information in the received signals while retaining the activity related information to the maximum extent. A deep learning architecture integrated with domain adversarial training is applied to domain independent activity recognition. Specifically, we leverage multi-adversarial domain adaptation to avoid the discriminative structures mixing up for different domains. We have implemented WiCAR with commercial-off-the-shelf WiFi devices. Our extensive evaluations show that WiCAR can achieve in-car recognition accuracy of around 95% in untrained domains, where it is only 53% for solutions without domain adversarial network and 83% for the state-of-the-art domain adversarial solution.

Dependent task placement and scheduling with function configuration in edge computing

In Mobile Edge Computing (MEC), each edge server can be configured with only a small number of functions due to the limited capacity of various resources. Meanwhile, mobile applications become more complicated, consisting of multiple dependent tasks which are typically modeled as a Directed Acyclic Graph (DAG). In edge computing, when an application arrives, we need to place and schedule its tasks onto edge servers and/or the remote cloud, where the functions to execute the tasks are configured. In this work, we jointly consider the problem of dependent task placement and scheduling with on-demand function configuration on servers. Our objective is to minimize the application completion time. Specifically, for the special case when the configuration on each edge server is fixed, we derive an algorithm to find the optimal task placement and scheduling efficiently. When the on-demand function configuration is allowed, we propose a novel approximation algorithm, named GenDoc, and analyze theoretically its additive error from the optimal solution. Our extensive experiments on the cluster trace from Alibaba (including 20365 unique applications with DAG information) show that GenDoc outperforms state-of-the-art baselines in processing 86.14% of these unique applications, and reduces their average completion time by at least 24% (and up to 54%). Moreover, GenDoc consistently performs well on various settings of key parameters.

NFVdeep: adaptive online service function chain deployment with deep reinforcement learning

With the evolution of network function virtualization (NFV), diverse network services can be flexibly offered as service function chains (SFCs) consisted of different virtual network functions (VNFs). However, network state and traffic typically exhibit unpredictable variations due to stochastically arriving requests with different quality of service (QoS) requirements. Thus, an adaptive online SFC deployment approach is needed to handle the real-time network variations and various service requests. In this paper, we firstly introduce a Markov decision process (MDP) model to capture the dynamic network state transitions. In order to jointly minimize the operation cost of NFV providers and maximize the total throughput of requests, we propose NFVdeep, an adaptive, online, deep reinforcement learning approach to automatically deploy SFCs for requests with different QoS requirements. Specifically, we use a serialization-and-backtracking method to effectively deal with large discrete action space. We also adopt a policy gradient based method to improve the training efficiency and convergence to optimality. Extensive experimental results demonstrate that NFVdeep converges fast in the training process and responds rapidly to arriving requests especially in large, frequently transferred network state space. Consequently, NFVdeep surpasses the state-of-the-art methods by 32.59% higher accepted throughput and 33.29% lower operation cost on average.

Characterizing and orchestrating NFV-ready servers for efficient edge data processing

The fast-growing Internet of Things (IoT) and Artificial intelligence (AI) applications mandate high-performance edge data analytics. This requirement cannot be fully fulfilled by prior works that focus on either small architectures (e.g., accelerators) or large infrastructure (e.g., cloud data centers). Sitting in between the edge and cloud, there have been many server-level designs for augmenting edge data processing. However, they often require specialized hardware resources and lack scalability as well as agility.

Other than reinventing the wheel, we explore tapping into underutilized network infrastructure in the incoming 5G era for augmenting edge data analytics. Specifically, we focus on efficiently deploying edge data processing applications on Network Function Virtualization (NFV) enabled commodity servers. In such a way, we can benefit from the service flexibility of NFV while greatly reducing the cost of many servers deployed in the edge network. We perform extensive experiments to investigate the characteristics of packet processing in a DPDK-based NFV platform and discover the resource under-utilization issue when using the DPDK polling-mode. Then, we propose a framework named EdgeMiner, which can harvest the potentially idle cycles of the cores for data processing purpose. Meanwhile, it can also guarantee the Quality of Service (QoS) of both the Virtualized Network Functions (VNFs) and Edge Data Processing (EDP) applications when they are co-running on the same server.

Variety matters: a new model for the wireless data market under sponsored data plans

In this paper, we develop a new model to study the competition among Content Providers (CPs) under Sponsored Data Plans (SDPs). SDP is an emerging pricing model for the wireless data market where Internet Service Providers (ISPs) allow a CP to compensate the traffic volume of users when users access the contents of this CP. Studies have shown that SDPs create a triple-win situation, where users consume more contents and the revenue of both CPs and ISPs increases. Currently, a main concern of SDPs is on whether SDPs may bring about unfair competition among CPs. Studies have shown that big CPs have an advantage over small CPs. We observe that such conclusions are derived because in all previous models, traffic price is the only factor that affects user decisions. We argue that it is not precise. Nowadays, people conduct a large variety of activities online, and users have an intrinsic demand for a variety of contents. To reflect this, we for the first time characterize the variety demand as an intrinsic parameter of users, and integrate such variety into a new model to help us drive some novel insights into SDPs, especially the competition among CPs. Our model shows that variety matters for understanding SDPs more thoroughly and comprehensively. For example, under SDPs, the advantage of CPs with higher revenue will be significantly reduced if users have a greater love for variety. Overall, our new model leads to a set of completely new results and rectifies some past conclusions.

Proactive inter-datacenter multicast with realtime and bulk transfers

In content distribution networks, a key objective is the efficient utilization of the network that interconnects geographically distributed datacenters. This is a challenging problem due to vastly different characteristics and requirements of bulk and realtime transfers that share the interconnection network. Bulk transfers aim at delivering a copy of a usually large file to multiple datacenters before a deadline, while realtime transfers are absolutely delay-intolerant with unsteady and dynamic demands. In this paper, we consider the problem of multicasting deadline-critical bulk transfers in an inter-datacenter network in the presence of unknown and fluctuating demand by realtime transfers. Specifically, we develop a joint admission control and routing algorithm called PMDx, which anticipates future realtime demands and proactively reserves just the right amount of network resources in order to serve future realtime transfers without adversely affecting network utilization or bulk transfer deadlines. We show that the PMDx algorithm is a 2/δ-approximation with probability 1 - ϵ, and runs in polynomial time proportional to ln(1/ϵ)/(1 - δ)2, for 0 < δ,ϵ < 1. We also provide extensive model-driven simulation results to study the behaviour of our algorithms in real world network topologies. Our results confirm that PMDx is very close to the optimal, and improves the utilization of the network by 14% compared to a recently proposed algorithm.

Towards privacy-preserving data trading for web browsing history

The trading of social media data has attracted wide research interests over years. Especially the trading for web browsing histories probably produces tremendous economic value for data consumers when being applied to targeted advertising. However, the disclosure of entire browsing histories, even in form of anonymous datasets poses a huge threat to user privacy. Although some existing solutions have investigated privacy-preserving outsourcing of social media data, unfortunately, they neglected the impact on the data consumer's utility. In this paper, we propose PEATSE, a new Privacy-prEserving dAta Trading framework for web browSing historiEs. It takes users' diverse privacy preferences and the utility of their web browsing histories into consideration. PEATSE perturbs users' detailed browsing times on released browsing records to protect user privacy, while balancing the privacy-utility tradeoff. Through real-data based experiments, our analysis and evaluation results demonstrate PEATSE indeed achieves user privacy protection, the data consumer's accuracy requirement, and truthfulness, individual rationality as well as budget balance.

Balancing of the quality-of-service, energy and revenue of base stations in wireless networks via tullock contests

In order to provide high-quality services, wireless network providers deploy a large number of base stations per unit area and maintain these base stations operating over a long period of time invariably. This situation resulted in tremendous energy waste and economic cost to service providers. Owing to the conflict between energy consumption and economic benefits, simply reducing energy consumption may cut the profits of network providers. In order to effectively balance the Quality-of-Service (QoS), energy consumption, and profits of wireless networks, we propose a new sleeping scheme for base stations by using Tullock Contest. In the proposed game-theoretical framework, each player is a base station which competes for service revenue by providing services while consuming energy. The stations are classified into two categories, running mode and sleeping mode. The eventual sleeping strategy can be obtained by applying Nash equilibrium. The proposed scheme adjusts the number of sleeping stations to balance the energy consumption and profits of wireless network providers with the premise of ensuring user QoS. The performance of the proposed scheme is evaluated and compared under different system configurations and user traffic patterns.

pRide: private ride request for online ride hailing service with secure hardware enclave

Promising unprecedented convenience, Online Ride Hailing (ORH) service such as Uber and Didi has gained increasing popularity. Different from traditional taxi service, this new on-demand transportation service allows users to request rides from the online service providers at the touch of their fingers. Despite such great convenience, existing ORH systems require the users to expose their locations when requesting rides - a severe privacy issue in the face of untrusted or compromised service providers. In this paper, we propose a private yet efficient ride request scheme, allowing the user to enjoy public ORH service without sacrificing privacy. Unlike previous works, we consider a more practical setting where the information about the drivers and road networks is public. This poses an open challenge to achieve strong security and high efficiency for the secure ORH service. Our main leverage in addressing this problem is hardware-enforced Trusted Execution Environment, in particular Intel SGX enclave. However, the use of secure enclave does not lead to an immediate solution due to the hardware's inherent resource constraint and security limitation. To tackle the limited enclave space, we first design an efficient ride-matching algorithm utilizing hub-based labeling technique, which avoids loading massive road network data into enclave during online processing. To defend against side-channel attacks, we take the next step to make the ride-matching algorithm data-oblivious, by augmenting it with oblivious label access and oblivious distance computation. The proposed solution provides high efficiency of real-time response and strong security guarantee of data-obliviousness. We implement a prototype system of the proposed scheme and thoroughly evaluate it from both theoretical and experimental aspects. The results show that the proposed scheme permits accurate and real-time ride-matching with provable security.

DaRTree: deadline-aware multicast transfers in reconfigurable wide-area networks

The increasing amount of data replication across datacenters introduces a need for efficient bulk data transfer protocols which meet QoS guarantees, notably timely completion. We present DaRTree which leverages emerging optical reconfiguration technologies, to jointly optimize topology and multicast transfers, and thereby maximize throughput and acceptance ratio of transfer requests subject to deadlines. DaRTree is based on a novel integer linear program relaxation and deterministic rounding scheme. To this end, DaRTree uses multicast Steiner trees and adaptive routing based on the current network load. DaRTree provides its guarantees without need for rescheduling or preemption. Our evaluations show that DaRTree increases the network throughput and the number of accepted requests by up to 70%, especially for larger Wide-Area Networks (WANs). In fact, we also find that DaRTree even outperforms state-of-the-art solutions when the network scheduler is only capable of routing unicast transfers or when the WAN topology is bound to be non-reconfigurable.

MixGeo: efficient secure range queries on encrypted dense spatial data in the cloud

As the location-based applications are flourishing, we will witness soon a prodigious amount of spatial data will be stored in the public cloud with the geometric range query as one of the most fundamental search functions. The rising demand of outsourced data is moving larger-scale datasets and wider-scope query size. To protect the confidentiality of the geographic information of individuals, the outsourced data stored at the cloud server should be preserved especially when they are queried. While the problem of secure range query on outsourced encrypted data has been extensively studied, the current schemes are far from the practice in terms of efficiency and scalability. In this paper, we propose a novel solution based on Geohash and predicate symmetric searchable encryption for secure range queries named as MixGeo. We present a multi-level indexes structure tailored for efficient and large-scale spatial data lookup in the cloud server while preserving data privacy.

Our experiment on a real-world spatial dataset in the cloud environment shows that it only takes less than 0.1s for once update operation over 65,128 encrypted dense data and is over 100 times faster than the existing solutions.

Chic: experience-driven scheduling in machine learning clusters

Large-scale machine learning (ML) models are routinely trained in a distributed fashion, due to their increasing complexity and data sizes. In a shared cluster handling multiple distributed learning workloads with a parameter server framework, it is important to determine the adequate number of concurrent workers and parameter servers for each ML workload over time, in order to minimize the average completion time and increase resource utilization. Existing schedulers for machine learning workloads involve meticulously designed heuristics. However, as the execution environment is highly complex and dynamic, it is challenging to construct an accurate model to make online decisions. In this paper, we design an experience-driven approach that learns to manage the cluster directly from experience rather than using a mathematical model. We propose Chic, a scheduler that is tailored for scheduling machine learning workloads in a cluster by leveraging deep reinforcement learning techniques. With our design of the state space, action space, and reward function, Chic trains a deep neural network with a modified version of the cross-entropy method to approximate the policy for assigning workers and parameter servers for future workloads based on the experience of the agent. Furthermore, a simplified version named Chic-Pair with a shorter training time for the policy is purposed by assigning workers and parameter servers in a pair. We compare Chic and Pair with state-of-the-art heuristics, and our results show that Chic and Chic-Pair are able to reduce the average training time significantly for machine learning workloads under a wide variety of conditions.

Litedge: towards light-weight edge computing for efficient wireless surveillance system

Wireless surveillance systems are rapidly gaining popularity due to their easier deployability and improved performance. However, cameras inside are generating a large amount of data, which brings challenges to the transmission through resource-constrained wireless networks. Observing that most collected consecutive frames are redundant with few objects of interest (OoIs), the filtering of these frames can dramatically relieve the transmission pressure. Additionally, real-world environment may bring shielding or blind areas in videos, which notoriously affects the accuracy of frame analysis. The collaboration between cameras facing at different angles can compensate for such accuracy loss.

In this work, we present Litedge, a light-weight edge computing strategy to improve the QoS (i.e., the latency and accuracy) of wireless surveillance systems. Two main modules are designed on edge cameras: (i) the light-weight video compression module for frame filtering, mainly realized by model compression and convolutional acceleration; and (ii) the collaborative validation module for error compensation between the master-slave camera pair. We also implement an enhanced surveillance system prototype from real-time monitoring and pre-processing on edge cameras to the backend data analysis on a server. Experiments based on real-world collected videos show the efficiency of Litedge. It achieves 82% transmission latency reduction with a maximal 0.119s additional processing delay, compared with the full video transmission. Remarkably, 91.28% of redundant frames are successfully filtered out, greatly reducing the transmission burden. Litedge outperforms state-of-the-art light-weight AI models and video compression methods by balancing the QoS balance ratio between accuracy and latency.

FAST: enabling fast software/hardware prototype for network experimentation

The evolution of new technologies in network community is getting ever faster. Yet it remains the case that prototyping those novel mechanisms on a real-world system (i.e. CPU-FPGA platforms) is both time and labor consuming, which has a serious impact on the research timeliness. In order to bring researchers out of trivial process in prototype development, this paper proposed FAST, a software hardware co-design framework for fast network prototyping. With the programming abstraction of FAST, researchers are able to prototype (using C, verilog or both) a wide spectrum of network boxes rapidly based on all kinds of CPU-FPGA platforms. FAST framework takes care of managing DMA, PCIe and Linux Kernel while providing a unified API for researchers so they can focus only on the packet processing functions. We demonstrate FAST framework's easy to use features with a number of prototypes and show we can get over 10x gains in performance or 1000x better accuracy in clock synchronization compared with their software versions.

Secure indoor positioning against signal strength attacks via optimized multi-voting

Indoor positioning systems (IPSes) can enable many location-based services in large indoor venues where GPS signals are unavailable or unreliable. Among the most viable types of IPSes, RSS-IPSes rely on ubiquitous smartphones and indoor WiFi infrastructures and explore distinguishable received signal strength (RSS) measurements at different indoor locations as their location fingerprints. RSS-IPSes are unfortunately vulnerable to physical-layer RSS attacks that cannot be thwarted by conventional cryptographic techniques. Existing defenses against RSS attacks are all subject to an inherent tradeoff between indoor positioning accuracy and attack resilience. This paper presents the design and evaluation of MV-IPS, a novel RSS-IPS based on weighted multi-voting, which does not suffer from this tradeoff. In MV-IPS, every WiFi access point (AP) that receives a user's RSS measurement gives a weighted vote for every reference location, and the reference location that receives the highest accumulative votes from all APs is output as the user's most likely position. Trace-driven simulation studies based on real RSS measurements demonstrate that MV-IPS can achieve much higher positioning accuracy than prior solutions no matter whether RSS attacks are present.

LOSC: efficient out-of-core graph processing with locality-optimized subgraph construction

Big data applications increasingly rely on the analysis of large graphs. In recent years, a number of out-of-core graph processing systems have been proposed to process graphs with billions of edges on just one commodity computer, by efficiently using the secondary storage (e.g., hard disk, SSD). On the other hand, the vertex-centric computing model is extensively used in graph processing thanks to its good applicability and expressiveness. Unfortunately, when implementing vertex-centric model for out-of-core graph processing, the large number of random memory accesses required to construct subgraphs lead to a serious performance bottleneck that substantially weakens cache access locality and thus leads to very long waiting time experienced by users for the computing results. In this paper, we propose an efficient out-of-core graph processing system, LOSC, to substantially reduce the overhead of subgraph construction without sacrificing the underlying vertex-centric computing model. LOSC proposes a locality-optimized subgraph construction scheme that significantly improves the in-memory data access locality of the subgraph construction phase. Furthermore, LOSC adopts a compact edge storage format and a lightweight replication of vertices to reduce I/O traffic and improve computation efficiency. Extensive evaluation results show that LOSC is respectively 6.9x and 3.5x faster than GraphChi and GridGraph, two state-of-the-art out-of-core systems.

Predictive switch-controller association and control devolution for SDN systems

In software-defined networking (SDN) systems, the scalability and reliability of the control plane still remain as major concerns. Existing solutions adopt either multi-controller designs or control devolution back to the data plane. The former requires a flexible yet efficient switch-controller association mechanism to adapt to workload changes and potential failures, while the latter demands timely decision making with low overheads. The integrate design for both is even more challenging. Meanwhile, the dramatic advancement in machine learning techniques has boosted the practice of predictive scheduling to improve the responsiveness in various systems. Nonetheless, so far little work has been conducted for SDN systems. In this paper, we study the joint problem of dynamic switch-controller association and control devolution, while investigating the benefits of predictive scheduling in SDN systems. We propose POSCAD, an efficient, online, and distributed scheme that exploits predictive future information to minimize the total system cost and the average request response time with queueing stability guarantee. Theoretical analysis and trace-driven simulation results show that POSCAD requires only mild-value of future information to achieve a near-optimal system cost and near-zero average request response time. Further, POSCAD is robust against mis-prediction to reduce the average request response time.

Branch scheduling: dag-aware scheduling for speeding up data-parallel jobs

A data-parallel job is characterized as a directed acyclic graph (DAG) which usually consists of multiple computation stages and across-stage data transfers. However, classical DAG scheduling strategies like the critical path method ignore other stages off the main path and do not give specific consideration of data locality, transfer costs, etc. In practice, complicated DAGs include multiple paths which overlap with each other. The intersection of different paths in a DAG job forms an important synchronization. The synchronization significantly impacts the job completion time and however, no known scheduling methods are designed to speed up the completion time of the synchronization especially when complicated DAGs involve nested and hierarchical synchronizations. To the end, we propose a new abstraction, named branch, which is referred to a disjoint path in DAGs, and design a branch scheduling method to decrease the average job completion time of multiple data-parallel jobs. The branch scheduling method leverages the urgency of branches to speed up the synchronization of multiple parallel branches. We have implemented the BS method on Apache Spark and conducted prototype-based experiments. Compared with Spark FIFO and the shortest-job-first with the critical path methods, results show that the branch scheduling method achieves around 10-15% reduction in the average job completion time.

Multidimensional behavioral profiling of internet-of-things in edge networks

The last decade has witnessed research advances and wide deployment of Internet-of-things (IoT) in smart homes and connected industry. However, the recent spate of cyber attacks exploiting the vulnerabilities and insufficient security management of IoT devices have created serious challenges for securing IoT devices and applications. As a first step towards understanding and mitigating diverse security threats of IoT devices, this paper develops a measurement framework to automatically collect network traffic of IoT devices in edge networks, and build multidimensional behavioral profiles of these devices which characterize who, when, what, and why on the behavioral patterns of IoT devices based on continuously collected traffic data. To the best of our knowledge, this paper is the first effort to shed light on the IP-spatial, temporal, and cloud service patterns of IoT devices in edge networks, and to explore these multidimensional behavioral fingerprints for IoT device classification, anomaly traffic detection, and network security monitoring for millions of vulnerable and resource-constrained IoT devices on the Internet.

ATMPA: attacking machine learning-based malware visualization detection methods via adversarial examples

Since the threat of malicious software (malware) has become increasingly serious, automatic malware detection techniques have received increasing attention, where machine learning (ML)-based visualization detection methods become more and more popular. In this paper, we demonstrate that the state-of-the-art ML-based visualization detection methods are vulnerable to Adversarial Example (AE) attacks. We develop a novel Adversarial Texture Malware Perturbation Attack (ATMPA) method based on the gradient descent and L-norm optimization method, where attackers can introduce some tiny perturbations on the transformed dataset such that ML-based malware detection methods will completely fail. The experimental results on the MS BIG malware dataset show that a small interference can reduce the accuracy rate down to 0% for several ML-based detection methods, and the rate of transferability is 74.1% on average.

Who limits the resource efficiency of my datacenter: an analysis of Alibaba datacenter traces

Cloud platform provides great flexibility and cost-efficiency for end-users and cloud operators. However, low resource utilization in modern datacenters brings huge wastes of hardware resources and infrastructure investment. To improve resource utilization, a straightforward way is co-locating different workloads on the same hardware. To figure out the resource efficiency and understand the key characteristics of workloads in co-located cluster, we analyze an 8-day trace from Alibaba's production trace. We reveal three key findings as follows. First, memory becomes the new bottleneck and limits the resource efficiency in Alibaba's datacenter. Second, in order to protect latency-critical applications, batch-processing applications are treated as second-class citizens and restricted to utilize limited resources. Third, more than 90% of latency-critical applications are written in Java applications. Massive self-contained JVMs further complicate resource management and limit the resource efficiency in datacenters.

SocialDistance: how far are you from verified users in online social media?

Verified users on online social media (OSM) largely determine the quality of OSM services and applications, but most OSM users are unverified due to the significant effort involved in becoming a verified user. This paper presents SocialDistance, a novel technique to identify unverified users that can be considered as trustworthy as verified users. SocialDistance is motivated by the observation that online interactions initiated from verified users towards unverified users can translate into some sort of trustworthiness. It treats all verified users equally and assigns a trust score between 0 and 1 to each unverified user. The higher the trust score, the closer an unverified user to verified users. We propose various metrics to model the interactions from verified to unverified users and then derive corresponding trust scores. SocialDistance is thoroughly evaluated with large Twitter datasets containing 276,143 verified users and 19,047,202 unverified users. Our results demonstrate that SocialDistance can produce a non-trivial number of unverified users that can be regarded as verified users for OSM applications. We also show the high efficacy of SocialDistance in sybil detection, a fundamental operation performed by virtually every OSM operator.

A game theoretic analysis on block withholding attacks using the zero-determinant strategy

In Bitcoin's incentive system that supports open mining pools, block withholding attacks incur huge security threats. In this paper, we investigate the mutual attacks among pools as this determines the macroscopic utility of the whole distributed system. Existing studies on pools' interactive attacks usually employ the conventional game theory, where the strategies of the players are considered pure and equal, neglecting the existence of powerful strategies and the corresponding favorable game results. In this study, we take advantage of the Zero-Determinant (ZD) strategy to analyze the block withholding attack between any two pools, where the ZD adopter has the unilateral control on the expected payoffs of its opponent and itself. In this case, we are faced with the following questions: who can adopt the ZD strategy? individually or simultaneously? what can the ZD player achieve? In order to answer these questions, we derive the conditions under which two pools can individually or simultaneously employ the ZD strategy and demonstrate the effectiveness. To the best of our knowledge, we are the first to use the ZD strategy to analyze the block withholding attack among pools.

No-jump-into-latency in China's internet!: toward last-mile hop count based IP geo-localization

Last-mile geo-localization plays an essential role in many location-based services, such as fraud detection and targeted advertising. In this study, we point out that round trip time (RTT) latency shows an extremely weak correlation with physical distance estimation in China's Internet, since a path between a vantage point and a destination can often be circuitous and inflated by queuing and processing delays. To sidestep the latency measurement, we perform a three-tier hop count based IP geo-localization mapping for China's Internet, on the assumption that each provincial router only serves a limited area. The mapping approach begins at the first tier using a single vantage point to fetch large-scale traceroute paths from the server to landmarks and target IPs. At the second tier, we try to find the last common routers along the traceroute paths of targets and landmarks and aggregate their hop count distances. At the third tier, we estimate the physical distances from hop count distances and provincial router radii, and geo-localize the targets to the nearest landmarks. Through large-scale experiments, we show that our approach is both cost-efficient and reliable, and can achieve last-ten-kilometer IP geo-localization for approximately 65% of the total 48874 pingable target IP addresses with a single ping server, and our hop count based approach completely outperforms the RTT based method.