SIGMETRICS 2013: Tutorials

Carnegie Mellon University, Pittsburgh, PA
June 17 - 21, 2013

Home

Call for Papers

Registration

Student Travel Grant

Full Program (incl. Workshops)

Tutorials

We are pleased to announce that we will have three tutorials this year. All three tutorials will be on Monday, the 17th of June.

Monday, June 17th, 9:00am - 10:30am
Geo-Replication in Data Center Applications

Speaker: Marcos K. Aguilera, Microsoft Research Silicon Valley

Abstract: Data center applications increasingly require a storage system that is geo-replicated, that is, replicated across many geographic locations. Geo-replication can reduce access latency, improve availability, and provide disaster tolerance. It turns out there are many techniques for geo-replication with different trade-offs. In this talk, we give an overview of these techniques, organized according to two orthogonal dimensions: level of synchrony (synchronous and asynchronous) and type of storage service (read-write, state machine, transaction). We explain the basic idea of these techniques, together with their applicability and trade-offs.

Bio: Marcos received a Ph.D. in Computer Science from Cornell University in 2000. We has worked as a researcher at Compaq's Systems Research Center and HP Labs. He is now a senior researcher at Microsoft Research Silicon Valley. His interests include distributed systems, distributed algorithms, fault tolerance, and storage systems.

Monday, June 17th, 11:00am - 12:30pm
The Fundamentals of Heavy-tails: Properties, Emergence, and Identification

Speakers: Adam Wierman, Caltech; Jayakrishnan Nair, Caltech; Bert Zwart, CWI

Abstract: Heavy-tails are a continual source of excitement and confusion across disciplines as they are repeatedly "discovered" in new contexts. This is especially true within computer systems, where heavy-tails seemingly pop up everywhere -- from degree distributions in the internet and social networks to file sizes and interarrival times of workloads. However, despite nearly a decade of work on heavy-tails they are still treated as mysterious, surprising, and even controversial.
The goal of this tutorial is to show that heavy-tailed distributions need not be mysterious and should not be surprising or controversial. In particular, we will demystify heavy-tailed distributions by showing how to reason formally about their counter-intuitive properties; we will highlight that their emergence should be expected (not surprising) by showing that a wide variety of general processes lead to heavy-tailed distributions; and we will highlight that most of the controversy surrounding heavy-tails is the result of bad statistics, and can be avoided by using the proper tools.

Bios:
Adam Wierman is a Professor in the Department of Computing and Mathematical Sciences at the California Institute of Technology, where he is a member of the Rigorous Systems Research Group (RSRG). He received his Ph.D., M.Sc. and B.Sc. in Computer Science from Carnegie Mellon University in 2007, 2004, and 2001, respectively. His research interests center around resource allocation and scheduling decisions in computer systems and services. More specifically, his work focuses both on developing analytic techniques in stochastic modeling, queueing theory, scheduling theory, and game theory, and applying these techniques to application domains such as energy-efficient computing, data centers, social networks, and electricity markets. He received the 2011 ACM SIGMETRICS Rising Star award, and has been co-recipient of best paper awards at ACM SIGMETRICS, IEEE INFOCOM, IFIP Performance, IEEE Green Computing Conference, and ACM GREENMETRICS. He was named a Seibel Scholar, received an Okawa Foundation grant, and received an NSF CAREER grant. Additionally, his dissertation received the CMU School of Computer Science Distinguished Dissertation Award and was given an honorable mention for the INFORMS Doctoral Dissertation Award for Operations Research in Telecommunications. He has also received multiple teaching awards, including the Associated Students of the California Institute of Technology (ASCIT) Teaching Award. Dr. Wierman has more than 60 refereed publications and serves as an Associate Editor for the Operations Research journal and on the editorial board of the Performance Evaluation journal and the IEEE Transactions on Cloud Computing.

Bert Zwart is currently a senior researcher at CWI, where he leads the Probability and Stochastic Networks group. He also holds a full professor position at VU University Amsterdam, is senior fellow at Eurandom, and holds an adjunct professor position at the H. Milton Stewart School of Industrial and Systems Engineering at Georgia Institute of Technology, where he was holding a Coca-Cola Chair until 2008. Bert Zwart is the 2008 recipient of the Erlang prize for outstanding contributions to applied probability by a researcher not older than 35 years old, and an IBM faculty award. His research is concerned with the application of analytic and probabilistic asymptotic methods to applied probability models in computer systems, communication networks, customer contact centers, and manufacturing systems. Dr. Zwart has published more than 70 refereed publications and is council member of the Applied Probability Society of INFORMS. Dr. Zwart has been area editor of Stochastic Models for Operations Research, the flagship journal of his profession, from 2009-2011. In addition, dr. Zwart is editor-in-chief (with J.K. Lenstra and M. Trick) of the journal Surveys in Operations Research and Management Science, and serves on the editorial board of Mathematics of Operations Research, Mathematical Methods of Operations Research, Operations Research, Queueing Systems and Stochastic Systems. He is a recipient of Veni and Vidi research grants from NWO.

Jayakrishnan Nair received his PhD from California Institute of Technology (Caltech) in 2012. His PhD thesis focused on scheduling for heavy-tailed and light-tailed workloads in queueing systems. He is currently a post-doctoral scholar at Caltech and will join CWI as a post-doctoral scholar in May 2013. His research interests include modeling, performance evaluation, and design issues in queueing systems and communication networks. Jayakrishnan was a recipient of the best paper award at IFIP Performance, 2010.

Monday, June 17th, 11:00am - 12:30pm
Profiling and Analyzing the I/O Performance of NoSQL DBs

Speaker: Jiri Schindler, NetApp

Abstract: The advent of the so-called NoSQL databases has brought about a new model of using storage systems. While traditional relational database systems took advantage of features offered by centrally-managed, enterprise-class storage arrays, the new generation of database systems with weaker data consistency models is content with using and managing locally attached individual storage devices and providing data reliability and availability through high-level software features and protocols.

This tutorial aims to review the architecture of selected NoSQL DBs to lay the foundations for understanding how these new DB systems behave. In particular, it focuses on how (in)efficiently these new systems use I/O and other resources to accomplish their work. The tutorial examines the behavior of several NoSQL DBs with an emphasis on Cassandra - a popular NoSQL DB system. It uses I/O traces and resource utilization profiles caputred in private cloud deployments that use both dedicated directly attached storage as well as shared networked storage.

The material is geared specifically towards SIGMETRICS attendees who are familiar with system profiling and analysis both theoretically as well as through hands-on experiences as systems administrators. It does not assume any prior experience with NoSQL or relational DB systems. Nor does it require deep understaing of storage systems architecture. The necessary concepts are reviewed to establish a common ground and to relate the concepts of NoSQL DBs. The participant will be able to learn that NoSQL DB systems are not much different in their fundamentals from other systems for storing (semi)structured data even tough their architecture (scale-out clustred shared-nothing model) and the use cases (with eventual consistency data models) are much different.

Bio: Jiri Schindler is a Member of Technical Staff at the NetApp Advanced Technology Group where he works on storage architectures integrating flash memory and disk drives in support of applications for management of (semi)structured data. Recently, he has been investigating the I/O profiles of columnar databases and designed a system for efficient de-staging of small updates to disk drives with the help of flash memory. Jiri has over a decade of systems experience ranging from device level request scheduling, though file systems, data layouts, and whole-system performance analysis. Previously, Jiri worked at EMC on Centera - the shared-nothing clustered content-addressable storage system. While getting his PhD at Carnegie Mellon University he and his colleagues designed and built the Fates (Clotho, Atropos, and Lachesis) system for efficient execution of mixed database workloads with different I/O profiles. Jiri has been an adjunct professor at the Northeastern University where he taught storage systems classes.

Carnegie Mellon University, Pittsburgh, PA June 17 - 21, 2013

Tutorials

Monday, June 17th, 9:00am - 10:30am Geo-Replication in Data Center Applications

Monday, June 17th, 11:00am - 12:30pm The Fundamentals of Heavy-tails: Properties, Emergence, and Identification

Monday, June 17th, 11:00am - 12:30pm Profiling and Analyzing the I/O Performance of NoSQL DBs

Carnegie Mellon University, Pittsburgh, PA
June 17 - 21, 2013

Monday, June 17th, 9:00am - 10:30am
Geo-Replication in Data Center Applications

Monday, June 17th, 11:00am - 12:30pm
The Fundamentals of Heavy-tails: Properties, Emergence, and Identification

Monday, June 17th, 11:00am - 12:30pm
Profiling and Analyzing the I/O Performance of NoSQL DBs