Venue of ACM SIGMETRICS 2024

ACM SIGMETRICS 2025

Stony Brook, New York, USA
June 9-13, 2025

Tutorials

Tentative Program: Monday June 9, 2025

There are 8 tutorials presented in 4 tracks in parallel. The program starts at 9 a.m. and ends at 3 p.m.

Track 1
Systems

Track 2
Algorithm Analysis

Track 3
Machine Learning Performance

Track 4
Reinforcement Learning

9.00 am -
10.30 am

  T1.A

  T2.A

  T3.A *

  T4.A

10.30 am -
11.00 am

  Break

11.00 am -
12.30 pm

  T1.A

  T2.A

  T3.A *

  T4.A

12.30 pm -
1.30 pm

  Lunch

1.30 pm -
3.00 pm

  T1.B *

  T2.B

  T3.B

  T4.B


* Tutorials includes hands-on components.

9.00 am - 12.30 pm

Track 1

T1.A "Quantum Communications and Networking"

by Saikat Guha (University of Maryland)

Track 2

T2.A "Algorithms with Predictions in Queueing: Challenges and Open Problems (Especially for LLMs)"

by Michael Mitzenmacher (Harvard) and Rana Shahout (Harvard)

Track 3

T3.A "Maximizing LLM Throughput in PyTorch: Optimized Pipelines for Modern Deep Learning Workloads"

by Davis Wertheimer (IBM, USA)

Track 4

T4.A "Recent Advances of Reinforcement Learning in Dynamic Games"

by Zaiwei Chen (Purdue University, USA) and Kaiqing Zhang (University of Maryland, USA)

1.30 pm - 3.00 pm

Track 1

T1.B "PerfVec: Generalizable Performance Modeling using Learned Program and Architecture Representations"

by Lingda Li (Brookhaven National Laboratory), Sairam Sri Vatsavai (Brookhaven National Laboratory), Kuan-Chieh Hsu (Brookhaven National Laboratory)

Track 2

T2.B "Distributional Analysis of Stochastic Algorithms"

by Qiaomin Xie (University of Wisconsin-Madison), Yudong Chen (University of Wisconsin-Madison)

Track 3

T3.B "Utilizing Underlying Data Statistics in Mitigating Heterogeneity and Client Faults in Federated and Collaborative Learning"

by Lili Su (Northeastern University)

Track 4

T4.B "Recent Theoretical Advances in Private Reinforcement Learning"

by Xingyu Zhou (Wayne State University, USA)

Track 1: Systems

Tutorial T1.A: Quantum Communications and Networking

Speakers: Saikat Guha (University of Maryland)

Duration: 3 hours

Abstract: TBA

Speakers biographies:

TBA

Tutorial T1.B: PerfVec: Generalizable Performance Modeling using Learned Program and Architecture Representations

Speakers: Lingda Li (Brookhaven National Laboratory), Sairam Sri Vatsavai (Brookhaven National Laboratory), Kuan-Chieh Hsu (Brookhaven National Laboratory)

Duration: 3 hours

Abstract: Performance modeling is essential for computer architecture research and engineering. Existing performance modeling approaches have limitations, such as high computational cost for discrete event simulators, narrow flexibility of hardware emulators, or restricted accuracy/generality of analytical/data-driven models. To address these limitations, we propose and implement PerfVec, a novel deep learning-based performance modeling framework that learns high-dimensional and independent program and microarchitecture representations. Once learned, a program representation can be used to predict its performance on any microarchitecture, and likewise, a microarchitecture representation can be applied in the performance prediction of any program. Additionally, PerfVec yields foundation models that captures the performance essence of instructions, which can be directly used by developers in numerous performance modeling related tasks without incurring its training cost. Our evaluation demonstrates that PerfVec is more general and efficient than previous approaches.

This tutorial includes three sessions. First, we will present an overview of PerfVec's methodology. Then, we will introduce how to use PerfVec for performance prediction step by step, with hands-on exercises. At last, we will introduce several applications of PerfVec, including design space exploration and program performance analysis.

Speakers biographies:

Lingda Li is a computer scientist at Brookhaven National Laboratory (BNL). He is generally interested in computer architecture and programming model research, with recent focuses on performance simulation/modeling, memory system, and machine learning. Before joining BNL, he worked at the Department of Computer Science of Rutgers University as a postdoc to carry out GPGPU research between 2014 and 2016, He obtained PhD in computer architecture from the Microprocessor Research and Development Center, Peking University in 2014. He has taught this tutorial internally and to people from other research labs in private settings. He has also taught graduate courses as a guest instructor as well as summer classes at Stony Brook University.

Sairam Sri Vatsavai is a Postdoctoral Research Associate at Brookhaven National Laboratory, where he is part of the Systems, Architectures, and Emerging Technologies group. His research focuses on leveraging AI to develop performance modeling and simulation frameworks for hardware accelerators in High-Performance Computing (HPC) systems. Sairam earned his PhD in Electrical Engineering from the University of Kentucky, USA, in 2024. His PhD thesis focused on the co-design of photonic integrated circuits (PIC)-based AI accelerator systems, addressing challenges in scalability, energy efficiency, and reconfigurability. He completed his bachelor's degree at the Jawaharlal Institute of Technology, India. His research interests span a broad spectrum of advanced computing technologies, including performance modeling and simulation, emerging technology-based hardware accelerators, in-memory computing, in-materio computing, and computer architecture. He is also interested in applying AI/ML to modeling, system design and optimization of next-generation HPC systems. In addition to his research, Sairam have conducted labs and delivered lectures in Embedded Systems Lab, Computer Architecture, and Introduction to VLSI courses for sophomore and junior undergraduate students in the Department of Electrical and Computer Engineering.

Kuan-Chieh Hsu is a postdoctoral research associate at Brookhaven National Laboratory in the Department of Computational Science Initiative, where he focuses on performance modeling. He earned his Ph.D. from the University of California, Riverside, with expertise in heterogeneous computing, parallel computing, and high-performance computing (HPC). His doctoral research emphasized democratizing AI-oriented tensor accelerators for general-purpose computing, leveraging holistic system design to achieve broad usability and efficient computation. Beyond his current role, he has served as a teaching assistant for courses such as Computer Organization and Advanced Computer Architecture and has lectured on the Design and Architecture of Computer Systems. Previously, he also gained diverse experience developing and deploying AI models for natural sciences and teaching STEM education.



Track 2: Algorithm Analysis

Tutorial T2.A: Algorithms with Predictions in Queueing: Challenges and Open Problems (Especially for LLMs)

Speakers: Michael Mitzenmacher (Harvard) and Rana Shahout (Harvard)

Duration: 3 hours

Abstract: The area of algorithms with predictions, or learning-augmented algorithms, considers the setting where an algorithm is given advice in the form of predictions, such as from a machine-learning model. For example, when scheduling jobs, one could obtain a prediction of each job's service time. Queueing systems present many opportunities for applying learning-augmented algorithms, raising numerous open questions about how predictions can be effectively leveraged to improve scheduling decisions. Several recent studies have started the exploration of queues with predicted service times instead of exact ones, typically aiming to minimize the average time a job spends in the system. We start by reviewing this recent work, highlighting the potential effectiveness of predictions and providing a collection of open questions regarding the performance of queueing systems using predictions.

We then move to consider an important practical example, Large Language Model (LLM) systems, which present novel scheduling challenges and highlight the need for learning-augmented algorithms to optimize performance. Performance metrics in LLM systems have expanded beyond traditional measures to include multi-dimensional objectives such as cost (e.g., energy consumption), runtime, and response quality. At inference time, requests (jobs) in LLM systems are inherently complex; they have variable inference times, dynamic memory footprints due to key-value (KV) memory limitations, and multiple preemption approaches that affect performance differently. We provide background on the important aspects of scheduling in LLM systems, and introduce new models and open problems that arise from them. We argue that there are significant opportunities for applying insights and analysis from queueing theory to scheduling in LLM systems.

Speakers biographies:

Michael Mitzenmacher is a Professor of Computer Science in the School of Engineering and Applied Sciences at Harvard University. His research interests include the design and analysis of algorithms, particularly for networks, databases, and other systems. He has authored or coauthored over 250 conference and journal publications on a variety of topics, including coding theory, queueing theory, compression, sketch data structures, and algorithms with predictions. He is the coauthor of Probability and Computing, a textbook on randomized algorithms and probabilistic techniques in computer science. He is a Fellow of the ACM and IEEE, and received the ACM Paris Kanellakis Theory and Practice Award for his contributions to the "The Power of Two Choices".

Rana Shahout is a Postdoctoral Fellow at Harvard University. Her research focuses on developing theoretical foundations and tailored solutions to critical challenges in data systems and applications. She earned her Ph.D. in Computer Science from the Technion, during which she interned at Yahoo! and Cornell Tech. Before her doctoral studies, she worked at Mellanox.

Tutorial T2.B: Distributional Analysis of Stochastic Algorithms

Speakers: Qiaomin Xie (University of Wisconsin-Madison), Yudong Chen (University of Wisconsin-Madison)

Duration: 1.5 hours

Abstract: Stochastic algorithms power modern machine learning and optimization. They instantiate as stochastic gradient descent for loss minimization, stochastic descent-ascent for min-max optimization, TD/Q-Learning for reinforcement learning, stochastic approximation for fixed-point problems, and methods for stochastic Variational Inequalities. Together with their many variants, these algorithms become increasingly vital in today's large-scale problems with finite noisy data. It is of imminent interest to obtain fine-grained characterization of stochastic algorithms and enhance their sample and computational efficiencies.

Traditionally, stochastic algorithms are treated as the noisy versions of their deterministic counterparts, viewing their stochasticity as a nuisance. Prior work thus focuses on controlling the stochastic fluctuation of the iterates, both in terms of algorithm design (e.g., use of diminishing stepsizes) and analysis (e.g., bounding mean squared errors). A recent line of work deviates from the above paradigm and embraces the probabilistic behavior of stochastic algorithms. By viewing the iterate sequence as a stochastic process, its distribution can be studied using modern tools from Markov chain and stochastic analysis, which provides fine-grained characterization of the behaviors of the algorithms.

In this tutorial, we will present an overview of these new techniques and results for distributional analysis of stochastic algorithms. Three main ingredients of this approach will be covered. (1) Establishing finite-time distributional convergence of the iterates and relating their ergodicity properties to the characteristics of the problem instance, algorithm and data. (2) Characterization of the steady-state distribution of the iterates using the techniques of coupling and basic adjoint relationship. (3) Leveraging the precise probabilistic characterization for stepsize scheduling, variance reduction, bias refinement and efficient statistical inference. Our focus will be on the constant stepsize paradigm popular among practitioners, and emphasize disentangling the deterministic and stochastic behaviors of the algorithms as well as their transient and long-run behaviors. We will cover both background and the fundamental, state-of-the-art results, and applications in concrete optimization and RL problems. Open issues and potential future directions will also be discussed.

Speakers biographies:

Qiaomin Xie is an assistant professor in the Department of Industrial and Systems Engineering at the University of Wisconsin-Madison. Her research interests lie in the fields of reinforcement learning, applied probability, game theory, and queueing theory, with applications to computer and communication networks. She was previously a visiting assistant professor at the School of Operations Research and Information Engineering at Cornell University. Prior to that, she was a postdoctoral researcher with LIDS at MIT. Qiaomin received her Ph.D. in Electrical and Computing Engineering from the University of Illinois Urbana-Champaign in 2016. She received her B.S. in Electronic Engineering from Tsinghua University. She is a recipient of the NSF CAREER Award, the JPMorgan Faculty Research Award, Google Systems Research Award, and the UIUC CSL Ph.D. Thesis Award.

Yudong Chen is currently an Associate Professor in the Department of Computer Sciences, University of Wisconsin-Madison. Previously he was an Associate Professor in the School of Operations Research and Information Engineering, Cornell University. He received the B.S. and M.S. degrees in control science and engineering from Tsinghua University and the Ph.D. degree in electrical and computer engineering from The University of Texas at Austin. His research works lie in machine learning, reinforcement learning, high-dimensional statistics, and optimization. His research has been recognized by the National Science Foundation CAREER Award, the Best Student Paper Award from ACM SIGMETRICS, the Best Student Paper Prize from the INFORMS Applied Probability Society, and the 2nd Place in the INFORMS George Nicholson Student Paper Competition.

Track 3: Machine Learning Performance

Tutorial T3.A: Maximizing LLM Throughput in PyTorch: Optimized Pipelines for Modern Deep Learning Workloads

Speaker: Davis Wertheimer (IBM, USA)

Duration: 3 hours

Abstract: As modern deep learning models grow larger and more data-hungry, efficient and scalable deployment becomes ever more of an underpinning necessity. This tutorial explores state-of-the-art techniques for maximizing model throughput in PyTorch, with a focus on data loading optimization, enhanced model computation, distributed training strategies, and speculative decoding for efficient inference.

Attendees will gain practical insights into integrating scalable distributed dataloaders, leveraging Fully Sharded Data Parallel (FSDP) large-scale training, performance gains through torch.compile, and implementing speculative decoding to accelerate autoregressive generation tasks. This tutorial provides a mix of theoretical concepts and hands-on demonstrations, enabling participants to optimize their training and inference pipelines effectively.

Speakers biography:

Davis Wertheimer is an AI researcher at IBM. He earned his Ph.D. in Computer Science at Cornell University in 2022, conducting research under Bharath Hariharan on few-shot learning and machine learning under constraints. He now researches and develops AI models at scale for IBM, training and accelerating large language models (machine learning under a very different set of constraints!). Fractal artist and jewelry designer.

Tutorial T3.B: Utilizing Underlying Data Statistics in Mitigating Heterogeneity and Client Faults in Federated and Collaborative Learning

Speakers: Lili Su (Northeastern University)

Duration: 1.5 hours

Abstract: Federated and collaborative learning offers a high-precision learning solution for challenging situations where isolated clients or tasks lack sufficient data to train effective machine learning (ML) models. It has gained tremendous popularity in various application domains such as autonomous vehicles and assistive multi-agent autonomy. In contrast to traditional distributed learning, which operates in well-controlled deployment environments, federated and collaborative learning often involves edge or end devices that operate in highly uncertain open environments. As a result, they are more susceptible to unpredictable uncertainties, including connection instability, availability issues, unstructured faults, and even adversarial attacks. Despite the growing research interest in this field, most theoretical advancements in federated and collaborative learning adopt an optimization-focused perspective, often neglecting the underlying statistical structure of the data. This can result in a disconnect between theoretical pessimism and the empirical success observed in practice. This tutorial will explore recent advances and open challenges in developing formally assured resilience against complex, dynamic faults and adversarial attacks, including those with intricate time and space correlations.

Speakers biographies:

Lili Su is an Assistant Professor in the Electrical and Computer Engineering department with a courtesy appointment in the Khoury College of Computer Sciences at Northeastern University. She is a director of the Center for Signal Processing, Imaging, Reasoning, and Learning (SPIRAL) at Northeastern University, which is home to 9 tenure-track faculty members. She received her M.Sc. (2014) and Ph.D. (2017) in ECE from UIUC. Prior to joining Northeastern, she was a postdoc at MIT CSAIL. Her research intersects distributed systems resilience and efficiency, distributed learning, federated learning, and multi-agent systems. She received the 2023 NSF Career Award and the 2022 Sony Faculty Innovation Award. She was recognized as a Rising Star in EECS in 2018. Dr. Su is a guest editor of ACM Transactions on Modeling and Performance Evaluation of Computing Systems special issue on Federated Learning. In addition, she is the runner-up of the Best Student Paper Award from DISC 2016, and she received the 2015 Best Student Paper Award from SSS 2015.

Track 4: Reinforcement Learning

Tutorial T4.A: Recent Advances of Reinforcement Learning in Dynamic Games

Speakers: Zaiwei Chen (Purdue University, USA) and Kaiqing Zhang (University of Maryland, USA)

Duration: 3 hours

Abstract: Multi-agent reinforcement learning (MARL) addresses the critical challenge of decision-making in environments where multiple agents interact strategically, a scenario common in domains such as autonomous systems, economics, and large-scale resource management. This tutorial explores recent theoretical advancements in reinforcement learning (RL) for dynamic games. It begins by introducing the underlying models, solution concepts like Nash equilibria, and classical planning and learning algorithms. The discussion then transitions to recent breakthroughs in sample-efficient learning dynamics and computational complexity. Finally, the tutorial emphasizes the importance of independent learning, using zero-sum games as a benchmark to showcase provably efficient algorithms with last-iterate finite-sample guarantees. By providing a comprehensive overview of MARL theory, this tutorial aims to inspire further research and exploration in RL for dynamic systems.

Speakers biographies:

Zaiwei Chen is an assistant professor in the Edwardson School of Industrial Engineering at Purdue University. Previously, he was a postdoctoral researcher in the Computing + Mathematical Sciences Department at Caltech. He received a Ph.D. in Machine Learning, an M.S. in Mathematics, and an M.S. in Operations Research, all from Georgia Tech, and a B.S. in Electrical Engineering at Chu Kochen Honors College, Zhejiang University, China. His research interests include Optimization, Applied Probability, Control, Reinforcement Learning, and Learning in Games. His Ph.D. thesis received the Georgia Tech Sigma Xi Best Ph.D. Thesis Award and was selected as a runner-up for the 2022 SIGMETRICS Doctoral Dissertation Award. Additionally, he has been honored with the Simoudis Discovery Prize and holds the title of PIMCO Postdoctoral Fellow in Data Science. Zaiwei has jointly delivered tutorials broadly on the Lyapunov-drift method for studying stochastic approximation and reinforcement learning at the Workshop on the Structure of Constraints in Sequential Decision-Making, held by the Simons Institute for the Theory of Computing in October 2022, and at ACM SIGMETRICS in June 2021.

Kaiqing Zhang is currently an assistant professor at the Department of Electrical and Computer Engineering (ECE) at the University of Maryland, College Park, with affiliations at the Department of Computer Science, Center for Machine Learning, and Applied Mathematics & Statistics, and Scientific Computation. He was a postdoctoral scholar affiliated with LIDS and CSAIL at MIT, and a Research Fellow at Simons Institute for the Theory of Computing at Berkeley. He finished his Ph.D. from the Department of ECE and CSL at the University of Illinois at Urbana-Champaign (UIUC), M.S. in both ECE and Applied Math from UIUC, and B.E. from Tsinghua University. His research interests lie broadly in Control and Decision Theory, Game Theory, Robotics, Reinforcement/Machine Learning, Computation, and their intersections. His work has been acknowledged by the Coordinated Science Laboratory Thesis Award, Simons-Berkeley Research Fellowship, and  ICML Outstanding Paper Award, among with several other fellowships/awards. Kaiqing has jointly delivered invited tutorials on Reinforcement Learning for Control at the Learning for Dynamics & Control (L4DC) Conference, the European Control Conference (ECC), and as a Workshop at the American Control Conference (ACC). He has also delivered an invited tutorial particularly on multi-agent RL at an ETH/EPFL Summer School 2024.

Tutorial T4.B: Recent Theoretical Advances in Private Reinforcement Learning

Speaker: Xingyu Zhou (Wayne State University, USA)

Duration: 1.5 hours

Abstract: This tutorial provides an in-depth overview of recent theoretical advances in differentially private reinforcement learning (RL), a critical area as RL systems increasingly integrate into privacy-sensitive domains such as healthcare, education, and personalized digital services. Unlike supervised learning, private RL presents unique challenges due to its interactive, sequential structure and the need to balance privacy with effective exploration-exploitation strategies. We will proceed through private multi-armed bandits (MABs), contextual bandits, and general RL under various differential privacy models (e.g., central, local, and distributed models), highlighting the fundamental privacy costs in sample complexity (e.g., information-theoretic regret lower bounds) and three key algorithmic techniques that often enable us to achieve nearly matching upper bounds on regret. Along the way, we will also present several separation results that are unique in private RL settings. As a real-world application, we will explore recent work on reinforcement learning with human feedback (RLHF) to align large language models (LLMs) while satisfying formal privacy guarantees. Finally, we will conclude with an overview of fundamental open problems and exciting directions for future research in private RL, aiming to inspire further advancements in this timely and impactful area.

Speakers biography:

Xingyu Zhou is an Assistant Professor at ECE of Wayne State University, Detroit, USA. He received his Ph.D. from Ohio State University (advised by Ness Shroff), and his master's and bachelor's degrees from Tsinghua University, and BUPT, respectively (all with the highest honors). His research interests include machine learning (e.g., bandits, reinforcement learning), stochastic systems, and applied probability (e.g., load balancing). Currently, he is particularly interested in online decision-making with formal privacy and/or robustness guarantees. His recent works on private and robust RL have been published in top-tier conferences including NeurIPS, ICML, ICLR, AAAI, SIGMETRICS, AISTATS, etc. His research has not only led to several invited talks at Caltech, CMU, and UCLA but also won the Best Student Paper Award and Runner-up awards. He is also the recipient of various other awards, including the NSF CRII award, the Presidential Fellowship at OSU, the Outstanding Graduate Award of Beijing City, the National Scholarship of China, and the Academic Rising Star Award at Tsinghua University. He has been a TPC for conferences and workshops, including SIGMETRICS, MobiHoc, INFOCOM, TPDP, etc, and an Area Chair of NeurIPS.