Topics in Distributed Systems – Massively parallel/distributed computing platforms


Instructor: Matei Ripeanu 

TA: Samer Al-Kiswani

Schedule: Monday 5:00-8:00

Location: KAIS4018 (might change)



[01/09] Subscribe to mailing list by emailing with “subscribe” in the body of the message or by visiting

[01/16] Register with the H2O system and join project EECE571R-09. To submit your paper review please go to the appropriate “Rotisserie Discussion”. Reviews for Monday class are due by midnight on Sunday.

Course description

Efficiently harnessing massively parallel computing platforms and providing predictable levels of service is an outstanding challenge for distributed systems research. This graduate-level course uses an inclusive definition for massively parallel platforms to include: silicon-level platforms (e.g., massively multi‑core chips like IBM’s cell processors or nVidia’s graphical processing units), massively multiprocessor architectures (e.g., BlueGene supercomputers) and wide-area distribute systems.  All these platforms have in common that they aggregate a large number of processing elements to offer a huge computing potential. Yet the issues that make it difficult to fully materialize this potential are numerous: they relate to minimizing the computational overheads of parallel and distributed applications, providing predictable performance at multiple levels of the computing stack, energy efficiency, usability (e.g., programming language support for data-parallel applications), or maintainability (e.g., ability to efficiently identify and repair problems).

The course will cover fundamentals of massively multi-core processor architecture, operating system and programming language support for parallel hardware and parallel applications, system support for debugging parallel applications, impact on emerging hardware trends on large-scale data processing system design. Advances in all these directions are key ingredients for recent efforts to build cyber‑infrastructure. Students will be exposed to a range of technologies from multi-core processors (nVidia GPUs), to operating system support (support for coprocessors), and distributed systems (support for data parallel applications) and their integration with massive computing systems.

Course format

The course is structured to provide (i) an in-depth understanding of current topics in large-scale, distributed system research; (ii) experience with reviewing and presenting advanced technical material; (iii) exercising writing and critically reviewing research papers. The class workload has a participation component and a final project.

·        Participation. In each class we discuss two or more research papers. Read the papers before class (be an efficient reader!)  and write a review for each paper that includes the following:

1. State the main contribution of the paper

2. Critique the main contribution. 

a.        Rate the significance of the paper on a scale of 5 (breakthrough), 4 (significant contribution), 3 (modest contribution), 2 (incremental contribution), 1 (no contribution or negative contribution). More importantly: Explain your rating in a sentence or two.

b.        Rate how convincing the methodology is. You may consider some of the following questions (use what is relevant): Do the claims and conclusions follow from the experiments? Are the assumptions realistic? Are the experiments well designed? Are there different experiments that would be more convincing? Are there other alternatives the authors should have considered? (And, of course, is the paper free of methodological errors?)

c.         What are the most important limitations of the approach?

3. What are the two strongest and/or most interesting ideas in the paper?

4. What are the two most striking weaknesses in the paper?

5. Name two questions that you would like to ask the authors.

6. Detail an interesting extension to the work not mentioned in the future work section.

7. Optional comments on the paper that you’d like to see discussed in class.

Reviews must be submitted by midnight the day before the class to the relevant Rotisserie Discussion on H2O. Papers are discussed in class. Discussions will be lead by one or more students and may include a brief (10-minute) presentation of the paper. Discussion leaders do not need to submit reviews, but they need to: (a) Prepare discussion plan, (b) Post a brief discussion summary on H2O based on in-class discussions (due before the following class).

·        Project: The final project is an opportunity for hands-on research in distributed systems. It involves literature survey, programming, running experiments or analytical modeling, analyzing results and writing a 10-page report. A list of project ideas is posted, but students are highly encouraged to propose topics of their own interest.  Teams of two students are highly recommended. Please see me if you want to form a larger team.


Schedule (tentative):

Last years’ course schedules can be found here: 2008 (topic: quality or service); 2007 (topic: data-intensive computing systems)



Topic / Project steps

Papers / Other links



Introduction. Overview of current research problems. Amdahl’s law. [slides]

§      Amdahl's Law in the Multicore Era, Mark D. Hill and Michael R. Marty, IEEE Computer, July 2008. [pdf]

§      Mitigating Amdahl’s Law through EPI Throttling, Murali Annavaram, Ed Grochowski, John Shen, 298-309, ISCA'05 [pdf]



Silicon platforms: GPUs, cell processors, Intel’s massively multicore architecture, SSEs (Abdullah)


[Project: 5-min presentations, discussion of project themes.]


§      Optimization principles and application performance evaluation of a multithreaded GPU Using CUDA, Shane Ryoo, Christopher Rodrigues, Sara Baghsorkhi, Sam Stone, David Kirk and Wen- mei Hwu, PPoPP’08 [pdf]

§      Dynamic Multigrain Parallelization on the Cell Broadband Engine,  F. Blagojevic, D. Nikolopoulos,  A. Stamatakis, C. Antonopoulos, PPoPP’07 [pdf]


§      Programming the Intel 80-core Network-on-a-chip Terascale Processor, SC’08 [pdf][Larabee architecture]

§      Exploring the cache design space for large scale CMPs, L. Hsu, R. Iyer, S. Makineni, S. Reinhardt, D. Newell, pp 24-33, ACM SIGARCH Computer Architecture News, 33(4), Nov.’05, [pdf]

§      CMP Design Space Exploration Subject to Physical Constraints, Yingmin Li, Benjamin Lee, David Brooks, Zhigang Hu, Kevin Skadron, HPCA’06 [pdf]




Multiprocessors (cluster-based computing, clouds) (Elizeu)



§      Evaluating synchronization on shared address space multiprocessors: methodology and performance,  Sanjeev Kumar, Dongming Jiang, Rohit Chandra, Jaswinder Pal Singh, SIGMETRICS’99 [pdf]

§      Entering the Petaflop Era: The Architecture and Performance of Roadrunner, SC08 [pdf]


§      Early Evaluation of BlueGene/P, SC08 [pdf]

§      Designing a highly-scalable operating system: The Blue Gene/L story [pdf]

§      Are non-blocking networks really needed for high-end-computing workloads? N. Desai, P. Balaji, P.   Sadayappan, M. Islam, Cluster Computing’08, [pdf]

§      Adapting a Message-Driven Parallel Application to GPU-Accelerated Clusters, SC08 [pdf][slides]




Massively-distributed (Internet-scale) platforms (Debojit)


[Project: submit a two-page proposal by Sunday 02/01]

§      Experiences Building PlanetLab, Larry Peterson, Andy Bavier, Marc E. Fiuczynski, and Steve Muir, OSDI’06

§      Network Coordinates in the Wild Jonathan Ledlie, Paul Gardner, Margo Seltzer, NSDI’07







Issues (1): OS support (Sriram)


§      Corey: An Operating System for Many Cores, Silas Boyd-Wickizer, Haibo Chen, Rong Chen, Yandong Mao, Frans Kaashoek, Robert Morris, Aleksey Pesterev; Lex Stein, Ming Wu, Yuehua Dai, Yang Zhang; Zheng Zhang, OSDI’08 [pdf]

§      Tapping into the Fountain of CPUs - On Operating Systems Support for Programmable Devices, Pete Wyckoff, Muli Ben-Yehuda, Yaron Weinsberg, Danny Dolev, Tal Anker, ASPLOS’08 [pdf]


§      SMARTMAP: Operating System Support for Efficient Data Sharing among Processes on a Multi-core Processor, SC08 [pdf]



[Project: midterm presentations, submit an up to five-page midterm report including related work. Deadline: Sunday 03/08]




Issues (2): Optimizations / Extracting maximum performance (Maliha)



§      Benchmarking GPUs to tune dense linear algebra, SC08 [pdf]

§      A Tuning Framework for Software-Managed Memory Hierarchies, Manman Ren, Ji Young Park, Mike Houston, Alex Aiken and William Dally, PACT 2008 [pdf]


§      A comparison of programming models for multiprocessors with explicitly managed memory hierarchies, Schneider et al., PPoPP '09. [pdf]

§      A Case Study in SIMD Text Processing with Parallel Bit Streams, Robert D. Cameron, PPoPP’08. slides

§      Massive Parallel LDPC Decoding on GPU Gabriel Falcão, Leonel Sousa and Vitor Silva, PPoPP08, slides

§      Toward Terabyte Data Mining: An Architecture Conscious Solution, Gregory Buehrer, Srinivasan Parthasarathy, Tahsin Kurc, Joel Saltz, and Shirish Tatikonda, PPoPP’07




Issues (3): Support for efficient applications. Data-parallel processing (Lauro)



§      DryadLINQ: A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language, Yuan Yu, Michael Isard, Dennis Fetterly, Mihai Budiu, Úlfar Erlingsson, Pradeep Kumar Gunda and Jon Currey (OSDI’2008), PDF

§      Streamware: Programming General-Purpose Multicore Processors Using Streams Jayanth Gummaraju, Joel Coburn, Yoshio Turner, Mendel Rosenblum [pdf]


§      Merge: A Programming Model for Heterogeneous Multi-core Systems, Michael D. Linderman, Jamison D. Collins, Hong Wang, Teresa H. Meng [pdf]

§      Automatic Optimization of Parallel Dataflow Programs, Christopher Olston, Benjamin Reed, Adam Silberstein, and Utkarsh Srivastava [html]

§      Improving MapReduce Performance in Heterogeneous Environments, Matei Zaharia, Andy Konwinski, Anthony D. Joseph, Randy Katz, and Ion Stoica, OSDI’08 [pdf]

§      Executing Irregular Scientific Applications on Stream Architectures, Mattan Erez, Jung Ho Ahn, Jayanth Gummaraju, Mendel Rosenblum, and William J. Dally, [pdf]

Projects: MapReduce; Dryad; DryadLINQ; Hadoop; Swift



Issues (4): Support for debugging (Mohammad)


§      DMP:Deterministic Shared Memory Multiprocessing, Joseph Devietti, Brandon Lucia, Luis Ceze and Mark Oskin, ASPLOS’09.

§      D3S: Debugging Deployed Distributed Systems, Xuezheng Liu and Zhenyu Guo, Xi Wang, Feibo Chen, Xiaochen Lian, Jian Tang, Ming Wu, M. Frans Kaashoek, Zheng Zhang, NSDI’08


§      DieCast: Testing Distributed Systems with an Accurate Scale Model, Diwaker Gupta, Kashi V. Vishwanath, and Amin Vahdat,

§      Lessons Learned at 208K: Towards Debugging Millions of Cores, SC2008 [pdf]

§      DMTracker: Finding Bugs in Large-scale Parallel Programs by Detecting Anomaly in Data Movements, SC07, [pdf]

§      NetComplex: A Complexity Metric for Networked System Designs, Byung-Gon Chun, Sylvia Ratnasamy, Eddie Kohler, NSDI’08 [pdf]



Issues (5): Power (Diana)


§      Reducing Network Energy Consumption via Sleeping and Rate-Adaptation, Sergiu Nedevschi, Lucian Popa, Gianluca Iannaccone, Sylvia Ratnasamy, David Wetherall, NSDI’08

§      Energy-Aware Server Provisioning and Load Dispatching for Connection-Intensive Internet Services, Gong Chen, Wenbo He, Jie Liu and Suman Nath, Leonidas Rigas, Lin Xiao and Feng Zhao, NSDI’08.


§      Power Provisioning for a Warehouse-sized Computer, Xiaobo Fan, ISCA’08 [pdf]

§      No "Power" Struggles: Coordinated Multi-level Power Management for the Data CenterRamya Raghavendra, Parthasarathy Ranganathan, Vanish Talwar, Zhikui Wang, Xiaoyun Zhu, ASPLOS’08 [pdf]

§      PICSEL: Measuring User-Perceived Performance to Control Dynamic Frequency Scaling, Arindam Mallik, Jack Cosgrove, Gokhan Memik, Robert P. Dick, Peter Dinda, ASPLOS’08 [pdf]

§      Feedback-Driven Threading: Power-Efficient and High-Performance Execution of Multi-threaded Workloads on CMPs, M. Aater Suleman, Moinuddin K. Qureshi, Yale N. Patt, ASPLOS’08 [pdf]

§      Managing Energy-Performance Tradeoffs for Multithreaded Applications on Multiprocessor Architectures, Soyeon Park, Weihang Jiang, Sarita Adve, Yuanyuan Zhou, SIGMETRICS’07 [pdf]

§      VPM Tokens: Virtual Machine-Aware Power Budgeting in Datacenters, Ripal Nathuji, Karsten Schwan, HPDC 08.  [pdf]

§      Software-Directed Combined CPU/Link Voltage Scaling for NoC-Based CMPs, M. Kandemir, O. Ozturk SIGMETRICS 08

§      Evaluating memory energy efficiency in parallel I/O workloads, Jianhui Yue   Yifeng Zhu   Zhao Cai, Cluster Computing Conference, 2007 [pdf]



UBC closed




Miscellaneous: Social networks (Elizeu)


1.     Yes, There is a correlation -- From Social Networks to  Personal Behavior on the Web, Singla, P. & Richardson, M., WWW’08 [pdf]

2.     Statistical Properties of Community Structure in Large Social and Information Networks, Leskovec et al., WWW’08, [pdf]

3.     Efficient Network-aware Search in Collaborative Tagging Sites, Amer-Yhaia et al., [pdf]

4.     Social VPNs: Integrating Overlay and Social Networks for Seamless P2P Networking, Figueiredo et al., COPS’08 [pdf]

5.     A Probabilistic Publish-Subscribe System for Social Networks, B. Wong and S. Guha, Quasar, [pdf]

6.     Social networks that matter: Twitter under the microscope, Huberman et al. " [pdf]



[Project: presentations and wrap-up



Other links:

1.    Cluster Interconnect Overview, Brett M. Bode, Jason J. Hill, and Troy R. Benjegerdes [pdf]




§      Consensus Routing: The Internet as a Distributed System, John P. John, Ethan Katz-Bassett, Arvind Krishnamurthy, and Thomas Anderson, Arun Venkataramani,

§      Structured and Unstructured Overlays under the Microscope: A Measurement-based View of Two P2P Systems That People Use, Yi Qiao and Fabián E. Bustamante

§      The Chubby lock service for loosely-coupled distributed systems, Mike Burrows, OSDI’06.

§      Loose Synchronization for Large-Scale Networked Systems, Jeannie Albrecht, Christopher Tuttle, Alex C. Snoeren, and Amin Vahdat,

5.    Server-Storage Virtualization: Integration and Load Balancing in Data Centers. SC08 [pdf]