Topics in Distributed Systems – Massively parallel/distributed
computing platforms
Instructor:
Matei Ripeanu
TA:
Samer Al-Kiswani
Schedule:
Monday 5:00-8:00
Location:
KAIS4018 (might change)
Announcements:
[01/09] Subscribe to eece571r@ece.ubc.ca mailing list by
emailing sympa@ece.ubc.ca with
“subscribe eece571r@ece.ubc.ca”
in the body of the message or by visiting https://oldlists.ece.ubc.ca/
[01/16]
Register with the H2O
system and join project EECE571R-09.
To submit your paper review please go to the appropriate “Rotisserie
Discussion”. Reviews for Monday class are due by midnight on Sunday.
Efficiently harnessing massively
parallel computing platforms and providing predictable levels of service is an
outstanding challenge for distributed systems research. This graduate-level
course uses an inclusive definition for massively parallel platforms to include:
silicon-level platforms (e.g., massively multi‑core chips like IBM’s cell
processors or nVidia’s graphical processing units), massively multiprocessor
architectures (e.g., BlueGene supercomputers) and wide-area distribute
systems. All these platforms have in
common that they aggregate a large number of processing elements to offer a
huge computing potential. Yet the issues that make it difficult to fully
materialize this potential are numerous: they relate to minimizing the
computational overheads of parallel and distributed applications, providing
predictable performance at multiple levels of the computing stack, energy
efficiency, usability (e.g., programming language support for data-parallel
applications), or maintainability (e.g., ability to efficiently identify and
repair problems).
The course will cover fundamentals
of massively multi-core processor architecture, operating system and
programming language support for parallel hardware and parallel applications,
system support for debugging parallel applications, impact on emerging hardware
trends on large-scale data processing system design. Advances in all these
directions are key ingredients for recent efforts to build cyber‑infrastructure.
Students will be exposed to a range of technologies from multi-core processors
(nVidia GPUs), to operating system support (support for coprocessors), and
distributed systems (support for data parallel applications) and their
integration with massive computing systems.
The course is structured to provide (i) an in-depth
understanding of current topics in large-scale, distributed system research;
(ii) experience with reviewing and presenting advanced technical material;
(iii) exercising writing and critically reviewing research papers. The class
workload has a participation component and a final
project.
1. State the main contribution of the
paper
2. Critique the main
contribution.
a.
Rate
the significance of the paper on a scale of 5 (breakthrough), 4
(significant contribution), 3 (modest contribution), 2 (incremental
contribution), 1 (no contribution or negative contribution). More importantly:
Explain your rating in a sentence or two.
b.
Rate
how convincing the methodology is. You may consider some of the
following questions (use what is relevant): Do the claims and conclusions
follow from the experiments? Are the assumptions realistic? Are the experiments
well designed? Are there different experiments that would be more convincing?
Are there other alternatives the authors should have considered? (And, of
course, is the paper free of methodological errors?)
c.
What
are the most important limitations of the approach?
3. What are the two strongest and/or
most interesting ideas in the paper?
4. What are the two most striking
weaknesses in the paper?
5. Name two questions that you would
like to ask the authors.
6. Detail an interesting extension to
the work not mentioned in the future work section.
7. Optional comments on the paper that
you’d like to see discussed in class.
Reviews
must be submitted by midnight the day before the class to the
relevant Rotisserie Discussion on H2O.
Papers are discussed in class. Discussions will be lead by one or more students
and may include a brief (10-minute) presentation of the paper. Discussion
leaders do not need to submit reviews,
but they need to: (a) Prepare discussion plan, (b) Post a brief
discussion summary on H2O based on in-class discussions (due before the
following class).
Schedule
(tentative):
Last years’ course schedules can be found here: 2008 (topic: quality or
service); 2007
(topic: data-intensive computing systems)
|
|
Topic / Project steps |
Papers / Other links |
W1 |
01/12 |
Introduction. Overview of current
research problems. Amdahl’s law. [slides] |
§
Amdahl's Law in the Multicore
Era, Mark D.
Hill and Michael R. Marty, IEEE Computer, July 2008. [pdf] §
Mitigating Amdahl’s Law through EPI Throttling, Murali Annavaram, Ed
Grochowski, John Shen, 298-309, ISCA'05 [pdf] |
W2 |
01/19 |
Silicon
platforms: GPUs, cell processors, Intel’s massively multicore architecture,
SSEs (Abdullah) [Project: 5-min presentations, discussion of project themes.] |
Required §
Optimization
principles and application performance evaluation of a multithreaded GPU
Using CUDA, Shane Ryoo, Christopher Rodrigues, Sara
Baghsorkhi, Sam Stone, David Kirk and Wen- mei Hwu, PPoPP’08 [pdf] §
Dynamic Multigrain
Parallelization on the Cell Broadband Engine, F.
Blagojevic, D. Nikolopoulos, A.
Stamatakis, C. Antonopoulos, PPoPP’07 [pdf] Optional §
Programming the Intel 80-core
Network-on-a-chip Terascale Processor, SC’08 [pdf][Larabee
architecture] §
Exploring the cache design space
for large scale CMPs, L. Hsu, R. Iyer, §
CMP Design Space Exploration
Subject to Physical Constraints, Yingmin Li, Benjamin Lee, David Brooks, Zhigang Hu,
Kevin Skadron, HPCA’06 [pdf] |
W3 |
01/26 |
Multiprocessors (cluster-based
computing, clouds) (Elizeu) |
Required §
Evaluating synchronization on
shared address space multiprocessors: methodology and performance, Sanjeev Kumar, Dongming Jiang, Rohit
Chandra, Jaswinder Pal Singh, SIGMETRICS’99 [pdf] §
Entering the Petaflop Era: The Architecture and Performance of
Roadrunner, SC08 [pdf] Optional §
Early Evaluation
of BlueGene/P, SC08 [pdf] §
Designing a highly-scalable
operating system: The Blue Gene/L story [pdf] §
Are non-blocking networks really needed for high-end-computing
workloads? N. Desai, P. Balaji, P. Sadayappan, M. Islam, Cluster
Computing’08, [pdf] §
Adapting a
Message-Driven Parallel Application to GPU-Accelerated Clusters, SC08 [pdf][slides] |
W4 |
02/02 |
Massively-distributed
(Internet-scale) platforms (Debojit) [Project: submit a two-page proposal by Sunday 02/01] |
§
Experiences
Building PlanetLab, Larry Peterson, Andy Bavier, Marc E. Fiuczynski, and
Steve Muir, OSDI’06 §
Network Coordinates in
the Wild Jonathan Ledlie, Paul Gardner, Margo Seltzer, NSDI’07 |
W5 |
02/09 |
|
|
W6 |
03/02 |
Issues (1): OS support (Sriram) |
Required §
Corey: An Operating System for
Many Cores,
Silas Boyd-Wickizer, Haibo Chen, Rong Chen, Yandong Mao, Frans Kaashoek,
Robert Morris, Aleksey Pesterev; Lex Stein, Ming Wu, Yuehua Dai, Yang Zhang;
Zheng Zhang, OSDI’08 [pdf] §
Tapping into the Fountain of
CPUs - On Operating Systems Support for Programmable Devices, Pete Wyckoff, Muli Ben-Yehuda, Yaron Weinsberg, Danny Dolev, Tal
Anker, ASPLOS’08 [pdf]
Optional §
SMARTMAP: Operating System Support
for Efficient Data Sharing among Processes on a Multi-core Processor, SC08 [pdf] |
W7 |
03/09 |
[Project: midterm presentations, submit an up to five-page midterm
report including related work. Deadline: Sunday 03/08] |
|
W8 |
03/16 |
Issues (2): Optimizations /
Extracting maximum performance (Maliha) |
Required §
Benchmarking GPUs to tune dense
linear algebra, SC08 [pdf] §
A Tuning Framework for
Software-Managed Memory Hierarchies, Manman Ren, Ji Young Park, Mike Houston, Alex Aiken
and William Dally, PACT 2008 [pdf] Optional §
A comparison of programming
models for multiprocessors with explicitly managed memory hierarchies, Schneider et al., PPoPP '09. [pdf] §
A Case Study in SIMD
Text Processing with Parallel Bit Streams, Robert D. Cameron, PPoPP’08. slides §
Massive Parallel LDPC
Decoding on GPU Gabriel Falcão, Leonel Sousa and Vitor
Silva, PPoPP08, slides §
Toward
Terabyte Data Mining: An Architecture Conscious Solution, Gregory Buehrer, Srinivasan
Parthasarathy, Tahsin Kurc, Joel Saltz, and Shirish Tatikonda, PPoPP’07 §
|
W9 |
03/23 |
Issues (3): Support for efficient
applications. Data-parallel processing (Lauro) |
Required § DryadLINQ: A System for General-Purpose
Distributed Data-Parallel Computing Using a High-Level Language, Yuan Yu, Michael Isard, Dennis
Fetterly, Mihai Budiu, Úlfar Erlingsson, Pradeep Kumar Gunda and Jon Currey
(OSDI’2008), PDF § Streamware: Programming General-Purpose Multicore Processors Using
Streams Jayanth Gummaraju, Joel Coburn, Yoshio Turner, Mendel Rosenblum [pdf] Optional §
Merge: A Programming Model for
Heterogeneous Multi-core Systems, Michael D.
Linderman, Jamison D. Collins, Hong Wang, Teresa H. Meng [pdf] § Automatic Optimization of Parallel Dataflow Programs, Christopher Olston, Benjamin
Reed, Adam Silberstein, and Utkarsh Srivastava [html] § Improving MapReduce Performance in Heterogeneous Environments, Matei Zaharia, Andy Konwinski,
Anthony D. Joseph, Randy Katz, and Ion Stoica, OSDI’08 [pdf] § Executing Irregular Scientific
Applications on Stream Architectures, Mattan Erez, Jung Ho Ahn, Jayanth
Gummaraju, Mendel Rosenblum, and William J. Dally, [pdf] |
W10 |
03/30 |
Issues (4): Support for debugging
(Mohammad) |
Required §
DMP:Deterministic
Shared Memory Multiprocessing, Joseph Devietti, Brandon Lucia, Luis Ceze
and Mark Oskin, ASPLOS’09. §
D3S:
Debugging Deployed Distributed Systems, Xuezheng Liu and Zhenyu Guo, Xi Wang, Feibo Chen,
Xiaochen Lian, Jian Tang, Ming Wu, M. Frans Kaashoek, Zheng Zhang, NSDI’08 Optional § DieCast: Testing
Distributed Systems with an Accurate Scale Model, Diwaker Gupta, Kashi V.
Vishwanath, and Amin Vahdat, §
Lessons Learned at 208K: Towards Debugging Millions of Cores, SC2008
[pdf] §
DMTracker: Finding Bugs in Large-scale Parallel Programs by
Detecting Anomaly in Data Movements, SC07, [pdf] §
NetComplex: A Complexity Metric for Networked System Designs,
Byung-Gon Chun, Sylvia Ratnasamy, Eddie Kohler, NSDI’08 [pdf] |
W11 |
04/06 |
Issues (5): Power (Diana) |
Required § Reducing
Network Energy Consumption via Sleeping and Rate-Adaptation, Sergiu Nedevschi, Lucian Popa,
Gianluca Iannaccone, Sylvia Ratnasamy, David Wetherall, NSDI’08 §
Energy-Aware Server
Provisioning and Load Dispatching for Connection-Intensive Internet Services, Gong Chen, Wenbo He, Jie Liu and
Suman Nath, Leonidas Rigas, Lin Xiao and Feng Zhao, NSDI’08. Optional
§ Power Provisioning for a Warehouse-sized Computer, Xiaobo Fan, ISCA’08 [pdf] §
No "Power" Struggles:
Coordinated Multi-level Power Management for the Data Center, Ramya Raghavendra, Parthasarathy Ranganathan, Vanish Talwar, Zhikui
Wang, Xiaoyun Zhu, ASPLOS’08 [pdf] § PICSEL: Measuring User-Perceived Performance to Control Dynamic
Frequency Scaling,
Arindam Mallik, Jack Cosgrove, Gokhan
Memik, Robert P. Dick, Peter Dinda, ASPLOS’08 [pdf] §
Feedback-Driven Threading:
Power-Efficient and High-Performance Execution of Multi-threaded Workloads on
CMPs, M. Aater Suleman, Moinuddin K. Qureshi, Yale N.
Patt, ASPLOS’08 [pdf] §
Managing Energy-Performance
Tradeoffs for Multithreaded Applications on Multiprocessor Architectures, Soyeon Park, Weihang Jiang,
Sarita Adve, Yuanyuan Zhou, SIGMETRICS’07 [pdf] §
VPM Tokens: Virtual
Machine-Aware Power Budgeting in Datacenters, Ripal Nathuji, Karsten Schwan,
HPDC 08. [pdf] §
Software-Directed Combined CPU/Link Voltage Scaling for NoC-Based
CMPs, M. Kandemir, O. Ozturk SIGMETRICS 08 §
Evaluating memory energy efficiency in parallel I/O workloads,
Jianhui Yue Yifeng Zhu
Zhao Cai, Cluster Computing Conference, 2007 [pdf] |
|
04/13 |
UBC closed |
|
W12 |
04/20 |
Miscellaneous: Social networks (Elizeu) |
1.
Yes, There is a correlation -- From
Social Networks to Personal Behavior
on the Web,
Singla, P. & Richardson, M., WWW’08 [pdf] 2.
Statistical Properties of Community
Structure in Large Social and Information Networks, Leskovec et al., WWW’08, [pdf] 3.
Efficient Network-aware Search in
Collaborative Tagging Sites, Amer-Yhaia et al., [pdf] 4.
Social VPNs: Integrating Overlay and
Social Networks for Seamless P2P Networking, Figueiredo et al., COPS’08 [pdf] 5.
A Probabilistic Publish-Subscribe System
for Social Networks,
B. Wong and 6.
Social networks that matter: Twitter under
the microscope,
Huberman et al. " [pdf] |
W13 |
04/27 |
[Project: presentations and wrap-up |
|
Other links:
1.
Cluster Interconnect
Overview, Brett M. Bode, Jason J. Hill, and Troy R. Benjegerdes [pdf]
2.
http://usenix.org/events/usenix07/tech/slides/treese.pdf
3.
http://www.usenix.org/publications/login/2008-10/openpdfs/walker.pdf
4.
http://www.allthingsdistributed.com/2008/12/eventually_consistent.html#more
§
Consensus Routing: The
Internet as a Distributed System, John P. John, Ethan Katz-Bassett,
Arvind Krishnamurthy, and Thomas Anderson, Arun Venkataramani,
§
Structured and Unstructured
Overlays under the Microscope: A Measurement-based View of Two P2P Systems That
People Use, Yi Qiao and Fabián E. Bustamante
§
The Chubby lock
service for loosely-coupled distributed systems, Mike Burrows, OSDI’06.
§
Loose Synchronization for Large-Scale Networked
Systems, Jeannie
Albrecht, Christopher Tuttle, Alex C. Snoeren, and Amin Vahdat,
5.
Server-Storage
Virtualization: Integration and Load Balancing in Data Centers. SC08 [pdf]