Topics in Data-Intensive Computing Systems
Instructor: Matei Ripeanu
Schedule: Tue/Thu, 4:30-6:00
Location:
Mailing list:
Course description
Modern science is often data-intensive: large-scale simulations, new scientific instruments, and deployed sensor networks all generate impressive volumes of data that often need to be analyzed by large, geographically dispersed user communities in fields as diverse as genomics and high-energy physics.
This graduate course will cover fundamentals of data management in distributed systems, large‑scale data storage systems, and their interaction with data‑intensive computing systems. Advances in all these directions are the foundation of recent efforts to build cyber-infrastructure.
The course will explore solutions to provide fast access to data and to improve data availability and durability under various consistency, scale, and component failure regime constraints. Students will be exposed to a range of distributed data storage techniques from traditional (distributed) file-systems, to cooperative internet proxy caches, to peer-to-peer file‑sharing, to virtual data concepts and their integration with massive computing systems.
Course structure
Three hours of classes per week,
with time divided roughly in equally between traditional lectures and student
presentations/group discussions of recent research results.
Course outline (tentative weekly topics)
Team project
Each team (2-3 members) examines a particular distributed systems topic focusing on data related issues. While a set of projects will be proposed, students are encouraged to define a project of their own: either characterize an existing system, or propose and evaluate techniques to improve existing systems, or prototype a new system. Note that it is critical that students present why a particular approach is used and how it contributes with rational explanation based on scientific or engineering knowledge leveraged by the literature search. The result is evaluated by both the report in a standard form of IEEE publications and oral presentation.
References
Books (recommended):
Journals:
Conferences:
Grading
Research paper reviews, class participation: 50%
Project report and presentation: 50%