Computer Science Department, University of Crete
HY-590.45. Modern Topics in Scalable Storage Systems

info | readings | syllabus | other resources

Course Staff

Name Email Office Hours
Instructor: Kostas Magoutis hy590-45@csd By appt./H-311
Teaching Assistant: Antonis Papaioannou hy590-45@csd By appt.

General Information

The course meets on Tuesdays 2-3pm and Thursdays 2-4pm in H-208. There will be occasional makeup classes, whose dates will be announced in advance. See schedule for exact dates.

Announcements

2.4.2020 18:00: There will be no class on Tuesday 11/2

1.9.2019 10:00: You are welcome to get in touch with the instructor to discuss course-related issues

Course Description

The explosive growth of information processing services in recent years has created an unprecedented need for storage capacity. Scalable access to storage resources requires a class of distributed systems designed for fast, reliable, and uninterrupted access to storage media (e.g., magnetic disks and tapes) over high-speed networks. This course offers an introduction to scalable storage systems and examines existing design techniques as well as current research problems in the design and implementation of such systems, along with possible solutions.

Some of the advantages of the scalable storage model over direct-attached storage include expandable capacity and performance, as well as improved utilization and sharing of distributed storage resources. A number of challenges, however, are facing the scalable storage systems architect: First, it is the higher complexity (compared to direct-attached storage) due to the distributed nature of the scalable storage system. Administration, capacity planning, configuration, backup, and disaster recovery are complicated in large-scale scalable storage systems. Second, transferring data over the network requires stronger security and safety guarantees than when transferring them on the system I/O bus. In addition, it sometimes requires new, storage-specific network transport protocols. These and other challenges make scalable storage an exciting research area that has made significant advances in recent years.

The core part of the course focuses on the study of scalable storage systems with special emphasis on architectures, design principles for scalable performance, reliability, and availability, the management of data during their lifecycle, application-specific design concepts, ways to reduce implementation cost, storage system capacity planning, and storage outsourcing services.

This course is targeted for graduate students and advanced undergraduates and requires the undertaking of a research project. The topics of the research projects will be chosen with the help and guidance of the course staff.

Coursework

Prerequisites

Grading

The final grade depends on class participation, an in-class quizz, and a research project.

Readings

There are a number of paper readings that are available online. You are expected to read the papers before the beginning of each class.

There is no required textbook for this class. The following textbooks, however, are recommended readings:

Syllabus

Date Topic Readings
Tue 4/2 Course overview -
Thu 6/2 Background See recommended readings
Tue 11/2 Instructor out of town, no class -
Thu 13/2 NFS Sandberg: Design and Implementation of the Sun Network Filesystem
Tue 18/2 NFS (contd.) Macklem: Not Quite NFS, Soft Cache Consistency for NFS
Thu 20/2 Paxos Lamport: Paxos made simple
Tue 25/2, Thu 27/2 Petal [ turnin summary1@hy590-45 your-file before class] Lee: Petal: Distributed Virtual Disks
Tue 3/3, Thu 5/3 Frangipani [ turnin summary2@hy590-45 your-file before class] Thekkath: Frangipani: A Scalable Distributed File System
Tue 10/3, Thu 26/3 GFS [ turnin summary3@hy590-45 your-file before class] Ghemawat: The Google File System
Thu 12/3 - Tue 24/3 University recess (in response to COVID-19) -
Thu 2/4, Tue 7/4 GFS Ghemawat: The Google File System
Thu 9/4 Related work presentations paper1 (Manos), paper2 (Iakovos)
Tue 14/4 Porcupine [ turnin summary4@hy590-45 your-file before class] Saito: Manageability, Availability and Performance in Porcupine: A Highly Scalable, Cluster-based Mail Service
16/4 - 21/4 Easter break -
Thu 23/4 Porcupine Saito: Manageability, Availability and Performance in Porcupine: A Highly Scalable, Cluster-based Mail Service
Tue 28/4, Thu 30/4 EuroSys'20 -
Tue 5/5, Thu 7/5 Related work presentations paper 3 (Giorgos Z.), paper 4 (Giorgos X.), paper 5 (Apostolos)
Tue 12/5 Related work presentations paper 6 (Stratos)
Thu 14/5 Spanner [ turnin summary5@hy590-45 your-file before class] Corbett: Spanner: Google's Globally-Distributed Database
Tue 19/5 Bigtable Chang: A Distributed Storage System for Structured Data
Thu 21/5 Review -

Projects HOWTO

Please note the following project guidelines:

Other Resources / Useful links