Computer Science Department, University of Crete
HY-590.45. Modern Topics in Scalable Storage Systems

info | readings | syllabus | other resources

Course Staff

Name Email Office Hours
Instructor: Kostas Magoutis hy590-45@csd By appt./Γ-111
Teaching Assistant: Dimokritos Stamatakis hy590-45@csd By appt.

General Information

The course meets on Mondays and Wednesdays from 5pm to 7pm in PA201. Backup lectures will take place on Fridays 7-9pm in PA203, see schedule for exact dates.

Announcements

07.10.2011 10:00: The first assignment is now available, due Friday 14/10.

24.09.2011 10:00: You should signup with the course mailing list by sending email to majordomo@csd with content "subscribe hy590-45-list".

07.09.2011 10:00: You are welcome to get in touch with the instructor to discuss course-related issues.

Course Description

The explosive growth of information processing services in recent years has created an unprecedented need for storage capacity. Scalable access to storage resources requires a class of distributed systems designed for fast, reliable, and uninterrupted access to storage media (e.g., magnetic disks and tapes) over high-speed networks. This course offers an introduction to scalable storage systems and examines existing design techniques as well as current research problems in the design and implementation of such systems, along with possible solutions.

Some of the advantages of the scalable storage model over direct-attached storage include expandable capacity and performance, as well as improved utilization and sharing of distributed storage resources. A number of challenges, however, are facing the scalable storage systems architect: First, it is the higher complexity (compared to direct-attached storage) due to the distributed nature of the scalable storage system. Administration, capacity planning, configuration, backup, and disaster recovery are complicated in large-scale scalable storage systems. Second, transferring data over the network requires stronger security and safety guarantees than when transferring them on the system I/O bus. In addition, it sometimes requires new, storage-specific network transport protocols. These and other challenges make scalable storage an exciting research area that has made significant advances in recent years.

The core part of the course focuses on the study of scalable storage systems with special emphasis on architectures, design principles for scalable performance, reliability, and availability, the management of data during their lifecycle, application-specific design concepts, ways to reduce implementation cost, storage system capacity planning, and storage outsourcing services.

This course is targeted for graduate students and advanced undergraduates and requires the undertaking of a research project in groups of two. The topics of the research projects will be chosen with the help and guidance of the course staff. Other requirements include a small number of homework assignments, a midterm exam, and a final exam.

Coursework

Prerequisites

Grading

The final grade depends on class participation, a midterm, a final examination, and a research project. Research projects will be chosen by students either independently or with help and guidance by the course staff.

Readings

There are a number of paper readings that are available online. You are expected to read the papers before the beginning of each class.

There is no required textbook for this class. The following textbooks, however, are recommended readings:

Syllabus

Date Notes Readings
Mon 03/10 Overview: Storage and file system architectures Lecture slide
Wed 05/10 Background I: Storage and file system architectures File systems handout (11.2.3, 11.2.4, 11.4, 11.4.1, 11.7, 11.7.1-11.7.4)
Fri 07/10 Background II: Errors and failures in storage systems Pinheiro: Failure Trends in a Large Disk Drive Population
Mon 10/10 Instructor out of town, no class -
Wed 12/10 Instructor out of town, no class -
Mon 17/10 Background III: Storage and file system architectures RAID handout
Wed 19/10 Instructor out of town, no class -
Mon 24/10 Background IV: Log-structured file systems Rosenblum: Design and Implementation of a Log Structured File System
Wed 26/10 Intro I: Parallel and distributed file systems NFS handout (9.1, 9.2)
Mon 31/10 Intro II: Parallel and distributed file systems NFS handout (9.3)
Wed 2/11 Instructor out of town, no class -
Mon 7/11 High availability Bhide: A Highly-Available Network File Server (HA-NFS)
Wed 9/11 Putting it all together Hitz: File System Design for an NFS File Server Appliance (WAFL)
Mon 14/11 Project status updates -
Wed 16/11 Instructor out of town, no class -
Mon 21/11 Distributed virtual disk model for shared storage Lee: Petal: Distributed Virtual Disks
Wed 23/11 Instructor out of town, no class -
Fri 25/11 Parallel file system architectures Thekkath: Frangipani: A scalable distributed file system
Mon 28/11 Parallel file system architectures Ghemawat: The Google File System (GFS)
Wed 30/12 Parallel file system architectures GFS pt. 2
Fri 02/12 Project status updates -
Mon 05/12 Scalable systems primitives Burrows: The Chubby lock service for loosely-coupled distributed systems (Chubby)
Wed 07/12 Scalable systems primitives Chubby pt. 2; plus an optional reading
Mon 12/12 Scalable replicated storage McCormick: Niobe: A practical replication protocol (Niobe), sections 1 - 4
Wed 14/12 Scalable replicated storage Niobe pt. 2
Fri 16/12 Project presentations schedule

Projects HOWTO

Please note the following project guidelines:

Other Resources