Computer Science Department, University of Crete
HY-590.45. Modern Topics in Scalable Storage Systems

info | readings | syllabus | other resources

Course Staff

Name Email Office Hours
Instructor: Kostas Magoutis hy590-45@csd By appt./Γ-111
Teaching Assistant: Dimokritos Stamatakis hy590-45@csd By appt.

General Information

The course meets on Mondays and Wednesdays from 5pm to 7pm in PA201. Backup lectures will take place on Fridays 5-7pm in PA201 see schedule for exact dates.

Announcements

10.10.2012 10:00: Starting today we will be meeting in ITE, see the syllabus for location details.

20.09.2012 10:00: The first class meeting will be on Monday 8/10.

07.09.2012 10:00: You are welcome to get in touch with the instructor to discuss course-related issues.

Course Description

The explosive growth of information processing services in recent years has created an unprecedented need for storage capacity. Scalable access to storage resources requires a class of distributed systems designed for fast, reliable, and uninterrupted access to storage media (e.g., magnetic disks and tapes) over high-speed networks. This course offers an introduction to scalable storage systems and examines existing design techniques as well as current research problems in the design and implementation of such systems, along with possible solutions.

Some of the advantages of the scalable storage model over direct-attached storage include expandable capacity and performance, as well as improved utilization and sharing of distributed storage resources. A number of challenges, however, are facing the scalable storage systems architect: First, it is the higher complexity (compared to direct-attached storage) due to the distributed nature of the scalable storage system. Administration, capacity planning, configuration, backup, and disaster recovery are complicated in large-scale scalable storage systems. Second, transferring data over the network requires stronger security and safety guarantees than when transferring them on the system I/O bus. In addition, it sometimes requires new, storage-specific network transport protocols. These and other challenges make scalable storage an exciting research area that has made significant advances in recent years.

The core part of the course focuses on the study of scalable storage systems with special emphasis on architectures, design principles for scalable performance, reliability, and availability, the management of data during their lifecycle, application-specific design concepts, ways to reduce implementation cost, storage system capacity planning, and storage outsourcing services.

This course is targeted for graduate students and advanced undergraduates and requires the undertaking of a research project in groups of two. The topics of the research projects will be chosen with the help and guidance of the course staff. Other requirements include a small number of homework assignments, a midterm exam, and a final exam.

Coursework

Prerequisites

Grading

The final grade depends on class participation, a midterm, a final examination, and a research project. Research projects will be chosen by students either independently or with help and guidance by the course staff.

Readings

There are a number of paper readings that are available online. You are expected to read the papers before the beginning of each class.

There is no required textbook for this class. The following textbooks, however, are recommended readings:

Syllabus

Date Notes Readings
Mon 08/10 Background I: Storage and file system concepts File systems handout (11.2.3, 11.2.4, 11.4, 11.4.1, 11.7, 11.7.1-11.7.4)
Wed 10/10 Background II: Storage and file system concepts (AMΣ 5-7pm) RAID handout
Fri 12/10 Background III: Log-structured file systems (Step-C 5-7pm) Rosenblum: Design and Implementation of a Log Structured File System
Fri 19/10 Intro I: Distributed file system concepts (AMΣ 5-7pm) -
Mon 22/10 Intro II: High availability (AMΣ 5-7pm) -
Wed 24/10 Distributed systems primitives, intro (AMΣ 5-7pm) Paxos
Mon 29/10 Distributed replicated storage (AMΣ 5-7pm) DeCandia: Dynamo: Amazon's Highly-Available Key-value Store (Dynamo)
Wed 31/10 Distributed replicated storage (AMΣ 5-7pm) Dynamo pt. 2
Fri 2/11 Distributed virtual disk model for shared storage (AMΣ 5-7pm) Petal
Wed 07/11 Distributed virtual disk model for shared storage (AMΣ 5-7pm) Petal pt. 2
Mon 12/11 Distributed file systems (AMΣ 5-7pm) Frangipani
Mon 26/11 File system replication (AMΣ 5-7pm) Niobe
Wed 28/11 Distributed file systems (AMΣ 5-7pm) Google File System
Mon 10/12 Distributed NoSQL data stores (AMΣ 5-7pm) BigTable
Wed 12/12 Distributed systems infrastructure Chubby
Mon 17/12 Project design session -
Wed 19/12 Project design session -
TBD Project presentations schedule

Projects HOWTO

Please note the following project guidelines:

Other Resources