Class meets Monday, Wednesday, 13.00–15.00
in room Beta 211 (white buildings, Knossos campus). Friday classes
are scheduled on a need basis, in the event of instructor's absence during regular class times. |
| | ||||
| The course uses reading material based on papers and
lectures on parallel programming patterns, models, and languages for
multi-core computer architectures. This material will be posted on the
course web page as needed. Though no textbook is formally adopted
for the course, students interested on a focused training on parallel
programming for emerging hardware may use the following textbooks:
Lin and Snyder, Principles of Parallel Programming, Addison-Wesley, 2009, ISBN-10:0321487907, ISBN-13:9780321487902. Mattson, Sanders and Massingill, Patterns for Parallel Programming, 2005, Addison-Weseley, ISBN-10: 0321228111, ISBN-13: 9780321228116. Kirk and Hwu, Programming Massively Parallel Processors, 2010, Morgan Kaufmann, ISBN-13: 978-0-12-381472-2 |
| Class | Topic, lecture notes | Reading list | Assignments/Milestones | 14.02.11 | Introduction: Multi-core processor technology, concurrent computational patterns (slides: 1up, 2up, 4up) | A View of the Parallel Computing Landscape,Communications of the ACM, 52(10):56–67, October 2009. | Form project groups |
| 16.02.11 | Roofline model (slides: 1up, 2up, 4up) | Roofline: An Insightful Visual Performance Model for Multicore Architectures |
Fill out and submit account forms Read paper for discussion on Monday 21.02: Stencil Computation Optimization and Autotuning on State-of-the-Art Multicore Architectures |
| 21.02.11 | Paper discussion | Stencil Computation Optimization and Autotuning on State-of-the-Art Multicore Architectures | First Class Project |
| 23.02.11 | POSIX threads (slides: 1up, 2up, 4up) | Pthreads Programming: A POSIX Standard for Better Multiprocessing, Bradford Nichols, Dick Buttlar, Jacqueline Proulx Farrell, O'Reilly & Associates, | Familiarize with project programming models (OpenMP tutorial, OpenMP specs, Cilk, SMPSs overview,
SMPSs programming model). Read specs and simple examples Read paper for discussion on Monday 28.02: Threads Cannot be Implemented as a Library |
| 28.02.11 | Paper discussion | Threads Cannot be Implemented as a Library | |
| 02.03.11 | Cilk (slides: 1up, 2up, 4up) | Cilk 5.4.6 reference manual, | First project should be running on Cilk by end of the week. Paper assignment for next week: Lazy Binary-Splitting: A Run-Time Adaptive Work-Stealing Scheduler |
| 11.03.11 | Cilk | Lecture notes of 02.03.11 | First project should be running on Cilk and OpenMP by now |
| 14.03.11 | Paper discussion | Lazy Binary-Splitting: A Run-Time Adaptive Work-Stealing Scheduler | |
| 16.03.11 | OpenMP (slides: 1up, 2up, 4up) | OpenMP ARB Specifications, | First project should be running on all three programming models. Paper assignment for next week: The Design of OpenMP Tasks |
| 18.03.11 | OpenMP | Lecture notes of 16.03 | |
| 23.03.11 | Race Detection (slides) | LockSmith: A tool for finding races in C programs | Second class project |
| 28.03.11 | Paper discussion | The Design of OpenMP Tasks | Study MapReduce ReverseIndex benchmark. Run benchmark on an SMP |
| 30.03.11 | Task-Based Dataflow Programming Models (slides) | SMP SuperScalar | Unoptimized ReverseIndex should be running in OpenMP, Cilk, SMPSs |
| 11.04.11 | Programming Multiprocessors with Explicitly Managed Memory Hierarchies – Cell (slides: 1up, 2up, 4up) | Cell Broadband Engine Processor: Design and Implementation, | Paper assignment for next week: Thread criticality predictors for dynamic performance, power, and resource management in chip multiprocessors |
| 13.04.11 | Paper discussion | Thread criticality predictors for dynamic performance, power, and resource management in chip multiprocessors | |
| 04.05.11 | Parallelizing with compiler assistance | (slides) | |
| 06.05.11 | Third class project | ||
| 11.05.11 | General-Purpose Graphics Processing Units, CUDA | (slides: 1up, 2up, 4up) | Design of parallel web server should be ready |
| 13.05.11 | General-Purpose Graphics Processing Units, CUDA | slides of previous lecture | Paper assignment for 20.05: An asymmetric distributed shared memory model for heterogeneous parallel systems |
| 16.05.11 | Optimizing CUDA code | (slides: 1up, 2up, 4up) | |
| 20.05.11 | Paper discussion | An asymmetric distributed shared memory model for heterogeneous parallel systems |
| © copyright Dimitrios S. Nikolopoulos. Last modification: , by dsn. | |