Parallel Programming

The goal of this programming problem is to implement a matrix multiplication program in both shared memory and message passing. You can implement this on any parallel system. The suggested platform is the research cluster at FORTH-ICS. You will have access to 8 nodes of the cluster.


The platform

You can use either one of two platforms:


Platform 1: Direct access 


Each node in the sub-cluster is a dual-processor AMD opteron system. The cluster nodes you will use are physically interconnected with a Gigabit Ethernet network. Although you may use this system as both a shared memory (over a software shared memory abstraction) and a message passing platform, in this assignment you will only use it as a message passing system over a standard library, Message Passing Interface (MPI). For the shared memory part of the assignment you will use a single node in the cluster that has two, quad-core CPUs, for a total of eight (8) CPUs. You can access the cluster by logging in via ssh to shark at port 4096 (ssh -p4096 139.91.92.100). You can only access shark only from IP addresses that belong to ICS (including the VPN of ICS) or from the IP addresses that belong to the VPN of CSD. From shark, you can then access the 4-node sub-cluster for MPI (mpifteki, mpikini, mpironi, mpistoli) and an eight-core node for SAS (mpiskoto) via ssh. User accounts for the cluster will be distributed in class.


Platform 2: Access via Kubernetes (setup courtesy of A. Chazapis and I. Malliotakis)


This platform provides access via pods running under Kubernetes. The servers are more powerful and the access mechanism aims to remove the hassle of using the platform directly. For detailed instructions, please check here.


The assignment 


(a) SAS programming


To write a shared memory parallel program for the eight-core system you can use the ANL m4 macros (Argonne National Laboratory) that allow you to create processes (threads), allocate global memory, and use synchronization primitives. The file ~cs527/sas/macros/c.linux.m4 contains this set of macros. The ~cs527/sas/macros/c.null.m4 file may be handy for running the sequential versions of the SPLASH-2 programs. 


Tasks:

  1. README_FIRST.SAS. Get, compile, and run FFT from the ~cs527/sas/applications. This version of the application is similar to the original SPLASH-2 versions (available in various repos, e.g. splash2_benchmark), with minor modifications (mainly to support data placement and 64-bit addresses).
  2. Write a shared memory program that reads two NxN matrices from a file and multiples them on a system with P processors. You don't have to worry too much about corner cases (for instance you can assume that N is a power of P). For the format of the input file use one array element per line, and elements are linearized in a row-wise fashion. Output the result on the standard output in the same format. The program should report the time it took to compute the result (not including initialization, reading files, or outputting results) to the standard output.
  3. Run your SAS program on 1,2,3,4,5,6,7,8 cores and create a speedup curve.
  4. (b) MPI Programming
  1. README_FIRST.MPI. Install MPICH locally in your account.
  2. Copy, compile, and run the int_pi2 program from ~cs527/mpi (runmpi.txt). This application computes the value of pi. Read the instrucitons in ~cs527/mpi/Readme for compiling an MPI program. Experiment with the number of approximation intervals: try 100, 1000, and 1000000. Why is the error lower with 1000 approximation intervals than with 100? Why does the error increase for large numbers?
  3. Write an MPI program that reads two NxN matrices and multiples them in the same way as task 2 above. 
  4. Run your program on 1,2,3,4 processors (one processor per node) and on 2,4,6,8 processors (two processors per node) and create two speedup curves.
  1. (c) Speedups
  1. Put the three speedup curves (one from SAS and two from MPI) on a single graph with appropriate legends, indicating application, programming abstraction, and input size.


References


Submission 

Turn in (by mail to b i l a s @ c s d . u o c . g r) a tar file that contains your solutions and a README file stating assumptions or special features.