## First Programming Assignment

### Due: March 13, 2017

In this homework you will develop three parallel implementations of the n-Queens problem, measure their performance and explain it.

The n-Queens problem is defined as follows:
Find the total number of distinct solutions to place a given number of chess queens on a quadratic chess board with a specific number of squares on one side of a board (n) in a way that no two queens would be able to attack each other. Two queens can attack each other when they are placed on the same horizontal row, vertical column or in one of (2 × (2 × n - 1)) possible diagonals.

For example, on an 8×8 board, there are 92 distinct solutions. The difference between a unique and distinct solution is that distinct solutions allow for symmetrical operations, like rotations and reflections of the board, to be counted as one; so an 8×8 board has 12 unique solutions. The n‐Queens problem has exponential complexity.

The world record at the moment is held for the solution on a 26×26 board implemented on a custom supercomputer using FPGAs at TU Dresden. More information is available here.

You are asked to parallelize an efficient serial implementation of the algorithm, which finds distinct solutions using bit vectors. The code will be distributed to you via the class mailing list. Your team should parallelize the code using three programming models: OpenMP, Cilk, and TBB.

• OpenMP is a de facto standard for parallelizing code on shared‐memory multiprocessors. OpenMP uses compiler directives (pragmas) to parallelize code. It assumes a global address space and a relaxed memory model. For parallelization, OpenMP uses worksharing constructs --which specify distributions of loops between processors-- and/or asynchronous tasks, which are specified explicitly by programmers. In addition, programmers have control over data privatization by providing data annotations in directives.
• Cilk is a multithreaded programming language, which is a faithful extension of the C language. Cilk programs implement a strict multithreading model and a scheduler following the work-first principle, which we will discuss further in class. Cilk is ideal for parallelizing recursive algorithms that use the divide‐and-conquer strategy.
• TBB is a library of parallel patterns and algorithms in C++, developed by Intel. It includes a large library of patterns, parallel data structures and algorithms that the programmer can use to describe parallelism at a high level of abstraction.

You are free to devise the parallelization strategy yourselves using the aforementioned programming models and you are free to pick the compilation flags of your code. You are not allowed to modify the algorithm but you are free to use any parallel control structures that you see fit (including loops, tasks, vectors, etc.) You will measure your algorithm on chessboards of size 10×10, 12×12 and 15×15. You should try your best to achieve high performance and good scalability of your parallel algorithm! You are welcome to discuss your ideas with the instructor.

You will carry out the assignment on an 8-core x86 SMP (shared-memory multiprocessor) system at FORTH-ICS. Instructions on how to access the machine will be posted soon. Carry out the project in groups of two students. You need to write a report of your project, where you describe:

1. the parallelization strategy and the parallel control constructs that you used (loops, tasks, threads, etc.) with an explanation of your choices;
2. any optimizations that you performed to improve parallelism, locality, communication, or synchronization;
3. a detailed analysis of your experimental results.

While there is no prescribed length of these reports, they should not exceed 8 pages of text, including figures and references.