Moduł oferowany także w ramach programów studiów:
Informacje ogólne:
Nazwa:
Introduction to CUDA and OpenCL
Tok studiów:
2017/2018
Kod:
JIS-1-037-s
Wydział:
Fizyki i Informatyki Stosowanej
Poziom studiów:
Studia I stopnia
Specjalność:
-
Kierunek:
Informatyka Stosowana
Semestr:
0
Profil kształcenia:
Ogólnoakademicki (A)
Język wykładowy:
Angielski
Osoba odpowiedzialna:
dr hab. inż, prof. AGH Szumlak Tomasz (szumlak@agh.edu.pl)
Osoby prowadzące:
dr hab. inż, prof. AGH Szumlak Tomasz (szumlak@agh.edu.pl)
Krótka charakterystyka modułu

This lecutre aims at providing a basic knowledge and skills on parallel programming using GPU. We introduce two frameworks NVIDIA CUDA SDK and AMD APP-ICD SDK.

Opis efektów kształcenia dla modułu zajęć
Kod EKM Student, który zaliczył moduł zajęć wie/umie/potrafi Powiązania z EKK Sposób weryfikacji efektów kształcenia (forma zaliczeń)
Wiedza
M_W001 A student gains knowledge on advanced topics related to the massively parallel programming using GPU. IS1A_W08, IS1A_W05, IS1A_W03, IS1A_W04 Udział w dyskusji,
Sprawozdanie,
Aktywność na zajęciach
Umiejętności
M_U001 A student can write complete programs, using extended C/C++ language. Can use a professional framework provided by NVIDIA. Can proficiently use IDE environment, nvidia compiler and debugger. IS1A_U07, IS1A_U01, IS1A_U02, IS1A_U11 Projekt,
Kolokwium
M_U002 A student can work as a part of a team and can interact properly with his/her team-mates. IS1A_U10, IS1A_U11 Projekt
Kompetencje społeczne
M_K001 A student can present his/her results and discuss them IS1A_K01, IS1A_K04 Aktywność na zajęciach
Matryca efektów kształcenia w odniesieniu do form zajęć
Kod EKM Student, który zaliczył moduł zajęć wie/umie/potrafi Forma zajęć
Wykład
Ćwicz. aud
Ćwicz. lab
Ćw. proj.
Konw.
Zaj. sem.
Zaj. prakt
Zaj. terenowe
Zaj. warsztatowe
Inne
E-learning
Wiedza
M_W001 A student gains knowledge on advanced topics related to the massively parallel programming using GPU. + - + + - - - - - - -
Umiejętności
M_U001 A student can write complete programs, using extended C/C++ language. Can use a professional framework provided by NVIDIA. Can proficiently use IDE environment, nvidia compiler and debugger. - - - + - - - - - - -
M_U002 A student can work as a part of a team and can interact properly with his/her team-mates. - - - + - - - - - - -
Kompetencje społeczne
M_K001 A student can present his/her results and discuss them - - - + - - - - - - -
Treść modułu zajęć (program wykładów i pozostałych zajęć)
Wykład:
  1. Introduction (2 h)

    A generic introduction to multicore/many-core hardware and programming techniques. Students will be given a short overview of the current trends in both hardware available on the market (CPUs and GPUs) as well as software solutions provided for developers. Basic ideas of heterogeneous parallel computing.

  2. Introduction to CUDA and OpenCL – how to approach (3 h)

    During this lecture we introduce, at a high level for the time being, how to use the tools for developers and be able to write and run programs using CUDA and OpenCL. For this lecture we focus on NVIDIA CUDA SDK and AMD APP SDK frameworks. The first one, as the name suggests is dedicated for NVIDIA GPUs and has been introduced and being maintained by NVIDIA company. For the OpenCL world situation is a bit more complex and there is more than one way of using OpenCL (Khronos does not provide any ready-to-use software just specs). I decided to introduce the AMD IDE but beside that one there is a nice programming environment provided by Intel: Intel® SDK for OpenCL™. You can also choose this one if you prefer.

  3. Programming and execution models – CUDA (4 h)

    In general one can say, that CUDA is a parallel computing platform and a programming model. Also, it provides an extension to the standard C language which allows, in turn, to write and run massively parallel algorithms using CPU/GPU. Thanks to CUDA it is very easy to code your programs – it is almost like using C language. With CUDA extensions you can provide software for a diverse set of systems like tablets/phones, desktops, workstations and HPC clusters and last but not least embedded devices. Programming model can be viewed as an abstraction of computer architectures – it is a bridge between your application and its concrete implementation on available hardware. By learning the programming model we will be able to execute parallel code (so called kernels) on GPU, organise grids of threads and manage your devices.
    On other hand, in order to understand how to optimise your software you need to gain knowledge on how respective instructions are actually executed on the available devices. This is a domain of the execution model, which exposes an high level (abstract) view of the GPU massively parallel architecture and threat concurrency.

  4. GPU Memory – its hierarchy and management with CUDA (4 h)

    Optimisation of the kernel performance is not only related with the threads management. It also has to do with a device memory. During these lectures you will learn how to manage the memory. We use so called global memory to create better applications by exploring its access patterns and maximising its throughput. We introduce the notion of unified memory, which is a great help for a developer. Also, the shared memory will be introduced and discussed in detail. We explain how to create and use cache to make your code even better.

  5. GPU accelerated libraries (2 h)

    At the end we discuss selected parallel libraries that can be used to speed up your applications. They can easily be employed as ‘atoms’ to build more and more complex programs. By analysing their behaviour we can learn new levels of parallelism. Here, we mainly focus on very generic features and look at linear algebra and random number generation. These libraries should be treated as state-of-the-art products that are highly optimised by large number of experts working with massively parallel devices.

Ćwiczenia laboratoryjne:
  1. Introduction to NVIDIA CUDA SDK (4 h)

    Using our dedicated server we introduce NVIDIA CUDA SDK – from starting it through writing code, compiling it and executing. We also use profiling tools to analyse the execution time and performance.

  2. Introduction to AMD SDK – OpenCL (4 h)

    Similarly, we will have a look at IDE for developing parallel software using OpenCL programming model.

  3. Code samples – a good way forward in self-teaching (4 h)

    Both NVIDIA and AMD provied a large number of code samples. These are provided by proffessionals and are an excellent source for improving your programming skills. We are going to have a deep look into it.

  4. Creating grids and optimising their performance (4 h)

    The first step into creating efficient and robust programs is to understand algorithms. By exposing the type of parallelism one can create a thread layout that is the best for a given problem. We try to understand this and check how different thread grids can impact performance of your programs.

  5. Memory management – how to gain in execution time (4 h)

    Threads and their layouts are not the whole story! Memory is also important. Here, we try to understand the CUDA memory model and manage the memory of a device.

  6. Using shared memory (4 h)

    Communication between threads can be realised using the shared memory – which is one of the most important components of the GPU. We learn how it is related to streaming multiprocessors and how it could be exploited in your programs.

Ćwiczenia projektowe:
  1. To be defined!

    Concrete topics very much depend on students! If you are interested in GPU you could get a more ambitious project. The group will be split into groups (2 up to 4 students). Each group will be asked to select a team leader and work together to finish the project. The topics will be hand out after we finish lectures – you are going to have plenty of time to comfortably complete it.

  2. To be defined!

    Concrete topics very much depend on students! If you are interested in GPU you could get a more ambitious project. The group will be split into groups (2 up to 4 students). Each group will be asked to select a team leader and work together to finish the project. The topics will be hand out after we finish lectures – you are going to have plenty of time to comfortably complete it.

Nakład pracy studenta (bilans punktów ECTS)
Forma aktywności studenta Obciążenie studenta
Sumaryczne obciążenie pracą studenta 129 godz
Punkty ECTS za moduł 5 ECTS
Udział w ćwiczeniach laboratoryjnych 22 godz
Udział w ćwiczeniach projektowych 8 godz
Wykonanie projektu 35 godz
Przygotowanie do zajęć 20 godz
Samodzielne studiowanie tematyki zajęć 25 godz
Egzamin lub kolokwium zaliczeniowe 2 godz
Dodatkowe godziny kontaktowe z nauczycielem 2 godz
Udział w wykładach 15 godz
Pozostałe informacje
Sposób obliczania oceny końcowej:

The final grade will depend on your performance during the computer laboratories and the final project:
final_grade=0.5 * lab_grade + 0.5 * project_grade. NOTE! You need to have both labs and project passed to get an overall passing grade!!!

Wymagania wstępne i dodatkowe:

Since this is the intro lecture there are minimal requirements regarding basic skills with using linux OS and programming in C language.

Zalecana literatura i pomoce naukowe:

Depertamental Library (Physics and Applied Computer Science) is quite well stocked with books pertaining to parallel programming. We have the following, that can be used to study the problem:
1. “CUDA for Engineers: An Introduction to High-Performance Parallel Computing”
Duane Storti, Mete Yurtoglu
Addison Wesley, ISBN-13: 978-0134177410

2. “CUDA Programming: A Developer’s Guide to Parallel Computing with GPUs (Applications of GPU Computing Series)”
Shane Cook
Morgan Kaufmann, ISBN-13: 978-0124159334

3. “Programming Massively Parallel Processors: A Hands-on Approach”
David B. Kirk, Wen-mei W. Hwu
Morgan Kaufmann; 2 edition (14 Dec. 2012), ISBN-13: 978-0124159921

Publikacje naukowe osób prowadzących zajęcia związane z tematyką modułu:

1. The LHCb Collaboration, “Measurement of the track reconstruction efficiency at LHCb”, JINST 10 (2015) P02007
2. The LHCb VELO Group, “Performance of the LHCb Vertex Locator”, JINST 9 (2014) P09007
3. The LHCb VELO Group, “Radiation damage in the LHCb Vertex Locator”, JINST 8 (2013) P08002

Informacje dodatkowe:

The labs will be quite agressively paced – please attend them! This classes are compulsory and you are allowed to skip only one lab during the whole semester. Also, there will be very hard to catch up if you miss a class.