Moduł oferowany także w ramach programów studiów:
Informacje ogólne:
Deep Learning with CUDA
Tok studiów:
Fizyki i Informatyki Stosowanej
Poziom studiów:
Studia I stopnia
Informatyka Stosowana
Profil kształcenia:
Ogólnoakademicki (A)
Język wykładowy:
Osoba odpowiedzialna:
dr hab. inż, prof. AGH Szumlak Tomasz (
Osoby prowadzące:
dr hab. inż, prof. AGH Szumlak Tomasz (
Krótka charakterystyka modułu

This lecture aims at providing an advance knowledge regarding parallel programming using NVIDIA CUDA technology with a particular emphasis on deep learning techniques.

Opis efektów kształcenia dla modułu zajęć
Kod EKM Student, który zaliczył moduł zajęć wie/umie/potrafi Powiązania z EKK Sposób weryfikacji efektów kształcenia (forma zaliczeń)
M_W001 A student gains knowledge on advanced topics related to the massively parallel programming using GPU. IS1A_W03, IS1A_W02 Aktywność na zajęciach,
Udział w dyskusji
M_U001 A student can write complete programs, using extended C/C++ language. Can use a professional framework provided by NVIDIA. Can proficiently use IDE environment, nvidia compiler and debugger. IS1A_U05, IS1A_U01, IS1A_U04 Kolokwium,
M_U002 A student can work as a part of a team and can interact properly with his/her team-mates. IS1A_U06, IS1A_U05 Projekt
Kompetencje społeczne
M_K001 A student can present his/her results and discuss them IS1A_K01 Aktywność na zajęciach
Matryca efektów kształcenia w odniesieniu do form zajęć
Kod EKM Student, który zaliczył moduł zajęć wie/umie/potrafi Forma zajęć
Ćwicz. aud
Ćwicz. lab
Ćw. proj.
Zaj. sem.
Zaj. prakt
Zaj. terenowe
Zaj. warsztatowe
M_W001 A student gains knowledge on advanced topics related to the massively parallel programming using GPU. + - + + - + - - - - -
M_U001 A student can write complete programs, using extended C/C++ language. Can use a professional framework provided by NVIDIA. Can proficiently use IDE environment, nvidia compiler and debugger. - - - + - - - - - - -
M_U002 A student can work as a part of a team and can interact properly with his/her team-mates. - - - + - - - - - - -
Kompetencje społeczne
M_K001 A student can present his/her results and discuss them - - - + - - - - - - -
Treść modułu zajęć (program wykładów i pozostałych zajęć)
  1. Introduction to massively parallel computing with GPGPUs (4 h)

    A generic introduction to multicore/many-core hardware and programming techniques. Students will be given a short overview of the current trends in both hardware available on the market (CPUs and GPUs) as well as software solutions provided for developers. We also introduce NVIDIA CUDA SDK that provides a very convenient IDE environment.

  2. Programming, execution and memory models in CUDA (2 h)

    In general, one can say, that CUDA is a parallel computing platform and a programming model. Also, it provides an extension to the standard C language which allows, in turn, to write and run massively parallel algorithms using CPU/GPU. Thanks to CUDA it is very easy to code your programs – it is almost like using C language. With CUDA extensions you can provide software for a diverse set of systems like tablets/phones, desktops, workstations and HPC clusters and last but not least embedded devices. Programming model can be viewed as an abstraction of computer architectures – it is a bridge between your application and its concrete implementation on available hardware. By learning the programming model we will be able to execute parallel code (so-called kernels) on GPU, organise grids of threads and manage your devices.
    On the other hand, in order to understand how to optimise your software you need to gain knowledge on how respective instructions are actually executed on the available devices. This is a domain of the execution model, which exposes a high level (abstract) view of the GPU massively parallel architecture and threat concurrency.
    Optimisation of the kernel performance is not only related to the threads management. It also has to do with a device memory. During these lectures, you will learn how to manage the memory. We use so-called global memory to create better applications by exploring its access patterns and maximising its throughput. We introduce the notion of unified memory, which is a great help for a developer. Also, the shared memory will be introduced and discussed in detail. We explain how to create and use cache to make your code even better.

  3. Deep learning – what is it? (2 h)

    A basic building block of the neural network is a perceptron (artificial neuron). The idea has not changed a lot since the 1960s. This lecture gives a general overview of the deep learning. You are going to get familiar with terms such as decision rule, loss function, the gradient of loss and update rule. The lecture is accompanied by two lab classes exclusively devoted to neural networks.

  4. Convolutional networks – ConvNets (2 h)

    The convolutional network deep learning technique can without a doubt be indicated as a revolution in image analysis and processing. The ConvNets were directly inspired by the physiology of visual cortex, especially by following the patterns of cells interconnections. In order to facilitate the training process, it is assumed that the input data make a picture. Then we build the network using convolution, ReLu and pooling layers. If you want to know what this is all about… come to the lecture!

  5. Optimisation (2 h)

    There are no wonders! Training techniques must be very well understood and even then from time to time we are going to face a difficult problem. There is a list of “tricks” that can be applied to make the training process more robust and faster. We discuss selected ideas such as weight initialisation, data multiplication, input normalisation (with batch and stochastic approach) and learning rate equalisation.

Zajęcia seminaryjne:
Selected topics regarding GPGPU computing

Each student will be requested and required to pick a topic (or the topic will be assigned) regarding GPGPU programming pertaining to the current trends and developments in the field. Each such presentation will be accompanied by slides and must be given in the front of other classmates.

Ćwiczenia laboratoryjne:
  1. Introduction to NVIDIA CUDA SDK (4 h)

    Using our dedicated server we introduce NVIDIA CUDA SDK – from starting it through writing code, compiling it and executing. We also use profiling tools to analyse the execution time and performance. We are going to look at very neat code samples provided along with the SDK and learn how to use them. There is a plenty to look at and learn. Especially if you are not quite familiar with massively-parallel coding – this is the time to catch up. The whole lab will be very aggressively paced – you need to follow!

  2. Neural Nets (4 h)

    Deep Learning is based on neural networks – these practical classes are dedicated to studying what the NN is and how to implement it by hand. This is going to be very rewarding and enlightening experience!

  3. Code samples – a good way forward in self-teaching (4 h)

    Both NVIDIA and AMD provied a large number of code samples. These are provided by proffessionals and are an excellent source for improving your programming skills. We are going to have a deep look into it.

  4. Handwrited digit recognition using MNIST database (4 h)

    This lab is designed in such a way that you are going to become an expert on backpropagation training technique. You will also learn about multinomial logistic regression (with N classes) and what is the softmax expression.

  5. More backpropagation and beyond (4 h)

    In course of this lecture, you are going to learn that backpropagation is the central and most crucial technique to obtain weights of respective artificial neurons. This part will pay special attention to batch normalisation, non-linear activation functions and optimisation of the training – fighting the overfitting in particular.

Ćwiczenia projektowe:
To be defined!

Concrete topics very much depend on students! If you are really interested in GPGPUs you could get a more ambitious project. The group will be split into groups (2 up to 4 students). Each group will be asked to select a team leader and work together to finish the project. The topics will be hand out after we finish lectures – you are going to have plenty of time to comfortably complete it.

Nakład pracy studenta (bilans punktów ECTS)
Forma aktywności studenta Obciążenie studenta
Sumaryczne obciążenie pracą studenta 155 godz
Punkty ECTS za moduł 6 ECTS
Udział w ćwiczeniach laboratoryjnych 20 godz
Udział w ćwiczeniach projektowych 5 godz
Wykonanie projektu 35 godz
Przygotowanie do zajęć 20 godz
Samodzielne studiowanie tematyki zajęć 30 godz
Egzamin lub kolokwium zaliczeniowe 2 godz
Dodatkowe godziny kontaktowe z nauczycielem 8 godz
Udział w wykładach 12 godz
Udział w zajęciach seminaryjnych 8 godz
Przygotowanie sprawozdania, pracy pisemnej, prezentacji, itp. 15 godz
Pozostałe informacje
Sposób obliczania oceny końcowej:

The final grade will depend on your performance during the computer laboratories and the final project and of course the exam: final_grade = 0.35 * lab_grade + 0.35 * project_grade + 0.3 * final_exam_grade. NOTE! You need to have both labs and project passed to get an overall passing grade!!!

Wymagania wstępne i dodatkowe:

The course is not for the entrants – instead we assume that you at least are familiar with basics of parallel programming. However, I intend to give a brisk crash-course at the very beginning so you just need to be willing to follow my lead here.

Zalecana literatura i pomoce naukowe:

Depertamental Library (Physics and Applied Computer Science) is quite well stocked with books pertaining to parallel programming. We have the following, that can be used to study the problem:
1. "CUDA for Engineers: An Introduction to High-Performance Parallel Computing"
Duane Storti, Mete Yurtoglu
Addison Wesley, ISBN-13: 978-0134177410

2. "CUDA Programming: A Developer's Guide to Parallel Computing with GPUs (Applications of GPU Computing Series)"
Shane Cook
Morgan Kaufmann, ISBN-13: 978-0124159334

3. "Programming Massively Parallel Processors: A Hands-on Approach"
David B. Kirk, Wen-mei W. Hwu
Morgan Kaufmann; 2 edition (14 Dec. 2012), ISBN-13: 978-0124159921

Publikacje naukowe osób prowadzących zajęcia związane z tematyką modułu:

LHCb collaboration, Performance of the LHCb Vertex Locator, JINST 9 (2014) P09007,
LHCb collaboration, Evidence for the decay B0s→K∗0μ+μ−, LHCb-PAPER-2018-004

Informacje dodatkowe:

The labs will be quite agressively paced – please attend them! This classes are compulsory and you are allowed to skip only one lab during the whole semester. Also, there will be very hard to catch up if you miss a class.