Module also offered within study programmes:
General information:
Name:
Deep Learning with CUDA
Course of study:
2018/2019
Code:
JIS-1-027-s
Faculty of:
Physics and Applied Computer Science
Study level:
First-cycle studies
Specialty:
-
Field of study:
Applied Computer Science
Semester:
0
Profile of education:
Academic (A)
Lecture language:
English
Responsible teacher:
dr hab. inż, prof. AGH Szumlak Tomasz (szumlak@agh.edu.pl)
Academic teachers:
dr hab. inż, prof. AGH Szumlak Tomasz (szumlak@agh.edu.pl)
Module summary

This lecture aims at providing an advance knowledge regarding parallel programming using NVIDIA CUDA technology with a particular emphasis on deep learning techniques.

Description of learning outcomes for module
MLO code Student after module completion has the knowledge/ knows how to/is able to Connections with FLO Method of learning outcomes verification (form of completion)
Social competence
M_K001 A student can present his/her results and discuss them IS1A_K01 Activity during classes
Skills
M_U001 A student can write complete programs, using extended C/C++ language. Can use a professional framework provided by NVIDIA. Can proficiently use IDE environment, nvidia compiler and debugger. IS1A_U05, IS1A_U01, IS1A_U04 Test,
Project
M_U002 A student can work as a part of a team and can interact properly with his/her team-mates. IS1A_U06, IS1A_U05 Project
Knowledge
M_W001 A student gains knowledge on advanced topics related to the massively parallel programming using GPU. IS1A_W03, IS1A_W02 Activity during classes,
Report,
Participation in a discussion
FLO matrix in relation to forms of classes
MLO code Student after module completion has the knowledge/ knows how to/is able to Form of classes
Lecture
Audit. classes
Lab. classes
Project classes
Conv. seminar
Seminar classes
Pract. classes
Zaj. terenowe
Zaj. warsztatowe
Others
E-learning
Social competence
M_K001 A student can present his/her results and discuss them - - - + - - - - - - -
Skills
M_U001 A student can write complete programs, using extended C/C++ language. Can use a professional framework provided by NVIDIA. Can proficiently use IDE environment, nvidia compiler and debugger. - - - + - - - - - - -
M_U002 A student can work as a part of a team and can interact properly with his/her team-mates. - - - + - - - - - - -
Knowledge
M_W001 A student gains knowledge on advanced topics related to the massively parallel programming using GPU. + - + + - + - - - - -
Module content
Lectures:
  1. Introduction to massively parallel computing with GPGPUs (4 h)

    A generic introduction to multicore/many-core hardware and programming techniques. Students will be given a short overview of the current trends in both hardware available on the market (CPUs and GPUs) as well as software solutions provided for developers. We also introduce NVIDIA CUDA SDK that provides a very convenient IDE environment.

  2. Programming, execution and memory models in CUDA (2 h)

    In general, one can say, that CUDA is a parallel computing platform and a programming model. Also, it provides an extension to the standard C language which allows, in turn, to write and run massively parallel algorithms using CPU/GPU. Thanks to CUDA it is very easy to code your programs – it is almost like using C language. With CUDA extensions you can provide software for a diverse set of systems like tablets/phones, desktops, workstations and HPC clusters and last but not least embedded devices. Programming model can be viewed as an abstraction of computer architectures – it is a bridge between your application and its concrete implementation on available hardware. By learning the programming model we will be able to execute parallel code (so-called kernels) on GPU, organise grids of threads and manage your devices.
    On the other hand, in order to understand how to optimise your software you need to gain knowledge on how respective instructions are actually executed on the available devices. This is a domain of the execution model, which exposes a high level (abstract) view of the GPU massively parallel architecture and threat concurrency.
    Optimisation of the kernel performance is not only related to the threads management. It also has to do with a device memory. During these lectures, you will learn how to manage the memory. We use so-called global memory to create better applications by exploring its access patterns and maximising its throughput. We introduce the notion of unified memory, which is a great help for a developer. Also, the shared memory will be introduced and discussed in detail. We explain how to create and use cache to make your code even better.

  3. Deep learning – what is it? (2 h)

    A basic building block of the neural network is a perceptron (artificial neuron). The idea has not changed a lot since the 1960s. This lecture gives a general overview of the deep learning. You are going to get familiar with terms such as decision rule, loss function, the gradient of loss and update rule. The lecture is accompanied by two lab classes exclusively devoted to neural networks.

  4. Convolutional networks – ConvNets (2 h)

    The convolutional network deep learning technique can without a doubt be indicated as a revolution in image analysis and processing. The ConvNets were directly inspired by the physiology of visual cortex, especially by following the patterns of cells interconnections. In order to facilitate the training process, it is assumed that the input data make a picture. Then we build the network using convolution, ReLu and pooling layers. If you want to know what this is all about… come to the lecture!

  5. Optimisation (2 h)

    There are no wonders! Training techniques must be very well understood and even then from time to time we are going to face a difficult problem. There is a list of “tricks” that can be applied to make the training process more robust and faster. We discuss selected ideas such as weight initialisation, data multiplication, input normalisation (with batch and stochastic approach) and learning rate equalisation.

Seminar classes:
Selected topics regarding GPGPU computing

Each student will be requested and required to pick a topic (or the topic will be assigned) regarding GPGPU programming pertaining to the current trends and developments in the field. Each such presentation will be accompanied by slides and must be given in the front of other classmates.

Laboratory classes:
  1. Introduction to NVIDIA CUDA SDK (4 h)

    Using our dedicated server we introduce NVIDIA CUDA SDK – from starting it through writing code, compiling it and executing. We also use profiling tools to analyse the execution time and performance. We are going to look at very neat code samples provided along with the SDK and learn how to use them. There is a plenty to look at and learn. Especially if you are not quite familiar with massively-parallel coding – this is the time to catch up. The whole lab will be very aggressively paced – you need to follow!

  2. Neural Nets (4 h)

    Deep Learning is based on neural networks – these practical classes are dedicated to studying what the NN is and how to implement it by hand. This is going to be very rewarding and enlightening experience!

  3. Code samples – a good way forward in self-teaching (4 h)

    Both NVIDIA and AMD provied a large number of code samples. These are provided by proffessionals and are an excellent source for improving your programming skills. We are going to have a deep look into it.

  4. Handwrited digit recognition using MNIST database (4 h)

    This lab is designed in such a way that you are going to become an expert on backpropagation training technique. You will also learn about multinomial logistic regression (with N classes) and what is the softmax expression.

  5. More backpropagation and beyond (4 h)

    In course of this lecture, you are going to learn that backpropagation is the central and most crucial technique to obtain weights of respective artificial neurons. This part will pay special attention to batch normalisation, non-linear activation functions and optimisation of the training – fighting the overfitting in particular.

Project classes:
To be defined!

Concrete topics very much depend on students! If you are really interested in GPGPUs you could get a more ambitious project. The group will be split into groups (2 up to 4 students). Each group will be asked to select a team leader and work together to finish the project. The topics will be hand out after we finish lectures – you are going to have plenty of time to comfortably complete it.

Student workload (ECTS credits balance)
Student activity form Student workload
Summary student workload 155 h
Module ECTS credits 6 ECTS
Participation in laboratory classes 20 h
Participation in project classes 5 h
Completion of a project 35 h
Preparation for classes 20 h
Realization of independently performed tasks 30 h
Examination or Final test 2 h
Contact hours 8 h
Participation in lectures 12 h
Participation in seminar classes 8 h
Preparation of a report, presentation, written work, etc. 15 h
Additional information
Method of calculating the final grade:

The final grade will depend on your performance during the computer laboratories and the final project and of course the exam: final_grade = 0.35 * lab_grade + 0.35 * project_grade + 0.3 * final_exam_grade. NOTE! You need to have both labs and project passed to get an overall passing grade!!!

Prerequisites and additional requirements:

The course is not for the entrants – instead we assume that you at least are familiar with basics of parallel programming. However, I intend to give a brisk crash-course at the very beginning so you just need to be willing to follow my lead here.

Recommended literature and teaching resources:

Depertamental Library (Physics and Applied Computer Science) is quite well stocked with books pertaining to parallel programming. We have the following, that can be used to study the problem:
1. "CUDA for Engineers: An Introduction to High-Performance Parallel Computing"
Duane Storti, Mete Yurtoglu
Addison Wesley, ISBN-13: 978-0134177410

2. "CUDA Programming: A Developer's Guide to Parallel Computing with GPUs (Applications of GPU Computing Series)"
Shane Cook
Morgan Kaufmann, ISBN-13: 978-0124159334

3. "Programming Massively Parallel Processors: A Hands-on Approach"
David B. Kirk, Wen-mei W. Hwu
Morgan Kaufmann; 2 edition (14 Dec. 2012), ISBN-13: 978-0124159921

Scientific publications of module course instructors related to the topic of the module:

LHCb collaboration, Performance of the LHCb Vertex Locator, JINST 9 (2014) P09007,
LHCb collaboration, Evidence for the decay B0s→K∗0μ+μ−, LHCb-PAPER-2018-004

Additional information:

The labs will be quite agressively paced – please attend them! This classes are compulsory and you are allowed to skip only one lab during the whole semester. Also, there will be very hard to catch up if you miss a class.