Course Essentials

Language: English
Prerequisites: C/C++, Linux shell usage
Audience: Computational scientists, data scientists, university students.
Duration: 2 days, 7 x 2 = 14 hours

– Introduction to Parallel Computing
– Introduction to GPUs and their Architecture
– CUDA Programming Model
– Using Multiple GPUs and Multiple Streams
– Debugging and Profiling Performance
– Performance Optimization and Efficiency
– Some Libraries and Remaining Issues
– CUDA Samples

Learning Outcomes

After this course, the audience will
– have in-depth knowledge about parallel programming on the GPU
– be able to design and develop algorithms on the GPU with CUDA
– be able to distribute the computation to multiple GPUs
– have the ability to assess and improve the parallel performance on a GPU


Özcan Dülger is a visiting Asst. Prof. Dr. at the Department of Computer Engineering, Middle East Technical University, Ankara, Turkey. He received his Ph.D. from the same department. He was also visiting scholar in the Center for Automotive Research at the Ohio State University, Columbus, OH, USA. His research areas are High-Performance Computing, CUDA Programming, and Target Tracking. He has been working on parallelization of particle filter-based tracking algorithms on GPU. (

Kamer Kaya is an Associate Professor at the Faculty of Engineering and Natural Sciences at Sabancı University. His research interests include Parallel Algorithms, Graph Algorithms, High-Performance Computing, and Cryptography. He is actively working on sparse computations on matrices, graphs, and tensors. His team focuses on implementing efficient algorithms on sparse data structures for cutting-edge HPC hardware such as GPUs and IPUs, especially for ML applications. (