On the first day of this workshop we will introduce the Intel SW development tools and the new, upcoming cross platform development concept oneAPI and the code optimization process. The attendees will be supported with hands-on exercises using a many-body kernel and learn how to enable vectorization with simple pragmas and more effective techniques, like changing data layout and memory alignment. The work is then guided by the hints from the Intel® Compiler reports and using the Intel® Advisor. Furthermore, the participants will get insights into Intel® VTune™ Amplifier and of Intel Application Performance Snapshot as tools for investigating and improving the performance of HPC application.
On the second day we will describe the latest micro-processor hardware architectures and how the developers can efficiently use modern HPC hardware, in particular the Intel® Deep Learning Boost with AVX-512VNNI instructions.
We will then focus on data analytics techniques, such as Machine Learning and Deep Learning, which become the key for gaining insight into the incredible amount of data generated by scientific investigations (simulations and observations). An overview on the most known machine learning algorithms for supervised and unsupervised learning will follow. With small example codes we show how to implement such algorithms using the Intel® Distribution for Python*, and which performance benefit can be obtained with minimal effort from the developer perspective. We will implement simple algorithms, like K-Means and PCA, using the Intel® Data Analytics Acceleration Library (Intel-DAAL) and show how to scale the workload on HPC systems using the Intel® MPI library. We cover also how to accelerate the training of deep neural networks with Tensorflow*, thanks to the highly optimized Intel® Math Kernel Library (Intel® MKL). We also demonstrate techniques on how to leverage deep neural network training on multiple nodes on distributed x86 HPC systems not requiring GPUs.
|09:00-09:30||Introduction to Intel oneAPI project for heterogeneous software infrastructure||Intel’s Hardware and Software directions for Artificial Intelligence (AI)|
|09:30-10:00||Latest update of the Intel Parallel Studio features|
|10:00-10:30||Introduction to HPC code optimization process||Hardware Accelerated Deep Learning instructions and implementations|
|10:30-11:00||Coffee break||Coffee break|
|11:00-12:30||Hands-on on compiler optimization process using N-body code||Hands-on Labs using Intel distribution for Python focus on Classical Machine Learning algorithms|
|14:00-15:00||Identify Vectorization inefficiencies - Roofline Model and Intel Advisor with hands-on||Hands-on using Intel Data Analytics Acceleration Library for distributed Machine Learning algorithms|
|15:00-15:30||Introduction to Deep Learning using Intel Optimized Tensorflow*|
|15:30-16:00||Coffee break||Coffee break|
|16:00-17:00||Identify Performance inefficiencies - Intel VTune and the Application Performance Snapshot||Distributed Deep Learning Solution for HPC systems|
|17:00-17:30||Tips and tricks - Enhance the performance using the Intel Math Kernel Library|