You're not currently signed in. Sign in »

Languages and Abstractions for High-Performance Scientific Computing (CS 598APK) Spring 2025

What Where
Time/place Tue/Thu 11:00-12:15 106B3 Engineering Hall / Catalog
Class URL https://bit.ly/hpcabstr-s25
Class recordings Illinois Mediaspace
Discussion forum Discuss »
Administrative Help Help Desk (click "Message" on the top right)
Calendar View »

Assignments

Schedule of Talks

Date Slot Who Title Link
February 20th, 2025 1 Jacob Xianyu Sorensen 2016: Portable inter-workgroup barrier synchronisation for GPUs Link
February 20th, 2025 2 Ananth Madan Baghdadi 2018: Tiramisu: A Code Optimization Framework for High Performance Systems Link
February 27th, 2025 1 Dhruv Baronia Kothari 2024: Hydride: A Retargetable and Extensible Synthesis-based Compiler for Modern Hardware Architectures Link
February 27th, 2025 2 Jiatong Li Jääskeläinen 2015: pocl: A performance-portable OpenCL implementation Link
March 6th, 2025 1 Krut Patel Pharr 2012: ispc: A SPMD compiler for high-performance CPU programming Link
March 6th, 2025 2 Addison Jordan Alvey-Blanco Ragan-Kelley 2012: Decoupling Algorithms from Schedules for Easy Optimization of Image Processing Pipelines Link
March 13th, 2025 1 Relena Zhao Li Tiwari 2009: A scalable auto-tuning framework for compiler optimization Link
March 13th, 2025 2
March 27th, 2025 1 Wei-Lun Wu Vasilache 2018: Tensor Comprehensions: Framework-Agnostic High-Performance Machine Learning Abstractions Link
March 27th, 2025 2 Ryan Cheng Thammakhoune Heinecke 2016: LIBXSMM: accelerating small matrix multiplications by runtime code generation Link
April 3rd, 2025 1 Sharvil Garg Osama 2023: Stream-K: Work-centric Parallel Decomposition for Dense Matrix-Matrix Multiplication on the GPU Link
April 3rd, 2025 2 Zhan Jin Pai 2016: A Compiler for Throughput Optimization of Graph Algorithms on GPUs Link
April 10th, 2025 1 Chia-Lun Tsai Bezanson 2017: Julia: A fresh approach to numerical computing Link
April 10th, 2025 2 Nick Koskelo Leißa 2015: Shallow Embedding of DSLs via Online Partial Evaluation Link
April 17th, 2025 1 Jonathan Wang Bikshandi 2006: Programming for Parallelism and Locality with Hierarchically Tiled Arrays Link
April 17th, 2025 2 Songyuan Cui Blelloch 1990: Prefix sums and their applications Link
April 24th, 2025 1 Haocheng Xia Bauer 2012: Legion: Expressing locality and independence with logical regions Link
April 24th, 2025 2 Shawn Lin Catanzaro 2014: A Decomposition for In-place Matrix Transposition Link
April 29th, 2025 1 Aditya Bhosale Edwards 2014: Kokkos: Enabling manycore performance portability through polymorphic memory access patterns Link
April 29th, 2025 2 Hunter Joseph Demeyer Ben-Nun 2019: Stateful Dataflow Multigraphs: A Data-Centric Model for High-Performance Parallel Programs Link
May 1st, 2025 1 Po-Hao Huang Higuchi 2019: ClPy: A NumPy-Compatible Library Accelerated with OpenCL Link
May 1st, 2025 2 Ahan Gupta Spampinato 2016: A Basic Linear Algebra Compiler for Structured Matrices Link
May 6th, 2025 1 Rishabh Patrick Menezes Luporini 2015: Cross-Loop Optimization of Arithmetic Intensity for Finite Element Local Assembly Link
May 6th, 2025 2 Dohyun Park Abadi 2016: Tensorflow: a system for large-scale machine learning. Link

Why You Should Take this Class

Visualization of an array access pattern

GPU kernel source code

Software for large-scale problems is stretched between three key requirements: high-performance, typically parallel implementation, asymptotically optimal algorithms, and often highly technical application domains. This tension contributes considerably to making HPC software difficult to write and hard to maintain. If you are faced with this problem, this class can help you find and develop possible solution approaches.

Abstractions, tools, and languages can help restore separation of concerns and ease creation and maintenance of such software. Proven approaches to this problem include domain-specific mini-languages (`DSLs'), code generation, as well as 'active' libraries.

This class begins with a quick, but thorough examination of the problem setting: What machines are we realistically confronted with now and in the foreseeable future? What are determinants of performance? How can we measure and understand performance?

From the hardware level, we will then move towards a view of abstractions: Concepts and simplifications of complex hardware realities that are sufficiently thin to allow user/developer to reason about expected performance while achieving a substantial simplification of the programming task.

We will discuss, design, and evaluate a number of such systems, with the goal of putting you in a position to

  • know the available landscape of tools
  • make informed choices among them for a computational task at hand
  • design your own abstractions and place them in context with respect to the state of the art.

As we progress, we will examine a number of increasingly high-level program representations, ranging from instruction sets to compiler IRs, through CUDA/OpenCL/'SIMT' models, to polyhedral and other higher-level representations. Along the way, you will design toy program representations and transformations for limited-scale tasks and examine your achievable and practically achieved performance.

We will pay careful attention to semantics and correctness in the context of program representation and ultimate code generation, but we will prefer the definition of specialized or simplified semantics over extensive compiler analyses that might help prove the validity of transformations.

To complement the many excellent distributed-memory offerings in the department (e.g. CS 484, CS 554), this class focuses more on 'on-node'/shared-memory performance.

Prerequisites / What You Should Already Know

  • C
  • Python
  • Familiarity with parallel programming: You should have measured and wondered about the performance of a code you wrote
  • Having taken a class on compiler construction (such as CS 426) will be helpful, but is not required.

While this class is being offered in a CS department, it is deliberately open to graduate students in the engineering disciplines who do extensive computational work and face this range of problems every day.

Suggested Papers for Student Presentations

See this list for an idea on the focus of this class. These papers can also serve as the basis for mid-semester paper presentations.

Instructor

Andreas Kloeckner

Andreas Kloeckner

(Instructor)

Email: andreask@illinois.edu

Office: 4318 Siebel

Course Outline

I will insert links to class material, books, and papers into this tree as time goes on.

Note: the section headings in this tree are clickable to reveal more detail.

CAUTION!

These scribbled PDFs are an unedited reflection of what we wrote during class. They need to be viewed in the context of the class discussion that led to them. See the lecture videos for that.

If you would like actual, self-contained class notes, look in the outline above.

These scribbles are provided here to provide a record of our class discussion, to be used in perhaps the following ways:

  • as a way to cross-check your own notes
  • to look up a formula that you know was shown in a certain class
  • to remind yourself of what exactly was covered on a given day

By continuing to read them, you acknowledge that these files are provided as supplementary material on an as-is basis.

Computing

If you have submitted your SSH key, an account has been created you on a family of machines managed by members of the scientific computing area. See the linked page for more information on access and usage.

Grading Policies

View grading policies »