Performance of parallel QR factorization methods on the NVIDIA Grace CPU Superchip

Břichňáč, Vít; Šístek, Jakub

Performance of parallel QR factorization methods on the NVIDIA Grace CPU Superchip

Břichňáč, Vít; Šístek, Jakub

Programs and Algorithms of Numerical Mathematics, page 29-40

Access Full Article

top

Access to full text

Full (PDF)

Abstract

top

This article studies several algorithms for QR factorization based on hierarchical Householder reflectors organized into elimination trees, which are particularly suited for tall-and-skinny matrices and allow parallelization. We examine the effect of various parameters on the performance of the tree-based algorithms. The work is accompanied with a custom implementation that utilizes a task-based runtime system (OpenMP or StarPU). The same algorithm is implemented in the PLASMA library. The performance evaluation is done on the recent NVIDIA Grace CPU Superchip.

How to cite

top

MLA
BibTeX
RIS

Břichňáč, Vít, and Šístek, Jakub. "Performance of parallel QR factorization methods on the NVIDIA Grace CPU Superchip." Programs and Algorithms of Numerical Mathematics. 2025. 29-40. <http://eudml.org/doc/299959>.

@inProceedings{Břichňáč2025,
abstract = {This article studies several algorithms for QR factorization based on hierarchical Householder reflectors organized into elimination trees, which are particularly suited for tall-and-skinny matrices and allow parallelization. We examine the effect of various parameters on the performance of the tree-based algorithms. The work is accompanied with a custom implementation that utilizes a task-based runtime system (OpenMP or StarPU). The same algorithm is implemented in the PLASMA library. The performance evaluation is done on the recent NVIDIA Grace CPU Superchip.},
author = {Břichňáč, Vít, Šístek, Jakub},
booktitle = {Programs and Algorithms of Numerical Mathematics},
keywords = {QR factorization; task-based programming; NVIDIA Grace CPU},
pages = {29-40},
title = {Performance of parallel QR factorization methods on the NVIDIA Grace CPU Superchip},
url = {http://eudml.org/doc/299959},
year = {2025},
}

TY - CLSWK
AU - Břichňáč, Vít
AU - Šístek, Jakub
TI - Performance of parallel QR factorization methods on the NVIDIA Grace CPU Superchip
T2 - Programs and Algorithms of Numerical Mathematics
PY - 2025
SP - 29
EP - 40
AB - This article studies several algorithms for QR factorization based on hierarchical Householder reflectors organized into elimination trees, which are particularly suited for tall-and-skinny matrices and allow parallelization. We examine the effect of various parameters on the performance of the tree-based algorithms. The work is accompanied with a custom implementation that utilizes a task-based runtime system (OpenMP or StarPU). The same algorithm is implemented in the PLASMA library. The performance evaluation is done on the recent NVIDIA Grace CPU Superchip.
KW - QR factorization; task-based programming; NVIDIA Grace CPU
UR - http://eudml.org/doc/299959
ER -

NotesEmbed ?

top

You must be logged in to post comments.

To embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.

Language to use for this widget.

Only the controls for the widget will be shown in your chosen language. Notes will be shown in their authored language.

Number of notes per page

Tells the widget how many notes to show per page. You can cycle through additional notes using the next and previous controls.

Note: Best practice suggests putting the JavaScript code just before the closing </body> tag.