Performance of parallel QR factorization methods on the NVIDIA Grace CPU Superchip
- Programs and Algorithms of Numerical Mathematics, page 29-40
Access Full Article
topAbstract
topHow to cite
topBřichňáč, Vít, and Šístek, Jakub. "Performance of parallel QR factorization methods on the NVIDIA Grace CPU Superchip." Programs and Algorithms of Numerical Mathematics. 2025. 29-40. <http://eudml.org/doc/299959>.
@inProceedings{Břichňáč2025,
abstract = {This article studies several algorithms for QR factorization based on hierarchical Householder reflectors organized into elimination trees, which are particularly suited for tall-and-skinny matrices and allow parallelization. We examine the effect of various parameters on the performance of the tree-based algorithms. The work is accompanied with a custom implementation that utilizes a task-based runtime system (OpenMP or StarPU). The same algorithm is implemented in the PLASMA library. The performance evaluation is done on the recent NVIDIA Grace CPU Superchip.},
author = {Břichňáč, Vít, Šístek, Jakub},
booktitle = {Programs and Algorithms of Numerical Mathematics},
keywords = {QR factorization; task-based programming; NVIDIA Grace CPU},
pages = {29-40},
title = {Performance of parallel QR factorization methods on the NVIDIA Grace CPU Superchip},
url = {http://eudml.org/doc/299959},
year = {2025},
}
TY - CLSWK
AU - Břichňáč, Vít
AU - Šístek, Jakub
TI - Performance of parallel QR factorization methods on the NVIDIA Grace CPU Superchip
T2 - Programs and Algorithms of Numerical Mathematics
PY - 2025
SP - 29
EP - 40
AB - This article studies several algorithms for QR factorization based on hierarchical Householder reflectors organized into elimination trees, which are particularly suited for tall-and-skinny matrices and allow parallelization. We examine the effect of various parameters on the performance of the tree-based algorithms. The work is accompanied with a custom implementation that utilizes a task-based runtime system (OpenMP or StarPU). The same algorithm is implemented in the PLASMA library. The performance evaluation is done on the recent NVIDIA Grace CPU Superchip.
KW - QR factorization; task-based programming; NVIDIA Grace CPU
UR - http://eudml.org/doc/299959
ER -
NotesEmbed ?
topTo embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.