The non-symmetric s-step Lanczos algorithm: Derivation of efficient recurrences and synchronization-reducing variants of BICG and QMR

Stefan Feuerriegel; H. Martin Bücker

International Journal of Applied Mathematics and Computer Science (2015)

  • Volume: 25, Issue: 4, page 769-785
  • ISSN: 1641-876X

Abstract

top
The Lanczos algorithm is among the most frequently used iterative techniques for computing a few dominant eigenvalues of a large sparse non-symmetric matrix. At the same time, it serves as a building block within biconjugate gradient (BiCG) and quasi-minimal residual (QMR) methods for solving large sparse non-symmetric systems of linear equations. It is well known that, when implemented on distributed-memory computers with a huge number of processes, the synchronization time spent on computing dot products increasingly limits the parallel scalability. Therefore, we propose synchronizationreducing variants of the Lanczos, as well as BiCG and QMR methods, in an attempt to mitigate these negative performance effects. These so-called s-step algorithms are based on grouping dot products for joint execution and replacing timeconsuming matrix operations by efficient vector recurrences. The purpose of this paper is to provide a rigorous derivation of the recurrences for the s-step Lanczos algorithm, introduce s-step BiCG and QMR variants, and compare the parallel performance of these new s-step versions with previous algorithms.

How to cite

top

Stefan Feuerriegel, and H. Martin Bücker. "The non-symmetric s-step Lanczos algorithm: Derivation of efficient recurrences and synchronization-reducing variants of BICG and QMR." International Journal of Applied Mathematics and Computer Science 25.4 (2015): 769-785. <http://eudml.org/doc/275985>.

@article{StefanFeuerriegel2015,
abstract = {The Lanczos algorithm is among the most frequently used iterative techniques for computing a few dominant eigenvalues of a large sparse non-symmetric matrix. At the same time, it serves as a building block within biconjugate gradient (BiCG) and quasi-minimal residual (QMR) methods for solving large sparse non-symmetric systems of linear equations. It is well known that, when implemented on distributed-memory computers with a huge number of processes, the synchronization time spent on computing dot products increasingly limits the parallel scalability. Therefore, we propose synchronizationreducing variants of the Lanczos, as well as BiCG and QMR methods, in an attempt to mitigate these negative performance effects. These so-called s-step algorithms are based on grouping dot products for joint execution and replacing timeconsuming matrix operations by efficient vector recurrences. The purpose of this paper is to provide a rigorous derivation of the recurrences for the s-step Lanczos algorithm, introduce s-step BiCG and QMR variants, and compare the parallel performance of these new s-step versions with previous algorithms.},
author = {Stefan Feuerriegel, H. Martin Bücker},
journal = {International Journal of Applied Mathematics and Computer Science},
keywords = {synchronization-reducing; s-step Lanczos; s-step BiCG; s-step QMR; efficient recurrences},
language = {eng},
number = {4},
pages = {769-785},
title = {The non-symmetric s-step Lanczos algorithm: Derivation of efficient recurrences and synchronization-reducing variants of BICG and QMR},
url = {http://eudml.org/doc/275985},
volume = {25},
year = {2015},
}

TY - JOUR
AU - Stefan Feuerriegel
AU - H. Martin Bücker
TI - The non-symmetric s-step Lanczos algorithm: Derivation of efficient recurrences and synchronization-reducing variants of BICG and QMR
JO - International Journal of Applied Mathematics and Computer Science
PY - 2015
VL - 25
IS - 4
SP - 769
EP - 785
AB - The Lanczos algorithm is among the most frequently used iterative techniques for computing a few dominant eigenvalues of a large sparse non-symmetric matrix. At the same time, it serves as a building block within biconjugate gradient (BiCG) and quasi-minimal residual (QMR) methods for solving large sparse non-symmetric systems of linear equations. It is well known that, when implemented on distributed-memory computers with a huge number of processes, the synchronization time spent on computing dot products increasingly limits the parallel scalability. Therefore, we propose synchronizationreducing variants of the Lanczos, as well as BiCG and QMR methods, in an attempt to mitigate these negative performance effects. These so-called s-step algorithms are based on grouping dot products for joint execution and replacing timeconsuming matrix operations by efficient vector recurrences. The purpose of this paper is to provide a rigorous derivation of the recurrences for the s-step Lanczos algorithm, introduce s-step BiCG and QMR variants, and compare the parallel performance of these new s-step versions with previous algorithms.
LA - eng
KW - synchronization-reducing; s-step Lanczos; s-step BiCG; s-step QMR; efficient recurrences
UR - http://eudml.org/doc/275985
ER -

References

top
  1. Balay, S., Gropp, W.D., McInnes, L.C. and Smith, B.F. (1997). Efficient management of parallelism in object oriented numerical software libraries, in E. Arge et al. (Eds.), Modern Software Tools in Scientific Computing, Birkhäuser Press, Boston, MA, pp. 163-202. Zbl0882.65154
  2. Bücker, H.M. (2002). Iteratively solving large sparse linear systems on parallel computers, in J. Grotendorst, D. Marx and A. Muramatsu (Eds.), Quantum Simulations of Complex Many-Body Systems: From Theory to Algorithms, NIC Series, Vol. 10, John Von Neumann Institute for Computing, Jülich, pp. 521-548. 
  3. Bücker, H.M. and Sauren, M. (1996). A parallel version of the quasi-minimal residual method based on coupled two-term recurrences, in J. Waśniewski et al. (Eds.), Applied Parallel Computing, Lecture Notes in Computer Science, Vol. 1184, Springer, Berlin, pp. 157-165. Zbl0888.65036
  4. Bücker, H.M. and Sauren, M. (1997). A variant of the biconjugate gradient method suitable for massively parallel computing, in G. Bilardi et al. (Eds.), Solving Irregularly Structured Problems in Parallel, Lecture Notes in Computer Science, Vol. 1253, Springer, Berlin, pp. 72-79. 
  5. Bücker, H.M. and Sauren, M. (1999). Reducing global synchronization in the biconjugate gradient method, in T. Yang (Ed.), Parallel Numerical Computations with Applications, Kluwer Academic Publishers, Norwell, MA, pp. 63-76. Zbl0945.65029
  6. Cappello, F., Geist, A., Gropp, B., Kale, L., Kramer, B. and Snir, M. (2009). Toward exascale resilience, International Journal of High Performance Computing Applications 23(4): 374-388. 
  7. Carson, E. and Demmel, J. (2014). A residual replacement strategy for improving the maximum attainable accuracy of s-step Krylov subspace methods, SIAM Journal on Matrix Analysis and Applications 35(1): 22-43. Zbl1302.65075
  8. Carson, E., Knight, N. and Demmel, J. (2014). Avoiding communication in nonsymmetric Lanczos-based Krylov subspace methods, SIAM Journal on Scientific Computing 35(5): S42-S61. Zbl1281.65057
  9. Chronopoulos, A.T. (1986). A class of parallel iterative methods implemented on multiprocessors, Technical report UIUCDCS-R-86-1267, Department of Computer Science, University of Illinois, Urbana, IL. 
  10. Chronopoulos, A.T. and Gear, C.W. (1989). s-Step iterative methods for symmetric linear systems, Journal of Computational and Applied Mathematics 25(2): 153-168. Zbl0669.65021
  11. Chronopoulos, A.T. and Swanson, C.D. (1996). Parallel iterative s-step methods for unsymmetric linear systems, Parallel Computing 22(5): 623-641. Zbl0873.65019
  12. Curfman McInnes, L., Smith, B., Zhang, H. and Mills, R.T. (2014). Hierarchical Krylov and nested Krylov methods for extreme-scale computing, Parallel Computing 40(1): 17-31. 
  13. Davis, N.E., Robey, R.W., Ferenbaugh, C.R., Nicholaeff, D. and Trujillo, D.P. (2012). Paradigmatic shifts for exascale supercomputing, Journal of Supercomputing 62(2): 1023-1044. 
  14. Demmel, J., Heath, M. and van der Vorst, H. (1993). Parallel numerical linear algebra, Acta Numerica 2(1): 111-197. Zbl0793.65011
  15. Duff, I.S. (2012). European exascale software initiative: Numerical libraries, solvers and algorithms, in D. Hutchison et al. (Eds.), Euro-Par 2011: Parallel Processing Workshops, Lecture Notes in Computer Science, Vol. 7155, Springer, Berlin, pp. 295-304. 
  16. Duff, I.S. and van der Vorst, H.A. (1999). Developments and trends in the parallel solution of linear systems, Parallel Computing 25(13-14): 1931-1970. 
  17. Feuerriegel, S. and Bücker, H.M. (2013a). A normalization scheme for the non-symmetric s-step Lanczos algorithm, in J. Kolodziej et al. (Eds.), ICA3PP, Part II, Lecture Notes in Computer Science, Vol. 8286, Springer, Berlin, pp. 30-39. 
  18. Feuerriegel, S. and Bücker, H.M. (2013b). Synchronizationreducing variants of the biconjugate gradient and the quasi-minimal residual methods, in J. Kolodziej et al. (Eds.), ICA3PP, Part I, Lecture Notes in Computer Science, Vol. 8285, Springer, Berlin, pp. 226-235. 
  19. Fischer, B. and Freund, R. (1994). An inner product-free conjugate gradient-like algorithm for Hermitian positive definite systems, in J. Brown et al. (Eds.), Cornelius Lanczos International Centenary Conference, SIAM, Philadelphia, PA, pp. 288-290. 
  20. Fletcher, R. (1976). Conjugate gradient methods for indefinite systems, in G. Watson (Ed.), Numerical Analysis, Lecture Notes in Computer Science, Vol. 506, Springer, Berlin, pp. 73-89. Zbl0326.65033
  21. Freund, R. and Nachtigal, N. (1991). QMR: A quasi-minimal residual method for non-Hermitian linear systems, Numerische Mathematik 60(1): 315-339. Zbl0754.65034
  22. Freund, R.W. and Hochbruck, M. (1991). A biconjugate gradient type algorithm on massively parallel architectures, in R. Vichnevetsky and J.J.H. Miller (Eds.), IMACS'91, Criterion Press, Dublin, pp. 720-721. 
  23. Freund, R.W. and Hochbruck, M. (1992). A biconjugate gradient-type algorithm for the iterative solution of non-Hermitian linear systems on massively parallel architectures, in C. Brezinski and U. Kulisch (Eds.), IMACS'91, North Holland, Amsterdam, pp. 169-178. Zbl0789.65020
  24. Freund, R.W. and Nachtigal, N.M. (1994). An implementation of the QMR method based on coupled two-term recurrences, SIAM Journal on Scientific Computing 15(2): 313. 
  25. Ghysels, P., Ashby, T.J., Meerbergen, K. and Vanroose, W. (2013). Hiding global communication latency in the GMRES algorithm on massively parallel machines, SIAM Journal on Scientific Computing 35(1): C48-C71. Zbl1273.65050
  26. Ghysels, P. and Vanroose, W. (2014). Hiding global synchronization latency in the preconditioned conjugate gradient algorithm, Parallel Computing 40(7): 224-238. 
  27. Gustafsson, M., Demmel, J. and Holmgren, S. (2012a). Numerical evaluation of the communication-avoiding Lanczos algorithm, Technical Report 2012-001, Department of Information Technology, Uppsala University, Uppsala. 
  28. Gustafsson, M., Kormann, K. and Holmgren, S. (2012b). Communication-efficient algorithms for numerical quantum dynamics, in K. Jónasson (Ed.), Applied Parallel and Scientific Computing, Lecture Notes in Computer Science, Vol. 7134, Springer, Berlin, pp. 368-378. 
  29. Gutknecht, M.H. (1997). Lanczos-type solvers for nonsymmetric linear systems of equations, Acta Numerica 6(1): 271-397. Zbl0888.65030
  30. Hernandez, V., Roman, J.E. and Vidal, V. (2005). SLEPc: A scalable and flexible toolkit for the solution of eigenvalue problems, ACM Transactions on Mathematical Software 31(3): 351-362. Zbl1136.65315
  31. Hoemmen, M.F. (2010). Communication-avoiding Krylov Subspace Methods, Ph.D. thesis, EECS Department, University of California, Berkeley, CA. 
  32. Kandalla, K., Yang, U., Keasler, J., Kolev, T., Moody, A., Subramoni, H., Tomko, K., Vienne, J., De Supinski, B. and Panda, D. (2012). Designing non-blocking allreduce with collective offload on InfiniBand clusters: A case study with conjugate gradient solvers, Proceedings of the 2012 IEEE 26th International Parallel Distributed Processing Symposium (IPDPS), Shanghai, China, pp. 1156-1167. 
  33. Kim, S.K. (2010). Efficient biorthogonal Lanczos algorithm on message passing parallel computer, in C.H. Hsu and V. Malyshkin (Eds.), MTPP 2010, Lecture Notes in Computer Science, Vol. 6083, Springer, Berlin, pp. 293-299. 
  34. Kim, S.K. and Chronopoulos, A. (1991). A class of Lanczos-like algorithms implemented on parallel computers, Parallel Computing 17(6-7): 763-778. Zbl0735.65016
  35. Kim, S.K. and Chronopoulos, A.T. (1992). An efficient nonsymmetric Lanczos method on parallel vector computers, Journal of Computational and Applied Mathematics 42(3): 357-374. Zbl0756.65057
  36. Kim, S.K. and Kim, T.H. (2005). A study on the efficient parallel block Lanczos method, in J. Zhang, J.-H. He and Y. Fu (Eds.), Computational and Information Science, Lecture Notes in Computer Science, Vol. 3314, Springer, Berlin/Heidelberg, pp. 231-237. Zbl1117.65371
  37. Lanczos, C. (1950). An iteration method for the solution of the eigenvalue problem of linear differential and integral operators, Journal of Research of the National Bureau of Standards 45(4): 255-282. 
  38. Meurant, G. (1986). The conjugate gradient method on supercomputers, Supercomputer 13: 9-17. 
  39. Mohiyuddin, M., Hoemmen, M., Demmel, J. and Yelick, K. (2009). Minimizing communication in sparse matrix solvers, Conference on High Performance Computing Networking, Storage and Analysis, Portland, OR, USA, pp. 36:1-36:12. 
  40. Saad, Y. (1989). Krylov subspace methods on supercomputers, SIAM Journal on Scientific and Statistical Computing 10(6): 1200-1232. Zbl0693.65028
  41. Sauren, M. and Bücker, H.M. (1998). On deriving the quasi-minimal residual method, SIAM Review 40(4): 922-926. Zbl0913.65029
  42. Shalf, J., Dosanjh, S. and Morrison, J. (2011). Exascale computing technology challenges, in D. Hutchison et al. (Eds.), High Performance Computing for Computational Science, VECPAR 2010, Lecture Notes in Computer Science, Vol. 6449, Springer, Berlin, pp. 1-25. 
  43. van der Vorst, H. (1990). Iterative methods for the solution of large systems of equations on supercomputers, Advances in Water Resources 13(3): 137-146. 
  44. van der Vorst, H.A. and Ye, Q. (2000). Residual replacement strategies for Krylov subspace iterative methods for the convergence of true residuals, SIAM Journal on Scientific Computing 22(3): 835-852. Zbl0983.65039
  45. Van Rosendale, J. (1983). Minimizing inner product data dependencies in conjugate gradient iteration, NASA Contractor Report NASA-CR-172178, NASA Langley Research Center, Hampton, VA. 
  46. Zhu, S.-X., Gu, T.-X. and Liu, X.-P. (2014). Minimizing synchronizations in sparse iterative solvers for distributed supercomputers, Computers & Mathematics with Applications 67(1): 199-209. 
  47. Zuo, X., Gu, T.-X. and Mo, Z. (2010). An improved GPBi-CG algorithm suitable for distributed parallel computing, Applied Mathematics and Computation 215(12): 4101-4109. Zbl1186.65041

NotesEmbed ?

top

You must be logged in to post comments.

To embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.

Only the controls for the widget will be shown in your chosen language. Notes will be shown in their authored language.

Tells the widget how many notes to show per page. You can cycle through additional notes using the next and previous controls.

    
                

Note: Best practice suggests putting the JavaScript code just before the closing </body> tag.