An Empirical Study of Conjugate Gradient

Preconditioners for Solving Symmetric Positive Deﬁnite

Systems of Linear Equations

Marc A. Tunnell

∗

David F. Gleich

†

Despite hundreds of papers on preconditioned linear systems of equations, there re-

mains a signiﬁcant lack of comprehensive performance benchmarks comparing various

preconditioners for solving symmetric positive deﬁnite (SPD) systems. In this paper,

we present a preliminary comparative study of 79 matrices using a broad range of

preconditioners. Speciﬁcally, we evaluate 10 widely used preconditoners across 92 con-

ﬁgurations to assess their relative performance against using no preconditioner. Our

focus is on preconditioners that are commonly used in practice, are available in major

software packages, and can be utilized as black-box tools without requiring signiﬁcant

a priori knowledge. In addition, we compare these against a selection of classical meth-

ods. We do not consider the time and eﬀort needed to compute the preconditioner.

Our results show that symmetric positive deﬁnite systems are mostly likely to bene-

ﬁt from incomplete symmetric factorizations, such as incomplete Cholesky. Multigrid

methods occassionally do exceptionally well. Simple classical techniques, symmetric

Gauss Seidel and symmetric SOR, are not productive.

TODO: Finish abstract with the ﬁnal results. This work is part of a long-term

study, and we plan to continue development to expand the scope of the benchmark

and reﬁne our analysis in ongoing work.

1 Introduction 2

2 Results 4

2.1 Black-Box Preconditioner Performance . . . . . . . . . . . . . . . . . . . . . 4

2.2 Preconditioner Performance Proﬁles . . . . . . . . . . . . . . . . . . . . . . . 7

2.3 Impact of Ordering on Preconditioner Performance . . . . . . . . . . . . . . 9

∗

Purdue University, mtunnell@purdue.edu

†

Purdue University, dgleich@purdue.edu

3 Key Findings 11

3.1 Ongoing Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3.2 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

4 Background and Related Work 12

4.1 Krylov Subspace Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

4.2 Preconditioned Conjugate Gradient . . . . . . . . . . . . . . . . . . . . . . . 13

4.3 Preconditioned Minimum Residual Method . . . . . . . . . . . . . . . . . . . 15

5 Systems of Linear Equations 15

5.1 Matrix Ordering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

5.2 Symmetric Positive Deﬁnite Matrices . . . . . . . . . . . . . . . . . . . . . . 16

5.3 Graph-Laplacian Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

5.4 Right-Hand Side Vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

6 Preconditioners 19

6.1 Classical Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

6.2 Sparse Approximate Inverse . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

6.3 Incomplete Factorizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

6.4 Algebraic Multigrid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

6.5 Graph-Laplacian Preconditioners . . . . . . . . . . . . . . . . . . . . . . . . 28

1 Introduction

We consider using preconditioned iterative methods to solve Ax = b where A is a large,

sparse, symmetric, and positive deﬁnite matrix. We assume the matrix data is available in

a standard sparse data structure such as compressed sparse row or similar. Hence, we are

not treating matrix-free or operator methods. It is well known that the convergence rate

of iterative solvers heavily depends on the condition number of the matrix (Greenbaum,

1997). To improve the condition number, preconditioners are used to transform the original

system into one with more favorable spectral properties, thereby accelerating the rate of

convergence as a function of the number of iterations (Greenbaum, 1997; Saad, 2003; Golub

and Van Loan, 2013; Benzi, 2002).

Over the past decades, the world of preconditioners has grown. It now includes rela-

tively simple techniques such as symmetric scaling (Saad, 2003), iterative methods such as

symmetric successive over-relaxation (SSOR) (Golub and Van Loan, 2013; Young, 1954),

and incomplete factorizations (Benzi, 2002; Meijerink and van der Vorst, 1977; Saad, 1994;

Li, Saad, and Chow, 2003). On the other end of the spectrum, multigrid methods (Briggs,

2000), algebraic multigrid (AMG) (Brandt and Livne, 2011), and domain decomposition

based methods (Smith, Bjorstad, and Gropp, 2004) represent more complicated techniques

that promise even faster convergence.

Despite an abundance of research on novel preconditioners for symmetric, positive def-

inite systems – including reviews of their theoretical performance (Benzi, 2002; St¨uben,

2001; Pearson and Pestana, 2020) – there is no comprehensive performance benchmark that

provides an empirical comparison of the performance of these preconditioners.

Those comparisons that exist are often limited to only a few matrices as in both Benzi

(2002) and St¨uben (2001). This is also the case in most papers proposing new precondition-

ers. Several studies have attempted a comprehensive empirical evaluation of preconditioners

for SPD systems, but these have notable limitations or otherwise ﬁll a diﬀerent gap than

addressed in this study. For instance, George, Gupta, and Sarin (2012) conducted an em-

pirical analysis of preconditioners for SPD systems. However, their study was limited to

30 matrices and focused on a few well-known software libraries. Furthermore, the types of

preconditioners tested were not unique across these libraries; for instance, they evaluated

ILU preconditioners from each library, resulting in signiﬁcant overlap in the preconditioner

classes examined. Similarly, Inca, Davis, and Dubas (2021) conducted a benchmarking study

on a narrow set of preconditioners and limited their testing set to PDEs.

In Benzi, Haws, and Tuma (2000), the authors provide an empirical evaluation of pre-

conditioners for non-symmetric matrices, while (Ghai, Lu, and Jiao, 2016) provides a more

up-to-date empirical evaluation for large-scale non-symmetric systems. In Benzi and T˚uma

(1998) and Benzi and Tˆuma (1999), sparse approximate inverse preconditioners are empir-

ically analyzed and validated against the standard preconditioning techniques of the time.

However, these analyses are focused on non-symmetric matrices and, as such, lack the cov-

erage of SPD systems that we seek.

These limitations highlight the need for a comprehensive performance benchmark of SPD

systems across a wide range preconditioners and matrices. Such a benchmark would help

establish clearer guidelines on the applicability and eﬃciency of diﬀerent preconditioners

across a wide range of real-world SPD systems.

In this study, we aim to address this gap by providing a thorough performance evaluation

of a wide range of preconditioner classes on 79 SPD matrices from the SuiteSparse Matrix

Collection (Davis and Hu, 2011). Our focus is on preconditioners that are widely used in

practice, that are available in large software packages, and may be utilized as a black-box

without requiring signiﬁcant a priori knowledge of the underlying properties of the system.

We choose these types of preconditioners to ensure the relevance of this study to the broader

scientiﬁc community.

Speciﬁcally, we test preconditioners from the classes outlined in Table 1 and compare

their performance against the unpreconditioned but symmetric diagonally scaled case.

Preconditioner Class Preconditioners

Classical Methods

Symmetric Gauss-Seidel, SSOR,

Truncated Neumann, Steepest Descent

Sparse Approximate Inverse SPAI

Incomplete Factorizations

Incomplete Cholesky (IC), Modiﬁed IC,

Incomplete LU

Algebraic Multigrid Ruge-Stuben, Smoothed Aggregation

Graph-Laplacian Preconditioners Combinatorial Multigrid, Approximate Cholesky

Table 1: Classes of preconditioners evaluated in this study.

In section 6, we provide more details on the exact preconditioners used, their properties

and workload, and the software packages in which they are implemented. We utilize the

preconditioned CG (PCG) method as our solver, speciﬁcally the Hestenes-Stiefel formula-

tion (Hestenes and Stiefel, 1952; Golub and Van Loan, 2013). In a future iteration of this

manuscript, we will include the results of the Preconditioned Minimum Residual (PMR)

method.

We aim to eliminate as many extraneous variables as possible, focusing solely on compar-

ing the eﬀectiveness of the preconditioners in reducing the overall computational workload.

Thus, we study and compare preconditioners in terms of ﬂoating point operations needed

to converge to a given tolerance. This approach excludes the time needed to construct the

preconditioner, which can be signiﬁcant, but crucially, it removes variance from diﬀerences

in implementation eﬃciency. While ﬂoating point operations are only a proxy for the more

desirable measure of computation time, the idea here is to study and assess preconditioner

eﬀectiveness independent of hardware and software details and optimizations.

2 Results

We split this section into three parts: the ﬁrst presents results for black-box precondi-

tioners on a selection of test matrices, the second provided performance proﬁles, and the

third analyzes the impact of reordering schemes on optimal performance.

Throughout this section, preconditioner groups, which represent all tested conﬁgurations

for a speciﬁc preconditioner, are sometimes combined where appropriate (e.g., Symmetric

Gauss-Seidel and SSOR). The work of a preconditioner (or of the solver without precondi-

tioning) is measured as the number of iterations multiplied by the matrix’s non-zero elements,

plus the preconditioner’s contribution per iteration. For more information on how this work

is calculated or estimated for each preconditioner, see the relevant subsection in section 6.

In this version of the manuscript, we elect not to show results for the Combinatorial

Multigrid (CMG) preconditioner as it performed well on exactly zero matrices in the test

set. We note that the CMG preconditioner is designed for graph-Laplacian matrices and is

theoretically expected to perform similarly on diagonally dominant matrices. However, our

current test set omits the graph-Laplacian matrices and contains only a limited number of

diagonally dominant matrices. We also do not show results for Steepest Descent as they

also performed poorly on all matrices in the test set. Additionally, we have tested SOR and

Gauss-Seidel and found that they do not perform on symmetric problems.

2.1 Black-Box Preconditioner Performance

This subsection presents ﬁgures illustrating the convergence of the PCG method with

various preconditioners applied to individual matrices. Each ﬁgure displays a grid of sub-

plots, one for each preconditioner group, where the relative residual is plotted against the

work expended. The control case (no preconditioner) is shown in blue for reference, the

preconditioner of focus is highlighted in red, and all other preconditioners are shown in gray.

We start by showing a selection of results for crankseg 1, a matrix from structural

analysis that models a crankshaft (Mayer, 2004b). This matrix has dimension 52804 and

has 10614210 non-zero elements. We compute the condition number of the symmetrically

scaled matrix to a tolerance of 10

−8

and ﬁnd it to be not yet computed. This matrix is

not diagonally dominant and has an average degree of 201.01, which is relatively high. The

results for this matrix can be seen in Figure 1.

From the ﬁgure, we see that the IC and Modiﬁed IC preconditioners are the most eﬀective,

Relative Residual

− 10

− 8

− 6

− 4

− 2

SGS & SSOR

Truncated

Neumann

Symmetric

SPAI

Relative Residual

− 10

− 8

− 6

− 4

− 2

AMG

Rube-Stuben

AMG

Smoothed

Aggregation

Laplacians

Cholesky

Rel

ative Resi

dual

− 10

− 8

− 6

− 4

− 2

Sup

erLU

(LLt)

Incomplete

Cholesky

(IC)

Modified IC

tive Re

sidua

− 10

− 8

− 6

− 4

− 2

Control

Figure 1: Convergence of the PCG method with various preconditioners applied to the

crankseg 1 matrix. The graphs in this ﬁgure are plotted with a log-log scale.

reducing the computational cost in the best case by one order of magnitude. The AMG and

Symmetric SPAI preconditioners show minor improvements in decrease of total work, while

the classical methods oﬀer no improveent over the control case. Because the matrix is

not diagonally dominant, the graph-Laplacian methods are not eﬀective, and the SuperLU

preconditioner is similarly ineﬀective.

We next show a selection of results for ecology2, a matrix from circuit simulation theory

that is used to model animal movement. This matrix has dimension 999999 and has 4995991

non-zero elements. We compute the condition number of the symmetrically scaled matrix

to a tolerance of 10

−8

and ﬁnd it to be 6.318306710606449 · 10

. This matrix is diagonally

dominant and has an average degree of 5.00, which is relatively low. The results for this

matrix can be seen in Figure 2.

Relative Residual

− 10

− 8

− 6

− 4

− 2

SGS & SSOR

Truncated

Neumann

Symmetric

SPAI

Relative Residual

−

− 8

− 6

− 4

− 2

AMG

Rube-Stuben

AMG

Smoothed

Aggregation

Laplacians

Cholesky

Relative Residual

− 10

− 8

− 6

− 4

− 2

Sup

erLU

(LLt)

Incomplete

Cholesky

(IC)

Modified IC

lative Resid

ual

− 10

− 8

− 6

− 4

− 2

Cont

rol

Figure 2: Convergence of the PCG method with various preconditioners applied to the

ecology2 matrix. The graphs in this ﬁgure are plotted with a log-log scale.

From the ﬁgure, we see that the IC and Modiﬁed IC preconditioners are the most ef-

fective, reducing the computational cost in the best case by over one order of magnitude.

The methods based on Algebraic Multigrid are also eﬀective, reducing the computational

cost similarly to the IC preconditioners. Additionally, the graph-Laplacian-based Cholesky

preconditioner is eﬀective on this preconditioner, reducing the computational cost by a sim-

ilar margin. The classical preconditioners are largely ineﬀective here, although the SGS

and SSOR preconditioners show minor improvements in the best case. The SuperLU and

Symmetric SPAI preconditioners are largely ineﬀective on this matrix.

The performance of the graph-Laplacian preconditioner on ecology2 is typical of the

performance of this preconditioner on approximately half of the diagonally dominant matrices

in the test set. The Algebraic Multigrid preconditioners have similar performance to the

shown results in Figure 2 on the majority of the diagonally dominant matrices in the test

set. As we will see in the performance proﬁles, the IC preconditioners are eﬀective on a wide

range of matrices, while the classical preconditioners are largely ineﬀective.

2.2 Preconditioner Performance Proﬁles

This subsection presents performance proﬁles summarizing the eﬀectiveness of the test

preconditioners across all matrices (or graphs) in the test set. As in the previous section, the

work expended is measured as the number of iterations required to lower the relative residual

to a given tolerance, multiplied by the matrix’s non-zero elements plus the preconditioner’s

contribution per iteration. The performance proﬁle indicates the fraction of problems for

which a preconditioner achieves a certain factor of reduction in computational eﬀort to reduce

the relative residual to 10

−10

In each ﬁgure, there are multiple subplots, displaying one graph per preconditioner group.

The graph is lightly shaded in the region where the preconditioner is more costly than the

control, starting at the horizontal axis value of 1x. The preconditioner group in focus is

shown in red, while all other preconditioners are shown in gray. For each of these plots,

the horizontal axis represents the factor by which the computational cost is decreased (or

increased) relative to the computational cost of convergence to 10

−10

with no preconditioner.

The vertical axes show the fraction of problems where the preconditioner achieves that

performance.

We show the performance proﬁle over all matrices in the test set in Figure 3, on the

previous page. This ﬁgure shows that the IC and Modiﬁed IC preconditioners are the most

eﬀective, reducing the computational cost by one order of magnitude on a signiﬁcant fraction

of the test set. The AMG preconditioners are also at least slightly eﬀective on a signiﬁcant

fraction of the test set. We found that the classical methods, Symmetric SPAI, and SuperLU

preconditioners were largely ineﬀective overall, though they do reduce the computational cost

on a small fraction of the test set.

The following ﬁgure shows similar information as Figure 3, but with selecting the best

run for each matrix (across all parameter settings) within a given preconditioner group. This

approach simulates tuning the preconditioners parameters for each matrix while isolating the

eﬀect of the preconditioner. This is shown next in Figure 4.

This ﬁgure shows that, with some tuning, the IC and MIC preconditioners are still the

most eﬀective. The IC preconditioner was better performing on average than MIC, and lead

to a speedup of 128 times or greater on 21.79% of the test set, while the MIC preconditioner

lead to a speedup of 128 times or greater on 19.23% of the test set. The IC preconditioner

reports a speedup of 4 times or greater on 74.36% of the tested problems, and either could

not be generated or lead to a degredation in performance compared to no preconditioning

on only 6.41% of the test set.

The AMG preconditioners are also eﬀective on a signiﬁcant fraction of the test set, with

Smoothed Aggregation (SA) being signiﬁcantly better than Ruge-Stuben (RS) on average.

We found that SA lead to speedup of 64 times or greater on 2.56% of the test set, while RS

did not improve the convergence of any problem by a factor this large. Both SA and RS

lead to a speedup of 4 times or greater on 11.54% of the test set, while SA and RS lead to

a speedup of 2 times or greater on 34.62% and 23.08% of the test set, respectively. The SA

Proportion of Problems

Convergence Speed ≥ τ

25%

50%

75%

100%

SGS & SSOR Truncated

Neumann

Symmetric

SPAI

Proportion of Problems

Convergence Speed ≥ τ

25%

50%

75%

AMG

Rube-Stuben

AMG

Smoothed

Aggregation

Laplacians

Cholesky

Relative Convergence Speed τ

128x

64x

32x

16x

Pro

on of Problems

ver

nce

peed ≥

25%

50%

75%

SuperLU

(LLt)

128x

64x

32x

16x

Incomplete

Cholesky

(IC)

128x

64x

32x

16x

.25x

Modified IC

Figure 3: Performance proﬁles of the PCG method with various preconditioners applied to

all matrices in the test set. The graphs in this ﬁgure are plotted on a log-linear scale.

Relative Convergence Speed τ

128x

64x

32x

16x

0.5x

.25x

Proportion of Problems

Convergence Speed ≥ τ

25%

50%

75%

100%

SGS & SSOR

Truncated

Neumann

Symmetric

SPAI

AMG

Rube-Stuben

AMG

Smoothed

Aggregation

Laplacians

Cholesky

SuperLU

(

LLt)

ncomplete

Cholesky

(

)

Modified IC

Figure 4: Preconditioner performance proﬁle for PCG over all matrices in the test set,

selecting the best run for each matrix within a given preconditioner group. This ﬁgure is

plotted with a log-linear scale.

variant could not be generated or lead to a degredation in performance on only 14.10% of

tested problems, while RS was ineﬀective on 23.08% of the test set.

We found that the classical methods, Symmetric SPAI, and SuperLU preconditioners

were largely ineﬀective overall. SSOR and SGS combined lead to a speedup of 2 times or

greater on only 3.85% of the test set. The Symmertric SPAI preconditioner similarly lead

to a speedup of 2 times or greater on only 1.28% of the test set. These two preconditioner

groups lead to a degredation in performance compared to no preconditioner on 41.03% and

25.36% of the test set, respectively.

2.3 Impact of Ordering on Preconditioner Performance

In this subsection, we evaluate preconditioner performance by selecting the best run for

each matrix (across all parameter settings) within a given ordering and then compare the re-

sulting performance proﬁles. This approach simulates tuning the preconditioners parameters

for each matrix while isolating the eﬀect of reordering.

Similarly to the previous section on performance proﬁles, the section of the graph in

which preconditioner performance is worse than the control is lightly shaded. We show the

performance proﬁle for the diﬀerent orderings in Figure 5 on the following page.

We found that there was only marginal diﬀerence in performance based on ordering for

the classical preconditioners, which is expected as they are invariant to ordering. The slight

diﬀerence in performance in runs is likely due to the potential for numerical issues, which

can be exacerbated by the ordering on ill-conditioned matrices. Similarly, there was little

diﬀerence in performance based on ordering for the AMG-based methods and the graph-

Laplacian Cholesky preconditioner as they perform their own ordering internally. Again, the

diﬀerence in performance can likely be attributed to numerical issues.

We found that the RCM ordering was the most eﬀective for the Symmetric SPAI pre-

Proportion of Problems

Convergence Speed ≥ τ

25%

50%

75%

100%

Natural

AMD

RCM

SGS & SSOR

Natural

AMD

RCM

Truncated

Neumann

Natural

AMD

RCM

Symmetric

SPAI

Proportion of Problems

Convergence Speed ≥ τ

25%

50%

75%

Natural

AMD

RCM

AMG

Rube-Stuben

Natural

AMD

RCM

AMG

Smoothed

Aggregation

Natural

AMD

RCM

Laplacians

Cholesky

Relative Convergence Speed τ

128x

64x

32x

16x

Pro

on of Problems

ver

nce

peed ≥

25%

50%

75%

Natural

AMD

RCM

SuperLU

(LLt)

128x

64x

32x

16x

Natural

AMD

RCM

Incomplete

Cholesky

(IC)

128x

64x

32x

16x

.25x

Natural

AMD

RCM

Modified IC

Figure 5: Performance proﬁles of the PCG method with various preconditioners applied to

all matrices in the test set, grouped by ordering. The graphs in this ﬁgure are plotted on a

log-linear scale.

conditioner and that there was nearly no diﬀerence in performance for this preconditioner

betweem AMD and the Natural ordering.

For the IC and MIC preconditioners, the AMD

ordering was by far the most eﬀective, although the RCM ordering still provided some per-

formance uplift over the Natural ordering. The SuperLU preconditioner was largely invariant

The curves for the natural and RCM orderings are almost entirely overlapping for this preconditioner.

to the ordering, although the AMD and RCM orderings performed marginally better than

the Natural ordering.

3 Key Findings

Our preliminary experimental results indicate that the IC preconditioner—and its modi-

ﬁed variant—consistently delivers superior performance across a wide range of SPD matrices.

When optimally tuned, IC achieves speedups of upwards of 128 times on nearly one quarter

of the test set, while yielding at least a 4 times improvement over nearly three quarters of the

test set. When it was possible to form the preconditioner, there was rarely any degredation

in performance compared to no preconditioning, and it lead to complete failure on almost

no matrices. This shows that the IC and MIC preconditioners are robust and eﬀective for a

wide range of SPD matrices and should be considered as a ﬁrst choice if the reader’s problem

appears to be similar to the SPD matrices found in the SuiteSparse Matrix Collection.

Algebraid Multigrid methods, particularly the SA variant, also demonstrate signiﬁcant

beneﬁts on a large portion of the test set, especially on those matrices with strong diagonal

dominance. In contrast, classical preconditioners such as SGS and SSOR, and symmetric

SPAI, generally oﬀer minimal improvement over the unpreconditioner solver and may even

lead to a degredation in performance on a signiﬁcant fraction of the test set.

Our study further aﬃrms that the choice of ordering has a notable impact on precondi-

tioner eﬀectiveness for certain classes of preconditioners. In particular, applying the AMD

ordering to the IC and MIC preconditioners signiﬁcantly enhances their ability to improve

convergence compared to natural or RCM orderings. Conversely, the RCM ordering is most

eﬀective for the symmetric SPAI preconditioner and we found little diﬀerence in performance

between the AMD and natural orderings. Overall, these ﬁnding demonstrate that careful

consideration of both the preconditioning strategy and the associated ordering can substan-

tially improve the eﬃciency of the conjugate gradient method for solving SPD systems.

3.1 Ongoing Work

We are actively expanding our test suite by more than a factor of three by incorporating a

large number of graph-based problems that naturally yield graph-Laplacian matrices. Many

of these graphs are signiﬁcantly larger than our current examples, allowing us to study

preconditioner performance across a wide range of problem sizes—from small- and medium-

scale to very-large-scale. This expanded dataset will help us determine whether certain

preconditioners excel only on smaller problems or maintain their advantages as problem size

increases. We additionally seek to add a greater number of diagonally dominant matrices,

which will allow us to more thoroughly evaluate preconditioner performance in settings where

strong diagonal dominance plays a role.

In addition to expanding our test set, we are integrating additional reordering strategies

into our framework. Beyond our current use of AMD and RCM orderings, we plan to

incorporate nested dissection and METIS orderings. These additional ordering schemes will

allow us to further explore how structural reordering impacts preconditioner eﬃciency on

large-scale graphs and SPD systems, and to provide clearer guidance on best practices for

diﬀerent problem classes. Finally, we are also actively testing the preconditioned MINRES

method to compare its performance with that of PCG, thereby extending our analysis across

diﬀerent power subspace solvers.

3.2 Limitations

Our study has several limitations that we are actively addressing in ongoing work. First,

the current test suite does not yet include a suﬃciently diverse set of large-scale problems;

most matrices are what we would consider small- to medium-scale for modern scientiﬁc com-

puting. This limitation restricts our ability to fully assess how preconditioner performance

interacts with problem size, particularly in the context of very large graphs and systems.

Additionally, the dataset includes relatively few diagonally dominant matrices, which leads

to an unfair testing environment for preconditioners that are speciﬁcally designed for such

systems (e.g., graph-Laplacian preconditioners).

Another limitation arises from our currrent focus on ﬂoating point operations as a proxy

for computational cost. While this measure provides a machine-independent benchmark, it

does not capture all aspects of actual runtime performance, such as memory access patterns

or the impact of hardware-speciﬁc optimizations. Moreover, the preconditioner construction

time is not incorporated into our cost model, even though we found that this time can be

signiﬁcant in practice for some preconditioners. Finally, although our work current empha-

sizes the PCG method, further analysis of other power subspace methods (e.g., MINRES) is

needed to understand how preconditioners interact with diﬀerent solvers. We are address-

ing these limitations by expanding our dataset, integrating additional preconditioners, and

incorporating more comprehesnsive performance metrics in future iterations of this work.

Finally, in this study we focus on the relative performance of preconditioners across a

diverse set of matrices. In performing so many runs, we are limited in the number of right-

hand side vectors we can generate and test on. We recognize that the choice of the right-hand

side vector can cause variations in the reported performance of the preconditioners, or even

the convergence of the solver. This is a fundamental limitation to our study, but we have

found that using our method of generating the right-hand side vector provides similar results

to those provided by the SuiteSparse Matrix Collection. In Figure 6, we show a scatter plot

which shows the number of unpreconditioned CG iterations to converge with our right-hand

side vector compared to the SuiteSparse Matrix Collection right-hand side vector.

This shows that there is some variance in the number of iterations required to converge,

but that the variance is always within a reasonable range. Note that there are relatively few

matrices in the test set which come with provided right-hand side vectors, but this provides

evidence that our method of generating the right-hand side vector is reasonable.

4 Background and Related Work

We ﬁrst discuss the notation and terminology used in this paper. We utilize boldface

lowercase letters to denote vectors (e.g., x), boldface uppercase letters to denote matrices

(e.g., A), and lowercase letters to denote scalars (e.g., γ). The i

element of a vector x is

denoted by x

, and the (i, j)

element of a matrix A is denoted by a

i,j

. We use notation

like x

(k)

to denote the k

iterate of some vector x, whereas A

⊤

, A

−1

, and A

refers to the

transpose, the inverse, and the k

power of some matrix A, respectively.

We refer to some arbitrary preconditioner as M , where its application to a vector r is

denoted as z = M

−1

r. We make frequent reference to linear operators in this paper and

often denote some arbitrary linear operator as L. The application of a linear operator L is

equivalent to applying the inverse of M and is denoted as z = L(r) = M

−1

We use notation like ⌊γ⌉ to denote the nearest integer to some scalar γ. We use the

Matrix

2cubes_sphere

af_0_k101

af_1_k101

af_2_k101

af_3_k101

af_4_k101

af_5_k101

af_shell3

af_shell7

BenElechi1

bone010

boneS01

boneS10

bundle_

nasasrb

offshore

parabolic_fem

Pre

s_Poisson

smt

therm

thermal2

CG Iterations To Converge

Seeded

ovided

Figure 6: Comparison of the number of unpreconditioned CG iterations to converge with

our right-hand side vector compared to the SuiteSparse Matrix Collection right-hand side

vector. In this ﬁgure, “Seeded” refers to the right-hand side vector generated by our method,

and “Provided” refers to the right-hand side vector provided by the SuiteSparse Matrix

Collection. In the case where multiple right-hand side vectors were provided, we used the

ﬁrst one. This graph is plotted with semi-log scale and a categorical horizontal axis.

notation 0 ≺ A ⪯ 1 to denote that the matrix A is positive deﬁnite and has eigenvalues

bounded by 1 (Bhatia, 1996).

4.1 Krylov Subspace Methods

In this preliminary study, we focus on Krylov subspace methods for solving large, sparse

linear systems of equations. Speciﬁcally, we utilize the Preconditioned Conjugate Gradient

(PCG) method for SPD systems. We recognize that the performance of preconditioners may

be diﬀerent when applied with diﬀerent solvers; therefore, in future work, we will expand

our analysis to include Preconditioned Minimum Residual (PMR) method.

4.2 Preconditioned Conjugate Gradient

The PCG method is an eﬃcient iterative solver for SPD systems and is known for its

scalability and eﬃciency (Hestenes and Stiefel, 1952; Golub and Van Loan, 2013). The

convergence rate of the PCG method, like all Krylov methods, is heavily dependent on

the condition number of the matrix, and preconditioners are used to improve the condition

number of the system and accelerate the rate of convergence (Greenbaum, 1997; Golub and

Van Loan, 2013; Saad, 2003).

The PCG method can be viewed as a process that minmizes the quadratic function

ϕ(x) =

⊤

Ax − x

⊤

b. (1)

This function is strictly convex when A is SPD, ensuring that there is a unique global

minimum that corresponds exactly to the solution of Ax = b. The gradient of the function

given in Equation 1 is

∇ϕ(x) = Ax − b.

By ﬁnding some x

(∗)

such that ∇ϕ(x

(∗)

) = 0, we have found the solution to the original

system. Therefore, CG iteratively minimizes ϕ(x) by selecting a search direction p

(k)

that is

A-conjugate to the previous search directions, ensuring that the method converges in at most

n iterations in exact arithmetic (Golub and Van Loan, 2013). In ﬂoating point arithmetic, the

method may stall out before reaching the solution or lose numerical orthogonality between

the search directions, potentially leading to instability.

We implemented the Hestenes-Stiefel formulation of the PCG method (Hestenes and

Stiefel, 1952; Golub and Van Loan, 2013), which is described next.

4.2.1 Algorithm Description

The algorithm is outlined in Algorithm 1. In each iteration, the residual r

(k)

, the precon-

ditioned residual z

(k)

, and the search direction p

(k)

are updated. The method utilizes scalar

quantities α

(k)

and µ

(k)

to compute the optimal step sizes. The preconditioned variant of the

method enhances CG by incorporating a preconditioner M, which approximates A

−1

, and

improves the conditioning of the quadratic function ϕ(x). The algorithm seeks to minimize

ϕ(x) in the linearly transformed space deﬁned by the preconditioner.

Algorithm 1 Preconditioned Conjugate Gradients Method

Input: Sparse matrix A, right-hand side b, initial guess x

, preconditioner M with

linear operator L(r) = M

−1

r, and tolerance τ.

Output: Solution x to Ax = b.

(0)

← Ax

(0)

− b

(0)

← L(r

(0)

)

k ← 0

while ∥r

(k)

∥ > τ · ∥b∥ do

k ← k + 1

(k)

← (r

(k−1)

)

⊤

(k−1)

if k > 1 then

(k)

← z

(k−1)

(k)

(k−1)

· p

(k−1)

else

(k)

← z

(k−1)

µ ←

(k)

)

⊤

(k)

← x

(k−1)

+ µ · p

(k)

← r

(k−1)

− µ · Ap

(k)

← L(r

(k)

)

return x

(k)

Implementation Details This is a standard implementation of the preconditioned con-

jugate gradient method, which we implemented in Julia (Bezanson et al., 2017). We use

one matrix-vector product per iteration and one preconditioner application per iteration.

We abstract the application of the preconditioner M as the linear operator L(r) = M

−1

allowing us to wrap external preconditioners in a common interface that conforms to the

implementation. Our implementation checks for convergence based on both the relative

residual and the normwise relative backward error. We conservatively deﬁne the method as

stalling out if the norm of the update to x, that is µ · ∥p∥, is less than 10

−20

Computational Cost The dominant computational costs per iteration include are the

matrix-vector product Ap

(k)

and the preconditioner application z

(k)

= L(r

(k)

). The cost

of the matrix-vector product is proportional to the number of non-zero values in the ma-

trix, and the cost of the preconditioner application is dependent on the type of precondi-

tioner—ranging from the number of non-zero values in the preconditioner to the complexity

of the cycle/application in the case of algebraic multigrid. We provide more detail on the

computational cost of each preconditioner in section 6.

4.3 Preconditioned Minimum Residual Method

A future extension of this study will include the Preconditioned Minimum Residual

(PMR) method, which seeks to minimize the residual of the system rather than the quadratic

function over a Krylov subspace.

5 Systems of Linear Equations

In this section, we describe the selection of symmetric positive deﬁnite graph-Laplacian

matrices used in our study. Our goal is to evaluate the performance of various preconditioners

on large-scale SPD systems that are representative of real-world problems.

5.1 Matrix Ordering

The ordering of a sparse matrix is well known to be crucial for the performance of a large

number of preconditioners (Benzi, 2002; Saad, 2003). In the symmetric case, the rows and

columns of the matrix must be permuted in the same way to preserve the symmetry of the

matrix, so we only consider symmetric reorderings in this study. In the context of direct

solvers such as Gaussian elimination or those preconditioners that compute an incomplete

factorization, the ordering of the matrix can have a signiﬁcant impact on the ﬁll-in and the

quality of the factorization (Benzi, 2002; Davis, 2006a).

We denote the permutation matrix of a given ordering as P , where P AP

is the per-

muted matrix. In the natural ordering case, P is the identity matrix.

5.1.1 Natural

The natural ordering of a matrix is the order in which it is obtained from the application

that generated it. This serves as a baseline for comparison with other ordering strategies.

5.1.2 Reverse Cuthill-McKee

The Reverse Cuthill-McKee (RCM) algorithm was originally proposed for reducing the

bandwidth of a matrix (Cuthill and McKee, 1969). By minimizing the bandwidth of the

matric, RCM aims to cluster the non-zero elements closer to the diagonal and has been

shown in practice to reduce the ﬁll-in of the matrix during factorization (Davis, 2006b).

The RCM algorithm is a graph-based algorithm that traverses the graph of the matrix

in a breadth-ﬁrst search manner, starting from a vertex with minimal degree. Each node is

numbered in the order that they are visited and the numbering is reversed at the end of the

algorithm to give the ﬁnal ordering.

We utilize the SymRCM package in Julia to compute the RCM ordering of the matrices.

5.1.3 Symmetric Approximate Minimum Degree

The Symmetric Approximate Minimum Degree (SYMAMD) ordering is an approximate

version of the minimum degree algorithm that is designed to preserve the symmetry of the

matrix (Amestoy, Davis, and Duﬀ, 1996). SYMAMD is a greedy heuristic-based algorithm

that seeks to minimize the ﬁll-in of a cholesky factorization of a sparse matrix (Davis, 2006b).

We utilize the AMD package in Julia (Dominique et al., 2023).

5.2 Symmetric Positive Deﬁnite Matrices

We utilize matrices from the SuiteSparse Matrix Collection (Davis and Hu, 2011), a

widely recognized repository of sparse matrix benchmarks that are representative of real-

world problems. To ensure our study focuses on large-scale and practically relevant problems,

we ﬁlter matrices based on the following criteria:

• Matrix Dimension: Only matrices of dimension n ≥ 10,000 are considered. This

ensures that the matrices are suﬃciently large to reﬂect computational challenges en-

countered in real-world applications.

• Symmetric Positive Deﬁnite: We consider only those matrices that are real-valued

and symmetric positive deﬁnite in this study.

• Convergence without Preconditioning: The unpreconditioned Conjugate Gradi-

ent method should converge within 10 · n iterations, where n is the matrix dimension.

This allows us to compare the relative performance of preconditioners in aggregate

against the unpreconditioned case.

By limiting our study to a test set of matrices that ﬁt these criteria, we ensure that

we are evaluating preconditioners on those matrices that present a computational challenge

that is typical in large-scale scientiﬁc and engineering problems. Additionally, requiring that

the unpreconditioned CG method converges within a reasonable number iterations enables

us to compare the relative performance of the preconditioners both in aggregate and on a

per-matrix basis.

5.2.1 Matrix Characteristics

The selected matrices come from a diverse set of applications, including structural anal-

ysis, ﬂuid dynamics, and optimization problems. These matrices comprise a wide range of

sizes and structures, reﬂecting the diversity of problems encountered in practice. This diver-

sity allows us to evaluate the robustness and eﬀectiveness of preconditioners across a broad

spectrum of real-world problems.

The matrices used in this study are categorized in Table 2. Below, we provide relevant

sources where possible for more information on a selection of the matrices:

• (Rudnyi and Van Rietbergen, 2005) performs the original model reduction of the prob-

lems given in (Rudnyi, Korvink, and van Rietbergen, 2006a,b,c).

• (Dabrowski, Krotkiewski, and Schmid, 2008) provides details on (Schmid, 2006a,b).

• (Janna and Ferronato, 2011a; Ferronato, Janna, and Pini, 2011) discuss (Janna and

Ferronato, 2011i).

Matrix Category Citations Count

Circuit Simulation (Okuyucu, 2006a,b; McRae, 2008) 3

Combinatorial (Trefethen, 2008a,b) 2

Computational Fluid Dynamics (Shakib, 2004; Janna and Ferronato, 2011i;

Um, 2010; Wissgott, 2007; Leppkes and Nau-

mann, 2009a,b)

Computer Vision (Lourakis, 2006; Mazaheri, 2015) 2

Electromagnetics (Um, 2008; Isaak, 2008) 2

Materials (Grimes, 1995a,b) 2

Model Reduction (Rudnyi, Korvink, and van Rietbergen,

2006a,b,c; Billger, 2004; Rudnyi, 2004)

Optimization (Gould, 1995; Berger et al., 1997; Mor´e and

Toraldo, 1991; Dolan and Mor´e, 2000; Dembo

and Tulowitzki, 1983)

Miscellaneous PDE (Elechi, 2007; Janna and Ferronato, 2014a;

Dubcova, Cerveny, and Solin, 2007a,b,c; Janna

and Ferronato, 2011c, 2014b,c; consph, 2008;

Cunningham, 2002; Wathen, 2004a,b)

Structural (Janna and Ferronato, 2011d,e,f,g,h; Schenk,

2006a,b,c,d,e,f, 2003a,b; Mayer, 2004a; Will,

1985a,b; NASA, 2003a,b,c; Kouhia, 2008a;

Mayer, 2004b,c; Weiher, Jochen Weiher;

Mayer, 2004d; Weiher, 2004a, 2005; NASA,

1995; Weiher, 2004b; Damhaug, 1999a,b,c,d,e;

Kouhia, 2008b)

Thermal (Bindel, 2006; Schmid, 2006a,b; Botonakis,

2009a,b,c)

Table 2: Matrix categories

• (Ferronato et al., 2007; Ferronato, Janna, and Gambolati, 2008; Janna, Ferronato, and

Gambolati, 2009, 2010) discuss (Janna and Ferronato, 2011d).

• (Janna, Ferronato, and Gambolati, 2013, 2015) provide information on (Janna and

Ferronato, 2014a), while (Janna, Ferronato, and Gambolati, 2015) also covers (Janna

and Ferronato, 2014c,b).

• (Ferronato et al., 2010) details (Janna and Ferronato, 2011c,h), with additional dis-

cussion on (Janna and Ferronato, 2011c) in (Janna and Ferronato, 2011b).

• (Janna, Ferronato, and Gambolati, 2010) discusses (Janna and Ferronato, 2011e,f), and

(Janna and Ferronato, 2011e) is further discussed in (Janna, Comerlati, and Gambolati,

2009).

• (Berger et al., 1995) provides information on the ﬁnancial optimization problem (Berger

et al., 1997).

• (Duﬀ, Grimes, and Lewis, 1989) covers items in the Harwell-Boeing collection, namely

(Will, 1985a,b).

Matrix Scaling Before running any benchmark on an SPD matrix, we perform symmet-

ric Jacobi scaling on the matrices to improve the conditioning of the system(Golub and

Van Loan, 2013; Saad, 2003). Explicitly,

′

← D

−

P AP

⊤

−

where D is the diagonal matrix with the diagonal elements of P AP

⊤

. Because we deal

with SPD matrices, the diagonal elements of A are positive, ensuring that D

−

is well-

deﬁned. Due to the potential for ﬂoating point inaccuracies in the scaling, we explicitly set

the diagonal of the scaled matrix to the identity and symmetrize the matrix. That is, we set

′′

←

′

+ (A

′

)

⊤

and then explicitly set the diagonal of A

′′

to the identity,

′′

i,i

← 1 ∀i ∈ {1, . . . , n}.

Because this is a positive scaling of the rows and columns of A, the positive-deﬁniteness

of the matrix is preserved. Additionally, we scale the right-hand side vector b by D

−

ensure that the scaling is consistent across the system. We discuss our choice of right-hand

side vectors in subsection 5.4. Next, we describe the graph-Laplacian matrices.

5.3 Graph-Laplacian Matrices

This section has been removed while we ﬁnish the graph portion of the study. In a future

iteration of this study, we will include graph-Laplacian matrices of undirected graphs.

5.4 Right-Hand Side Vector

We generate a random solution vector x

(∗)

∈ R

using the StableRNGs package in Ju-

lia, which reproduces a random number generator (RNG) suitable for scientiﬁc computing.

Speciﬁcally, we use the seed 123456789 to initialize the random number generator, allowing

us to reproduce the same sequence of random number across diﬀerent runs and systems.

The StableRNGs package implements the Lehmer pseudo-random number generator, a

prime modulus multiplicative linear congruential generator (Park and Miller, 1988; Payne,

Rabung, and Bogyo, 1969). This generator is known for its simplicity, eﬃciency, and reason-

ably good statistical properties, making it suitable for scientiﬁc computing applications (Park

and Miller, 1988).

Out of an abundance of caution for the statistical properties of the generated random

numbers, we “warm up” the RNG by generating and discarding ⌊log

(n)⌉ random vectors of

length n, where n is the dimension of the matrix A. This practice reduces the likelihood of

any biases or correlations in the sequence that may be present in the initial random numbers

generated by the RNG (Gentle, 2003).

Using the last generated random vector as the solution vector x

(∗)

, we construct the

right-hand side vector b in the following manner:

We locate the largest in magnitude ⌊log

(n) + 1⌉ elements of x

(∗)

. For each of these

elements, we set its value to either 1 or −1, chosen at random with equal probability. We

set the remaining elements of x

(∗)

to zero, resulting in a sparse vector.

Finally, we compute b = Ax

(∗)

, which ensures that the right-hand side vector is consistent

with the solution vector and the matrix A. Performing this procedure allows us to evaluate

the performance of preconditioners in terms of its error ∥x

(k)

− x

(∗)

∥ during iterations of the

solver.

5.4.1 Scaling and Reordering of the Right-Hand Side and Solution Vectors

The eﬀect of symmetric Jacobi scaling is independent from the ordering applied to the

matrix. Similarly, it is known that CG and MINRES are unaﬀected by matrix ordering in

exact arithmetic, so the control case should remain unchanged. However, numerical errors

may introduce slight performance diﬀerences. For this reason, we run the control for each

ordering to establish the baseline of comparison against each preconditioner.

After generating the right-hand side vector in the manner described above, we reorder the

matrix, right-hand side, and solution vectors if a reordering is to be applied to the matrix.

Afterward, we scale the right-hand side vector b and the solution vector x

(∗)

to ensure that

the scaling is consistent across the system. For the scaled matrix A

′

= D

−

P AP

⊤

−

scale the right-hand side vector b

′

= D

−

P b and the solution vector x

(∗)

′

= D

P x

(∗)

6 Preconditioners

In this section, we describe the preconditioners that are evaluated in this preliminary

study. In the following subsections, we provide an overview of each preconditioner category,

discussing the software packages used and the expected computational costs associated with

their use.

6.1 Classical Methods

Classical iterative methods serve as the foundation of numerical linear algebra. Although

they are not as eﬃcient as more modern solvers, they are simple and still ﬁnd use in prac-

tice (Saad, 2003; Golub and Van Loan, 2013).

In the following, we describe the classical iterative methods, highlighting their connection

to matrix splitting techniques where relevant and their role as preconditioners.

6.1.1 Truncated Neumann Series

The Truncated Neumann Series (TNS) preconditioner is based on approximating the

inverse of a matrix using a ﬁnite number of terms in the Neumann series expansion (Dubois,

Greenbaum, and Rodrigue, 1979). Given a matrix A expressed in the form:

A = D − R

= D



I − D

−1



where D is a nonsingular diagonal matrix and R is the remainder matrix. Then we may

express its inverse

−1

= D

−1



I − D

−1



−1

thus giving us the Neumann series expansion of A

−1

= D

−1

∞

k=0



−D

−1



By truncating this series after a ﬁnite number of terms m, we obtain an approximate inverse:

−1

= D

−1

k=0



−D

−1



Because we scale our matrix to have unit diagonal, for A ≺ 1 the preconditioner simpliﬁes

−1

k=0

(−R)

However, for A ≻ 1, a scaling term must be included in order to guarantee the convergence

of the Neumann series:

−1

= α

k=0

(I − αR)

where α is chosen such that αA ≺ 1. Because of its simplicity in implementation, we do not

give a detailed algorithm beyond the summation notation given above.

Theoretical Considerations Consider the linear system Ax = b where A has unit

diagonal and R is the remainder matrix. Without loss of generalization, assume 0 ≺ A ≺ 1,

then the TNS for this system is given by

−1

k=0

(−R)

Suppose we let m = 1, we have

−1

= I − R.

Because R is the remainder matrix, we have

−1

= A.

The preconditioned Krylov subspace is given by

(k+1)

= span{z

(0)

, Az

(0)

, A

(0)

, . . . , A

(0)

where

(0)

= M

−1

(0)

Thus the TNS preconditioner with m = 1 gives

(0)

= Ar

(0)

and the Krylov subspace is

(k+1)

= span{Ar

(0)

, A

(0)

, A

(0)

, . . . , A

k+1

(0)

In this scenario, the preconditioned residual z

(0)

is A times the original residual r

(0)

The Krylov subspace is eﬀectively built from higher powers of A applied to r

(0)

, starting

from Ar

(0)

instead of r

(0)

. As a function of the work performed, the TNS preconditioner

builds the Krylov subspace at approximately the same cost as the unpreconditioned system.

Therefore for matrices that are scaled to have unit diagonal, there should be no signiﬁcant

improvement in the convergence of the system with the TNS preconditioner as a function of

the work performed.

Computational Cost The computational cost of the TNS preconditioner is equal to the

number of non-zero values in the matrix A multiplied by the number of terms in the Neumann

series expansion.

Preconditioner Settings Used The only setting to tune for the TNS preconditioner are

the number of iterations to apply and the scaling factor. We run this preconditioner for all

combinations of iterations in {1, 2, 3, 4} and

∈ {∥A∥

, ∥A∥

∞

, ∥A∥

∥A∥

, 1}.

6.1.2 Symmetric Gauss-Seidel and Successive Over-Relaxation

Gauss-Seidel (GS) and Successive Over-Relaxation (SOR) are classical iterative methods

that update the solution vector component-wise (Golub and Van Loan, 2013; Saad, 2003).

These methods improve over the Jacobi method by using the most recent values available to

update each component of the solution vector. The SOR method accelerates the convergence

of GS by introducing a relaxation parameter ω that over-relaxes the update in the direction of

the residual, which can improve the convergence rate of the solver if ω is chosen appropriately.

Consequently, the GS method can be viewed as a special case of SOR with ω = 1.

Given a linear system Ax = b, where A ∈ R

n×n

is a nonsingular matrix, x ∈ R

is the

solution vector, and b ∈ R

is the right-hand side vector, the SOR method partitions A into

its diagonal, strictly lower triangular, and strictly upper triangular components:

A = D + L + U,

where D are the diagonal elements of A, L contains the entries strictly below the diagonal,

and U contains the entries strictly above the diagonal (Golub and Van Loan, 2013).

The iterative update formula for SOR is then given by:

(k+1)

= (D + ωL)

−1



ωb − (ωU + (1 − ω)D)x

(k)



, (2)

where ω is the relaxation parameter, and x

(k)

is the approximation of the solution at iteration

k (Golub and Van Loan, 2013). Because GS is SOR with ω = 1, we do not provide a separate

update formula here. These methods are not suitable for use in solvers that require symmetric

matrices, such as PCG and PMR, as the preconditioner is not symmetric.

Symmetric GS and SOR Symmetric Successive Over-Relaxation (SSOR) and Symmet-

ric Gauss-Seidel (SGS) are variants of the SOR and GS methods that update the solution

vector in a symmetric fashion, alternating between forward and backward sweeps (Golub

and Van Loan, 2013; Saad, 2003; Young, 1954, 1950). Performing the update symmetrically

smooths the error in the solution vector and can improve the convergence rate of the solver.

Just as before, the SGS method can be viewed as a special case of SSOR with ω = 1.

Given the same linear system Ax = b and splitting deﬁned for the SOR method, the

SSOR method performs a forward sweep (as in Equation 2) followed by a backward sweep

to update the solution vector. Formally, the forward sweep is given by:

(k+

)

= (D + ωL)

−1



ωb − (ωU + (1 − ω)D)x

(k)



and the backward sweep is given by

(k+1)

= (D + ωU )

−1

ωb − (ωL + (1 − ω)D)x

(k+

)

Algorithm Description These methods update each component of the solution vector

sequentially, using the most recent values available. Algorithm 2 provides the procedural

steps for implementing SSOR.

Algorithm 2 Symmetric Successive Over-Relaxation (SSOR) Method

1: Input: Matrix A, right-hand side b, initial guess x

(0)

, relaxation parameter ω, and

tolerance τ.

2: Output: Approximate solution x to Ax = b.

3: k ← 0

4: while ∥x

(k+1)

− x

(k)

∥ > τ do

5: for i = 1 to n do ▷ Forward Sweep

6: x

(k+

)

← (1 − ω)x

(k)

i,i



−

i−1

j=1

i,j

(k+

)

−

j=i

i,j

(k)



7: for i = n down to 1 do ▷ Backward Sweep

8: x

(k+1)

← (1 − ω)x

(k+

)

i,i



−

j=i+1

i,j

(k+1)

−

j=1

i,j

(k+

)



9: k ← k + 1

10: return x

(k)

Because each implementation is similar to one another, we do not provide separate al-

gorithms for GS, SOR, and SGS. The non-symmetric variants can be obtained by simply

applying only the forward or backward sweep of the SSOR method. The GS and SGS meth-

ods may be implemented with a constant factor less work than the SOR and SSOR methods,

or simply by setting ω = 1.

Computational Cost The computational cost per sweep of GS, SGS, SOR and SSOR

is equal to the number of non-zero values in the matrix A. There are an additional 2n

operations per sweep to compute the proportional update to the solution vector x for SOR

and SSOR. The cost of the symmetric variants are exactly twice the cost of the non-symmetric

variants, as they perform both a forward and backward sweep. These costs are given for a

single iteration of the method, and the total cost of the preconditioner is proportional to the

number of iterations applied.

Preconditioner Settings Used We run these preconditioners for all combinations of

iterations in {1, 2} and relaxation parameters in {1.0, 1.2, 1.5, 1.8}. When running with a

relaxation parameter of 1.0, we use a standard SGS implementation that does not perform

the additional computations to compute the relaxation.

Additionally, it was shown in (Young, 1954) that, for certain matrices, there is a known

optimal value of ω. Those matrices that possess Young’s “Property-A” (Young, 1954) have

an optimal ω parameter described by

ω = 1 +

∥J∥

1 +

(1 − ∥J∥

)

where ∥J ∥ is the spectral norm of the Jacobi iteration matrix of A. This is well-deﬁned

for matrices with ∥J∥ ≤ 1, but may or may not correspond with the optimal value of ω for

matrices not possessing Property-A. We compute this value for all matrices in which it is

well deﬁned and run for 1 and 2 iterations.

6.1.3 Steepest Descent

The Steepest Descent (SD) method is an iterative algorithm for solving linear systems

that minimizes the residual along the direction of the gradient of the quadratic form

ϕ(x) =

⊤

Ax − b

⊤

The method is guaranteed to converge for symmetric positive deﬁnite matrices, but its

convergence rate can be slow for ill-conditioned systems (Golub and Van Loan, 2013; Saad,

2003).

Given a symmetric positive deﬁnite (SPD) matrix A ∈ R

n×n

, a right-hand side vector

b ∈ R

, and an initial guess x

(0)

, the goal is to solve the linear system Ax = b.

Algorithm Description At each iteration, SD updates the solution estimate x

(k)

by mov-

ing in the direction of the residual with optimal step size chosen to minimize the quadratic

function ϕ(x) in that direction.

This iterative process is described in Algorithm 3.

Algorithm 3 Steepest Descent Method

1: Input: SPD matrix A, right-hand side b, initial guess x

(0)

, and tolerance τ.

2: Output: Approximate solution x to Ax = b.

3: r

(0)

← b − Ax

(0)

4: k ← 0

5: while ∥r

(k)

∥ > τ · ∥b∥ do

6: k ← k + 1

7: α

(k)

←

(k−1)

)

⊤

(k−1)

)

⊤

(k−1)

8: x

(k)

← x

(k−1)

+ α

(k)

(k−1)

9: r

(k)

← r

(k−1)

− α

(k)

(k−1)

10: return x

(k)

Computational Cost Each iteration requires one matrix-vector product, two inner prod-

ucts, and two vector updates that are multiplied by the step size α. On the last update of

SD, we do not need to update the residual. Therefore the total computational cost for m

iterations is equal to:

(4 · n + nnz(A)) · m − n,

where nnz(A) is the number of non-zero values in the matrix A.

Preconditioner Settings Used The only setting to tune for this preconditioner is the

number of iterations to run. We run this preconditioner for 1 and 2 iterations.

6.2 Sparse Approximate Inverse

Sparse Approximate Inverse (SAI) preconditioners are based on constructing an explicit

approximation of the inverse of the matrix in a sparse format, enabling eﬃcient application

during iterative methods (Benzi and Tuma, 1998; Grote and Huckle, 1997; Regev and Saun-

ders, 2020). Unlike classical methods or those based on Gaussian elimination (discussed in

subsection 6.3), SAI preconditioners provide an explicit matrix that approximates the inverse

and can be applied directly to the residual in each iteration (Benzi and Tuma, 1998; Grote

and Huckle, 1997). In this sense, while there is a concept of M

−1

for the application of the

preconditioner, there is less of a concept of M = (M

−1

)

−1

itself

The core idea behind SAI is to approximate the inverse of the matrix along some sparsity

pattern such that the resulting matrix is signiﬁcantly more sparse than the explicit inverse of

A. Preserving sparsity in this manner while approximating the inverse where M

−1

≈ A

−1

is attractive for large-scale sparse systems where the cost of applying the preconditioner is

a signiﬁcant factor in the overall computational cost (Grote and Huckle, 1997).

6.2.1 Symmetric Sparse Approximate Inverse

The Symmetric Sparse Approximate Inverse (SSAI) preconditioner (Regev and Saunders,

2020; Regev, 2022) builds on earlier work on SAI by introducing a symmetrizing step to the

algorithm. The SSAI algorithm seeks to minimize the diﬀerence between the identity and

the product of the preconditioner and the matrix. Formally, this is achieved by solving the

optimization problem

arg min

−1

∥I − M

−1

A∥

where ∥ · ∥

denotes the Frobenius norm.

After ﬁnding some such M

−1

, the SSAI preconditioner is then constructed by symmetriz-

ing the matrix:

−1

)

′



−1

+ M

−⊤



This implementation can be found at https://stanford.edu/group/SOL/software/

minres/minres20.zip. This preconditioner has a ﬁll-in multiple parameter that can be

adjusted to control the sparsity of the resulting matrix.

Computational Cost The computational cost of the SSAI preconditioner is equal to the

number of non-zero values in the matrix M

−1

Preconditioner Settings Used The authors suggest using the average number of ele-

ments in a column as the level of ﬁll allowed in the preconditioner. We run the preconditioner

with ﬁll levels set to the average number of elements per column multipled by the values in

{0.5, 1.0, 2.0, 3.0}.

6.3 Incomplete Factorizations

Incomplete factorizations are a class of preconditioners that approximate the factoriza-

tion of a matrix A in a sparse format, enabling eﬃcient application during iterative meth-

ods (Benzi, 2002; Saad, 2003). The sparsity of the incomplete factors are preserved by

dropping some entries from the factorization, which can lead to a signiﬁcant reduction in

the computational cost of applying the preconditioner. These are well-known to be eﬀective

preconditioners (Benzi, 2002), are widely used in practice, and are available in many major

software packages.

Next we desribe the Incomplete LU Factorization (ILU) and Incomplete Cholesky Fac-

torization (IC) preconditioners that we consider in our study.

6.3.1 Incomplete Cholesky and Modiﬁed Incomplete Cholesky

The IC factorization approximates the exact Cholesky factorization of an SPD matrix

A ∈ R

n×n

by computing a sparse lower triangular matrix F such that:

A ≈ F F

⊤

The IC factorization retains only selected non-zero entries of the Cholesky factor along

some sparsity pattern. This sparsity pattern may be deﬁned by the structure of the matrix

itself, by some power of the matrix, or by some other criteria. The primary goal with any

of these criteria is to produce a factorization that is computationally less espensive to apply

while capturing the essential features of the original matrix (Meijerink and van der Vorst,

1977; Saad, 2003).

The IC factorization supplied by MATLAB allows for either an ILU(0) variant or one

that is based solely on thresholding. In the case of the ILU(0) variant, the factorization is

computed without ﬁll-in along only the sparsity pattern of A. In the case of the threshold

variant, any amount of ﬁll-in is allowed provided that the entries are greater in magnitude

than the threshold.

The IC preconditioner is applied by solving triangular systems involving L and its trans-

pose. Given a residual vector r, the preconditioned residual is computed as:

z = M

−1

r = F

−⊤

−1

The Modiﬁed Incomplete Cholesky (MIC) factorization modiﬁesathe standard IC by

adjusting the diagonal entries of F to compensate for the dropped ﬁll-in elements (Robert,

1982). This diﬀerence aims to improve numerical stability and convergence properties of the

preconditioner (Gustafsson, 1978).

The MIC factorization computes F such that:

A ≈ F F

⊤

and

ε ≥ ∥Ae − F F

⊤

e∥,

where e is the vector of all ones and ε is machine precision. The diagonal modiﬁcation of

the MIC factorization seeks to reincorporate the magnitude of the dropped ﬁll-in elements

into the diagonal of F .

The MIC preconditioner is applied in the same manner as the IC preconditioner.

Preconditioner Settings Used We run this preconditioner with the ILU(0) variant as

well as thresholding set to all values in {10

−4

, 10

−5

, 10

−6

, 10

−7

, 10

−8

6.3.2 Supernodal Incomplete LU Factorization

ILU approximates the exact LU factorization by retaining only selected non-zero entries

in the lower (L) and upper (U ) triangular matrices:

A ≈ LU.

Note that for symmetric matrices and given a symmetric ordering, the ILU factorization

should be equivalent to the IC factorization.

We utilize the SuperLU library (Li, Saad, and Chow, 2003; Li and Shao, 2010; Demmel

et al., 1999; Li et al., 1999; Li, 2005), which provides eﬃcient implementations of sparse

LU factorization with partial pivoting. SuperLU allows for control over ﬁll-in levels and

drop tolerances, which can be adjusted to control the sparsity of the resulting factors. In

addition to complete LU factorizations, SuperLU provides incomplete factorizations with

varying levels of ﬁll-in—we utilize their ILU factorization in this study. SuperLU uses a

supernodal approach to factorize the matrix, which is a modiﬁcation of the standard LU

factorization that groups multiple columns together to improve cache eﬃciency and reduce

the number of operations required to factorize the matrix.

The SuperLU package returns a preconditioner that is not necessarily symmetric, even

with symmetric orderings applied. To utilize this preconditioner, we instead utilize the lower

triangular portion of the factorization. We create a symmetric preconditioner by scaling L

by the square root of the diagonal entries of U :

′

= LD

where D is the diagonal matrix of L. Now we have

M = L

′

)

⊤

where M is a symmetric preconditioner. Note that the lower triangular portion of the pre-

conditioner returned by SuperLU is scaled to have unit diagonal entries. Thus, we are adding

half of the contribution of the scaling from U, such that when applying the preconditioner,

we are applying the full scaling.

Preconditioner Settings Used We run this preconditioner with all combinations of a

drop tolerance in {10

−4

, 10

−5

, 10

−6

} and ﬁll-in levels in {1, 2, 3}.

Algorithm 4 Sparse Triangular Solve in CSR Format

Require: Lower triangular matrix L, upper triangular matrix U , right-hand side vector b

Ensure: Solution vector x satisfying LUx = b

1: x ← b

2: n ← length of x

3: for i = 1 to n do ▷ Forward Solve

4: for j = colptr

[i] to colptr

[i + 1] − 2 do

5: x

← x

− nzval

[j] · x

rowval

[j]

6: x

←

· nzval

[colptr

[i + 1] − 1]

7: for i = n down to 1 do ▷ Backward Solve

8: for j = colptr

[i] + 1 to colptr

[i + 1] − 1 do

9: x

← x

− nzval

[j] · x

rowval

[j]

10: x

←

· nzval

[colptr

[i]]

11: return x

6.3.3 Algorithm: Applying Incomplete Factorizations

Algorithm 4 outlines the procedure for solving sparse triangular systems using the com-

pressed sparse row (CSR) format.

In Algorithm 4, the CSR data structures for L and U are:

• colptr

, rowval

, nzval

for L

• colptr

, rowval

, nzval

for U.

The forward solve updates the solution vector x by performing a forward substitution

with the lower triangular matrix L. Afterward, the backward solve updates x by performing

a backward substitution with the upper triangular matrix U . This implementation eﬃciently

applies the inverse of the incomplete factorizations to the residual vector. For the incomplete

cholesky, the algorithm must be modiﬁed only slightly to account for the fact that the second

factor is the transpose of the ﬁrst. This can be accomplished by performing a CSC variant of

the backward solve portion of the algorithm on the CSR factor that was used in the forward

solve.

Computational Cost The computational cost of applying the incomplete factorization is

equal to the number of non-zero values in each factor that is inverted. In the IC and MIC

cases, this is two times the number of non-zero values in the lower triangular factor. In the

ILU case, this is equal to the number of non-zero values in the lower and upper triangular

factors.

6.4 Algebraic Multigrid

Algebraic Multigrid (AMG) methods are an advanced class of iterative methods that

are designed to solve large and sparse linear systems eﬃciently. These methods work by

constructing a hierarchy of coarser and coarser grids that approximate the original problem,

allowing for the solution to be computed at diﬀerent levels of resolution (Brandt, 1986).

Unlike geometric multigrid methods, which we do not consider in this study, AMG constructs

the hierarchy based on the matrix itself, making it more general and applicable to a wider

range of problems (Brandt, 1986; Saad, 2003; St¨uben, 2001; Brandt and Livne, 2011).

AMG is particularly eﬀective for elliptic PDEs, where the matrix is symmetric and pos-

itive deﬁnite, and the problem is discretized on a structured or unstructured grid. For

these types of problms, they are known to achieve optimal or nearly-optimal convergence

rates (Brandt, 1986; Brandt and Livne, 2011).

6.4.1 Ruge-Stuben and Smoothed Aggregation

In this study, we focus on the Ruge-Stuben (Ruge and St¨uben, 1987) AMG method and

the Smoothed Aggregation (Vanˇek, Brezina, and Mandel, 2001) AMG method, which are

two popular AMG methods that are widely used in practice (Bell et al., 2023). We utilize

the PyAMG library (Bell et al., 2023) to construct the AMG preconditioners.

Computational Cost The PyAmg library provides a function to compute the complexity

of a cycle of the AMG method used. We use this as an approximation to the computational

cost of the AMG preconditioner.

Preconditioner Settings Used The Ruge-Stuben method is provided with V, W, F, and

AMLI cycles. The Smoothed Aggregation method is provided with V, W, and F cycles. We

test both methods and all cycles in our study, running for one and two cycles. We otherwise

use the default settings for the AMG preconditioner.

6.5 Graph-Laplacian Preconditioners

This subsection explores preconditioners designed speciﬁcally for graph-Laplacian matri-

ces. Although we understand that these preconditioners are not designed for general SPD

matrices, we still run them on general SPD matrices and report their results.

For general SPD matrices that are not SDD, we do not expect these preconditioners to

perform well. For matrices that are SDD when unscaled but not SDD when scaled, we run

the preconditioner on the unscaled matrix. For matrices that are SDD when scaled, we

run the preconditioner on the scaled matrix. For the remaining case in which the matrices

are not SDD when scaled or unscaled, we create the preconditioner on a matrix that was

modiﬁed to be SDD by increasing the diagonal entries of the matrix to be at least equal to

the row sum of the absolute values of the oﬀ-diagonal entries.

For an SDD matrix of dimension n, we can construct a graph-Laplacian matrix of dimen-

sion 2n through an augmentation described in (Kelner et al., 2013), which is an improvement

over Gremban’s reduction that would produce a matrix of dimension 2n+1 (Gremban, 1996).

For an SDD matrix A, this reduction is given by the following equation:

L =



D + N +

S −



P +



−



P +



D + N +



where P are the positive oﬀ-diagonal entries of A, N are the negative oﬀ-diagonal entries

of A, and where

D = diag (P e − N e) ,

and

S = s

i,i

= a

i,i

− d

i,i

The diagonal matrix S represents the excess diagonal entries of A over the row sum of

the absolute values of the oﬀ-diagonal entries. We then distribute half of this slack to the

oﬀ-diagonal entries of the matrix and half to the diagonal entries.

Given a RHS vector b to the original system, the augmented RHS vector is given by:

′



−b



We can then solve the augmented system Lx

′

= b

′

to obtain the solution to the original

system. The solution to the original system is then recovered by

2x = (x

′

)

1:n

− (x

′

)

n+1:2n

We utilize this reduction when applying the graph-Laplacian preconditioners to the gen-

eral SPD matrices. When solving for the solution to graphs, we use the original graph-

Laplacian matrix. Next, we brieﬂy introduce the graph-Laplacian preconditioners.

6.5.1 Combinatorial Multigrid

We utilize the CombinatorialMultigrid package in Julia to construct the precondi-

tioner (Koutis, Miller, and Tolliver, 2011).

Computational Cost To obtain the work expended by the preconditioner, we used a

modiﬁed version of the preconditioner that output the number of multiplications used in

each step of the cycle instead of applying the preconditioner to the matrix. A CSV ﬁle

containing the number of multiplications for a graph or matrix in the test suite will be found

on the github page associated with this project when it is released.

Preconditioner Settings Used Because there are no options to tune for this precondi-

tioner, we simply run the preconditioner with function returned by the package.

6.5.2 Laplacians: Approximate Cholesky

We utilize the Laplacians package in Julia.

We use the approxchol lap function

to construct the preconditioner, which is a randomized variant that constructs an LDL

⊤

decomposition of a subtree of the graph (Gao, Kyng, and Spielman, 2023).

As of the writing of this study, the Laplacians package does not expose the precondi-

tioner for use in outside solvers. We worked around this by extracting the relevant portions

of the package and creating a function that applies the preconditioner to a matrix.

Computational Cost We compute the work expended by the preconditioner as the num-

ber of multiplications used in inverting the LDL

⊤

factorization. Note that for matrices that

have multiple connected components in the augmented graph, multiple LDL

⊤

factorizations

are computed, one for each connected component, and the work expended is the sum of the

work expended for each connected component.

Preconditioner Settings Used There are two parameters that can be tuned: split and

merge, which controls the number of multi-edges used. We run these parameters for all

combinations of split and merge values in {0, 1, 2, 3, 4, 5}.

We use the release with SHA hash 077101c78e35095b8fc33b4625f515a2962c4cf6

We use the version with SHA hash d89118f61fe4346d2401e7595c197ef3ef73d32c

We thank the authors of the Laplacians package for their assistance in this matter.

References

P. R. Amestoy, T. A. Davis, and I. S. Duff. An approximate minimum degree ordering

algorithm. SIAM Journal on Matrix Analysis and Applications, 17 (4), p. 886–905, 1996.

doi:10.1137/s0895479894278952.

N. Bell, L. N. Olson, J. Schroder, and B. Southworth. PyAMG: Algebraic multigrid

solvers in python. Journal of Open Source Software, 8 (87), p. 5495, 2023. doi:10.21105/

joss.05495.

M. Benzi. Preconditioning techniques for large linear systems: A survey. Journal of Com-

putational Physics, 182 (2), p. 418–477, 2002. doi:10.1006/jcph.2002.7176.

M. Benzi, J. C. Haws, and M. Tuma. Preconditioning highly indeﬁnite and nonsymmetric

matrices. SIAM Journal on Scientiﬁc Computing, 22 (4), p. 1333–1353, 2000. doi:

10.1137/s1064827599361308.

M. Benzi and M. Tuma. A sparse approximate inverse preconditioner for nonsymmetric

linear systems. SIAM Journal on Scientiﬁc Computing, 19 (3), p. 968–994, 1998. doi:

10.1137/s1064827595294691.

M. Benzi and M. T

uma. A comparative study of sparse approximate inverse precondi-

tioners. Applied Numerical Mathematics, 30 (2–3), p. 305–340, 1999. doi:10.1016/

s0168-9274(98)00118-4.

M. Benzi and M. T

uma. Numerical experiments with two approximate inverse precondition-

ers. BIT Numerical Mathematics, 38 (2), p. 234–241, 1998. doi:10.1007/bf02512364.

A. Berger, J. Mulvey, E. Rothberg, and R. Vanderbei. Solving multistage stochas-

tic programs using tree dissection. Statistics and Operations Research Research Report,

Princeton University, p. 55, 1995.

———. ﬁnan512. https://sparse.tamu.edu/Mulvey/finan512, 1997. Economic problem,

portfolio optimization. Princeton University. Retrieved from SuiteSparse (Mulvey group).

J. Bezanson, A. Edelman, S. Karpinski, and V. B. Shah. Julia: A fresh approach to

numerical computing. SIAM Review, 59 (1), pp. 65–98, 2017. doi:10.1137/141000671.

R. Bhatia. Matrix Analysis, Springer, New York, NY, 1997 edition, 1996.

D. Billger. gyro m. https://sparse.tamu.edu/Oberwolfach/gyro_m, 2004. Model re-

duction problem, butterﬂy gyro mass matrix. The Imego Institute. Retrieved from SuiteS-

parse (Oberwolfach group).

D. Bindel. ted b. https://sparse.tamu.edu/Bindel/ted_B, 2006. Thermal problem,

ﬁnite element mesh, coupled linear thermoelasticity equations on a bar. UC Berkeley.

Retrieved from SuiteSparse (Bindel group).

I. Botonakis. thermomech dm. https://sparse.tamu.edu/Botonakis/thermomech_dM,

2009a. Thermal problem, deformation of a steel cylinder. Retrieved from Suite Sparse

(Botonakis group).

———. thermomech tc. https://sparse.tamu.edu/Botonakis/thermomech_TC, 2009b.

Thermal problem, temperature of a steel cylinder. Retrieved from Suite Sparse (Botonakis

group).

———. thermomech tc. https://sparse.tamu.edu/Botonakis/thermomech_TC, 2009c.

Thermal problem, temperature of a steel cylinder. Retrieved from Suite Sparse (Botonakis

group).

A. Brandt. Algebraic multigrid theory: The symmetric case. Applied Mathematics and

Computation, 19 (1–4), p. 23–56, 1986. doi:10.1016/0096-3003(86)90095-0.

A. Brandt and O. E. Livne. Multigrid techniques, Society for Industrial and Applied

Mathematics, 2011.

W. L. Briggs. A multigrid tutorial, Society for Industrial and Applied Mathematics,

Philadelphia, Pa., 2 edition, 2000.

consph. consph. https://sparse.tamu.edu/Williams/consph, 2008. 2D/3D Problem.

Retrieved from SuiteSparse (Williams group).

A. Cunningham. qa8fm. https://sparse.tamu.edu/Cunningham/qa8fm, 2002. Acoustics

problem, 3D acoustic ﬁnite element mass matrix. Vibro-Acoustic Sciences, Inc. Retrieved

from SuiteSparse (Cunningham group).

E. Cuthill and J. McKee. Reducing the bandwidth of sparse symmetric matrices. In Pro-

ceedings of the 1969 24th national conference on -. 1969. doi:10.1145/800195.805928.

M. Dabrowski, M. Krotkiewski, and D. W. Schmid. Milamin: Matlab-based ﬁnite

element method solver for large problems. Geochemistry, Geophysics, Geosystems, 9 (4),

2008. doi:10.1029/2007gc001719.

C. Damhaug. ship 001. https://sparse.tamu.edu/DNVS/ship_001, 1999a. Structural

problem, ship structure. DNV Software. Retrieved from SuiteSparse (DNVS group).

———. ship 003. https://sparse.tamu.edu/DNVS/ship_003, 1999b. Structural problem,

ship structure from a production run. DNV Software. Retrieved from SuiteSparse (DNVS

group).

———. shipsec1. https://sparse.tamu.edu/DNVS/shipsec1, 1999c. Structural problem,

ship section/detail from a production run. DNV Software. Retrieved from SuiteSparse

(DNVS group).

———. shipsec5. https://sparse.tamu.edu/DNVS/shipsec5, 1999d. PARASOL Ship

section. DNV Software. Retrieved from SuiteSparse (DNVS group).

———. shipsec8. https://sparse.tamu.edu/DNVS/shipsec8, 1999e. Structural problem,

ship section/detail from a production run. DNV Software. Retrieved from SuiteSparse

(DNVS group).

T. A. Davis. Direct Methods for Sparse Linear Systems, Society for Industrial and Applied

Mathematics, 2006a. doi:10.1137/1.9780898718881.

———. Direct Methods for Sparse Linear Systems, Society for Industrial and Applied Math-

ematics, 2006b. doi:10.1137/1.9780898718881.ch7.

T. A. Davis and Y. Hu. The university of ﬂorida sparse matrix collection. ACM Transac-

tions on Mathematical Software, 38 (1), p. 1–25, 2011. doi:10.1145/2049662.2049663.

R. Dembo and U. Tulowitzki. obstclae. https://sparse.tamu.edu/GHS_psdef/

obstclae, 1983. Optimization problem, quadratic obstacle problem. CUTEr. Retrieved

from SuiteSparse (GHS psdef group).

J. W. Demmel, S. C. Eisenstat, J. R. Gilbert, X. S. Li, and J. W. H. Liu. A supernodal

approach to sparse partial pivoting. SIAM J. Matrix Analysis and Applications, 20 (3),

pp. 720–755, 1999.

E. D. Dolan and J. J. Mor

e. minsurfo. https://sparse.tamu.edu/GHS_psdef/minsurfo,

2000. Optimization problem, minimum surface problem. CUTEr. Retrieved from SuiteS-

parse (GHS psdef group).

Dominique, A. Montoison, A. S. Siqueira, JSOBot, M. Toukal, tmigot, E. Saba,

J. TagBot, T. Kelman, and Y. Masuda. Juliasmoothoptimizers/amd.jl: v0.5.3 . https:

//zenodo.org/doi/10.5281/zenodo.3381898, 2023. doi:10.5281/ZENODO.3381898.

L. Dubcova, J. Cerveny, and P. Solin. Dubcova1. https://sparse.tamu.edu/UTEP/

Dubcova1, 2007a. 2D/3D Problem from PDE solver. University of Texas at El Paso.

Retrieved from SuiteSparse (UTEP group).

———. Dubcova2. https://sparse.tamu.edu/UTEP/Dubcova2, 2007b. 2D/3D Problem

from PDE solver. University of Texas at El Paso. Retrieved from SuiteSparse (UTEP

group).

———. Dubcova3. https://sparse.tamu.edu/UTEP/Dubcova3, 2007c. 2D/3D Problem

from PDE solver. University of Texas at El Paso. Retrieved from SuiteSparse (UTEP

group).

P. F. Dubois, A. Greenbaum, and G. H. Rodrigue. Approximating the inverse of a matrix

for use in iterative algorithms on vector processors. Computing, 22 (3), p. 257–268, 1979.

doi:10.1007/bf02243566.

I. S. Duff, R. G. Grimes, and J. G. Lewis. Sparse matrix test problems. ACM Transactions

on Mathematical Software, 15 (1), p. 1–14, 1989. doi:10.1145/62038.62043.

S. B. Elechi. Benelechi1. https://sparse.tamu.edu/BenElechi/BenElechi1, 2007.

2D/3D Problem. Arcelor, Inc. Retrieved from SuiteSparse (BenElechi group).

M. Ferronato, G. Gambolati, C. Janna, and P. Teatini. Numerical modelling of

regional faults in land subsidence prediction above gas/oil reservoirs. International Journal

for Numerical and Analytical Methods in Geomechanics, 32 (6), p. 633–657, 2007. doi:

10.1002/nag.640.

———. Geomechanical issues of anthropogenic co2 sequestration in exploited gas ﬁelds.

Energy Conversion and Management, 51 (10), p. 1918–1928, 2010. doi:10.1016/j.

enconman.2010.02.024.

M. Ferronato, C. Janna, and G. Gambolati. Mixed constraint preconditioning in com-

putational contact mechanics. Computer Methods in Applied Mechanics and Engineering,

197 (45–48), p. 3922–3931, 2008. doi:10.1016/j.cma.2008.03.008.

M. Ferronato, C. Janna, and G. Pini. Shifted fsai preconditioners for the eﬃcient par-

allel solution of non-linear groundwater ﬂow models. International Journal for Numerical

Methods in Engineering, 89 (13), p. 1707–1719, 2011. doi:10.1002/nme.3309.

Y. Gao, R. Kyng, and D. A. Spielman. Robust and practical solution of laplacian equations

by approximate elimination. 2023. doi:10.48550/ARXIV.2303.00709.

J. E. Gentle. Random number generation and Monte Carlo methods, Springer, New York,

NY, 2 edition, 2003.

T. George, A. Gupta, and V. Sarin. An empirical analysis of the performance of precon-

ditioners for spd systems. ACM Transactions on Mathematical Software, 38 (4), p. 1–30,

2012. doi:10.1145/2331130.2331132.

A. Ghai, C. Lu, and X. Jiao. A comparison of preconditioned krylov subspace methods for

large-scale nonsymmetric linear systems. 2016. doi:10.48550/ARXIV.1607.00351.

G. H. Golub and C. F. Van Loan. Matrix Computations - 4th Edition, Johns Hopkins

University Press, Philadelphia, PA, 2013. arXiv:https://epubs.siam.org/doi/pdf/

10.1137/1.9781421407944, doi:10.1137/1.9781421407944.

N. Gould. cvxbqp1. https://sparse.tamu.edu/GHS_psdef/cvxbqp1, 1995. Optimization

problem, barrier Hessian from convex QP. CUTEr. Retrieved from SuiteSparse (GHS psdef

group).

A. Greenbaum. Iterative Methods for Solving Linear Systems, Society for Industrial and

Applied Mathematics, 1997. doi:10.1137/1.9781611970937.

K. D. Gremban. Combinatorial Preconditioners for Sparse, Symmetric Diagonally Domi-

nant Linear Systems. Ph.D. thesis, Carnegie Mellon University, 1996.

R. Grimes. crystm02. https://sparse.tamu.edu/Boeing/crystm02, 1995a. Materials

problem, ﬁnite element mesh free vibration mass matrix. Boeing. Retrieved from SuiteS-

parse (Boeing group).

———. crystm03. https://sparse.tamu.edu/Boeing/crystm03, 1995b. Materials prob-

lem, ﬁnite element mesh free vibration mass matrix. Boeing. Retrieved from SuiteSparse

(Boeing group).

M. J. Grote and T. Huckle. Parallel preconditioning with sparse approximate in-

verses. SIAM Journal on Scientiﬁc Computing, 18 (3), p. 838–853, 1997. doi:10.1137/

s1064827594276552.

I. Gustafsson. A class of ﬁrst order factorization methods. BIT, 18 (2), p. 142–156, 1978.

doi:10.1007/bf01931691.

M. Hestenes and E. Stiefel. Methods of conjugate gradients for solving linear systems.

Journal of Research of the National Bureau of Standards, 49 (6), p. 409, 1952. doi:

10.6028/jres.049.044.

J. Inca, A. Davis, and A. Dubas. Benchmarking preconditioners. Technical Report

UKAEA-STEP-PR(23)02, United Kingdom Atomic Energy Authority, 2021.

D. Isaak. tmt sym. https://sparse.tamu.edu/CEMW/tmt_sym, 2008. Electromagnetics

problem. Retrieved from SuiteSparse (CEMW group).

C. Janna, A. Comerlati, and G. Gambolati. A comparison of projective and direct

solvers for ﬁnite elements in elastostatics. Advances in Engineering Software, 40 (8), p.

675–685, 2009. doi:10.1016/j.advengsoft.2008.11.010.

C. Janna and M. Ferronato. Adaptive pattern research for block fsai precondition-

ing. SIAM Journal on Scientiﬁc Computing, 33 (6), p. 3357–3380, 2011a. doi:

10.1137/100810368.

———. Adaptive pattern research for block fsai preconditioning. SIAM Journal on Scientiﬁc

Computing, 33 (6), p. 3357–3380, 2011b. doi:10.1137/100810368.

———. Emilia 923. https://sparse.tamu.edu/Janna/Emilia_923, 2011c. Structural

problem, geomechanical model for CO2 sequestration. University of Padova. Retrieved

from SuiteSparse (Janna group).

———. Fault 639. https://sparse.tamu.edu/Janna/Fault_639, 2011d. Structural prob-

lem, contact mechanics for model of a faulted gas reservoir. University of Padova. Retrieved

from SuiteSparse (Janna group).

———. Flan 1565. https://sparse.tamu.edu/Janna/Flan_1565, 2011e. Structural prob-

lem, 3D model of a steel ﬂance, hexahedral ﬁnite elements. University of Padova. Retrieved

from SuiteSparse (Janna group).

———. Geo 1438. https://sparse.tamu.edu/Janna/Geo_1438, 2011f. Structural prob-

lem, geomechanical model of earth crust with underground deformation. University of

Padova. Retrieved from SuiteSparse (Janna group).

———. Hook 1498. https://sparse.tamu.edu/Janna/Hook_1498, 2011g. Structural prob-

lem, 3D model of a steel hook with tetrahedral ﬁnite elements. University of Padova.

Retrieved from SuiteSparse (Janna group).

———. Serena. https://sparse.tamu.edu/Janna/Serena, 2011h. Structural problem, gas

resevoir simulation for CO2 sequestration. University of Padova. Retrieved from SuiteS-

parse (Janna group).

———. Stocf-1465. https://sparse.tamu.edu/Janna/StocF-1465, 2011i. Computational

ﬂuid dynamics, ﬂow in porous medium with stochastic permeabilies. University of Padova.

Retrieved from SuiteSparse (Janna group).

———. Bump 2911. https://sparse.tamu.edu/Janna/Bump_2911, 2014a. 2D/3D Prob-

lem, 3D geomechanical reservoir simulation. University of Padova. Retrieved from SuiteS-

parse (Janna group).

———. Pﬂow 742. https://sparse.tamu.edu/Janna/PFlow_742, 2014b. 2D/3D Prob-

lem, 3D pressure-temp evoluation in porous media. University of Padova. Retrieved from

SuiteSparse (Janna group).

———. Queen 4147. https://sparse.tamu.edu/Janna/Queen_4147, 2014c. 2D/3D Prob-

lem, 3D structural problem. University of Padova. Retrieved from SuiteSparse (Janna

group).

C. Janna, M. Ferronato, and G. Gambolati. Multi-level incomplete factorizations

for the iterative solution of non-linear ﬁnite element problems. International Journal for

Numerical Methods in Engineering, 80 (5), p. 651–670, 2009. doi:10.1002/nme.2664.

———. A block fsai-ilu parallel preconditioner for symmetric positive deﬁnite linear sys-

tems. SIAM Journal on Scientiﬁc Computing, 32 (5), p. 2468–2484, 2010. doi:

10.1137/090779760.

———. Enhanced block fsai preconditioning using domain decomposition techniques. SIAM

Journal on Scientiﬁc Computing, 35 (5), p. S229–S249, 2013. doi:10.1137/120880860.

———. The use of supernodes in factored sparse approximate inverse preconditioning. SIAM

Journal on Scientiﬁc Computing, 37 (1), p. C72–C94, 2015. doi:10.1137/140956026.

J. A. Kelner, L. Orecchia, A. Sidford, and Z. A. Zhu. A simple, combinatorial algo-

rithm for solving sdd systems in nearly-linear time. 2013. doi:10.48550/ARXIV.1301.

6628.

R. Kouhia. cbuckle. https://sparse.tamu.edu/TKK/cbuckle, 2008a. Structural problem,

compressed cylindrical shell buckling. Helsinki University of Technology. Retrieved from

SuiteSparse (TKK group).

———. smt. https://sparse.tamu.edu/TKK/smt, 2008b. Structural problem, 3D model,

thermal stresss analysis of surgace mounted transistor. Helsinki University of Technology.

Retrieved from SuiteSparse (TKK group).

I. Koutis, G. L. Miller, and D. Tolliver. Combinatorial preconditioners and multilevel

solvers for problems in computer vision and image processing. Computer Vision and Image

Understanding, 115 (12), p. 1638–1646, 2011. doi:10.1016/j.cviu.2011.05.013.

K. Leppkes and U. Naumann. shallow water1. https://sparse.tamu.edu/MaxPlanck/

shallow_water1, 2009a. Computational ﬂuid dynamics problem, shallow water modelling.

Max-Planck Institute of Meteorology. Retrieved from SuiteSparse (MaxPlanck group).

———. shallow water2. https://sparse.tamu.edu/MaxPlanck/shallow_water2, 2009b.

Computational ﬂuid dynamics problem, shallow water modelling. Max-Planck Institute of

Meteorology. Retrieved from SuiteSparse (MaxPlanck group).

N. Li, Y. Saad, and E. Chow. Crout versions of ilu for general sparse matrices. SIAM Jour-

nal on Scientiﬁc Computing, 25 (2), p. 716–728, 2003. doi:10.1137/s1064827502405094.

X. Li, J. Demmel, J. Gilbert, iL. Grigori, M. Shao, and I. Yamazaki. SuperLU

Users’ Guide. Technical Report LBNL-44289, Lawrence Berkeley National Labora-

tory, 1999. https://portal.nersc.gov/project/sparse/superlu/ug.pdf Last update:

June 2018.

X. S. Li. An overview of SuperLU: Algorithms, implementation, and user interface. ACM

Transactions on Mathematical Software, 31 (3), pp. 302–325, 2005.

X. S. Li and M. Shao. A supernodal approach to incomplete LU factorization with partial

pivoting. ACM Trans. Mathematical Software, 37 (4), 2010.

M. Lourakis. bundle1. https://sparse.tamu.edu/Lourakis/bundle1, 2006. Computer

graphics/vision problem, sparse bundle adjustment, 3D vision. Foundation for Research

and Technology Hellas — Institute of Computer Science. Retrieved from SuiteSparse

(Lourakis group).

S. Mayer. audikw 1. https://sparse.tamu.edu/GHS_psdef/audikw, 2004a. Structural

problem. project PARASOL. Retrieved from SuiteSparse (GHS psdef group).

———. crankseg 1. https://sparse.tamu.edu/GHS_psdef/crankseg_1, 2004b. Structural

problem. DNV Software. Retrieved from SuiteSparse (GHS psdef group).

———. crankseg 2. https://sparse.tamu.edu/GHS_psdef/crankseg_2, 2004c. Structural

problem. DNV Software. Retrieved from SuiteSparse (GHS psdef group).

———. inline 1. https://sparse.tamu.edu/GHS_psdef/inline_1, 2004d. Structural

problem. project PARASOL. Retrieved from SuiteSparse (GHS psdef group).

M. Mazaheri. bundle adj. https://sparse.tamu.edu/Mazaheri/bundle_adj, 2015. Com-

puter vision problem, sparse bundle adjustment. University of Calgary. Retrieved from

SuiteSparse (Mazaheri group).

B. McRae. ecology2. https://sparse.tamu.edu/McRae/ecology2, 2008. Landscape ecol-

ogy problem. National Center for Ecological Analysis and Synethesis. Retrieved from

SuiteSparse (McRae group).

J. A. Meijerink and H. A. van der Vorst. An iterative solution method for linear systems

of which the coeﬃcient matrix is a symmetric m-matrix . Mathematics of Computation,

31 (137), p. 148–162, 1977. doi:10.1090/s0025-5718-1977-0438681-4.

J. Mor

e and G. Toraldo. jnlbrng1. https://sparse.tamu.edu/GHS_psdef/jnlbrng1,

1991. Optimization problem, quadratic journal bearing problem. CUTEr. Retrieved from

SuiteSparse (GHS psdef group).

NASA. nasasrb. https://sparse.tamu.edu/Pothen/nasasrb, 1995. Structural problem, shut-

tle rocket booster. NASA. Retrieved from SuiteSparse (Pothen group).

———. bodyy4. https://sparse.tamu.edu/Pothen/bodyy4, 2003a. Structural problem.

NASA. Retrieved from SuiteSparse (Pothen group).

———. bodyy5. https://sparse.tamu.edu/Pothen/bodyy5, 2003b. Structural problem.

NASA. Retrieved from SuiteSparse (Pothen group).

———. bodyy6. https://sparse.tamu.edu/Pothen/bodyy6, 2003c. Structural problem.

NASA. Retrieved from SuiteSparse (Pothen group).

U. Okuyucu. G2 circuit. https://sparse.tamu.edu/AMD/G2_circuit, 2006a. Circuit

simulation matrix. AMD, Inc. Retrieved from SuiteSparse (AMD group).

———. G3 circuit. https://sparse.tamu.edu/AMD/G3_circuit, 2006b. Circuit simulation

matrix. AMD, Inc. Retrieved from SuiteSparse (AMD group).

S. K. Park and K. W. Miller. Random number generators: good ones are hard to ﬁnd.

Communications of the ACM, 31 (10), p. 1192–1201, 1988. doi:10.1145/63039.63042.

W. H. Payne, J. R. Rabung, and T. P. Bogyo. Coding the lehmer pseudo-random number

generator. Communications of the ACM, 12 (2), p. 85–86, 1969. doi:10.1145/362848.

362860.

J. W. Pearson and J. Pestana. Preconditioners for krylov subspace methods: An overview.

GAMM-Mitteilungen, 43 (4), 2020. doi:10.1002/gamm.202000015.

S. Regev. Preconditioning Techniques for Sparse Linear Systems, Stanford University, 2022.

S. Regev and M. A. Saunders. SSAI: A Symmetric Sparse Approximate Inverse Precondi-

tioner for the Conjugate Gradient Methods PCG and PCGLS . Technical report, Stanford

University, 2020.

Y. Robert. Regular incomplete factorizations of real positive deﬁnite matrices. Linear Al-

gebra and its Applications, 48, p. 105–117, 1982. doi:10.1016/0024-3795(82)90101-x.

E. Rudnyi. t2dah e. https://sparse.tamu.edu/Oberwolfach/t2dah_e, 2004. Model re-

duction problem, micropyros thruster. University of Freiburg. Retrieved from SuiteSparse

(Oberwolfach group).

E. Rudnyi, J. Korvink, and B. van Rietbergen. bone010. https://sparse.tamu.

edu/Oberwolfach/bone010, 2006a. Model reduction problem. Retrieved from SuiteSparse

(Oberwolfach group).

———. bones01. https://sparse.tamu.edu/Oberwolfach/boneS01, 2006b. Model reduc-

tion problem. Retrieved from SuiteSparse (Oberwolfach group).

———. bones10. https://sparse.tamu.edu/Oberwolfach/boneS10, 2006c. Model reduc-

tion problem. Retrieved from SuiteSparse (Oberwolfach group).

E. Rudnyi and B. Van Rietbergen. Eﬃcient harmonic simulation of a trabecular bone

ﬁnite element model by means of model reduction. In Proceedings of the 12th The Finite

Element Method in Biomedical Engineering, Biomechanics and Related Fields workshop

(FEM) 20-21 july 2005, Ulm, Germany, pp. 61–68. 2005.

J. W. Ruge and K. St

uben. 4. Algebraic Multigrid, p. 73–130. Society for Industrial and

Applied Mathematics, 1987. doi:10.1137/1.9781611971057.ch4.

Y. Saad. Ilut: A dual threshold incomplete lu factorization. Numerical Linear Algebra with

Applications, 1 (4), p. 387–402, 1994. doi:10.1002/nla.1680010405.

———. Iterative methods for sparse linear systems, SIAM, Philadelphia, MS, 2 edition,

2003.

O. Schenk. af shell3. https://sparse.tamu.edu/Schenk_AFE/af_shell3, 2003a. Struc-

tural problem, sheet metal forming structural problem series. University of Basel. Retrieved

from SuiteSparse (Schenk AFE group).

———. af shell7. https://sparse.tamu.edu/Schenk_AFE/af_shell7, 2003b. Structural

problem, sheet metal forming structural problem series. University of Basel. Retrieved

from SuiteSparse (Schenk AFE group).

———. af 0 k101. https://sparse.tamu.edu/Schenk_AFE/af_0_k101, 2006a. Structural

problem, sheet metal forming structural problem. University of Basel. Retrieved from