An Empirical Study of Conjugate Gradient
Preconditioners for Solving Symmetric Positive Definite
Systems of Linear Equations
Marc A. Tunnell
David F. Gleich
Despite hundreds of papers on preconditioned linear systems of equations, there re-
mains a significant lack of comprehensive performance benchmarks comparing various
preconditioners for solving symmetric positive definite (SPD) systems. In this paper,
we present a preliminary comparative study of 79 matrices using a broad range of
preconditioners. Specifically, we evaluate 10 widely used preconditoners across 92 con-
figurations to assess their relative performance against using no preconditioner. Our
focus is on preconditioners that are commonly used in practice, are available in major
software packages, and can be utilized as black-box tools without requiring significant
a priori knowledge. In addition, we compare these against a selection of classical meth-
ods. We do not consider the time and effort needed to compute the preconditioner.
Our results show that symmetric positive definite systems are mostly likely to bene-
fit from incomplete symmetric factorizations, such as incomplete Cholesky. Multigrid
methods occassionally do exceptionally well. Simple classical techniques, symmetric
Gauss Seidel and symmetric SOR, are not productive.
TODO: Finish abstract with the final results. This work is part of a long-term
study, and we plan to continue development to expand the scope of the benchmark
and refine our analysis in ongoing work.
1 Introduction 2
2 Results 4
2.1 Black-Box Preconditioner Performance . . . . . . . . . . . . . . . . . . . . . 4
2.2 Preconditioner Performance Profiles . . . . . . . . . . . . . . . . . . . . . . . 7
2.3 Impact of Ordering on Preconditioner Performance . . . . . . . . . . . . . . 9
Purdue University, mtunnell@purdue.edu
Purdue University, dgleich@purdue.edu
1
3 Key Findings 11
3.1 Ongoing Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.2 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4 Background and Related Work 12
4.1 Krylov Subspace Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.2 Preconditioned Conjugate Gradient . . . . . . . . . . . . . . . . . . . . . . . 13
4.3 Preconditioned Minimum Residual Method . . . . . . . . . . . . . . . . . . . 15
5 Systems of Linear Equations 15
5.1 Matrix Ordering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
5.2 Symmetric Positive Definite Matrices . . . . . . . . . . . . . . . . . . . . . . 16
5.3 Graph-Laplacian Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
5.4 Right-Hand Side Vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
6 Preconditioners 19
6.1 Classical Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
6.2 Sparse Approximate Inverse . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
6.3 Incomplete Factorizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
6.4 Algebraic Multigrid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
6.5 Graph-Laplacian Preconditioners . . . . . . . . . . . . . . . . . . . . . . . . 28
1 Introduction
We consider using preconditioned iterative methods to solve Ax = b where A is a large,
sparse, symmetric, and positive definite matrix. We assume the matrix data is available in
a standard sparse data structure such as compressed sparse row or similar. Hence, we are
not treating matrix-free or operator methods. It is well known that the convergence rate
of iterative solvers heavily depends on the condition number of the matrix (Greenbaum,
1997). To improve the condition number, preconditioners are used to transform the original
system into one with more favorable spectral properties, thereby accelerating the rate of
convergence as a function of the number of iterations (Greenbaum, 1997; Saad, 2003; Golub
and Van Loan, 2013; Benzi, 2002).
Over the past decades, the world of preconditioners has grown. It now includes rela-
tively simple techniques such as symmetric scaling (Saad, 2003), iterative methods such as
symmetric successive over-relaxation (SSOR) (Golub and Van Loan, 2013; Young, 1954),
and incomplete factorizations (Benzi, 2002; Meijerink and van der Vorst, 1977; Saad, 1994;
Li, Saad, and Chow, 2003). On the other end of the spectrum, multigrid methods (Briggs,
2000), algebraic multigrid (AMG) (Brandt and Livne, 2011), and domain decomposition
based methods (Smith, Bjorstad, and Gropp, 2004) represent more complicated techniques
that promise even faster convergence.
Despite an abundance of research on novel preconditioners for symmetric, positive def-
inite systems including reviews of their theoretical performance (Benzi, 2002; St¨uben,
2001; Pearson and Pestana, 2020) there is no comprehensive performance benchmark that
provides an empirical comparison of the performance of these preconditioners.
2
Those comparisons that exist are often limited to only a few matrices as in both Benzi
(2002) and St¨uben (2001). This is also the case in most papers proposing new precondition-
ers. Several studies have attempted a comprehensive empirical evaluation of preconditioners
for SPD systems, but these have notable limitations or otherwise fill a different gap than
addressed in this study. For instance, George, Gupta, and Sarin (2012) conducted an em-
pirical analysis of preconditioners for SPD systems. However, their study was limited to
30 matrices and focused on a few well-known software libraries. Furthermore, the types of
preconditioners tested were not unique across these libraries; for instance, they evaluated
ILU preconditioners from each library, resulting in significant overlap in the preconditioner
classes examined. Similarly, Inca, Davis, and Dubas (2021) conducted a benchmarking study
on a narrow set of preconditioners and limited their testing set to PDEs.
In Benzi, Haws, and Tuma (2000), the authors provide an empirical evaluation of pre-
conditioners for non-symmetric matrices, while (Ghai, Lu, and Jiao, 2016) provides a more
up-to-date empirical evaluation for large-scale non-symmetric systems. In Benzi and T˚uma
(1998) and Benzi and Tˆuma (1999), sparse approximate inverse preconditioners are empir-
ically analyzed and validated against the standard preconditioning techniques of the time.
However, these analyses are focused on non-symmetric matrices and, as such, lack the cov-
erage of SPD systems that we seek.
These limitations highlight the need for a comprehensive performance benchmark of SPD
systems across a wide range preconditioners and matrices. Such a benchmark would help
establish clearer guidelines on the applicability and efficiency of different preconditioners
across a wide range of real-world SPD systems.
In this study, we aim to address this gap by providing a thorough performance evaluation
of a wide range of preconditioner classes on 79 SPD matrices from the SuiteSparse Matrix
Collection (Davis and Hu, 2011). Our focus is on preconditioners that are widely used in
practice, that are available in large software packages, and may be utilized as a black-box
without requiring significant a priori knowledge of the underlying properties of the system.
We choose these types of preconditioners to ensure the relevance of this study to the broader
scientific community.
Specifically, we test preconditioners from the classes outlined in Table 1 and compare
their performance against the unpreconditioned but symmetric diagonally scaled case.
Preconditioner Class Preconditioners
Classical Methods
Symmetric Gauss-Seidel, SSOR,
Truncated Neumann, Steepest Descent
Sparse Approximate Inverse SPAI
Incomplete Factorizations
Incomplete Cholesky (IC), Modified IC,
Incomplete LU
Algebraic Multigrid Ruge-Stuben, Smoothed Aggregation
Graph-Laplacian Preconditioners Combinatorial Multigrid, Approximate Cholesky
Table 1: Classes of preconditioners evaluated in this study.
In section 6, we provide more details on the exact preconditioners used, their properties
and workload, and the software packages in which they are implemented. We utilize the
3
preconditioned CG (PCG) method as our solver, specifically the Hestenes-Stiefel formula-
tion (Hestenes and Stiefel, 1952; Golub and Van Loan, 2013). In a future iteration of this
manuscript, we will include the results of the Preconditioned Minimum Residual (PMR)
method.
We aim to eliminate as many extraneous variables as possible, focusing solely on compar-
ing the effectiveness of the preconditioners in reducing the overall computational workload.
Thus, we study and compare preconditioners in terms of floating point operations needed
to converge to a given tolerance. This approach excludes the time needed to construct the
preconditioner, which can be significant, but crucially, it removes variance from differences
in implementation efficiency. While floating point operations are only a proxy for the more
desirable measure of computation time, the idea here is to study and assess preconditioner
effectiveness independent of hardware and software details and optimizations.
2 Results
We split this section into three parts: the first presents results for black-box precondi-
tioners on a selection of test matrices, the second provided performance profiles, and the
third analyzes the impact of reordering schemes on optimal performance.
Throughout this section, preconditioner groups, which represent all tested configurations
for a specific preconditioner, are sometimes combined where appropriate (e.g., Symmetric
Gauss-Seidel and SSOR). The work of a preconditioner (or of the solver without precondi-
tioning) is measured as the number of iterations multiplied by the matrix’s non-zero elements,
plus the preconditioner’s contribution per iteration. For more information on how this work
is calculated or estimated for each preconditioner, see the relevant subsection in section 6.
In this version of the manuscript, we elect not to show results for the Combinatorial
Multigrid (CMG) preconditioner as it performed well on exactly zero matrices in the test
set. We note that the CMG preconditioner is designed for graph-Laplacian matrices and is
theoretically expected to perform similarly on diagonally dominant matrices. However, our
current test set omits the graph-Laplacian matrices and contains only a limited number of
diagonally dominant matrices. We also do not show results for Steepest Descent as they
also performed poorly on all matrices in the test set. Additionally, we have tested SOR and
Gauss-Seidel and found that they do not perform on symmetric problems.
2.1 Black-Box Preconditioner Performance
This subsection presents figures illustrating the convergence of the PCG method with
various preconditioners applied to individual matrices. Each figure displays a grid of sub-
plots, one for each preconditioner group, where the relative residual is plotted against the
work expended. The control case (no preconditioner) is shown in blue for reference, the
preconditioner of focus is highlighted in red, and all other preconditioners are shown in gray.
We start by showing a selection of results for crankseg 1, a matrix from structural
analysis that models a crankshaft (Mayer, 2004b). This matrix has dimension 52804 and
has 10614210 non-zero elements. We compute the condition number of the symmetrically
scaled matrix to a tolerance of 10
8
and find it to be not yet computed. This matrix is
not diagonally dominant and has an average degree of 201.01, which is relatively high. The
results for this matrix can be seen in Figure 1.
From the figure, we see that the IC and Modified IC preconditioners are the most effective,
4
Relative Residual
10
10
10
8
10
6
10
4
10
2
10
0
SGS & SSOR
Truncated
Neumann
Symmetric
SPAI
Relative Residual
10
10
1
0
8
10
6
10
4
1
0
2
AMG
Rube-Stuben
AMG
Smoothed
Aggregation
Laplacians
Cholesky
Rel
ative Resi
dual
10
10
1
0
8
10
6
10
4
10
2
Sup
erLU
(LLt)
Incomplete
Cholesky
(IC)
Modified IC
10
8
10
9
Re
l
a
tive Re
sidua
l
10
10
10
8
10
6
10
4
10
2
Control
Figure 1: Convergence of the PCG method with various preconditioners applied to the
crankseg 1 matrix. The graphs in this figure are plotted with a log-log scale.
reducing the computational cost in the best case by one order of magnitude. The AMG and
Symmetric SPAI preconditioners show minor improvements in decrease of total work, while
the classical methods offer no improveent over the control case. Because the matrix is
not diagonally dominant, the graph-Laplacian methods are not effective, and the SuperLU
preconditioner is similarly ineffective.
We next show a selection of results for ecology2, a matrix from circuit simulation theory
that is used to model animal movement. This matrix has dimension 999999 and has 4995991
5
non-zero elements. We compute the condition number of the symmetrically scaled matrix
to a tolerance of 10
8
and find it to be 6.318306710606449 · 10
7
. This matrix is diagonally
dominant and has an average degree of 5.00, which is relatively low. The results for this
matrix can be seen in Figure 2.
Relative Residual
10
10
10
8
10
6
10
4
10
2
10
0
SGS & SSOR
Truncated
Neumann
Symmetric
SPAI
Relative Residual
10
10
10
8
10
6
10
4
10
2
AMG
Rube-Stuben
AMG
Smoothed
Aggregation
Laplacians
Cholesky
Relative Residual
10
10
10
8
1
0
6
10
4
10
2
Sup
erLU
(LLt)
Incomplete
Cholesky
(IC)
Modified IC
1
0
7
1
0
8
1
0
9
1
0
10
Re
lative Resid
ual
10
10
10
8
1
0
6
10
4
10
2
Cont
rol
Figure 2: Convergence of the PCG method with various preconditioners applied to the
ecology2 matrix. The graphs in this figure are plotted with a log-log scale.
From the figure, we see that the IC and Modified IC preconditioners are the most ef-
fective, reducing the computational cost in the best case by over one order of magnitude.
The methods based on Algebraic Multigrid are also effective, reducing the computational
6
cost similarly to the IC preconditioners. Additionally, the graph-Laplacian-based Cholesky
preconditioner is effective on this preconditioner, reducing the computational cost by a sim-
ilar margin. The classical preconditioners are largely ineffective here, although the SGS
and SSOR preconditioners show minor improvements in the best case. The SuperLU and
Symmetric SPAI preconditioners are largely ineffective on this matrix.
The performance of the graph-Laplacian preconditioner on ecology2 is typical of the
performance of this preconditioner on approximately half of the diagonally dominant matrices
in the test set. The Algebraic Multigrid preconditioners have similar performance to the
shown results in Figure 2 on the majority of the diagonally dominant matrices in the test
set. As we will see in the performance profiles, the IC preconditioners are effective on a wide
range of matrices, while the classical preconditioners are largely ineffective.
2.2 Preconditioner Performance Profiles
This subsection presents performance profiles summarizing the effectiveness of the test
preconditioners across all matrices (or graphs) in the test set. As in the previous section, the
work expended is measured as the number of iterations required to lower the relative residual
to a given tolerance, multiplied by the matrix’s non-zero elements plus the preconditioner’s
contribution per iteration. The performance profile indicates the fraction of problems for
which a preconditioner achieves a certain factor of reduction in computational effort to reduce
the relative residual to 10
10
.
In each figure, there are multiple subplots, displaying one graph per preconditioner group.
The graph is lightly shaded in the region where the preconditioner is more costly than the
control, starting at the horizontal axis value of 1x. The preconditioner group in focus is
shown in red, while all other preconditioners are shown in gray. For each of these plots,
the horizontal axis represents the factor by which the computational cost is decreased (or
increased) relative to the computational cost of convergence to 10
10
with no preconditioner.
The vertical axes show the fraction of problems where the preconditioner achieves that
performance.
We show the performance profile over all matrices in the test set in Figure 3, on the
previous page. This figure shows that the IC and Modified IC preconditioners are the most
effective, reducing the computational cost by one order of magnitude on a significant fraction
of the test set. The AMG preconditioners are also at least slightly effective on a significant
fraction of the test set. We found that the classical methods, Symmetric SPAI, and SuperLU
preconditioners were largely ineffective overall, though they do reduce the computational cost
on a small fraction of the test set.
The following figure shows similar information as Figure 3, but with selecting the best
run for each matrix (across all parameter settings) within a given preconditioner group. This
approach simulates tuning the preconditioners parameters for each matrix while isolating the
effect of the preconditioner. This is shown next in Figure 4.
This figure shows that, with some tuning, the IC and MIC preconditioners are still the
most effective. The IC preconditioner was better performing on average than MIC, and lead
to a speedup of 128 times or greater on 21.79% of the test set, while the MIC preconditioner
lead to a speedup of 128 times or greater on 19.23% of the test set. The IC preconditioner
reports a speedup of 4 times or greater on 74.36% of the tested problems, and either could
not be generated or lead to a degredation in performance compared to no preconditioning
7
on only 6.41% of the test set.
The AMG preconditioners are also effective on a significant fraction of the test set, with
Smoothed Aggregation (SA) being significantly better than Ruge-Stuben (RS) on average.
We found that SA lead to speedup of 64 times or greater on 2.56% of the test set, while RS
did not improve the convergence of any problem by a factor this large. Both SA and RS
lead to a speedup of 4 times or greater on 11.54% of the test set, while SA and RS lead to
a speedup of 2 times or greater on 34.62% and 23.08% of the test set, respectively. The SA
Proportion of Problems
Convergence Speed τ
0%
25%
50%
75%
100%
SGS & SSOR Truncated
Neumann
Symmetric
SPAI
Proportion of Problems
Convergence Speed τ
0%
25%
50%
75%
AMG
Rube-Stuben
AMG
Smoothed
Aggregation
Laplacians
Cholesky
Relative Convergence Speed τ
128x
64x
32x
16x
8x
4x
2x
1x
0
.
5x
Pro
p
o
rt
i
on of Problems
C
on
ver
g
e
nce
S
peed
τ
0%
25%
50%
75%
SuperLU
(LLt)
128x
64x
32x
16x
8x
4x
2x
1x
0
.
5x
Incomplete
Cholesky
(IC)
128x
64x
32x
16x
8x
4x
2x
1x
0
.
5x
0
.25x
Modified IC
Figure 3: Performance profiles of the PCG method with various preconditioners applied to
all matrices in the test set. The graphs in this figure are plotted on a log-linear scale.
8
Relative Convergence Speed τ
128x
64x
32x
16x
8x
4x
2x
1x
0.5x
0
.25x
Proportion of Problems
Convergence Speed τ
0%
25%
50%
75%
100%
SGS & SSOR
Truncated
Neumann
Symmetric
SPAI
AMG
Rube-Stuben
AMG
Smoothed
Aggregation
Laplacians
Cholesky
SuperLU
(
LLt)
I
ncomplete
Cholesky
(
IC
)
Modified IC
Figure 4: Preconditioner performance profile for PCG over all matrices in the test set,
selecting the best run for each matrix within a given preconditioner group. This figure is
plotted with a log-linear scale.
variant could not be generated or lead to a degredation in performance on only 14.10% of
tested problems, while RS was ineffective on 23.08% of the test set.
We found that the classical methods, Symmetric SPAI, and SuperLU preconditioners
were largely ineffective overall. SSOR and SGS combined lead to a speedup of 2 times or
greater on only 3.85% of the test set. The Symmertric SPAI preconditioner similarly lead
to a speedup of 2 times or greater on only 1.28% of the test set. These two preconditioner
groups lead to a degredation in performance compared to no preconditioner on 41.03% and
25.36% of the test set, respectively.
2.3 Impact of Ordering on Preconditioner Performance
In this subsection, we evaluate preconditioner performance by selecting the best run for
each matrix (across all parameter settings) within a given ordering and then compare the re-
sulting performance profiles. This approach simulates tuning the preconditioners parameters
for each matrix while isolating the effect of reordering.
Similarly to the previous section on performance profiles, the section of the graph in
which preconditioner performance is worse than the control is lightly shaded. We show the
performance profile for the different orderings in Figure 5 on the following page.
We found that there was only marginal difference in performance based on ordering for
the classical preconditioners, which is expected as they are invariant to ordering. The slight
difference in performance in runs is likely due to the potential for numerical issues, which
can be exacerbated by the ordering on ill-conditioned matrices. Similarly, there was little
difference in performance based on ordering for the AMG-based methods and the graph-
Laplacian Cholesky preconditioner as they perform their own ordering internally. Again, the
difference in performance can likely be attributed to numerical issues.
We found that the RCM ordering was the most effective for the Symmetric SPAI pre-
9
Proportion of Problems
Convergence Speed τ
0%
25%
50%
75%
100%
Natural
AMD
RCM
SGS & SSOR
Natural
AMD
RCM
Truncated
Neumann
Natural
AMD
RCM
Symmetric
SPAI
Proportion of Problems
Convergence Speed τ
0%
25%
50%
75%
Natural
AMD
RCM
AMG
Rube-Stuben
Natural
AMD
RCM
AMG
Smoothed
Aggregation
Natural
AMD
RCM
Laplacians
Cholesky
Relative Convergence Speed τ
128x
64x
32x
16x
8x
4x
2x
1x
0
.
5x
Pro
p
o
rt
i
on of Problems
C
on
ver
g
e
nce
S
peed
τ
0%
25%
50%
75%
Natural
AMD
RCM
SuperLU
(LLt)
128x
64x
32x
16x
8x
4x
2x
1x
0
.
5x
Natural
AMD
RCM
Incomplete
Cholesky
(IC)
128x
64x
32x
16x
8x
4x
2x
1x
0
.
5x
0
.25x
Natural
AMD
RCM
Modified IC
Figure 5: Performance profiles of the PCG method with various preconditioners applied to
all matrices in the test set, grouped by ordering. The graphs in this figure are plotted on a
log-linear scale.
conditioner and that there was nearly no difference in performance for this preconditioner
betweem AMD and the Natural ordering.
1
For the IC and MIC preconditioners, the AMD
ordering was by far the most effective, although the RCM ordering still provided some per-
formance uplift over the Natural ordering. The SuperLU preconditioner was largely invariant
1
The curves for the natural and RCM orderings are almost entirely overlapping for this preconditioner.
10
to the ordering, although the AMD and RCM orderings performed marginally better than
the Natural ordering.
3 Key Findings
Our preliminary experimental results indicate that the IC preconditioner—and its modi-
fied variant—consistently delivers superior performance across a wide range of SPD matrices.
When optimally tuned, IC achieves speedups of upwards of 128 times on nearly one quarter
of the test set, while yielding at least a 4 times improvement over nearly three quarters of the
test set. When it was possible to form the preconditioner, there was rarely any degredation
in performance compared to no preconditioning, and it lead to complete failure on almost
no matrices. This shows that the IC and MIC preconditioners are robust and effective for a
wide range of SPD matrices and should be considered as a first choice if the reader’s problem
appears to be similar to the SPD matrices found in the SuiteSparse Matrix Collection.
Algebraid Multigrid methods, particularly the SA variant, also demonstrate significant
benefits on a large portion of the test set, especially on those matrices with strong diagonal
dominance. In contrast, classical preconditioners such as SGS and SSOR, and symmetric
SPAI, generally offer minimal improvement over the unpreconditioner solver and may even
lead to a degredation in performance on a significant fraction of the test set.
Our study further affirms that the choice of ordering has a notable impact on precondi-
tioner effectiveness for certain classes of preconditioners. In particular, applying the AMD
ordering to the IC and MIC preconditioners significantly enhances their ability to improve
convergence compared to natural or RCM orderings. Conversely, the RCM ordering is most
effective for the symmetric SPAI preconditioner and we found little difference in performance
between the AMD and natural orderings. Overall, these finding demonstrate that careful
consideration of both the preconditioning strategy and the associated ordering can substan-
tially improve the efficiency of the conjugate gradient method for solving SPD systems.
3.1 Ongoing Work
We are actively expanding our test suite by more than a factor of three by incorporating a
large number of graph-based problems that naturally yield graph-Laplacian matrices. Many
of these graphs are significantly larger than our current examples, allowing us to study
preconditioner performance across a wide range of problem sizes—from small- and medium-
scale to very-large-scale. This expanded dataset will help us determine whether certain
preconditioners excel only on smaller problems or maintain their advantages as problem size
increases. We additionally seek to add a greater number of diagonally dominant matrices,
which will allow us to more thoroughly evaluate preconditioner performance in settings where
strong diagonal dominance plays a role.
In addition to expanding our test set, we are integrating additional reordering strategies
into our framework. Beyond our current use of AMD and RCM orderings, we plan to
incorporate nested dissection and METIS orderings. These additional ordering schemes will
allow us to further explore how structural reordering impacts preconditioner efficiency on
large-scale graphs and SPD systems, and to provide clearer guidance on best practices for
different problem classes. Finally, we are also actively testing the preconditioned MINRES
method to compare its performance with that of PCG, thereby extending our analysis across
different power subspace solvers.
11
3.2 Limitations
Our study has several limitations that we are actively addressing in ongoing work. First,
the current test suite does not yet include a sufficiently diverse set of large-scale problems;
most matrices are what we would consider small- to medium-scale for modern scientific com-
puting. This limitation restricts our ability to fully assess how preconditioner performance
interacts with problem size, particularly in the context of very large graphs and systems.
Additionally, the dataset includes relatively few diagonally dominant matrices, which leads
to an unfair testing environment for preconditioners that are specifically designed for such
systems (e.g., graph-Laplacian preconditioners).
Another limitation arises from our currrent focus on floating point operations as a proxy
for computational cost. While this measure provides a machine-independent benchmark, it
does not capture all aspects of actual runtime performance, such as memory access patterns
or the impact of hardware-specific optimizations. Moreover, the preconditioner construction
time is not incorporated into our cost model, even though we found that this time can be
significant in practice for some preconditioners. Finally, although our work current empha-
sizes the PCG method, further analysis of other power subspace methods (e.g., MINRES) is
needed to understand how preconditioners interact with different solvers. We are address-
ing these limitations by expanding our dataset, integrating additional preconditioners, and
incorporating more comprehesnsive performance metrics in future iterations of this work.
Finally, in this study we focus on the relative performance of preconditioners across a
diverse set of matrices. In performing so many runs, we are limited in the number of right-
hand side vectors we can generate and test on. We recognize that the choice of the right-hand
side vector can cause variations in the reported performance of the preconditioners, or even
the convergence of the solver. This is a fundamental limitation to our study, but we have
found that using our method of generating the right-hand side vector provides similar results
to those provided by the SuiteSparse Matrix Collection. In Figure 6, we show a scatter plot
which shows the number of unpreconditioned CG iterations to converge with our right-hand
side vector compared to the SuiteSparse Matrix Collection right-hand side vector.
This shows that there is some variance in the number of iterations required to converge,
but that the variance is always within a reasonable range. Note that there are relatively few
matrices in the test set which come with provided right-hand side vectors, but this provides
evidence that our method of generating the right-hand side vector is reasonable.
4 Background and Related Work
We first discuss the notation and terminology used in this paper. We utilize boldface
lowercase letters to denote vectors (e.g., x), boldface uppercase letters to denote matrices
(e.g., A), and lowercase letters to denote scalars (e.g., γ). The i
th
element of a vector x is
denoted by x
i
, and the (i, j)
th
element of a matrix A is denoted by a
i,j
. We use notation
like x
(k)
to denote the k
th
iterate of some vector x, whereas A
, A
1
, and A
k
refers to the
transpose, the inverse, and the k
th
power of some matrix A, respectively.
We refer to some arbitrary preconditioner as M , where its application to a vector r is
denoted as z = M
1
r. We make frequent reference to linear operators in this paper and
often denote some arbitrary linear operator as L. The application of a linear operator L is
equivalent to applying the inverse of M and is denoted as z = L(r) = M
1
r.
We use notation like γ to denote the nearest integer to some scalar γ. We use the
12
Matrix
2cubes_sphere
af_0_k101
af_1_k101
af_2_k101
af_3_k101
af_4_k101
af_5_k101
af_shell3
af_shell7
BenElechi1
bone010
boneS01
boneS10
bundle_
ad
j
nasasrb
offshore
parabolic_fem
Pre
s_Poisson
smt
therm
al
1
thermal2
CG Iterations To Converge
10
2
10
3
10
4
Seeded
P
r
ovided
Figure 6: Comparison of the number of unpreconditioned CG iterations to converge with
our right-hand side vector compared to the SuiteSparse Matrix Collection right-hand side
vector. In this figure, “Seeded” refers to the right-hand side vector generated by our method,
and “Provided” refers to the right-hand side vector provided by the SuiteSparse Matrix
Collection. In the case where multiple right-hand side vectors were provided, we used the
first one. This graph is plotted with semi-log scale and a categorical horizontal axis.
notation 0 A 1 to denote that the matrix A is positive definite and has eigenvalues
bounded by 1 (Bhatia, 1996).
4.1 Krylov Subspace Methods
In this preliminary study, we focus on Krylov subspace methods for solving large, sparse
linear systems of equations. Specifically, we utilize the Preconditioned Conjugate Gradient
(PCG) method for SPD systems. We recognize that the performance of preconditioners may
be different when applied with different solvers; therefore, in future work, we will expand
our analysis to include Preconditioned Minimum Residual (PMR) method.
4.2 Preconditioned Conjugate Gradient
The PCG method is an efficient iterative solver for SPD systems and is known for its
scalability and efficiency (Hestenes and Stiefel, 1952; Golub and Van Loan, 2013). The
convergence rate of the PCG method, like all Krylov methods, is heavily dependent on
the condition number of the matrix, and preconditioners are used to improve the condition
number of the system and accelerate the rate of convergence (Greenbaum, 1997; Golub and
Van Loan, 2013; Saad, 2003).
The PCG method can be viewed as a process that minmizes the quadratic function
ϕ(x) =
1
2
x
Ax x
b. (1)
13
This function is strictly convex when A is SPD, ensuring that there is a unique global
minimum that corresponds exactly to the solution of Ax = b. The gradient of the function
given in Equation 1 is
ϕ(x) = Ax b.
By finding some x
()
such that ϕ(x
()
) = 0, we have found the solution to the original
system. Therefore, CG iteratively minimizes ϕ(x) by selecting a search direction p
(k)
that is
A-conjugate to the previous search directions, ensuring that the method converges in at most
n iterations in exact arithmetic (Golub and Van Loan, 2013). In floating point arithmetic, the
method may stall out before reaching the solution or lose numerical orthogonality between
the search directions, potentially leading to instability.
We implemented the Hestenes-Stiefel formulation of the PCG method (Hestenes and
Stiefel, 1952; Golub and Van Loan, 2013), which is described next.
4.2.1 Algorithm Description
The algorithm is outlined in Algorithm 1. In each iteration, the residual r
(k)
, the precon-
ditioned residual z
(k)
, and the search direction p
(k)
are updated. The method utilizes scalar
quantities α
(k)
and µ
(k)
to compute the optimal step sizes. The preconditioned variant of the
method enhances CG by incorporating a preconditioner M, which approximates A
1
, and
improves the conditioning of the quadratic function ϕ(x). The algorithm seeks to minimize
ϕ(x) in the linearly transformed space defined by the preconditioner.
Algorithm 1 Preconditioned Conjugate Gradients Method
Input: Sparse matrix A, right-hand side b, initial guess x
0
, preconditioner M with
linear operator L(r) = M
1
r, and tolerance τ.
Output: Solution x to Ax = b.
r
(0)
Ax
(0)
b
z
(0)
L(r
(0)
)
k 0
while r
(k)
> τ · b do
k k + 1
α
(k)
(r
(k1)
)
z
(k1)
if k > 1 then
p
(k)
z
(k1)
+
α
(k)
α
(k1)
· p
(k1)
else
p
(k)
z
(k1)
µ
α
(k)
(p
(k)
)
Ap
(k)
x
(k)
x
(k1)
+ µ · p
(k)
r
(k)
r
(k1)
µ · Ap
(k)
z
(k)
L(r
(k)
)
return x
(k)
Implementation Details This is a standard implementation of the preconditioned con-
jugate gradient method, which we implemented in Julia (Bezanson et al., 2017). We use
one matrix-vector product per iteration and one preconditioner application per iteration.
14
We abstract the application of the preconditioner M as the linear operator L(r) = M
1
r,
allowing us to wrap external preconditioners in a common interface that conforms to the
implementation. Our implementation checks for convergence based on both the relative
residual and the normwise relative backward error. We conservatively define the method as
stalling out if the norm of the update to x, that is µ · p, is less than 10
20
.
Computational Cost The dominant computational costs per iteration include are the
matrix-vector product Ap
(k)
and the preconditioner application z
(k)
= L(r
(k)
). The cost
of the matrix-vector product is proportional to the number of non-zero values in the ma-
trix, and the cost of the preconditioner application is dependent on the type of precondi-
tioner—ranging from the number of non-zero values in the preconditioner to the complexity
of the cycle/application in the case of algebraic multigrid. We provide more detail on the
computational cost of each preconditioner in section 6.
4.3 Preconditioned Minimum Residual Method
A future extension of this study will include the Preconditioned Minimum Residual
(PMR) method, which seeks to minimize the residual of the system rather than the quadratic
function over a Krylov subspace.
5 Systems of Linear Equations
In this section, we describe the selection of symmetric positive definite graph-Laplacian
matrices used in our study. Our goal is to evaluate the performance of various preconditioners
on large-scale SPD systems that are representative of real-world problems.
5.1 Matrix Ordering
The ordering of a sparse matrix is well known to be crucial for the performance of a large
number of preconditioners (Benzi, 2002; Saad, 2003). In the symmetric case, the rows and
columns of the matrix must be permuted in the same way to preserve the symmetry of the
matrix, so we only consider symmetric reorderings in this study. In the context of direct
solvers such as Gaussian elimination or those preconditioners that compute an incomplete
factorization, the ordering of the matrix can have a significant impact on the fill-in and the
quality of the factorization (Benzi, 2002; Davis, 2006a).
We denote the permutation matrix of a given ordering as P , where P AP
T
is the per-
muted matrix. In the natural ordering case, P is the identity matrix.
5.1.1 Natural
The natural ordering of a matrix is the order in which it is obtained from the application
that generated it. This serves as a baseline for comparison with other ordering strategies.
5.1.2 Reverse Cuthill-McKee
The Reverse Cuthill-McKee (RCM) algorithm was originally proposed for reducing the
bandwidth of a matrix (Cuthill and McKee, 1969). By minimizing the bandwidth of the
matric, RCM aims to cluster the non-zero elements closer to the diagonal and has been
shown in practice to reduce the fill-in of the matrix during factorization (Davis, 2006b).
The RCM algorithm is a graph-based algorithm that traverses the graph of the matrix
in a breadth-first search manner, starting from a vertex with minimal degree. Each node is
15
numbered in the order that they are visited and the numbering is reversed at the end of the
algorithm to give the final ordering.
We utilize the SymRCM package in Julia to compute the RCM ordering of the matrices.
5.1.3 Symmetric Approximate Minimum Degree
The Symmetric Approximate Minimum Degree (SYMAMD) ordering is an approximate
version of the minimum degree algorithm that is designed to preserve the symmetry of the
matrix (Amestoy, Davis, and Duff, 1996). SYMAMD is a greedy heuristic-based algorithm
that seeks to minimize the fill-in of a cholesky factorization of a sparse matrix (Davis, 2006b).
We utilize the AMD package in Julia (Dominique et al., 2023).
5.2 Symmetric Positive Definite Matrices
We utilize matrices from the SuiteSparse Matrix Collection (Davis and Hu, 2011), a
widely recognized repository of sparse matrix benchmarks that are representative of real-
world problems. To ensure our study focuses on large-scale and practically relevant problems,
we filter matrices based on the following criteria:
Matrix Dimension: Only matrices of dimension n 10,000 are considered. This
ensures that the matrices are sufficiently large to reflect computational challenges en-
countered in real-world applications.
Symmetric Positive Definite: We consider only those matrices that are real-valued
and symmetric positive definite in this study.
Convergence without Preconditioning: The unpreconditioned Conjugate Gradi-
ent method should converge within 10 · n iterations, where n is the matrix dimension.
This allows us to compare the relative performance of preconditioners in aggregate
against the unpreconditioned case.
By limiting our study to a test set of matrices that fit these criteria, we ensure that
we are evaluating preconditioners on those matrices that present a computational challenge
that is typical in large-scale scientific and engineering problems. Additionally, requiring that
the unpreconditioned CG method converges within a reasonable number iterations enables
us to compare the relative performance of the preconditioners both in aggregate and on a
per-matrix basis.
5.2.1 Matrix Characteristics
The selected matrices come from a diverse set of applications, including structural anal-
ysis, fluid dynamics, and optimization problems. These matrices comprise a wide range of
sizes and structures, reflecting the diversity of problems encountered in practice. This diver-
sity allows us to evaluate the robustness and effectiveness of preconditioners across a broad
spectrum of real-world problems.
The matrices used in this study are categorized in Table 2. Below, we provide relevant
sources where possible for more information on a selection of the matrices:
(Rudnyi and Van Rietbergen, 2005) performs the original model reduction of the prob-
lems given in (Rudnyi, Korvink, and van Rietbergen, 2006a,b,c).
(Dabrowski, Krotkiewski, and Schmid, 2008) provides details on (Schmid, 2006a,b).
(Janna and Ferronato, 2011a; Ferronato, Janna, and Pini, 2011) discuss (Janna and
Ferronato, 2011i).
16
Matrix Category Citations Count
Circuit Simulation (Okuyucu, 2006a,b; McRae, 2008) 3
Combinatorial (Trefethen, 2008a,b) 2
Computational Fluid Dynamics (Shakib, 2004; Janna and Ferronato, 2011i;
Um, 2010; Wissgott, 2007; Leppkes and Nau-
mann, 2009a,b)
6
Computer Vision (Lourakis, 2006; Mazaheri, 2015) 2
Electromagnetics (Um, 2008; Isaak, 2008) 2
Materials (Grimes, 1995a,b) 2
Model Reduction (Rudnyi, Korvink, and van Rietbergen,
2006a,b,c; Billger, 2004; Rudnyi, 2004)
6
Optimization (Gould, 1995; Berger et al., 1997; Mor´e and
Toraldo, 1991; Dolan and Mor´e, 2000; Dembo
and Tulowitzki, 1983)
5
Miscellaneous PDE (Elechi, 2007; Janna and Ferronato, 2014a;
Dubcova, Cerveny, and Solin, 2007a,b,c; Janna
and Ferronato, 2011c, 2014b,c; consph, 2008;
Cunningham, 2002; Wathen, 2004a,b)
12
Structural (Janna and Ferronato, 2011d,e,f,g,h; Schenk,
2006a,b,c,d,e,f, 2003a,b; Mayer, 2004a; Will,
1985a,b; NASA, 2003a,b,c; Kouhia, 2008a;
Mayer, 2004b,c; Weiher, Jochen Weiher;
Mayer, 2004d; Weiher, 2004a, 2005; NASA,
1995; Weiher, 2004b; Damhaug, 1999a,b,c,d,e;
Kouhia, 2008b)
34
Thermal (Bindel, 2006; Schmid, 2006a,b; Botonakis,
2009a,b,c)
5
Table 2: Matrix categories
(Ferronato et al., 2007; Ferronato, Janna, and Gambolati, 2008; Janna, Ferronato, and
Gambolati, 2009, 2010) discuss (Janna and Ferronato, 2011d).
(Janna, Ferronato, and Gambolati, 2013, 2015) provide information on (Janna and
Ferronato, 2014a), while (Janna, Ferronato, and Gambolati, 2015) also covers (Janna
and Ferronato, 2014c,b).
(Ferronato et al., 2010) details (Janna and Ferronato, 2011c,h), with additional dis-
cussion on (Janna and Ferronato, 2011c) in (Janna and Ferronato, 2011b).
(Janna, Ferronato, and Gambolati, 2010) discusses (Janna and Ferronato, 2011e,f), and
(Janna and Ferronato, 2011e) is further discussed in (Janna, Comerlati, and Gambolati,
2009).
(Berger et al., 1995) provides information on the financial optimization problem (Berger
et al., 1997).
(Duff, Grimes, and Lewis, 1989) covers items in the Harwell-Boeing collection, namely
(Will, 1985a,b).
17
Matrix Scaling Before running any benchmark on an SPD matrix, we perform symmet-
ric Jacobi scaling on the matrices to improve the conditioning of the system(Golub and
Van Loan, 2013; Saad, 2003). Explicitly,
A
D
1
2
P AP
D
1
2
,
where D is the diagonal matrix with the diagonal elements of P AP
. Because we deal
with SPD matrices, the diagonal elements of A are positive, ensuring that D
1
2
is well-
defined. Due to the potential for floating point inaccuracies in the scaling, we explicitly set
the diagonal of the scaled matrix to the identity and symmetrize the matrix. That is, we set
A
′′
1
2
(A
+ (A
)
),
and then explicitly set the diagonal of A
′′
to the identity,
a
′′
i,i
1 i {1, . . . , n}.
Because this is a positive scaling of the rows and columns of A, the positive-definiteness
of the matrix is preserved. Additionally, we scale the right-hand side vector b by D
1
2
to
ensure that the scaling is consistent across the system. We discuss our choice of right-hand
side vectors in subsection 5.4. Next, we describe the graph-Laplacian matrices.
5.3 Graph-Laplacian Matrices
This section has been removed while we finish the graph portion of the study. In a future
iteration of this study, we will include graph-Laplacian matrices of undirected graphs.
5.4 Right-Hand Side Vector
We generate a random solution vector x
()
R
n
using the StableRNGs package in Ju-
lia, which reproduces a random number generator (RNG) suitable for scientific computing.
Specifically, we use the seed 123456789 to initialize the random number generator, allowing
us to reproduce the same sequence of random number across different runs and systems.
The StableRNGs package implements the Lehmer pseudo-random number generator, a
prime modulus multiplicative linear congruential generator (Park and Miller, 1988; Payne,
Rabung, and Bogyo, 1969). This generator is known for its simplicity, efficiency, and reason-
ably good statistical properties, making it suitable for scientific computing applications (Park
and Miller, 1988).
Out of an abundance of caution for the statistical properties of the generated random
numbers, we “warm up” the RNG by generating and discarding log
2
(n) random vectors of
length n, where n is the dimension of the matrix A. This practice reduces the likelihood of
any biases or correlations in the sequence that may be present in the initial random numbers
generated by the RNG (Gentle, 2003).
Using the last generated random vector as the solution vector x
()
, we construct the
right-hand side vector b in the following manner:
We locate the largest in magnitude log
2
(n) + 1 elements of x
()
. For each of these
elements, we set its value to either 1 or 1, chosen at random with equal probability. We
set the remaining elements of x
()
to zero, resulting in a sparse vector.
18
Finally, we compute b = Ax
()
, which ensures that the right-hand side vector is consistent
with the solution vector and the matrix A. Performing this procedure allows us to evaluate
the performance of preconditioners in terms of its error x
(k)
x
()
during iterations of the
solver.
5.4.1 Scaling and Reordering of the Right-Hand Side and Solution Vectors
The effect of symmetric Jacobi scaling is independent from the ordering applied to the
matrix. Similarly, it is known that CG and MINRES are unaffected by matrix ordering in
exact arithmetic, so the control case should remain unchanged. However, numerical errors
may introduce slight performance differences. For this reason, we run the control for each
ordering to establish the baseline of comparison against each preconditioner.
After generating the right-hand side vector in the manner described above, we reorder the
matrix, right-hand side, and solution vectors if a reordering is to be applied to the matrix.
Afterward, we scale the right-hand side vector b and the solution vector x
()
to ensure that
the scaling is consistent across the system. For the scaled matrix A
= D
1
2
P AP
D
1
2
we
scale the right-hand side vector b
= D
1
2
P b and the solution vector x
()
= D
1
2
P x
()
.
6 Preconditioners
In this section, we describe the preconditioners that are evaluated in this preliminary
study. In the following subsections, we provide an overview of each preconditioner category,
discussing the software packages used and the expected computational costs associated with
their use.
6.1 Classical Methods
Classical iterative methods serve as the foundation of numerical linear algebra. Although
they are not as efficient as more modern solvers, they are simple and still find use in prac-
tice (Saad, 2003; Golub and Van Loan, 2013).
In the following, we describe the classical iterative methods, highlighting their connection
to matrix splitting techniques where relevant and their role as preconditioners.
6.1.1 Truncated Neumann Series
The Truncated Neumann Series (TNS) preconditioner is based on approximating the
inverse of a matrix using a finite number of terms in the Neumann series expansion (Dubois,
Greenbaum, and Rodrigue, 1979). Given a matrix A expressed in the form:
A = D R
= D
I D
1
R
,
where D is a nonsingular diagonal matrix and R is the remainder matrix. Then we may
express its inverse
A
1
= D
1
I D
1
R
1
,
thus giving us the Neumann series expansion of A
1
A
1
= D
1
X
k=0
D
1
R
k
.
19
By truncating this series after a finite number of terms m, we obtain an approximate inverse:
M
1
= D
1
m
X
k=0
D
1
R
k
,
Because we scale our matrix to have unit diagonal, for A 1 the preconditioner simplifies
to
M
1
=
m
X
k=0
(R)
k
.
However, for A 1, a scaling term must be included in order to guarantee the convergence
of the Neumann series:
M
1
= α
m
X
k=0
(I αR)
k
,
where α is chosen such that αA 1. Because of its simplicity in implementation, we do not
give a detailed algorithm beyond the summation notation given above.
Theoretical Considerations Consider the linear system Ax = b where A has unit
diagonal and R is the remainder matrix. Without loss of generalization, assume 0 A 1,
then the TNS for this system is given by
M
1
=
m
X
k=0
(R)
k
.
Suppose we let m = 1, we have
M
1
= I R.
Because R is the remainder matrix, we have
M
1
= A.
The preconditioned Krylov subspace is given by
K
(k+1)
= span{z
(0)
, Az
(0)
, A
2
z
(0)
, . . . , A
k
z
(0)
},
where
z
(0)
= M
1
r
(0)
.
Thus the TNS preconditioner with m = 1 gives
z
(0)
= Ar
(0)
,
and the Krylov subspace is
K
(k+1)
= span{Ar
(0)
, A
2
r
(0)
, A
3
r
(0)
, . . . , A
k+1
r
(0)
}.
20
In this scenario, the preconditioned residual z
(0)
is A times the original residual r
(0)
.
The Krylov subspace is effectively built from higher powers of A applied to r
(0)
, starting
from Ar
(0)
instead of r
(0)
. As a function of the work performed, the TNS preconditioner
builds the Krylov subspace at approximately the same cost as the unpreconditioned system.
Therefore for matrices that are scaled to have unit diagonal, there should be no significant
improvement in the convergence of the system with the TNS preconditioner as a function of
the work performed.
Computational Cost The computational cost of the TNS preconditioner is equal to the
number of non-zero values in the matrix A multiplied by the number of terms in the Neumann
series expansion.
Preconditioner Settings Used The only setting to tune for the TNS preconditioner are
the number of iterations to apply and the scaling factor. We run this preconditioner for all
combinations of iterations in {1, 2, 3, 4} and
1
α
{∥A
F
, A
, A
1
,
A
2
2
, 1}.
6.1.2 Symmetric Gauss-Seidel and Successive Over-Relaxation
Gauss-Seidel (GS) and Successive Over-Relaxation (SOR) are classical iterative methods
that update the solution vector component-wise (Golub and Van Loan, 2013; Saad, 2003).
These methods improve over the Jacobi method by using the most recent values available to
update each component of the solution vector. The SOR method accelerates the convergence
of GS by introducing a relaxation parameter ω that over-relaxes the update in the direction of
the residual, which can improve the convergence rate of the solver if ω is chosen appropriately.
Consequently, the GS method can be viewed as a special case of SOR with ω = 1.
Given a linear system Ax = b, where A R
n×n
is a nonsingular matrix, x R
n
is the
solution vector, and b R
n
is the right-hand side vector, the SOR method partitions A into
its diagonal, strictly lower triangular, and strictly upper triangular components:
A = D + L + U,
where D are the diagonal elements of A, L contains the entries strictly below the diagonal,
and U contains the entries strictly above the diagonal (Golub and Van Loan, 2013).
The iterative update formula for SOR is then given by:
x
(k+1)
= (D + ωL)
1
ωb (ωU + (1 ω)D)x
(k)
, (2)
where ω is the relaxation parameter, and x
(k)
is the approximation of the solution at iteration
k (Golub and Van Loan, 2013). Because GS is SOR with ω = 1, we do not provide a separate
update formula here. These methods are not suitable for use in solvers that require symmetric
matrices, such as PCG and PMR, as the preconditioner is not symmetric.
Symmetric GS and SOR Symmetric Successive Over-Relaxation (SSOR) and Symmet-
ric Gauss-Seidel (SGS) are variants of the SOR and GS methods that update the solution
vector in a symmetric fashion, alternating between forward and backward sweeps (Golub
and Van Loan, 2013; Saad, 2003; Young, 1954, 1950). Performing the update symmetrically
21
smooths the error in the solution vector and can improve the convergence rate of the solver.
Just as before, the SGS method can be viewed as a special case of SSOR with ω = 1.
Given the same linear system Ax = b and splitting defined for the SOR method, the
SSOR method performs a forward sweep (as in Equation 2) followed by a backward sweep
to update the solution vector. Formally, the forward sweep is given by:
x
(k+
1
2
)
= (D + ωL)
1
ωb (ωU + (1 ω)D)x
(k)
,
and the backward sweep is given by
x
(k+1)
= (D + ωU )
1
h
ωb (ωL + (1 ω)D)x
(k+
1
2
)
i
.
Algorithm Description These methods update each component of the solution vector
sequentially, using the most recent values available. Algorithm 2 provides the procedural
steps for implementing SSOR.
Algorithm 2 Symmetric Successive Over-Relaxation (SSOR) Method
1: Input: Matrix A, right-hand side b, initial guess x
(0)
, relaxation parameter ω, and
tolerance τ.
2: Output: Approximate solution x to Ax = b.
3: k 0
4: while x
(k+1)
x
(k)
> τ do
5: for i = 1 to n do Forward Sweep
6: x
(k+
1
2
)
i
(1 ω)x
(k)
i
+
ω
a
i,i
b
i
P
i1
j=1
a
i,j
x
(k+
1
2
)
j
P
n
j=i
a
i,j
x
(k)
j
7: for i = n down to 1 do Backward Sweep
8: x
(k+1)
i
(1 ω)x
(k+
1
2
)
i
+
ω
a
i,i
b
i
P
n
j=i+1
a
i,j
x
(k+1)
j
P
i
j=1
a
i,j
x
(k+
1
2
)
j
9: k k + 1
10: return x
(k)
Because each implementation is similar to one another, we do not provide separate al-
gorithms for GS, SOR, and SGS. The non-symmetric variants can be obtained by simply
applying only the forward or backward sweep of the SSOR method. The GS and SGS meth-
ods may be implemented with a constant factor less work than the SOR and SSOR methods,
or simply by setting ω = 1.
Computational Cost The computational cost per sweep of GS, SGS, SOR and SSOR
is equal to the number of non-zero values in the matrix A. There are an additional 2n
operations per sweep to compute the proportional update to the solution vector x for SOR
and SSOR. The cost of the symmetric variants are exactly twice the cost of the non-symmetric
variants, as they perform both a forward and backward sweep. These costs are given for a
single iteration of the method, and the total cost of the preconditioner is proportional to the
number of iterations applied.
Preconditioner Settings Used We run these preconditioners for all combinations of
iterations in {1, 2} and relaxation parameters in {1.0, 1.2, 1.5, 1.8}. When running with a
22
relaxation parameter of 1.0, we use a standard SGS implementation that does not perform
the additional computations to compute the relaxation.
Additionally, it was shown in (Young, 1954) that, for certain matrices, there is a known
optimal value of ω. Those matrices that possess Young’s “Property-A” (Young, 1954) have
an optimal ω parameter described by
ω = 1 +
J
1 +
p
(1 J
2
)
!
2
,
where J is the spectral norm of the Jacobi iteration matrix of A. This is well-defined
for matrices with J 1, but may or may not correspond with the optimal value of ω for
matrices not possessing Property-A. We compute this value for all matrices in which it is
well defined and run for 1 and 2 iterations.
6.1.3 Steepest Descent
The Steepest Descent (SD) method is an iterative algorithm for solving linear systems
that minimizes the residual along the direction of the gradient of the quadratic form
ϕ(x) =
1
2
x
Ax b
x.
The method is guaranteed to converge for symmetric positive definite matrices, but its
convergence rate can be slow for ill-conditioned systems (Golub and Van Loan, 2013; Saad,
2003).
Given a symmetric positive definite (SPD) matrix A R
n×n
, a right-hand side vector
b R
n
, and an initial guess x
(0)
, the goal is to solve the linear system Ax = b.
Algorithm Description At each iteration, SD updates the solution estimate x
(k)
by mov-
ing in the direction of the residual with optimal step size chosen to minimize the quadratic
function ϕ(x) in that direction.
This iterative process is described in Algorithm 3.
Algorithm 3 Steepest Descent Method
1: Input: SPD matrix A, right-hand side b, initial guess x
(0)
, and tolerance τ.
2: Output: Approximate solution x to Ax = b.
3: r
(0)
b Ax
(0)
4: k 0
5: while r
(k)
> τ · b do
6: k k + 1
7: α
(k)
(r
(k1)
)
r
(k1)
(r
(k1)
)
Ar
(k1)
8: x
(k)
x
(k1)
+ α
(k)
r
(k1)
9: r
(k)
r
(k1)
α
(k)
Ar
(k1)
10: return x
(k)
23
Computational Cost Each iteration requires one matrix-vector product, two inner prod-
ucts, and two vector updates that are multiplied by the step size α. On the last update of
SD, we do not need to update the residual. Therefore the total computational cost for m
iterations is equal to:
(4 · n + nnz(A)) · m n,
where nnz(A) is the number of non-zero values in the matrix A.
Preconditioner Settings Used The only setting to tune for this preconditioner is the
number of iterations to run. We run this preconditioner for 1 and 2 iterations.
6.2 Sparse Approximate Inverse
Sparse Approximate Inverse (SAI) preconditioners are based on constructing an explicit
approximation of the inverse of the matrix in a sparse format, enabling efficient application
during iterative methods (Benzi and Tuma, 1998; Grote and Huckle, 1997; Regev and Saun-
ders, 2020). Unlike classical methods or those based on Gaussian elimination (discussed in
subsection 6.3), SAI preconditioners provide an explicit matrix that approximates the inverse
and can be applied directly to the residual in each iteration (Benzi and Tuma, 1998; Grote
and Huckle, 1997). In this sense, while there is a concept of M
1
for the application of the
preconditioner, there is less of a concept of M = (M
1
)
1
itself
The core idea behind SAI is to approximate the inverse of the matrix along some sparsity
pattern such that the resulting matrix is significantly more sparse than the explicit inverse of
A. Preserving sparsity in this manner while approximating the inverse where M
1
A
1
is attractive for large-scale sparse systems where the cost of applying the preconditioner is
a significant factor in the overall computational cost (Grote and Huckle, 1997).
6.2.1 Symmetric Sparse Approximate Inverse
The Symmetric Sparse Approximate Inverse (SSAI) preconditioner (Regev and Saunders,
2020; Regev, 2022) builds on earlier work on SAI by introducing a symmetrizing step to the
algorithm. The SSAI algorithm seeks to minimize the difference between the identity and
the product of the preconditioner and the matrix. Formally, this is achieved by solving the
optimization problem
arg min
M
1
I M
1
A
2
F
,
where ·
F
denotes the Frobenius norm.
After finding some such M
1
, the SSAI preconditioner is then constructed by symmetriz-
ing the matrix:
(M
1
)
=
1
2
M
1
+ M
−⊤
.
This implementation can be found at https://stanford.edu/group/SOL/software/
minres/minres20.zip. This preconditioner has a fill-in multiple parameter that can be
adjusted to control the sparsity of the resulting matrix.
Computational Cost The computational cost of the SSAI preconditioner is equal to the
number of non-zero values in the matrix M
1
.
24
Preconditioner Settings Used The authors suggest using the average number of ele-
ments in a column as the level of fill allowed in the preconditioner. We run the preconditioner
with fill levels set to the average number of elements per column multipled by the values in
{0.5, 1.0, 2.0, 3.0}.
6.3 Incomplete Factorizations
Incomplete factorizations are a class of preconditioners that approximate the factoriza-
tion of a matrix A in a sparse format, enabling efficient application during iterative meth-
ods (Benzi, 2002; Saad, 2003). The sparsity of the incomplete factors are preserved by
dropping some entries from the factorization, which can lead to a significant reduction in
the computational cost of applying the preconditioner. These are well-known to be effective
preconditioners (Benzi, 2002), are widely used in practice, and are available in many major
software packages.
Next we desribe the Incomplete LU Factorization (ILU) and Incomplete Cholesky Fac-
torization (IC) preconditioners that we consider in our study.
6.3.1 Incomplete Cholesky and Modified Incomplete Cholesky
The IC factorization approximates the exact Cholesky factorization of an SPD matrix
A R
n×n
by computing a sparse lower triangular matrix F such that:
A F F
.
The IC factorization retains only selected non-zero entries of the Cholesky factor along
some sparsity pattern. This sparsity pattern may be defined by the structure of the matrix
itself, by some power of the matrix, or by some other criteria. The primary goal with any
of these criteria is to produce a factorization that is computationally less espensive to apply
while capturing the essential features of the original matrix (Meijerink and van der Vorst,
1977; Saad, 2003).
The IC factorization supplied by MATLAB allows for either an ILU(0) variant or one
that is based solely on thresholding. In the case of the ILU(0) variant, the factorization is
computed without fill-in along only the sparsity pattern of A. In the case of the threshold
variant, any amount of fill-in is allowed provided that the entries are greater in magnitude
than the threshold.
The IC preconditioner is applied by solving triangular systems involving L and its trans-
pose. Given a residual vector r, the preconditioned residual is computed as:
z = M
1
r = F
−⊤
F
1
r.
The Modified Incomplete Cholesky (MIC) factorization modifiesathe standard IC by
adjusting the diagonal entries of F to compensate for the dropped fill-in elements (Robert,
1982). This difference aims to improve numerical stability and convergence properties of the
preconditioner (Gustafsson, 1978).
The MIC factorization computes F such that:
A F F
,
25
and
ε Ae F F
e,
where e is the vector of all ones and ε is machine precision. The diagonal modification of
the MIC factorization seeks to reincorporate the magnitude of the dropped fill-in elements
into the diagonal of F .
The MIC preconditioner is applied in the same manner as the IC preconditioner.
Preconditioner Settings Used We run this preconditioner with the ILU(0) variant as
well as thresholding set to all values in {10
4
, 10
5
, 10
6
, 10
7
, 10
8
}.
6.3.2 Supernodal Incomplete LU Factorization
ILU approximates the exact LU factorization by retaining only selected non-zero entries
in the lower (L) and upper (U ) triangular matrices:
A LU.
Note that for symmetric matrices and given a symmetric ordering, the ILU factorization
should be equivalent to the IC factorization.
We utilize the SuperLU library (Li, Saad, and Chow, 2003; Li and Shao, 2010; Demmel
et al., 1999; Li et al., 1999; Li, 2005), which provides efficient implementations of sparse
LU factorization with partial pivoting. SuperLU allows for control over fill-in levels and
drop tolerances, which can be adjusted to control the sparsity of the resulting factors. In
addition to complete LU factorizations, SuperLU provides incomplete factorizations with
varying levels of fill-in—we utilize their ILU factorization in this study. SuperLU uses a
supernodal approach to factorize the matrix, which is a modification of the standard LU
factorization that groups multiple columns together to improve cache efficiency and reduce
the number of operations required to factorize the matrix.
The SuperLU package returns a preconditioner that is not necessarily symmetric, even
with symmetric orderings applied. To utilize this preconditioner, we instead utilize the lower
triangular portion of the factorization. We create a symmetric preconditioner by scaling L
by the square root of the diagonal entries of U :
L
= LD
1
2
,
where D is the diagonal matrix of L. Now we have
M = L
(L
)
,
where M is a symmetric preconditioner. Note that the lower triangular portion of the pre-
conditioner returned by SuperLU is scaled to have unit diagonal entries. Thus, we are adding
half of the contribution of the scaling from U, such that when applying the preconditioner,
we are applying the full scaling.
Preconditioner Settings Used We run this preconditioner with all combinations of a
drop tolerance in {10
4
, 10
5
, 10
6
} and fill-in levels in {1, 2, 3}.
26
Algorithm 4 Sparse Triangular Solve in CSR Format
Require: Lower triangular matrix L, upper triangular matrix U , right-hand side vector b
Ensure: Solution vector x satisfying LUx = b
1: x b
2: n length of x
3: for i = 1 to n do Forward Solve
4: for j = colptr
L
[i] to colptr
L
[i + 1] 2 do
5: x
i
x
i
nzval
L
[j] · x
rowval
L
[j]
6: x
i
1
x
i
· nzval
L
[colptr
L
[i + 1] 1]
7: for i = n down to 1 do Backward Solve
8: for j = colptr
U
[i] + 1 to colptr
U
[i + 1] 1 do
9: x
i
x
i
nzval
U
[j] · x
rowval
U
[j]
10: x
i
1
x
i
· nzval
U
[colptr
U
[i]]
11: return x
6.3.3 Algorithm: Applying Incomplete Factorizations
Algorithm 4 outlines the procedure for solving sparse triangular systems using the com-
pressed sparse row (CSR) format.
In Algorithm 4, the CSR data structures for L and U are:
colptr
L
, rowval
L
, nzval
L
for L
colptr
U
, rowval
U
, nzval
U
for U.
The forward solve updates the solution vector x by performing a forward substitution
with the lower triangular matrix L. Afterward, the backward solve updates x by performing
a backward substitution with the upper triangular matrix U . This implementation efficiently
applies the inverse of the incomplete factorizations to the residual vector. For the incomplete
cholesky, the algorithm must be modified only slightly to account for the fact that the second
factor is the transpose of the first. This can be accomplished by performing a CSC variant of
the backward solve portion of the algorithm on the CSR factor that was used in the forward
solve.
Computational Cost The computational cost of applying the incomplete factorization is
equal to the number of non-zero values in each factor that is inverted. In the IC and MIC
cases, this is two times the number of non-zero values in the lower triangular factor. In the
ILU case, this is equal to the number of non-zero values in the lower and upper triangular
factors.
6.4 Algebraic Multigrid
Algebraic Multigrid (AMG) methods are an advanced class of iterative methods that
are designed to solve large and sparse linear systems efficiently. These methods work by
constructing a hierarchy of coarser and coarser grids that approximate the original problem,
allowing for the solution to be computed at different levels of resolution (Brandt, 1986).
Unlike geometric multigrid methods, which we do not consider in this study, AMG constructs
27
the hierarchy based on the matrix itself, making it more general and applicable to a wider
range of problems (Brandt, 1986; Saad, 2003; St¨uben, 2001; Brandt and Livne, 2011).
AMG is particularly effective for elliptic PDEs, where the matrix is symmetric and pos-
itive definite, and the problem is discretized on a structured or unstructured grid. For
these types of problms, they are known to achieve optimal or nearly-optimal convergence
rates (Brandt, 1986; Brandt and Livne, 2011).
6.4.1 Ruge-Stuben and Smoothed Aggregation
In this study, we focus on the Ruge-Stuben (Ruge and St¨uben, 1987) AMG method and
the Smoothed Aggregation (Vanˇek, Brezina, and Mandel, 2001) AMG method, which are
two popular AMG methods that are widely used in practice (Bell et al., 2023). We utilize
the PyAMG library (Bell et al., 2023) to construct the AMG preconditioners.
Computational Cost The PyAmg library provides a function to compute the complexity
of a cycle of the AMG method used. We use this as an approximation to the computational
cost of the AMG preconditioner.
Preconditioner Settings Used The Ruge-Stuben method is provided with V, W, F, and
AMLI cycles. The Smoothed Aggregation method is provided with V, W, and F cycles. We
test both methods and all cycles in our study, running for one and two cycles. We otherwise
use the default settings for the AMG preconditioner.
6.5 Graph-Laplacian Preconditioners
This subsection explores preconditioners designed specifically for graph-Laplacian matri-
ces. Although we understand that these preconditioners are not designed for general SPD
matrices, we still run them on general SPD matrices and report their results.
For general SPD matrices that are not SDD, we do not expect these preconditioners to
perform well. For matrices that are SDD when unscaled but not SDD when scaled, we run
the preconditioner on the unscaled matrix. For matrices that are SDD when scaled, we
run the preconditioner on the scaled matrix. For the remaining case in which the matrices
are not SDD when scaled or unscaled, we create the preconditioner on a matrix that was
modified to be SDD by increasing the diagonal entries of the matrix to be at least equal to
the row sum of the absolute values of the off-diagonal entries.
For an SDD matrix of dimension n, we can construct a graph-Laplacian matrix of dimen-
sion 2n through an augmentation described in (Kelner et al., 2013), which is an improvement
over Gremban’s reduction that would produce a matrix of dimension 2n+1 (Gremban, 1996).
For an SDD matrix A, this reduction is given by the following equation:
L =
D + N +
1
2
S
P +
1
2
S
P +
1
2
S
D + N +
1
2
S
,
where P are the positive off-diagonal entries of A, N are the negative off-diagonal entries
of A, and where
D = diag (P e N e) ,
and
S = s
i,i
= a
i,i
d
i,i
.
28
The diagonal matrix S represents the excess diagonal entries of A over the row sum of
the absolute values of the off-diagonal entries. We then distribute half of this slack to the
off-diagonal entries of the matrix and half to the diagonal entries.
Given a RHS vector b to the original system, the augmented RHS vector is given by:
b
=
b
b
.
We can then solve the augmented system Lx
= b
to obtain the solution to the original
system. The solution to the original system is then recovered by
2x = (x
)
1:n
(x
)
n+1:2n
.
We utilize this reduction when applying the graph-Laplacian preconditioners to the gen-
eral SPD matrices. When solving for the solution to graphs, we use the original graph-
Laplacian matrix. Next, we briefly introduce the graph-Laplacian preconditioners.
6.5.1 Combinatorial Multigrid
We utilize the CombinatorialMultigrid package in Julia to construct the precondi-
tioner (Koutis, Miller, and Tolliver, 2011).
2
Computational Cost To obtain the work expended by the preconditioner, we used a
modified version of the preconditioner that output the number of multiplications used in
each step of the cycle instead of applying the preconditioner to the matrix. A CSV file
containing the number of multiplications for a graph or matrix in the test suite will be found
on the github page associated with this project when it is released.
Preconditioner Settings Used Because there are no options to tune for this precondi-
tioner, we simply run the preconditioner with function returned by the package.
6.5.2 Laplacians: Approximate Cholesky
We utilize the Laplacians package in Julia.
3
We use the approxchol lap function
to construct the preconditioner, which is a randomized variant that constructs an LDL
decomposition of a subtree of the graph (Gao, Kyng, and Spielman, 2023).
As of the writing of this study, the Laplacians package does not expose the precondi-
tioner for use in outside solvers. We worked around this by extracting the relevant portions
of the package and creating a function that applies the preconditioner to a matrix.
4
Computational Cost We compute the work expended by the preconditioner as the num-
ber of multiplications used in inverting the LDL
factorization. Note that for matrices that
have multiple connected components in the augmented graph, multiple LDL
factorizations
are computed, one for each connected component, and the work expended is the sum of the
work expended for each connected component.
Preconditioner Settings Used There are two parameters that can be tuned: split and
merge, which controls the number of multi-edges used. We run these parameters for all
combinations of split and merge values in {0, 1, 2, 3, 4, 5}.
2
We use the release with SHA hash 077101c78e35095b8fc33b4625f515a2962c4cf6
3
We use the version with SHA hash d89118f61fe4346d2401e7595c197ef3ef73d32c
4
We thank the authors of the Laplacians package for their assistance in this matter.
29
References
P. R. Amestoy, T. A. Davis, and I. S. Duff. An approximate minimum degree ordering
algorithm. SIAM Journal on Matrix Analysis and Applications, 17 (4), p. 886–905, 1996.
doi:10.1137/s0895479894278952.
N. Bell, L. N. Olson, J. Schroder, and B. Southworth. PyAMG: Algebraic multigrid
solvers in python. Journal of Open Source Software, 8 (87), p. 5495, 2023. doi:10.21105/
joss.05495.
M. Benzi. Preconditioning techniques for large linear systems: A survey. Journal of Com-
putational Physics, 182 (2), p. 418–477, 2002. doi:10.1006/jcph.2002.7176.
M. Benzi, J. C. Haws, and M. Tuma. Preconditioning highly indefinite and nonsymmetric
matrices. SIAM Journal on Scientific Computing, 22 (4), p. 1333–1353, 2000. doi:
10.1137/s1064827599361308.
M. Benzi and M. Tuma. A sparse approximate inverse preconditioner for nonsymmetric
linear systems. SIAM Journal on Scientific Computing, 19 (3), p. 968–994, 1998. doi:
10.1137/s1064827595294691.
M. Benzi and M. T
ˆ
uma. A comparative study of sparse approximate inverse precondi-
tioners. Applied Numerical Mathematics, 30 (2–3), p. 305–340, 1999. doi:10.1016/
s0168-9274(98)00118-4.
M. Benzi and M. T
˚
uma. Numerical experiments with two approximate inverse precondition-
ers. BIT Numerical Mathematics, 38 (2), p. 234–241, 1998. doi:10.1007/bf02512364.
A. Berger, J. Mulvey, E. Rothberg, and R. Vanderbei. Solving multistage stochas-
tic programs using tree dissection. Statistics and Operations Research Research Report,
Princeton University, p. 55, 1995.
———. finan512. https://sparse.tamu.edu/Mulvey/finan512, 1997. Economic problem,
portfolio optimization. Princeton University. Retrieved from SuiteSparse (Mulvey group).
J. Bezanson, A. Edelman, S. Karpinski, and V. B. Shah. Julia: A fresh approach to
numerical computing. SIAM Review, 59 (1), pp. 65–98, 2017. doi:10.1137/141000671.
R. Bhatia. Matrix Analysis, Springer, New York, NY, 1997 edition, 1996.
D. Billger. gyro m. https://sparse.tamu.edu/Oberwolfach/gyro_m, 2004. Model re-
duction problem, butterfly gyro mass matrix. The Imego Institute. Retrieved from SuiteS-
parse (Oberwolfach group).
D. Bindel. ted b. https://sparse.tamu.edu/Bindel/ted_B, 2006. Thermal problem,
finite element mesh, coupled linear thermoelasticity equations on a bar. UC Berkeley.
Retrieved from SuiteSparse (Bindel group).
30
I. Botonakis. thermomech dm. https://sparse.tamu.edu/Botonakis/thermomech_dM,
2009a. Thermal problem, deformation of a steel cylinder. Retrieved from Suite Sparse
(Botonakis group).
———. thermomech tc. https://sparse.tamu.edu/Botonakis/thermomech_TC, 2009b.
Thermal problem, temperature of a steel cylinder. Retrieved from Suite Sparse (Botonakis
group).
———. thermomech tc. https://sparse.tamu.edu/Botonakis/thermomech_TC, 2009c.
Thermal problem, temperature of a steel cylinder. Retrieved from Suite Sparse (Botonakis
group).
A. Brandt. Algebraic multigrid theory: The symmetric case. Applied Mathematics and
Computation, 19 (1–4), p. 23–56, 1986. doi:10.1016/0096-3003(86)90095-0.
A. Brandt and O. E. Livne. Multigrid techniques, Society for Industrial and Applied
Mathematics, 2011.
W. L. Briggs. A multigrid tutorial, Society for Industrial and Applied Mathematics,
Philadelphia, Pa., 2 edition, 2000.
consph. consph. https://sparse.tamu.edu/Williams/consph, 2008. 2D/3D Problem.
Retrieved from SuiteSparse (Williams group).
A. Cunningham. qa8fm. https://sparse.tamu.edu/Cunningham/qa8fm, 2002. Acoustics
problem, 3D acoustic finite element mass matrix. Vibro-Acoustic Sciences, Inc. Retrieved
from SuiteSparse (Cunningham group).
E. Cuthill and J. McKee. Reducing the bandwidth of sparse symmetric matrices. In Pro-
ceedings of the 1969 24th national conference on -. 1969. doi:10.1145/800195.805928.
M. Dabrowski, M. Krotkiewski, and D. W. Schmid. Milamin: Matlab-based finite
element method solver for large problems. Geochemistry, Geophysics, Geosystems, 9 (4),
2008. doi:10.1029/2007gc001719.
C. Damhaug. ship 001. https://sparse.tamu.edu/DNVS/ship_001, 1999a. Structural
problem, ship structure. DNV Software. Retrieved from SuiteSparse (DNVS group).
———. ship 003. https://sparse.tamu.edu/DNVS/ship_003, 1999b. Structural problem,
ship structure from a production run. DNV Software. Retrieved from SuiteSparse (DNVS
group).
———. shipsec1. https://sparse.tamu.edu/DNVS/shipsec1, 1999c. Structural problem,
ship section/detail from a production run. DNV Software. Retrieved from SuiteSparse
(DNVS group).
———. shipsec5. https://sparse.tamu.edu/DNVS/shipsec5, 1999d. PARASOL Ship
section. DNV Software. Retrieved from SuiteSparse (DNVS group).
31
———. shipsec8. https://sparse.tamu.edu/DNVS/shipsec8, 1999e. Structural problem,
ship section/detail from a production run. DNV Software. Retrieved from SuiteSparse
(DNVS group).
T. A. Davis. Direct Methods for Sparse Linear Systems, Society for Industrial and Applied
Mathematics, 2006a. doi:10.1137/1.9780898718881.
———. Direct Methods for Sparse Linear Systems, Society for Industrial and Applied Math-
ematics, 2006b. doi:10.1137/1.9780898718881.ch7.
T. A. Davis and Y. Hu. The university of florida sparse matrix collection. ACM Transac-
tions on Mathematical Software, 38 (1), p. 1–25, 2011. doi:10.1145/2049662.2049663.
R. Dembo and U. Tulowitzki. obstclae. https://sparse.tamu.edu/GHS_psdef/
obstclae, 1983. Optimization problem, quadratic obstacle problem. CUTEr. Retrieved
from SuiteSparse (GHS psdef group).
J. W. Demmel, S. C. Eisenstat, J. R. Gilbert, X. S. Li, and J. W. H. Liu. A supernodal
approach to sparse partial pivoting. SIAM J. Matrix Analysis and Applications, 20 (3),
pp. 720–755, 1999.
E. D. Dolan and J. J. Mor
´
e. minsurfo. https://sparse.tamu.edu/GHS_psdef/minsurfo,
2000. Optimization problem, minimum surface problem. CUTEr. Retrieved from SuiteS-
parse (GHS psdef group).
Dominique, A. Montoison, A. S. Siqueira, JSOBot, M. Toukal, tmigot, E. Saba,
J. TagBot, T. Kelman, and Y. Masuda. Juliasmoothoptimizers/amd.jl: v0.5.3 . https:
//zenodo.org/doi/10.5281/zenodo.3381898, 2023. doi:10.5281/ZENODO.3381898.
L. Dubcova, J. Cerveny, and P. Solin. Dubcova1. https://sparse.tamu.edu/UTEP/
Dubcova1, 2007a. 2D/3D Problem from PDE solver. University of Texas at El Paso.
Retrieved from SuiteSparse (UTEP group).
———. Dubcova2. https://sparse.tamu.edu/UTEP/Dubcova2, 2007b. 2D/3D Problem
from PDE solver. University of Texas at El Paso. Retrieved from SuiteSparse (UTEP
group).
———. Dubcova3. https://sparse.tamu.edu/UTEP/Dubcova3, 2007c. 2D/3D Problem
from PDE solver. University of Texas at El Paso. Retrieved from SuiteSparse (UTEP
group).
P. F. Dubois, A. Greenbaum, and G. H. Rodrigue. Approximating the inverse of a matrix
for use in iterative algorithms on vector processors. Computing, 22 (3), p. 257–268, 1979.
doi:10.1007/bf02243566.
I. S. Duff, R. G. Grimes, and J. G. Lewis. Sparse matrix test problems. ACM Transactions
on Mathematical Software, 15 (1), p. 1–14, 1989. doi:10.1145/62038.62043.
32
S. B. Elechi. Benelechi1. https://sparse.tamu.edu/BenElechi/BenElechi1, 2007.
2D/3D Problem. Arcelor, Inc. Retrieved from SuiteSparse (BenElechi group).
M. Ferronato, G. Gambolati, C. Janna, and P. Teatini. Numerical modelling of
regional faults in land subsidence prediction above gas/oil reservoirs. International Journal
for Numerical and Analytical Methods in Geomechanics, 32 (6), p. 633–657, 2007. doi:
10.1002/nag.640.
———. Geomechanical issues of anthropogenic co2 sequestration in exploited gas fields.
Energy Conversion and Management, 51 (10), p. 1918–1928, 2010. doi:10.1016/j.
enconman.2010.02.024.
M. Ferronato, C. Janna, and G. Gambolati. Mixed constraint preconditioning in com-
putational contact mechanics. Computer Methods in Applied Mechanics and Engineering,
197 (45–48), p. 3922–3931, 2008. doi:10.1016/j.cma.2008.03.008.
M. Ferronato, C. Janna, and G. Pini. Shifted fsai preconditioners for the efficient par-
allel solution of non-linear groundwater flow models. International Journal for Numerical
Methods in Engineering, 89 (13), p. 1707–1719, 2011. doi:10.1002/nme.3309.
Y. Gao, R. Kyng, and D. A. Spielman. Robust and practical solution of laplacian equations
by approximate elimination. 2023. doi:10.48550/ARXIV.2303.00709.
J. E. Gentle. Random number generation and Monte Carlo methods, Springer, New York,
NY, 2 edition, 2003.
T. George, A. Gupta, and V. Sarin. An empirical analysis of the performance of precon-
ditioners for spd systems. ACM Transactions on Mathematical Software, 38 (4), p. 1–30,
2012. doi:10.1145/2331130.2331132.
A. Ghai, C. Lu, and X. Jiao. A comparison of preconditioned krylov subspace methods for
large-scale nonsymmetric linear systems. 2016. doi:10.48550/ARXIV.1607.00351.
G. H. Golub and C. F. Van Loan. Matrix Computations - 4th Edition, Johns Hopkins
University Press, Philadelphia, PA, 2013. arXiv:https://epubs.siam.org/doi/pdf/
10.1137/1.9781421407944, doi:10.1137/1.9781421407944.
N. Gould. cvxbqp1. https://sparse.tamu.edu/GHS_psdef/cvxbqp1, 1995. Optimization
problem, barrier Hessian from convex QP. CUTEr. Retrieved from SuiteSparse (GHS psdef
group).
A. Greenbaum. Iterative Methods for Solving Linear Systems, Society for Industrial and
Applied Mathematics, 1997. doi:10.1137/1.9781611970937.
K. D. Gremban. Combinatorial Preconditioners for Sparse, Symmetric Diagonally Domi-
nant Linear Systems. Ph.D. thesis, Carnegie Mellon University, 1996.
R. Grimes. crystm02. https://sparse.tamu.edu/Boeing/crystm02, 1995a. Materials
problem, finite element mesh free vibration mass matrix. Boeing. Retrieved from SuiteS-
parse (Boeing group).
33
———. crystm03. https://sparse.tamu.edu/Boeing/crystm03, 1995b. Materials prob-
lem, finite element mesh free vibration mass matrix. Boeing. Retrieved from SuiteSparse
(Boeing group).
M. J. Grote and T. Huckle. Parallel preconditioning with sparse approximate in-
verses. SIAM Journal on Scientific Computing, 18 (3), p. 838–853, 1997. doi:10.1137/
s1064827594276552.
I. Gustafsson. A class of first order factorization methods. BIT, 18 (2), p. 142–156, 1978.
doi:10.1007/bf01931691.
M. Hestenes and E. Stiefel. Methods of conjugate gradients for solving linear systems.
Journal of Research of the National Bureau of Standards, 49 (6), p. 409, 1952. doi:
10.6028/jres.049.044.
J. Inca, A. Davis, and A. Dubas. Benchmarking preconditioners. Technical Report
UKAEA-STEP-PR(23)02, United Kingdom Atomic Energy Authority, 2021.
D. Isaak. tmt sym. https://sparse.tamu.edu/CEMW/tmt_sym, 2008. Electromagnetics
problem. Retrieved from SuiteSparse (CEMW group).
C. Janna, A. Comerlati, and G. Gambolati. A comparison of projective and direct
solvers for finite elements in elastostatics. Advances in Engineering Software, 40 (8), p.
675–685, 2009. doi:10.1016/j.advengsoft.2008.11.010.
C. Janna and M. Ferronato. Adaptive pattern research for block fsai precondition-
ing. SIAM Journal on Scientific Computing, 33 (6), p. 3357–3380, 2011a. doi:
10.1137/100810368.
———. Adaptive pattern research for block fsai preconditioning. SIAM Journal on Scientific
Computing, 33 (6), p. 3357–3380, 2011b. doi:10.1137/100810368.
———. Emilia 923. https://sparse.tamu.edu/Janna/Emilia_923, 2011c. Structural
problem, geomechanical model for CO2 sequestration. University of Padova. Retrieved
from SuiteSparse (Janna group).
———. Fault 639. https://sparse.tamu.edu/Janna/Fault_639, 2011d. Structural prob-
lem, contact mechanics for model of a faulted gas reservoir. University of Padova. Retrieved
from SuiteSparse (Janna group).
———. Flan 1565. https://sparse.tamu.edu/Janna/Flan_1565, 2011e. Structural prob-
lem, 3D model of a steel flance, hexahedral finite elements. University of Padova. Retrieved
from SuiteSparse (Janna group).
———. Geo 1438. https://sparse.tamu.edu/Janna/Geo_1438, 2011f. Structural prob-
lem, geomechanical model of earth crust with underground deformation. University of
Padova. Retrieved from SuiteSparse (Janna group).
34
———. Hook 1498. https://sparse.tamu.edu/Janna/Hook_1498, 2011g. Structural prob-
lem, 3D model of a steel hook with tetrahedral finite elements. University of Padova.
Retrieved from SuiteSparse (Janna group).
———. Serena. https://sparse.tamu.edu/Janna/Serena, 2011h. Structural problem, gas
resevoir simulation for CO2 sequestration. University of Padova. Retrieved from SuiteS-
parse (Janna group).
———. Stocf-1465. https://sparse.tamu.edu/Janna/StocF-1465, 2011i. Computational
fluid dynamics, flow in porous medium with stochastic permeabilies. University of Padova.
Retrieved from SuiteSparse (Janna group).
———. Bump 2911. https://sparse.tamu.edu/Janna/Bump_2911, 2014a. 2D/3D Prob-
lem, 3D geomechanical reservoir simulation. University of Padova. Retrieved from SuiteS-
parse (Janna group).
———. Pflow 742. https://sparse.tamu.edu/Janna/PFlow_742, 2014b. 2D/3D Prob-
lem, 3D pressure-temp evoluation in porous media. University of Padova. Retrieved from
SuiteSparse (Janna group).
———. Queen 4147. https://sparse.tamu.edu/Janna/Queen_4147, 2014c. 2D/3D Prob-
lem, 3D structural problem. University of Padova. Retrieved from SuiteSparse (Janna
group).
C. Janna, M. Ferronato, and G. Gambolati. Multi-level incomplete factorizations
for the iterative solution of non-linear finite element problems. International Journal for
Numerical Methods in Engineering, 80 (5), p. 651–670, 2009. doi:10.1002/nme.2664.
———. A block fsai-ilu parallel preconditioner for symmetric positive definite linear sys-
tems. SIAM Journal on Scientific Computing, 32 (5), p. 2468–2484, 2010. doi:
10.1137/090779760.
———. Enhanced block fsai preconditioning using domain decomposition techniques. SIAM
Journal on Scientific Computing, 35 (5), p. S229–S249, 2013. doi:10.1137/120880860.
———. The use of supernodes in factored sparse approximate inverse preconditioning. SIAM
Journal on Scientific Computing, 37 (1), p. C72–C94, 2015. doi:10.1137/140956026.
J. A. Kelner, L. Orecchia, A. Sidford, and Z. A. Zhu. A simple, combinatorial algo-
rithm for solving sdd systems in nearly-linear time. 2013. doi:10.48550/ARXIV.1301.
6628.
R. Kouhia. cbuckle. https://sparse.tamu.edu/TKK/cbuckle, 2008a. Structural problem,
compressed cylindrical shell buckling. Helsinki University of Technology. Retrieved from
SuiteSparse (TKK group).
———. smt. https://sparse.tamu.edu/TKK/smt, 2008b. Structural problem, 3D model,
thermal stresss analysis of surgace mounted transistor. Helsinki University of Technology.
Retrieved from SuiteSparse (TKK group).
35
I. Koutis, G. L. Miller, and D. Tolliver. Combinatorial preconditioners and multilevel
solvers for problems in computer vision and image processing. Computer Vision and Image
Understanding, 115 (12), p. 1638–1646, 2011. doi:10.1016/j.cviu.2011.05.013.
K. Leppkes and U. Naumann. shallow water1. https://sparse.tamu.edu/MaxPlanck/
shallow_water1, 2009a. Computational fluid dynamics problem, shallow water modelling.
Max-Planck Institute of Meteorology. Retrieved from SuiteSparse (MaxPlanck group).
———. shallow water2. https://sparse.tamu.edu/MaxPlanck/shallow_water2, 2009b.
Computational fluid dynamics problem, shallow water modelling. Max-Planck Institute of
Meteorology. Retrieved from SuiteSparse (MaxPlanck group).
N. Li, Y. Saad, and E. Chow. Crout versions of ilu for general sparse matrices. SIAM Jour-
nal on Scientific Computing, 25 (2), p. 716–728, 2003. doi:10.1137/s1064827502405094.
X. Li, J. Demmel, J. Gilbert, iL. Grigori, M. Shao, and I. Yamazaki. SuperLU
Users’ Guide. Technical Report LBNL-44289, Lawrence Berkeley National Labora-
tory, 1999. https://portal.nersc.gov/project/sparse/superlu/ug.pdf Last update:
June 2018.
X. S. Li. An overview of SuperLU: Algorithms, implementation, and user interface. ACM
Transactions on Mathematical Software, 31 (3), pp. 302–325, 2005.
X. S. Li and M. Shao. A supernodal approach to incomplete LU factorization with partial
pivoting. ACM Trans. Mathematical Software, 37 (4), 2010.
M. Lourakis. bundle1. https://sparse.tamu.edu/Lourakis/bundle1, 2006. Computer
graphics/vision problem, sparse bundle adjustment, 3D vision. Foundation for Research
and Technology Hellas Institute of Computer Science. Retrieved from SuiteSparse
(Lourakis group).
S. Mayer. audikw 1. https://sparse.tamu.edu/GHS_psdef/audikw, 2004a. Structural
problem. project PARASOL. Retrieved from SuiteSparse (GHS psdef group).
———. crankseg 1. https://sparse.tamu.edu/GHS_psdef/crankseg_1, 2004b. Structural
problem. DNV Software. Retrieved from SuiteSparse (GHS psdef group).
———. crankseg 2. https://sparse.tamu.edu/GHS_psdef/crankseg_2, 2004c. Structural
problem. DNV Software. Retrieved from SuiteSparse (GHS psdef group).
———. inline 1. https://sparse.tamu.edu/GHS_psdef/inline_1, 2004d. Structural
problem. project PARASOL. Retrieved from SuiteSparse (GHS psdef group).
M. Mazaheri. bundle adj. https://sparse.tamu.edu/Mazaheri/bundle_adj, 2015. Com-
puter vision problem, sparse bundle adjustment. University of Calgary. Retrieved from
SuiteSparse (Mazaheri group).
B. McRae. ecology2. https://sparse.tamu.edu/McRae/ecology2, 2008. Landscape ecol-
ogy problem. National Center for Ecological Analysis and Synethesis. Retrieved from
SuiteSparse (McRae group).
36
J. A. Meijerink and H. A. van der Vorst. An iterative solution method for linear systems
of which the coefficient matrix is a symmetric m-matrix . Mathematics of Computation,
31 (137), p. 148–162, 1977. doi:10.1090/s0025-5718-1977-0438681-4.
J. Mor
´
e and G. Toraldo. jnlbrng1. https://sparse.tamu.edu/GHS_psdef/jnlbrng1,
1991. Optimization problem, quadratic journal bearing problem. CUTEr. Retrieved from
SuiteSparse (GHS psdef group).
NASA. nasasrb. https://sparse.tamu.edu/Pothen/nasasrb, 1995. Structural problem, shut-
tle rocket booster. NASA. Retrieved from SuiteSparse (Pothen group).
———. bodyy4. https://sparse.tamu.edu/Pothen/bodyy4, 2003a. Structural problem.
NASA. Retrieved from SuiteSparse (Pothen group).
———. bodyy5. https://sparse.tamu.edu/Pothen/bodyy5, 2003b. Structural problem.
NASA. Retrieved from SuiteSparse (Pothen group).
———. bodyy6. https://sparse.tamu.edu/Pothen/bodyy6, 2003c. Structural problem.
NASA. Retrieved from SuiteSparse (Pothen group).
U. Okuyucu. G2 circuit. https://sparse.tamu.edu/AMD/G2_circuit, 2006a. Circuit
simulation matrix. AMD, Inc. Retrieved from SuiteSparse (AMD group).
———. G3 circuit. https://sparse.tamu.edu/AMD/G3_circuit, 2006b. Circuit simulation
matrix. AMD, Inc. Retrieved from SuiteSparse (AMD group).
S. K. Park and K. W. Miller. Random number generators: good ones are hard to find.
Communications of the ACM, 31 (10), p. 1192–1201, 1988. doi:10.1145/63039.63042.
W. H. Payne, J. R. Rabung, and T. P. Bogyo. Coding the lehmer pseudo-random number
generator. Communications of the ACM, 12 (2), p. 85–86, 1969. doi:10.1145/362848.
362860.
J. W. Pearson and J. Pestana. Preconditioners for krylov subspace methods: An overview.
GAMM-Mitteilungen, 43 (4), 2020. doi:10.1002/gamm.202000015.
S. Regev. Preconditioning Techniques for Sparse Linear Systems, Stanford University, 2022.
S. Regev and M. A. Saunders. SSAI: A Symmetric Sparse Approximate Inverse Precondi-
tioner for the Conjugate Gradient Methods PCG and PCGLS . Technical report, Stanford
University, 2020.
Y. Robert. Regular incomplete factorizations of real positive definite matrices. Linear Al-
gebra and its Applications, 48, p. 105–117, 1982. doi:10.1016/0024-3795(82)90101-x.
E. Rudnyi. t2dah e. https://sparse.tamu.edu/Oberwolfach/t2dah_e, 2004. Model re-
duction problem, micropyros thruster. University of Freiburg. Retrieved from SuiteSparse
(Oberwolfach group).
37
E. Rudnyi, J. Korvink, and B. van Rietbergen. bone010. https://sparse.tamu.
edu/Oberwolfach/bone010, 2006a. Model reduction problem. Retrieved from SuiteSparse
(Oberwolfach group).
———. bones01. https://sparse.tamu.edu/Oberwolfach/boneS01, 2006b. Model reduc-
tion problem. Retrieved from SuiteSparse (Oberwolfach group).
———. bones10. https://sparse.tamu.edu/Oberwolfach/boneS10, 2006c. Model reduc-
tion problem. Retrieved from SuiteSparse (Oberwolfach group).
E. Rudnyi and B. Van Rietbergen. Efficient harmonic simulation of a trabecular bone
finite element model by means of model reduction. In Proceedings of the 12th The Finite
Element Method in Biomedical Engineering, Biomechanics and Related Fields workshop
(FEM) 20-21 july 2005, Ulm, Germany, pp. 61–68. 2005.
J. W. Ruge and K. St
¨
uben. 4. Algebraic Multigrid, p. 73–130. Society for Industrial and
Applied Mathematics, 1987. doi:10.1137/1.9781611971057.ch4.
Y. Saad. Ilut: A dual threshold incomplete lu factorization. Numerical Linear Algebra with
Applications, 1 (4), p. 387–402, 1994. doi:10.1002/nla.1680010405.
———. Iterative methods for sparse linear systems, SIAM, Philadelphia, MS, 2 edition,
2003.
O. Schenk. af shell3. https://sparse.tamu.edu/Schenk_AFE/af_shell3, 2003a. Struc-
tural problem, sheet metal forming structural problem series. University of Basel. Retrieved
from SuiteSparse (Schenk AFE group).
———. af shell7. https://sparse.tamu.edu/Schenk_AFE/af_shell7, 2003b. Structural
problem, sheet metal forming structural problem series. University of Basel. Retrieved
from SuiteSparse (Schenk AFE group).
———. af 0 k101. https://sparse.tamu.edu/Schenk_AFE/af_0_k101, 2006a. Structural
problem, sheet metal forming structural problem. University of Basel. Retrieved from
SuiteSparse (Schenk AFE group).
———. af 2 k101. https://sparse.tamu.edu/Schenk_AFE/af_1_k101, 2006b. Structural
problem, sheet metal forming structural problem. University of Basel. Retrieved from
SuiteSparse (Schenk AFE group).
———. af 2 k101. https://sparse.tamu.edu/Schenk_AFE/af_2_k101, 2006c. Structural
problem, sheet metal forming structural problem. University of Basel. Retrieved from
SuiteSparse (Schenk AFE group).
———. af 3 k101. https://sparse.tamu.edu/Schenk_AFE/af_3_k101, 2006d. Structural
problem, sheet metal forming structural problem. University of Basel. Retrieved from
SuiteSparse (Schenk AFE group).
38
———. af 4 k101. https://sparse.tamu.edu/Schenk_AFE/af_4_k101, 2006e. Structural
problem, sheet metal forming structural problem. University of Basel. Retrieved from
SuiteSparse (Schenk AFE group).
———. af 5 k101. https://sparse.tamu.edu/Schenk_AFE/af_5_k101, 2006f. Structural
problem, sheet metal forming structural problem. University of Basel. Retrieved from
SuiteSparse (Schenk AFE group).
D. Schmid. thermal1. https://sparse.tamu.edu/Schmid/thermal1, 2006a. Thermal
problem, unstructured finite element mesh, steady state problem. University of Oslo. Re-
trieved from SuiteSparse (Schmid group).
———. thermal2. https://sparse.tamu.edu/Schmid/thermal2, 2006b. Thermal problem,
unstructured finite element mesh, steady state problem. University of Oslo. Retrieved from
SuiteSparse (Schmid group).
F. Shakib. Pres poisson. https://sparse.tamu.edu/ACUSIM/Pres_Poisson, 2004. Com-
putational fluid dynamics problem. ACUSIM, Inc. Retrieved from SuiteSparse (ACUSIM
group).
B. Smith, P. E. Bjorstad, and W. Gropp. Domain decomposition, Cambridge University
Press, Cambridge, England, 2004.
K. St
¨
uben. A review of algebraic multigrid, p. 331–359. Elsevier, 2001. doi:10.1016/
b978-0-444-50617-7.50015-x.
N. Trefethen. Trefethen 20000. https://sparse.tamu.edu/JGD_Trefethen/
Trefethen_20000, 2008a. Combinatorial Problem, diagonal matrices with primes. Ox-
ford University. Retrieved from SuiteSparse (JGD Trefethen group).
———. Trefethen
20000b. https://sparse.tamu.edu/JGD_Trefethen/Trefethen_
20000b, 2008b. Combinatorial Problem, diagonal matrices with primes. Oxford University.
Retrieved from SuiteSparse (JGD Trefethen group).
E. Um. 2cubes sphere. https://sparse.tamu.edu/Um/2cubes_sphere, 2008. Electromag-
netics problem, finite element mesh of 2 cubes in a sphere. Stanford. Retrieved from
SuiteSparse (Um group).
———. offshore. https://sparse.tamu.edu/Um/offshore, 2010. Computational fluid
dynamics, 3D Finite element mesh. Stanford. Retrieved from SuiteSparse (Um group).
P. Van
ˇ
ek, M. Brezina, and J. Mandel. Convergence of algebraic multigrid based on
smoothed aggregation. Numerische Mathematik, 88 (3), p. 559–579, 2001. doi:10.1007/
s211-001-8015-y.
A. Wathen. wathen100. https://sparse.tamu.edu/GHS_psdef/wathen100, 2004a. Ran-
dom 2D/3D Problem. Oxford University. Retrieved from SuiteSparse (GHS psdef group).
39
———. wathen120. https://sparse.tamu.edu/GHS_psdef/wathen120, 2004b. Random
2D/3D Problem. Oxford University. Retrieved from SuiteSparse (GHS psdef group).
J. Weiher. ldoor. https://sparse.tamu.edu/GHS_psdef/ldoor, 2004a. Structural prob-
lem, large size door. project PARASOL. Retrieved from SuiteSparse (GHS psdef group).
———. oilpan. https://sparse.tamu.edu/GHS_psdef/oilpan, 2004b. Structural problem.
project PARASOL. Retrieved from SuiteSparse (GHS psdef group).
———. msdoor. https://sparse.tamu.edu/INPRO/msdoor, 2005. Structural problem,
medium size door. project PARASOL. Retrieved from SuiteSparse (INPRO group).
———. hood. https://sparse.tamu.edu/GHS_psdef/hood, Jochen Weiher. Structural
Problem. project PARASOL. Retrieved from SuiteSparse (GHS psdef group).
M. Will. bcsstk17. https://sparse.tamu.edu/HB/bcsstk17, 1985a. Structural problem,
stiffness matrix, elevated pressure vessel. Georgia Institute of Technology. Retrieved from
SuiteSparse (HB group).
———. bcsstk18. https://sparse.tamu.edu/HB/bcsstk18, 1985b. Structural problem,
stiffness matrix, nuclear power station. Georgia Institute of Technology. Retrieved from
SuiteSparse (HB group).
P. Wissgott. parabolic fem. https://sparse.tamu.edu/Wissgott/parabolic_fem, 2007.
Computational fluid dynamics problem. Vienna University of Technology. Retrieved from
SuiteSparse (Wissgott group).
D. M. Young. The rate of convergence of an improved iterative method for solving the
finite difference analogue of the dirichlet problem. In BULLETIN OF THE AMERICAN
MATHEMATICAL SOCIETY, pp. 345–345. 1950.
———. Iterative methods for solving partial difference equations of elliptic type. Transactions
of the American Mathematical Society, 76, pp. 92–111, 1954.
40