## Description

The purpose of this assignment is to gain insight on the theoretical and numerical properties of the Conjugate Gradient method. Here we use this method in an image processing application with the goal of deblurring an image given

the exact (noise-free) blurred image and the original transformation matrix. Note that the “noise-free” simplification

is essential for us to solve this problem in the scope of this assignment. For the interested reader, googling image

deblurring or inverse problem image processing may be enlightening.

1. Simple Image Deblurring – Problem Definition

Suppose we have a blurred image B which we want to refine (or deblur) and we know the transformation that lead

to this blurred image. Then how can we efficiently reconstruct the original image? Let X ∈ R

n×n be the original

square, grayscale image matrix, where each matrix entry corresponds to one pixel value. Hence X corresponds to an

image with n pixels per side. We can convert the original image matrix X into a column vector x ∈ R

n

2

, where the

entries of X get stacked into a vector, either row- or column-wise. This process is referred to as vectorization1

. If we

now define as b the vectorized form of the blurred image B, we can write the following system of equations:

Ax = b, (1)

Here, A ∈ R

n

2×n

2

indicates the transformation matrix coming from the the repeated application of what is referred

to as the ”image kernel” which in our case produces a blurring effect. An excellent visual explanation of the kernel

operation can be found at [4]. Our objective will be to solve the linear system in Eq.[1] for the original vectorized

image x, where the transformation matrix A and the blurred image B (and hence b) are known. By solving the system,

we can recover the original image, getting rid of the blurring effect applied on top of it.

1. Visualization and Vectorization

The image in consideration is black and white and the corresponding image matrix can be visualized with the command plot (see Plot.jl documentation). We can vectorize the blurred image matrix with the command vec(B). The

vectorization ordering can be arbitrary, by column, row or whatever fits best with the data access pattern; however, it

must correspond with A. In our case we require row by row vectorize. A blurred image pixel is the weighted average

of the surrounding pixels. These weights are defined by the kernel matrix K ∈ R

d×d

. The much bigger transformation matrix A is constructed from the kernel matrix, such that the non-zero elements of each row of A correspond to

the values of K. The detailed construction of the transformation matrix is omitted for brevity, see [4]. The following

attributes should be noted:

Borders: The transformation matrix ignores the elements outside of borders of the matrix, (max(i, j) > n).

1

For more details on vectorization check, e.g., https://en.wikipedia.org/wiki/Vectorization_(mathematics).

1

Numerical Computing, Fall Semester 2022

Lecturer: Prof. O. Schenk

Assistants: E. Vecchi, L. Gaedke-Merzhauser ¨

Banded Matrix: A is a d

2

-banded matrix, where d n. This means that A is sparse, and thus one should use

sparse data structures. Recall that A is of dimension n

2 × n

2

, defining a dense matrix of that size will not fit on your

system. For example, if n = 250 we would need 31 GB (i.e., 8n

2

· n

2 Bytes) of memory.

For this assignment you will use the provided transformation matrix A.mat. It is generated from the normalized blur

image kernel

K =

1

605

100 9 9 9 9 9 100

9 1 1 1 1 1 9

9 1 1 1 1 1 9

9 1 1 1 1 1 9

9 1 1 1 1 1 9

9 1 1 1 1 1 9

100 9 9 9 9 9 100

, (2)

and operates on a “row-vectorized” image matrix.

2. Conjugate Gradient

The conjugate gradient method (CG) [1, 3] is widely used for solving systems of linear equations Ax = b, where

A is both symmetric and positive-definite. The conjugate gradient method can theoretically be viewed as a direct

method, as it produces the exact solution after a finite number of iterations, which is not larger than the size of

the matrix, in the absence of round-off error. However, the conjugate gradient method is unstable with respect to

even small perturbations, e.g., most directions are not in practice conjugate, and the exact solution is never obtained.

Fortunately, the conjugate gradient method can be used as an iterative method as it provides monotonically improving

approximations xk to the exact solution, which may reach the required tolerance after a relatively small (compared

to the problem size) number of iterations. The improvement is typically linear and its speed is determined by the

condition number κ(A) of the system matrix A: the larger κ(A) is, the slower the improvement. The CG algorithm

starts from an initial guess x0 and applies a series of matrix-vector and vector-vector operations until the desired

tolerance is reached:

Algorithm 1 Conjugate Gradient Algorithm

1: r ← b − Ax0

2: d ← r

3: ρold ← hr, ri

4: for i = 0, 1, . . . do

5: s ← Adi

6: α ← ρold/ hd, si

7: x ← x + αd

8: r ← r − αs

9: ρnew ← hr, ri

10: β ← ρnew/ρold

11: d ← r + βd

12: ρold ← ρnew

13: end for

1. Symmetry and Positivity

In our case A is symmetric and of full rank but not positive-definite. We can bypass this issue by solving the augmented

system Eq. [3]. The pre-multiplication with AT

, results in the positive-definite augmented transformation matrix A˜.

A

T Ax = A

T

b (3)

Ax˜ = ˜b.

Numerical Computing, Fall Semester 2022

Lecturer: Prof. O. Schenk

Assistants: E. Vecchi, L. Gaedke-Merzhauser ¨

2. Condition Number

An important attribute of a linear system Ax = b is its condition number κ(A). This is the relation of the sensitivity of

the solution x, to changes in b. If small changes in b results in large changes in x, the system is called ill-conditioned

and the condition number of the system is large. The convergence rate of CG is hindered when the system has a high

condition number. Notice that the number of iterations needed to find the exact solution is not related to κ(A), but

rather equal to the number of unique eigenvalues. The condition number of a square matrix A is

κ(A) = σmax

σmin

(4)

with σ indicating the singular values of A. For a real symmetric matrix, the singular values are equal to the absolute

value of the eigenvalues: σ = |λ|.

In our case, matrix A is symmetric and real, and this implies that the condition number is given by the ratio of the

magnitudes of the maximum and minimum eigenvalues. Furthermore, the transformation matrix is ill-conditioned and

– to make matters worse – we will be solving the augmented system in Eq.[3] with condition number κ(A)

2 = κ(A˜).

These factors will unfortunately deteriorate the convergence rate of our deblurring program, but this problem can be

partially mitigated through preconditioning, as explained in the next section.

3. Preconditioning

A symmetric positive-definite preconditioner P is selected such that P

−1A˜ ≈ I, that is P

−1

approximates A˜−1

. We

can decompose the preconditioner such that P = LLT

, where L is the Cholesky factor (see [2]). Our intention is to

solve the preconditioned augmented system

P

−1Ax˜ = P

−1˜b (5)

(L

−1AL˜ −1

)

| {z }

A

¯˜

(Lx)

| {z }

x¯

= L

−1˜b

| {z }

¯˜b

. (6)

This operation is done to both decrease the condition number κ(A

¯˜) < κ(A˜) and to decrease the range of the eigenvalues. The underlying caveat here is that the preconditioner should be computationally inexpensive to find.

To meet the desired properties of the preconditioner, a common approach is to use incomplete Cholesky (IC) factorization (see [1]). With IC, we only compute the Cholesky factorization of the non-zero elements of A˜. This cheap factorization of A˜ gives us the preconditioner P = F

TF, where F is the sparse IC factors. This routine can fail since the

existence of F is not guaranteed (note that F is forced to be sparse). To amend this issue, a heuristic approach is used

to apply a diagonal shift of P, enforcing positive-definiteness and making F computable. The detailed explanation of

this routine is outside the scope of this assignment. We can use the Julia function CholeskyPreconditioner()

command for the IC factorization (please refer to the Julia documentation for more information and also note the

memory option).

3. The Assignment

Read this document and Section 7.4 of the textbook [1].

1. General Questions [10 points]

1. What is the size of the matrix A?

Numerical Computing, Fall Semester 2022

Lecturer: Prof. O. Schenk

Assistants: E. Vecchi, L. Gaedke-Merzhauser ¨

2. How many diagonals bands does A have?

3. What is the length of the vectorized blur image b?

2. Properties of A [10 points]

1. If A is not symmetric, how would this affect A˜?

2. Explain why solving Ax = b for x is equivalent to minimizing 1

2

x

T Ax − b

T x over x, assuming that A is

symmetric positive-definite.

3. Conjugate Gradient [30 points]

1. Write a function for the conjugate gradient solver x, rvec = myCG(A,b,x0,max itr,tol), where x

and rvec are, respectively, the solution value and a vector containing the norm of the residual at every iteration.

Hint: You can use Julia’s cg() function for comparison.

2. In order to validate your implementation, solve the system defined by A test.mat and b test.mat. Plot

the convergence (residual vs iteration).

3. Plot the eigenvalues of A test.mat and comment on the condition number and convergence rate.

4. Does the residual decrease monotonically? Why or why not?

4. Deblurring problem [35 points]

1. Solve the deblurring problem for the blurred image matrix B.mat and transformation matrix A.mat using your

routine myCG and Julia’s preconditioned conjugate gradient cg using the Pl option (from the IterativeSolvers

package). As a preconditioner, use CholeskyPreconditioner() from the Preconditioners package to

get the incomplete Cholesky factors. In order to ensure existence of the IC factor shift the diagonal by 0.01

and set the memory fill equal to 1. Solve the system with both solvers using max iter = 200, abstol = 10−4

.

Plot the convergence (residual vs iteration) of each solver and display the original and final deblurred image.

Carefully explain and comment on the results that you observe.

Hint: You can use ?function() to retrieve some information on function().

2. When would the preconditioned CG be worth the added computational cost? What about if you are debluring

lots of images with the same blur operator?

Quality of the code and of the report [15 Points]

The highest possible score of each project is 85 points and up to 15 additional points can be awarded based on the

quality of your report and code (maximum possible grade: 100 points). Your report should be a coherent document,

structured according to the template provided on iCorsi. If there are theoretical questions, provide a complete and

detailed answer. All figures must have a caption and must be of sufficient quality (include them either as .eps or .pdf).

If you made a particular choice in your implementation that might be out of the ordinary, clearly explain it in the

report. The code you submit must be self-contained and executable, and must include the set-up for all the results that

you obtained and listed in your report. It has to be readable and, if particularly complicated, well-commented.

Numerical Computing, Fall Semester 2022

Lecturer: Prof. O. Schenk

Assistants: E. Vecchi, L. Gaedke-Merzhauser ¨

Additional notes and submission details

Summarize your results and experiments for all exercises by writing an extended LaTeX report, by using the template provided on iCorsi (https://www.icorsi.ch/course/view.php?id=14666). Submit your gzipped

archive file (tar, zip, etc.) – named project number lastname firstname.zip/tgz – on iCorsi (see [NC

2022] Project 5 – Conjugate Gradient) before the deadline. Submission by email or through any other channel will

not be considered. Late submissions will not be graded and will result in a score of 0 points for that project. You

are allowed to discuss all questions with anyone you like, but: (i) your submission must list anyone you discussed

problems with and (ii) you must write up your submission independently. Please remember that plagiarism will result

in a harsh penalization for all involved parties.

In-class assistance

If you experience difficulties in solving the problems above, please join us in class either on Tuesdays 16:30-18:15 or

on Wednesdays 14:30-16:15 in room C1.03.

References

[1] Linear Systems: Iterative Methods, Chapter 7, SIAM Book A First Course on Numerical Methods, C. Greif,

U. Ascher.

[2] The Cholesky decomposition, Chapter 5, Section 5.5, SIAM Book A First Course on Numerical Methods,

C. Greif, U. Ascher.

[3] An Introduction to the Conjugate Gradient Method Without the Agonizing Pain, http://www.cs.cmu.

edu/˜./quake-papers/painless-conjugate-gradient.pdf

[4] Image Kernels, http://setosa.io/ev/image-kernels/