The Kaczmarz method for solving linear systems of equations

As you probably know, there are two kinds of linear system solvers out there.

Today I want to talk about another method, the one invented by Kaczmarz (as well as a few other people, you can check Wikipedia to learn more about them). This method falls in the “iterative methods” category. Another name for it is ART or Algebraic Reconstruction Technique.

Let’s setup the problem a bit. We have a matrix $A\in\mathbb{R}^{n\times n}$ and a vector b and we want to find an x such that $Ax=b$. Everything is straightforward so far.

The point of an iterative method is to find a function/operator g such that if g has a fixed point $x^{*}$, then that fixed point will also be the solution to the problem we are trying to solve. Thus, two questions which come to mind immediately are : Out of all g, which of them have this fixed point  and under which conditions does the sequence $x_{k+1}=g(x_{k})$ actually converge?

Although the second question is very important, the first is not easier and usually we have to turn the problem around until we find a good formulation through which we can build such a g. An easy example for that is the search for the root of a smooth function $f(x)$ and the idea to linearize to find an approximate root of a new object, which in turn leads to Newton’s method. In our case, the theme is to find the solution of an equivalent least squares problem.

Inspiration from optimization

Let’s say that we want to solve the following problem :

Find an x such that $\inf_{y}\|Ay-b\|_{2}^{2}=\|Ax-b\|_{2}^{2}\geq 0$

We can say a few things about this problem if we know the properties of A. Is $b\in R(A)$ (R(A) being the column space of A)?  Then there is an x such that $Ax=b$, i.e there is at least one absolute minimum.  If the matrix is invertible, the minimizer is unique and thus searching for the minimum is equivalent to solving $Ax=b$.

It may be strange to some that we are interested in this formulation in the case that $b\in R(A)$. After all, this problem is usually introduced in the opposite case, when we don’t have b anywhere in our space of A, so we want the next best thing. It turns out that this gives us a very powerful algorithm though. Before we talk about it, we need to consider another related problem.

Finding the common ground

Imagine that we have two convex & closed sets, D and C, which share a common point. Our goal is to find that point!

Here’s a very quick review of convex sets. To bypass it, ctrl + f, Review end. Keep that on your search field to quickly bypass similar reviews.

Review start {

A set B is convex if and only if for any two elements $b_1, b_2\in B$ and any $t\in (0,1)$, the point $b_1t+(1-t)b_2\in B$. In other words, the line that connects two different points of a set rests completely inside the set.

A set that is convex  :

And another one, not convex this time,

Convexity of a set can make a big difference both in application and in theory.

A set is closed if it’s complement is open and a set is open if we can find a small ball around every point of the set so that the ball is inside it.

Review end}

So, how can we find that point? An idea is to use a method called alternating projection. To use this, we have to pick a starting point, let’s say $x_{0}$, and project it onto one of the two sets, say D. Then, we project the resulting point onto C and we start all over, projecting this new point onto D, then onto C, etc, etc.

In mathematical language, that would be stated as follows. Let $P_C, P_D$ the projection matrices onto C and D and pick a starting point $x_0$. Then iterate the sequence $x_{k+1}=P_C(P_D(x_k))=Tx_k$ until a criterion is fulfilled. Does this fixed point iteration have the correct fixed point, i.e a common point of C and D?

Let’s say that x is that point. Then, since x belongs to D, $P_D(x)=x$. In addition, since x belongs to C, $P_C(x)=x=P_C(P_D(x))$. Thus, indeed if x is the common point, it’s also a fixed point of T. Going backwards, we can also see that the converse holds, that if x is a fixed point of T, then it has to be a common point of C and D.

The simplest convex and closed set comes from the intersection of any convex set with a line. Now, with our new knowledge, let’s return to our problem.

A “trivial” problem

Up to this point, we’ve written the problem Ax = b as an optimization problem and we’ve talked about a method to find the common point of two closed and convex sets (essentially, through optimization again). It’s time to connect those worlds now to get to the Kaczmarz algorithm.

When trying to solve Ax = b, we are trying to find an x such $x=A^{-1}b$. Or perhaps we should say that we try to find the common point x of the hyperplanes defined by $a_{i}x=b_{i}$, where $a_{i}$ is the i-th row of A and $b_i$ is the i-th component of b. This hyperplane can be written as $B_{i}=\{x:a_{i}x-b_i=0\}$ but I like to present it in a “simpler” way, like this,

$B_{i} = \{x: \left (x^{T}-\frac{b_ia_{i}^T}{\|a_{i}\|^2}\right)a_{i}=0\},\ i=1,2,\ldots,n$

Stating it in this way exposes the role of $a_i$ as a normal vector of the hyperplane passing from $\frac{b_ia_{i}^T}{\|a_{i}\|^2}$. It also enables us to find the projection of a vector y onto that hyperplane by

$P_{i}(y)=y+\left(b-y^{T}a_{i})\right)\frac{a_{i}}{\|a_{i}\|^2}$

By now, you probably have guessed what we are working on to. We are gonna start from a point $x_0$ and start projecting it onto the different hyperplanes until we get the common point of the hyperplanes, which is a solution vector of the system Ax = b! Earlier we said that to apply the alternating projection method we need convex and closed sets and we are in luck since our hyperplanes are closed and affine. The definition of an affine set is the same with that of a convex set with a twist; t now can be any real number. A trivial affine set is a line. Check also that an affine set in $\mathbb{R}^{n}$ has to be unbounded in at least one dimension.

Visualizing the algorithm

So, the easier thing we can do is pick an easy problem for which we can visualize what’s happening. So, let’s fix A and b, $A=\begin{pmatrix}1& 1\\2&5\end{pmatrix}$, $b=\begin{pmatrix}1\\1\end{pmatrix}$ and let’s write our algorithm for finding x.

1. Pick $x_{0}$
2. While (criterion == False)
3.           $x_{k+1}\leftarrow x_{k}+\left(b-x_{k}^{T}a_{i})\right)\frac{a_{i}}{\|a_{i}\|^2},\ i=k\mod m + 1$
4.           $k\leftarrow k+1$
5. End while

At the end, $x_{N}\simeq x=A^{-1}b$. Let’s see what happens for A and b as we picked before.

The red and black lines are the hyperplanes defined by the rows of A. The blue line represents the steps of the algorithm and the green and red stars are the exact and approximate solutions of $Ax=b$. Seeing this graph, one gets the feeling that the concept is actually really simple & impressive at the same time. 😀

Here is a very simple Matlab code that produces the previous graph.


function [x, sol] = Kaczmarz()

A = [1,1;2,5];
b = [1;1];
x0 = rand(2,1)*6;
sol = A\b;
fprintf('Solution of system is : %1.3f , %1.3f\n',sol(1), sol(2));

x = x0;
m = size(A,1);

close;
hold on;

f1 = @(x)b(1)/A(1,2) - A(1,1)/A(1,2)*x;
f2 = @(x)b(2)/A(2,2) - A(2,1)/A(2,2)*x;
fplot(f1,[-6,6],'k');
fplot(f2,[-6,6],'r');
plot(sol(1), sol(2), 'g*');

maxIter = 50;
Iter = 0;
while(norm(A*x - b) > 1e-3 && Iter < maxIter)
for i = 1 : m
xprev = x;
x = x + (b(i) - A(i,:)*x)*A(i,:).'/norm(A(i,:).')^2;
plot([xprev(1),x(1)],[xprev(2),x(2)],'b--.');
Iter = Iter + 1;
pause(1);
end
end
plot(x(1),x(2),'r*')

hold off;

end



Under the rug

There are a couple of things that we did not talk about.

1. What’s the dependence on $x_0$ for convergence?
2. How fast is it converging?
3. We picked i as $k\mod m + 1$ in the algorithm. Is there another way, better perhaps in terms of rate of convergence, to pick i?
4. What happens if the system is overdetermined?

I will hold onto those questions for another post.