Previously in this series:

Welcome to the fourth installement of the *Knight in n* series.
In part 3 we talked about the direct product of rings, and how they helped us solve the knight moves problem.
This time yet another type of product is going to help in decomposing the algorithm to allow faster parts to be put in.

In part three I introduced the direct product on rings, which is nothing more than a pair of numbers.
Confusingly this operation is also called direct *sum*.
To illustrate this name, take the direct sum/product of `Array i a` with `Array j b`.
For every index `i` (within the bounds of the first array) there is a value of type `a`, and for every index `j` there is a value of type `b`.
Instead of a pair of arrays, this could also be implemented as a single array
with the type `Array (Either i j) (Either a b)`. "Either" is just Haskell's way of saying "disjoint union" or "sum type", hence "direct *sum*".

There is another product operation that we can perform on two rings: the tensor product.
Dually to the direct sum, the tensor product of `Array i a` and `Array j b` has type `Array (i,j) (a,b)`.
The definition is very simple: the array contains all pairs where the first part comes from the first array, and the second part comes from the second array.

Slightly more generally, we can use any combining function. The general tensor product of two arrays can be implemented as:

tensorWith :: (Ix i, Ix j) => (a -> b -> c) -> Array i a -> Array j b -> Array (i,j) c tensorWith f a b = array ((a_{lo},b_{lo}),(a_{hi},b_{hi})) [ ((i,j), f x y) | (i,x) <- assocs a, (j,y) <- assocs b ] where (a_{lo},a_{hi}) = bounds a (b_{lo},b_{hi}) = bounds b

Usually elements are multiplied:

(><) :: (Ix i, Ix j, Num a) => Array i a -> Array j a -> Array (i,j) a (><) = tensorWith (*)

The mathematical notation for this `(><)` operator is ⊗.
Now an example:
Here we take two 4-element vectors, their tensor product has 4*4=16 elements.
The two vectors are "one dimensional*" objects, their tensor product is a "two dimensional" matrix.

A special case we will use often is the tensor product of an array with itself:

square x = x >< x

For example (using simple reflection of expressions which is now on hackage as Debug.SimpleReflect):

Knight4> square (listArray (0,2) [u,v,w]) listArray ((0,0),(2,2)) [u*u, u*v, u*w ,v*u, v*v, v*w ,w*u, w*v, w*w]

The tensor product and convolution operations satisfy the very useful *interchange law*:

!!!style="margin-top:.1em"!!!And since exponentiation is repeated convolution, also

For a proof sketch of this equation,
compare the definitions of `(><)` and `mulArray`. Ignoring array bounds stuff, we have:

convolution: [ (i+j, x*y) | (i,x) <- assocs a, (j,y) <- assocs b ] tensor product: [ ((i,j), x*y) | (i,x) <- assocs a, (j,y) <- assocs b ]

The only difference is in what happens to indices, with convolution the indices are added, with the tensor product a pair is formed. Now consider the interchange law. Informally, the indices of the left hand side are of the form `(i _{a},i_{b})+(i_{c},i_{d})`, and on the right hand side

The interchange law is often exploited to perform faster convolutions.
For example, consider blurring an image by taking the convolution with a Gaussian blur kernel:

Performing this convolution requires O(n^{4}) operations for an n by n image.

The two dimensional Gaussian blur kernel can be written as the tensor product of two one dimensional kernels,
with a bit algebra this gives:

So now to blur an image we can perform two convolution, first with the horizontal kernel, and then with the vertical one:

This procedure needs only O(n^{3}) operations.

Blurring images is not what we are trying to do.
Instead of convolution with the Gaussian blur kernel, we are interested in convolution with `moveMatrix`.
We could try the same trick, finding an `a` such that `moveMatrix == a >< a`.
Unfortunately, this is impossible.

But we can still get close, there *is* a way to write `moveMatrix == square a + square b`, well, almost.
Actually, what we have is:

2 * moveMatrix 0 2 0 2 0 1 1 0 1 1 1 -1 0 -1 1 2 0 0 0 2 1 1 0 1 1 -1 1 0 1 -1 == 0 0 0 0 0 == 0 0 0 0 0 - 0 0 0 0 0 == square a - square b 2 0 0 0 2 1 1 0 1 1 -1 1 0 1 -1 0 2 0 2 0 1 1 0 1 1 1 -1 0 -1 1

where

a,b :: Array Int Integer a = listArray (-2,2) [1,1,0,1,1] b = listArray (-2,2) [1,-1,0,-1,1]

Now we can start with `paths _{conv}` from last time:

paths_{conv}n ij = (moveMatrix ^ n) `safeAt` ij

Where `safeAt` is a safe array indexing operator, that returns `0` for indices that are out of bounds:

safeAt ar i | inRange (bounds ar) i = ar ! i | otherwise = 0

Now let's do some algebraic manipulation:

paths_{conv}n ij = {- by definition of paths_{conv}-} (moveMatrix ^ n) `safeAt` ij = {- by defintion of a and b -} ((square a - square b) `div` 2)^n `safeAt` ij -- division by 2 is pseudocode = {- division does not depend on the index -} (square a - square b)^n `safeAt` ij `div` 2^n

We still cannot apply the interchange law, because the exponentiation `(^n)` is applied to the difference of two tensor products and not a single one.
We can, however, expand this exponentation by the formula:

(a + b)^n = sum [ multinomial [n_{a},n_{b}] * a^n_{a}* b^n_{b}| (n_{a},n_{b}) <- split n ]

This is just the usual binomial expansion, as in

Applying binomial expansion to our work-in-progress gives:

(square a - square b)^n `safeAt` ij `div` 2^n = {- binomial expansion -} sum [ multinomial [n_{a},n_{b}] * square a^n_{a}* (-square b)^n_{b}| (n_{a},n_{b}) <- split n ] `safeAt` ij `div` 2^n = {- (-square b)^n_{b}== (-1)^n_{b}* square b^n_{b}-} sum [ multinomial [n_{a},n_{b}] * (-1)^n_{b}* square a^n_{a}* square b^n_{b}| (n_{a},n_{b}) <- split n ] `safeAt` ij `div` 2^n = {- interchange law -} sum [ multinomial [n_{a},n_{b}] * (-1)^n_{b}* square (a^n_{a}* b^n_{b}) | (n_{a},n_{b}) <- split n ] `safeAt` ij `div` 2^n = {- move `safeAt` inwards, since addition is pointwise -} sum [ multinomial [n_{a},n_{b}] * (-1)^n_{b}* square (a^n_{a}* b^n_{b}) `safeAt` ij | (n_{a},n_{b}) <- split n ] `div` 2^n

Since `square something` already has n

The only reason for calculating `square (a^n _{a} * b^n_{b})` is because we need the element at index

-- square x `safeAt` ij == x `squareAt` ij x `squareAt` (i,j) = x `safeAt` i * x `safeAt` j

So the inner part of the algorithm becomes:

square (a^n_{a}* b^n_{b}) `safeAt` ij = {- property of squareAt -} (a^n_{a}* b^n_{b}) `squareAt` ij

We are still not there yet. Both `a^n _{a}` and

-- a * b `safeAt` i == mulArrayAt a b i mulArrayAt a b n = sum [ x * b `safeAt` (n-i) | (i,x) <- assocs a ]

And update `squareAt` accordingly:

mulSquareAt a b (i,j) = mulArrayAt a b i * mulArrayAt a b j

Finally we need a more efficient way to calculate all the powers of `a` and `b`.
The `iterate` function can help us with that:

Knight4> iterate (*u) 1 [1, 1*u, 1*u*u, 1*u*u*u, 1*u*u*u*u, ...

Putting the pieces together gives a O(n^{2}) algorithm for the knight moves problem:

paths_{tensor}n ij = sum [ multinomial [n_{a},n_{b}] * (-1)^n_{b}* mulSquareAt (powers_of_a !! n_{a}) (powers_of_b !! n_{b}) ij | (n_{a},n_{b}) <- split n ] `div` 2^n where powers_of_a = iterate (*a) 1 powers_of_b = iterate (*b) 1

Note that the savings we have made do not come directly from decomposing the `moveMatrix`. It is just that this decomposition allows us to see that we are computing all elements of am expensive product where a single one would do.

This post brings another order of improvement.
Do you think you can do better than O(n^{2}) time and O(n^{2}) space complexity? If so I would like to hear.

*: The number of elements is often called the dimension of a vector. Here we use the term dimension to refer to the number of indices used, also known as the (tensor) order. So a 100*100 pixel image has dimension 10000 according to the first interpretation (the number of elements), but dimension two in the second interpretation (the number of indices).

## Comments

Thanks for this beautiful insight. By translating this into PL J http://www.jsoftware.com/index.html , I decided to replace the array-based polynomial approach into a list-based one using Henning Thielemanns poly. functions. This step gives a nice time and improvement.

## Reply