Four ways to fold an array

As most Haskell programmers know, there are two ways to fold a list: from the right with foldr and from the left with foldl. foldr is corecursive (productive), which is great when the output can be produced lazily. foldl (or better, its strict cousin foldl') is tail recursive, preventing stack overflows.

We can define analogous operations for other data structures like 1-dimensional arrays. Libraries like 'Data.ByteString' and 'Data.Vector' provide these. But as I will show in this post there are more fold operations than the common two.

The data type I will use in this post is simply

type Array1D a = Array Int a
-- and two utility functions for getting the lower and upper bounds
lo,hi :: Array1D a -> Int
lo = fst . bounds
hi = snd . bounds

The right fold applies a function f to the current value and the folded result of the rest of the array:

foldra :: (a -> b -> b) -> b -> Array1D a -> b
foldra f z0 ar = go (lo ar)
  where go i
         | i > hi ar = z0
         | otherwise = f (ar ! i) (go (i + 1))

The (strict) left fold uses an accumulator parameter:

-- IGNORE, this function is the same as foldl' which is more interesting anyway
foldla :: (b -> a -> b) -> b -> Array1D a -> b
foldla f z0 ar = go z0 (lo ar)
  where go z i
         | i > hi ar = z
         | otherwise = go (f z (ar ! i)) (i + 1)
foldl'a :: (b -> a -> b) -> b -> Array1D a -> b
foldl'a f z0 ar = go z0 (lo ar)
  where go !z i
         | i > hi ar = z
         | otherwise = go (f z (ar ! i)) (i + 1)

In each case, the recursive go function is very similar in structure to the list version; only instead of recursing for the tail of the list we recurse for index i+1. The time and space behavior is also similar. For example, if you have a large array

testArray :: Array1D Integer
testArray = listArray (1,10^6) [1..]

Then for computing something like the sum of all elements, you should use a strict left fold:

*Main> foldl'a (+) 0 testArray
50000005000000
*Main> foldra (+) 0 testArray
*** Exception: stack overflow

On the other hand, a right fold is the way to go when you are only interested in a part of a lazily produced result. For example when converting an array to a list:

*Main> take 10 . foldra (:) [] $ testArray
[1,2,3,4,5,6,7,8,9,10]
(0.02 secs, 520824 bytes)
*Main> take 10 . foldl'a (flip (:)) [] $ testArray
[1000000,999999,999998,999997,999996,999995,999994,999993,999992,999991]
(5.89 secs, 263122464 bytes)

All of this is exactly the same as with lists.


But, if you look at foldra and foldl'a, you will see that they both contain a loop doing (i + 1). So in a sense, both of these functions work from left to right!

Because arrays allow for random access, it is possible to make true right to left folds, just start at the end and do (i - 1) in each iteration.

foldlb :: (b -> a -> b) -> b -> Array1D a -> b
foldlb f z ar = go (hi ar)
  where go i
         | i < lo ar = z
         | otherwise = f (go (i - 1)) (ar ! i)
foldr'b :: (a -> b -> b) -> b -> Array1D a -> b
foldr'b f z0 ar = go z0 (hi ar)
  where go !z i
         | i < lo ar = z
         | otherwise = go (f (ar ! i) z) (i - 1)

Just look at the pretty duality there! We now have a lazy left fold and a strict right fold.

The behavior is exactly the opposite of that of the folda functions above:

*Main> foldlb (+) 0 testArray
*** Exception: stack overflow
*Main> foldr'b (+) 0 testArray
50000005000000
*Main> take 10 . foldr'b (:) [] $ testArray
[1,2,3,4,5,6,7,8,9,10]
(6.19 secs, 263055372 bytes)
*Main> take 10 . foldlb (flip (:)) [] $ testArray
[1000000,999999,999998,999997,999996,999995,999994,999993,999992,999991]
(0.00 secs, 524836 bytes)

To summarize, four ways to fold an array are:

lo to hi, i+1hi to lo, i-1
corecursion, productive, lazyfoldrafoldlb
accumulator, tail recursive, strictfoldl'afoldr'b

Exercise: can you think of other ways to fold an array?

Comments

Quentin Moserx

Here's a binary tree fold over arrays:

-- store bounds along for convenience
data BoundedArray1D a = A (Array1D a) Int Int
  deriving Show
fromArray1D a = A a (lo a) (hi a)
-- treatment of an array as a binary node. The node's
-- tag is at the center.
asNode :: BoundedArray1D a -> Maybe (BoundedArray1D a, a, BoundedArray1D a)
asNode (A a l h) | l > h = Nothing
                 | otherwise = let center = (l + h) `div` 2
                                   lower = A a l (center-1)
                                   upper = A a (center+1) h
                               in Just (lower, a ! center, upper)
-- fold upwards in the tree
bfold :: (b -> a -> b -> b) -> b -> BoundedArray1D a -> b
bfold f b arr =  case asNode arr of
                   Nothing -> b
                   Just (l, a, h) -> f (bfold f b l) a (bfold f b h)
-- example
bsum arr = bfold addThree 0 arr
  where addThree a b c = a+b+c

This should allow you to fold over an array in logarithmic time. Well, I think...

Marcx

googles mapReduce is a fold, too. first fmap, to a Monoid, then mappend the results.

mapReduce :: (a->b) -> (b->b->b) -> Array1D a -> b

the reduce is an associative fold.

i'd prefer sth like a "square-fold":

squareFoldl' :: b -> (b->a->b) -> (b->b->b) -> Array1D a -> b

the first is a sequential foldl' over a part of the array, running in an own thread, the second combines the results of the threads (in an associative way).

It's worth noting that you can use both foldr and foldl' to sum up a list because + happens to be associative. If you have a non-associative binary operator you want to fold to the right in constant space, or if you want to fold to the left while lazily producing results, it's a good chance to use foldr'_b or foldl_b.

Can you explain the definition of corecursion/productivity? Is corecursion another word for coinduction, which means that a function is defined to be a greatest fixed-point?

Reply

(optional)
(optional, will not be revealed)
Name of the lazy functional programming language I write about:
Use > code for code blocks, @code@ for inline code. Some html is also allowed.