This post is based on a part of my masters thesis. The topic of my thesis was OCR of historical documents. A problem that came up there was the following:

Given a binary image, find the largest axis aligned rectangle that consists only of foreground pixels.

These largest rectangles can be used, for instance, to find columns in a page of text. Although in that case one would use large rectangles of *background* pixels.

Here is an example image,

.

White pixels are background and blue is the foreground.
The rectangle with the largest area is indicated in red.
The images you encounter in practical application will be much larger than this example, so efficiency is going to be important.

Let's start with the types of images and rectangles

-- An image is a 2D list of booleans, True is the foreground type Image = [[Bool]] -- An axis-aligned rectangle data Rect = Rect { left, top, width, height :: Int } deriving (Eq,Ord,Show)

And some properties of them,

-- The size of an image imWidth, imHeight :: Image -> Int imHeight = length imWidth (x:_) = length x imWidth [] = 0 -- The area and perimeter of a rectangle area, perimeter :: Rect -> Int area rect = width rect * height rect perimeter rect = 2 * width rect + 2 * height rect

I will say that an image 'contains' a rectangle if all pixels inside the rectangle are foreground pixels.

contains :: Image -> Rect -> Bool contains im (Rect x y w h) = and pixelsInRect where pixelsInRect = concatMap cols (rows im) rows = take h . drop y . (++repeat []) cols = take w . drop x . (++repeat False)

Now the obvious, stupid, way of finding the largest rectangle is to enumerate *all* rectangles in the image, and pick the largest from that list:

-- List all rectangles contained in an image allRects :: Image -> [Rect] allRects im = filter (im `contains`) rects where rects = [Rect x y w h | x <- [0..iw], y <- [0..ih] , w <- [1..iw-x], h <- [1..ih-y]] iw = imWidth im ih = imHeight im

For now, I will take 'largest rectangle' to mean one with the maximal area. I will come back to this choice soon.

largestRect_{spec}:: Image -> Rect largestRect_{spec}= maximalRectBy area . allRects -- Return the rectangle with maximum f, -- using lexicographical ordering to break ties -- return noRect if there are no rectangles in the input list. maximalRectBy :: Ord a => (Rect -> a) -> [Rect] -> Rect maximalRectBy f = maximumBy (comparing f `mappend` compare) . (noRect:) where noRect = Rect 0 0 0 0

The above code should hopefully be easy to understand. It will find the correct answer for the above example:

λ> largestRect_{spec}example Rect {left = 3, top = 2, width = 4, height = 5}

Of course `largestRect _{spec}` is horribly slow.
In an n by n image there are O(n

Before continuing, let's determine what it means for a rectangle to be the *largest*.
We could compare the area of rectangles, as we did before. But it is equally valid to look for the rectangle with the largest perimeter.

Can we pick the maximum according to any arbitrary function `f :: (Rect -> a)`?
Not all of these functions will correspond to the intuitive notion of 'largest'. For example `f = negate . area` will actually lead to the smallest rectangle.
In general there is going to be no efficient way of finding the rectangle that maximizes `f`. All we could do is optimize `contains`, to get an O(n^{4}) algorithm.

We should therefore restrict `f` to be *monotonic*.
What I mean by monotonic is that `f x >= f y` whenever rectangle `x` contains rectangle `y`.
In QuickCheck code:

prop_isMonotonic :: Ord a => (Rect -> a) -> Property prop_isMonotonic f = property $ \x y -> x `rectContains` y ==> f x >= f y rectContains :: Rect -> Rect -> Bool rectContains (Rect x y w h) (Rect x' y' w' h') = x <= x' && y <= y' && x+w >= x'+w' && y+h >= y'+h'

Area is a monotonic function, and so is perimeter. But you could also add weird constraints. For example, only consider rectangles that are at least 10 pixels tall, or only rectangles that contain the point (123,456).

Maximizing a monotonic function, as opposed to just any function, means that we can skip a lot of rectangles.
In particular, whenever rectangle `x` contains rectangle `y`, rectangle `y` doesn't need to be considered.
I will call rectangles in the image that are not contained in other (larger) rectangles *maximal*.
The strategy for finding the largest rectangle is then simply to enumerate only the maximal rectangles, and pick the best of those:

largestRect_{fast}:: Image -> Rect largestRect_{fast}= maximalRectBy area . allMaximalRects

For each maximal rectangle there is (trivially) a monotonic function that is maximal for that rectangle. So we can't do any better without taking the specific function `f` into account.

To find maximal rectangles, we are first of all going to need some machinery for working with images. In particular, zipping images together,

zip2d :: [[a]] -> [[b]] -> [[(a,b)]] zip2d = zipWith zip zipWith2d :: (a -> b -> c) -> [[a]] -> [[b]] -> [[c]] zipWith2d = zipWith . zipWith zipWith2d4 :: (a -> b -> c -> d -> e) -> [[a]] -> [[b]] -> [[c]] -> [[d]] -> [[e]] zipWith2d4 = zipWith4 . zipWith4

And accumulating/scanning over images.
This scanning can be done in four directions. Each `scanX` function takes a function to apply, and the initial value to use just outside the image. The scans that I use here are slightly different from `scanl` and `scanr`, because the output will have the same size as the input, instead of being one element larger.

scanLeftward, scanRightward, scanUpward, scanDownward :: (a -> b -> b) -> b -> [[a]] -> [[b]] scanLeftward f z = map (init . scanr f z) scanRightward f z = map (tail . scanl (flip f) z) scanUpward f z = init . scanr (\as bs -> zipWith f as bs) (repeat z) scanDownward f z = tail . scanl (\as bs -> zipWith f bs as) (repeat z)

Here is an example of a scan that calculates the x-coordinate of each pixel,

let x = scanRightward (\a x -> x + 1) (-1) a x = .

And the y-coordinates are of course

let y = scanDownward (\a x -> x + 1) (-1) a y = .

If we were looking for one-dimensional images, then a 'rectangle' would just be a single line of pixels. Now each pixel is contained in at most one maximal line of foreground pixels. To find the coordinates of this line, we just need to know the left and right endpoints.

For a foreground pixel, the left endpoint of the line it is in is the same as the left endpoint of its left neighbor.
On the other hand, a background pixel is not in any foreground line. So the left endpoint of all lines to the right of it will be at least `x+1`, where `x` is the x-coordinate of the background pixel.
In both these cases information flows from left to right; and so the left endpoint for all pixels can be determined with a rightward scan.

Unsurprisingly, we can find the right endpoints of all foreground lines with a leftward scan.
Now let's do this for all lines in the image.
Notice that we need the `x` coordinates defined previously:

let l = scanRightward (\(a,x) l -> if a then l else x+1) 0 (zip2d a x) l = let r = scanLeftward (\(a,x) r -> if a then r else x) (imWidth a) (zip2d a x) r =

In the images I have marked the left and right endpoints of the foreground lines in red. Note also, the values in the background pixels are not important, and you should just ignore them.

Vertically we can of course do the same thing, giving top and bottom endpoints:

let t = scanDownward (\(a,y) t -> if a then t else y+1) 0 (zip2d a y) t = let b = scanUpward (\(a,y) b -> if a then b else y) (imHeight a) (zip2d a y) b =

However, combining these left/right/top/bottom line endpoints does not yet give rectangles containing only foreground pixels.
Rather, it gives something like a cross. For example using the endpoints for (6,4) leads to the following incorrect rectangle,

.

In fact, there are many rectangles around this point (6,4):

,

and before looking at the area (or whatever function we are maximizing) there is way no telling which is the best one.

If there was some way to find just a single maximal rectangle for each pixel, then we would have an O(n^{2}) algorithm. Assuming of course that we do find all maximal rectangles.

Suppose that `Rect x y w h` is a maximal rectangle. What does that mean?
First of all, one of the points above the rectangle, `(x,y-1),(x+1,y-1),..,(x+w-1,y-1)`, must not be the a foreground pixel.
Because if all these points are foreground, then the rectangle could be extended upwards, and it would not be maximal.
So, suppose that `(u,y-1)` is a background pixel (or outside the image).
Then `(u,y)` is the top endpoint of the vertical line that contains `(u,y+h-1)`.

If we start from `(u,v)`, we can recover the height of a maximal rectangle using the top endpoint image `t`.
Just take `t!!(u,v)` as the top coordinate, and `u+1` as the bottom.
This image illustrates the idea:

.

Here the green point `(u,v)` has the red top endpoint, and it gives the height and vertical position of the yellow maximal rectangle.

Then to make this vertical line into a maximal rectangle, we just extend it horizontally as far as possible:

.

For this last step, we need to know the first background pixel that will be encountered when extending the rectangle to the left. That is the *maximum value* of all left endpoints in the rows `t,t+1,..,b-1`.
This maximum can again be determined with a scan over the image:

let lt = scanDownward (\(a,l) lt -> if a then max l lt else minBound) minBound (zip2d a l) lt =

For extending to the right the story is exactly the same, only taking the minimum right endpoint instead:

let rt = scanDownward (\(a,r) rt -> if a then min r rt else maxBound) maxBound (zip2d a r) rt =

Now we have all the ingredients for finding maximal rectangles:

- For a foreground pixel
`(u,v)`: - Take as top
`t!!(u,v)` - Take as left
`lt!!(u,v)` - Take as right
`rt!!(u,v)` - Take as bottom
`v+1`.

Every maximal rectangle can be found in this way. However, not all rectangles we get in this way are maximal. In particular, they could potentially still be extended downward. However, for finding the largest rectangle, it doesn't matter if we also see some non-maximal ones. There might also be duplicates, which again does not matter.

So now finishing up is just a matter of putting all the steps together in a function:

allMaximalRects :: Image -> [Rect] allMaximalRects a = catMaybes . concat $ zipWith2d4 mkRect lt rt t y where x = scanRightward (\_ x -> x + 1) (-1) a y = scanDownward (\_ y -> y + 1) (-1) a l = scanRightward (\(a,x) l -> if a then l else x+1) 0 (zip2d a x) r = scanLeftward (\(a,x) r -> if a then r else x) (imWidth a) (zip2d a x) t = scanDownward (\(a,y) t -> if a then t else y+1) 0 (zip2d a y) lt = scanDownward (\(a,l) lt -> if a then max l lt else minBound) minBound (zip2d a l) rt = scanDownward (\(a,r) rt -> if a then min r rt else maxBound) maxBound (zip2d a r) mkRect l r t y | l /= minBound = Just $ Rect l t (r-l) (y-t+1) | otherwise = Nothing

A quick QuickCheck shows that `largestRect _{fast}` finds the same answer as the slow specification:

prop_{fast_spec}= forAll genImage $ \a -> largestRect_{spec}a == largestRect_{fast}a

λ> quickCheck prop_{fast_spec}+++ OK, passed 100 tests.

It is possible to find all maximal rectangles that consist entirely of foreground pixels in an n*n image in O(n^{2}) time. That is linear in the number of pixels. Obviously it is not possible to do any better in general.

You may wonder whether this method also works in higher dimensions. And the answer to that question is *no*.
The reason is that there can be more than O(n^{3}) maximal cubes in a three dimensional image. In fact, there can be at least O(n^{(d-1)2}) maximal hypercubes in d dimensions. Just generalize this image to 3D:

.
Or click here for a 3D version.

## Comments

This is awesome. I've submitted it to reddit :)

I wanted something like this a while ago but the time was too short to think about a faster solution. So I switched to a C implementation for finding circles in one image. Now, I think that your solution can be adapted to that problem too.

I was never really into Minesweeper.

That's incredibly awesome! What a great solution. :D

Great article, just wanted to ask what program/method you used to create the pictures?

Haffi: I used the Haskell cairo package, via my own GridDiagrams package. The code to draw the pictures is in the .lhs source file of this post.

First of all, excellent article, and I also have to say I love your site design. It's simple, easily readable, with just the right amount of fancy (IMO)! Now, onto my question:

I'm afraid I am missing something. When reading this bit in

contains:I don't understand why rows has

take h . drop xinstead oftake h . drop y(and vice versa forcols).Could anyone shed some light on this?

Excellent article, thank you very much.

Nick Knowlson: you are right, that's what you get for changing such things at the last moment. The result will be that x and y coordinates are swapped for rectangles found by

largestRect. QuickCheck spots the error:_{spec}The quickCheck result in the post was from before I replaced

t,lbyx,y, what should have beeny,x. I corrected the error, and also took the opportunity to make thecontainsfunction slightly easier to understand, as suggested on reddit.Ah great, that makes sense now (and is also easier to read). Thanks!

There's an ACM ICPC problem similar to this one:

http://livearchive.onlinejudge.org/index.php?option=com_onlinejudge&Itemid=8&category=294&page=show_problem&problem=1933

Please tell us about the tool chain you use to go from literate Haskell to such a lovely formatted blog post.

I was asked to solve this problem during an interview onsite with Google.

I'm also interested in hearing about the tool chain you use for blogging with literate Haskell.

Out of curiosity, in a world where precision mattered less, could we go past linearity on number of pixels by reducing number of pixels? eg, convert a 500x500 image to a 100x100? or would that process add enough additional boilerplate as to not improve O() time?

Kevin: To downsample an image you usually have to inspect every pixel. So just reducing an n*n image to a n/2*n/2 image already takes O(n

^{2}) time. Also, if you just scale each side by a constant factor, this doesn't reduce the time complexity; O(n^{2}/25) is still O(n^{2}).## Reply