Finding rectangles

This post is based on a part of my masters thesis. The topic of my thesis was OCR of historical documents. A problem that came up there was the following:

Given a binary image, find the largest axis aligned rectangle that consists only of foreground pixels.

These largest rectangles can be used, for instance, to find columns in a page of text. Although in that case one would use large rectangles of background pixels.

Here is an example image,
.
White pixels are background and blue is the foreground. The rectangle with the largest area is indicated in red. The images you encounter in practical application will be much larger than this example, so efficiency is going to be important.

Specification

Let's start with the types of images and rectangles

-- An image is a 2D list of booleans, True is the foreground
type Image = [[Bool]]
-- An axis-aligned rectangle data Rect = Rect { left, top, width, height :: Int } deriving (Eq,Ord,Show)

And some properties of them,

-- The size of an image
imWidth, imHeight :: Image -> Int
imHeight = length
imWidth (x:_) = length x
imWidth []    = 0
-- The area and perimeter of a rectangle area, perimeter :: Rect -> Int area rect = width rect * height rect perimeter rect = 2 * width rect + 2 * height rect

I will say that an image 'contains' a rectangle if all pixels inside the rectangle are foreground pixels.

contains :: Image -> Rect -> Bool
contains im (Rect x y w h) = and pixelsInRect
  where
    pixelsInRect = concatMap cols (rows im)
    rows = take h . drop y . (++repeat [])
    cols = take w . drop x . (++repeat False)

Now the obvious, stupid, way of finding the largest rectangle is to enumerate all rectangles in the image, and pick the largest from that list:

-- List all rectangles contained in an image
allRects :: Image -> [Rect]
allRects im = filter (im `contains`) rects
  where
    rects = [Rect x y w h | x <- [0..iw], y <- [0..ih]
                          , w <- [1..iw-x], h <- [1..ih-y]]
    iw = imWidth im
    ih = imHeight im

For now, I will take 'largest rectangle' to mean one with the maximal area. I will come back to this choice soon.

largestRectspec :: Image -> Rect
largestRectspec = maximalRectBy area . allRects
-- Return the rectangle with maximum f, -- using lexicographical ordering to break ties -- return noRect if there are no rectangles in the input list. maximalRectBy :: Ord a => (Rect -> a) -> [Rect] -> Rect maximalRectBy f = maximumBy (comparing f `mappend` compare) . (noRect:) where noRect = Rect 0 0 0 0

The above code should hopefully be easy to understand. It will find the correct answer for the above example:

λ> largestRectspec example
Rect {left = 3, top = 2, width = 4, height = 5}

Of course largestRectspec is horribly slow. In an n by n image there are O(n4) rectangles to consider, and checking if one is contained in the image takes O(n2) work, for a total of O(n6).

What is 'largest'?

Before continuing, let's determine what it means for a rectangle to be the largest. We could compare the area of rectangles, as we did before. But it is equally valid to look for the rectangle with the largest perimeter.

Can we pick the maximum according to any arbitrary function f :: (Rect -> a)? Not all of these functions will correspond to the intuitive notion of 'largest'. For example f = negate . area will actually lead to the smallest rectangle. In general there is going to be no efficient way of finding the rectangle that maximizes f. All we could do is optimize contains, to get an O(n4) algorithm.

We should therefore restrict f to be monotonic. What I mean by monotonic is that f x >= f y whenever rectangle x contains rectangle y. In QuickCheck code:

prop_isMonotonic :: Ord a => (Rect -> a) -> Property
prop_isMonotonic f = property $ \x y ->  x `rectContains` y  ==>  f x >= f y
rectContains :: Rect -> Rect -> Bool rectContains (Rect x y w h) (Rect x' y' w' h') = x <= x' && y <= y' && x+w >= x'+w' && y+h >= y'+h'

Area is a monotonic function, and so is perimeter. But you could also add weird constraints. For example, only consider rectangles that are at least 10 pixels tall, or only rectangles that contain the point (123,456).

Maximizing a monotonic function, as opposed to just any function, means that we can skip a lot of rectangles. In particular, whenever rectangle x contains rectangle y, rectangle y doesn't need to be considered. I will call rectangles in the image that are not contained in other (larger) rectangles maximal. The strategy for finding the largest rectangle is then simply to enumerate only the maximal rectangles, and pick the best of those:

largestRectfast :: Image -> Rect
largestRectfast = maximalRectBy area . allMaximalRects

For each maximal rectangle there is (trivially) a monotonic function that is maximal for that rectangle. So we can't do any better without taking the specific function f into account.

Machinery

To find maximal rectangles, we are first of all going to need some machinery for working with images. In particular, zipping images together,

zip2d :: [[a]] -> [[b]] -> [[(a,b)]]
zip2d = zipWith zip
zipWith2d :: (a -> b -> c) -> [[a]] -> [[b]] -> [[c]] zipWith2d = zipWith . zipWith
zipWith2d4 :: (a -> b -> c -> d -> e) -> [[a]] -> [[b]] -> [[c]] -> [[d]] -> [[e]] zipWith2d4 = zipWith4 . zipWith4

And accumulating/scanning over images. This scanning can be done in four directions. Each scanX function takes a function to apply, and the initial value to use just outside the image. The scans that I use here are slightly different from scanl and scanr, because the output will have the same size as the input, instead of being one element larger.

scanLeftward, scanRightward, scanUpward, scanDownward
    :: (a -> b -> b) -> b -> [[a]] -> [[b]]
scanLeftward f z = map (init . scanr f z) scanRightward f z = map (tail . scanl (flip f) z) scanUpward f z = init . scanr (\as bs -> zipWith f as bs) (repeat z) scanDownward f z = tail . scanl (\as bs -> zipWith f bs as) (repeat z)

Here is an example of a scan that calculates the x-coordinate of each pixel,

let x = scanRightward (\a x -> x + 1) (-1) a
x = .

And the y-coordinates are of course

let y = scanDownward (\a x -> x + 1) (-1) a
y = .

Finding lines

If we were looking for one-dimensional images, then a 'rectangle' would just be a single line of pixels. Now each pixel is contained in at most one maximal line of foreground pixels. To find the coordinates of this line, we just need to know the left and right endpoints.

For a foreground pixel, the left endpoint of the line it is in is the same as the left endpoint of its left neighbor. On the other hand, a background pixel is not in any foreground line. So the left endpoint of all lines to the right of it will be at least x+1, where x is the x-coordinate of the background pixel. In both these cases information flows from left to right; and so the left endpoint for all pixels can be determined with a rightward scan.

Unsurprisingly, we can find the right endpoints of all foreground lines with a leftward scan. Now let's do this for all lines in the image. Notice that we need the x coordinates defined previously:

let l = scanRightward (\(a,x) l -> if a then l else x+1) 0 (zip2d a x)
l = 
let r = scanLeftward (\(a,x) r -> if a then r else x) (imWidth a) (zip2d a x)
r = 

In the images I have marked the left and right endpoints of the foreground lines in red. Note also, the values in the background pixels are not important, and you should just ignore them.

Vertically we can of course do the same thing, giving top and bottom endpoints:

let t = scanDownward (\(a,y) t -> if a then t else y+1) 0 (zip2d a y)
t = 
let b = scanUpward (\(a,y) b -> if a then b else y) (imHeight a) (zip2d a y)
b = 

However, combining these left/right/top/bottom line endpoints does not yet give rectangles containing only foreground pixels. Rather, it gives something like a cross. For example using the endpoints for (6,4) leads to the following incorrect rectangle,
.

In fact, there are many rectangles around this point (6,4):
,
and before looking at the area (or whatever function we are maximizing) there is way no telling which is the best one.

If there was some way to find just a single maximal rectangle for each pixel, then we would have an O(n2) algorithm. Assuming of course that we do find all maximal rectangles.

Finding maximal rectangles

Suppose that Rect x y w h is a maximal rectangle. What does that mean? First of all, one of the points above the rectangle, (x,y-1),(x+1,y-1),..,(x+w-1,y-1), must not be the a foreground pixel. Because if all these points are foreground, then the rectangle could be extended upwards, and it would not be maximal. So, suppose that (u,y-1) is a background pixel (or outside the image). Then (u,y) is the top endpoint of the vertical line that contains (u,y+h-1).

If we start from (u,v), we can recover the height of a maximal rectangle using the top endpoint image t. Just take t!!(u,v) as the top coordinate, and u+1 as the bottom. This image illustrates the idea:
.
Here the green point (u,v) has the red top endpoint, and it gives the height and vertical position of the yellow maximal rectangle.

Then to make this vertical line into a maximal rectangle, we just extend it horizontally as far as possible:
.

For this last step, we need to know the first background pixel that will be encountered when extending the rectangle to the left. That is the maximum value of all left endpoints in the rows t,t+1,..,b-1. This maximum can again be determined with a scan over the image:

let lt = scanDownward (\(a,l) lt -> if a then max l lt else minBound) minBound (zip2d a l)
lt = 

For extending to the right the story is exactly the same, only taking the minimum right endpoint instead:

let rt = scanDownward (\(a,r) rt -> if a then min r rt else maxBound) maxBound (zip2d a r)
rt = 

Now we have all the ingredients for finding maximal rectangles:

Every maximal rectangle can be found in this way. However, not all rectangles we get in this way are maximal. In particular, they could potentially still be extended downward. However, for finding the largest rectangle, it doesn't matter if we also see some non-maximal ones. There might also be duplicates, which again does not matter.

So now finishing up is just a matter of putting all the steps together in a function:

allMaximalRects :: Image -> [Rect]
allMaximalRects a = catMaybes . concat $ zipWith2d4 mkRect lt rt t y
  where
    x  = scanRightward (\_ x -> x + 1) (-1) a
    y  = scanDownward  (\_ y -> y + 1) (-1) a
    l  = scanRightward (\(a,x) l -> if a then l else x+1) 0 (zip2d a x)
    r  = scanLeftward  (\(a,x) r -> if a then r else x) (imWidth a) (zip2d a x)
    t  = scanDownward  (\(a,y) t -> if a then t else y+1) 0 (zip2d a y)
    lt = scanDownward  (\(a,l) lt -> if a then max l lt else minBound) minBound (zip2d a l)
    rt = scanDownward  (\(a,r) rt -> if a then min r rt else maxBound) maxBound (zip2d a r)
    mkRect l r t y
        | l /= minBound = Just $ Rect l t (r-l) (y-t+1)
        | otherwise     = Nothing

A quick QuickCheck shows that largestRectfast finds the same answer as the slow specification:

propfast_spec = forAll genImage $ \a -> largestRectspec a == largestRectfast a
λ> quickCheck propfast_spec
+++ OK, passed 100 tests.

Conclusion

It is possible to find all maximal rectangles that consist entirely of foreground pixels in an n*n image in O(n2) time. That is linear in the number of pixels. Obviously it is not possible to do any better in general.

You may wonder whether this method also works in higher dimensions. And the answer to that question is no. The reason is that there can be more than O(n3) maximal cubes in a three dimensional image. In fact, there can be at least O(n(d-1)2) maximal hypercubes in d dimensions. Just generalize this image to 3D:
. Or click here for a 3D version.

Comments

Mihai MaruseacDate: 2011-09-29T09:22Zx

This is awesome. I've submitted it to reddit :)

I wanted something like this a while ago but the time was too short to think about a faster solution. So I switched to a C implementation for finding circles in one image. Now, I think that your solution can be adapted to that problem too.

MikeDate: 2011-09-29T14:23Zx

I was never really into Minesweeper.

Michael RoseDate: 2011-09-29T16:39Zx

That's incredibly awesome! What a great solution. :D

HaffiDate: 2011-09-29T17:55Zx

Great article, just wanted to ask what program/method you used to create the pictures?

Twan van LaarhovenDate: 2011-09-29T18:24Zx

Haffi: I used the Haskell cairo package, via my own GridDiagrams package. The code to draw the pictures is in the .lhs source file of this post.

Nick KnowlsonDate: 2011-09-29T19:04Zx

First of all, excellent article, and I also have to say I love your site design. It's simple, easily readable, with just the right amount of fancy (IMO)! Now, onto my question:

I'm afraid I am missing something. When reading this bit in contains:

where rows = take h . drop x . (++repeat [])
      cols = take w . drop y . (++repeat False)

I don't understand why rows has take h . drop x instead of take h . drop y (and vice versa for cols).

Could anyone shed some light on this?

Charles KeepaxDate: 2011-09-29T19:13Zx

Excellent article, thank you very much.

Twan van LaarhovenDate: 2011-09-29T20:06Zx

Nick Knowlson: you are right, that's what you get for changing such things at the last moment. The result will be that x and y coordinates are swapped for rectangles found by largestRectspec. QuickCheck spots the error:

*Main> quickCheck propfast_spec
*** Failed! Falsifiable (after 7 tests):  
[[True,False,False,True]]
*Main> largestRectspec [[True,False,False,True]]
Rect {left = 0, top = 0, width = 1, height = 1}
*Main> largestRectfast [[True,False,False,True]]
Rect {left = 3, top = 0, width = 1, height = 1}

The quickCheck result in the post was from before I replaced t,l by x,y, what should have been y,x. I corrected the error, and also took the opportunity to make the contains function slightly easier to understand, as suggested on reddit.

Nick KnowlsonDate: 2011-09-29T20:10Zx

Ah great, that makes sense now (and is also easier to read). Thanks!

alejolpDate: 2011-09-30T00:18Zx

There's an ACM ICPC problem similar to this one:

http://livearchive.onlinejudge.org/index.php?option=com_onlinejudge&Itemid=8&category=294&page=show_problem&problem=1933

Mark LentcznerDate: 2011-09-30T03:49Zx

Please tell us about the tool chain you use to go from literate Haskell to such a lovely formatted blog post.

ErikDate: 2011-10-05T20:39Zx

I was asked to solve this problem during an interview onsite with Google.

JoshuaDate: 2012-02-23T16:35Zx

I'm also interested in hearing about the tool chain you use for blogging with literate Haskell.

KevinDate: 2016-06-03T18:21Zx

Out of curiosity, in a world where precision mattered less, could we go past linearity on number of pixels by reducing number of pixels? eg, convert a 500x500 image to a 100x100? or would that process add enough additional boilerplate as to not improve O() time?

Twan van LaarhovenDate: 2016-06-04T11:29Zx

Kevin: To downsample an image you usually have to inspect every pixel. So just reducing an n*n image to a n/2*n/2 image already takes O(n2) time. Also, if you just scale each side by a constant factor, this doesn't reduce the time complexity; O(n2/25) is still O(n2).

Reply

(optional)
(optional, will not be revealed)
Name a function of type [[a]] -> [a]:
Use > code for code blocks, @code@ for inline code. Some html is also allowed.