In part 2 of this series, I looked at finding axis aligned rectangles in binary images. I left you hanging with a hint of a more efficient algorithm than the O(n3) one from that post. Formally, the problem we were trying to solve was:
Given a binary image, find the largest axis aligned rectangle with a 1 pixel wide border that consists entirely of foreground pixels.
Here is the same example as last time,
where white pixels are the background and blue is the foreground. The rectangle with the largest area is indicated in red.
The idea behind the more efficient algorithm is simple.
Draw a vertical line x=xmid through the middle of the image,
If the largest rectangle in an image is large enough, then it will intersect this line. The one in the example above certainly does. So, the idea is to only look at rectangles that intersect x=xmid. We will worry about other cases later.
Each rectangle that intersects the vertical line consists of of a left bracket and a right bracket, just look at this ASCII art: , or at these images:
To find all rectangles intersecting x=xmid, we need to find these left and right brackets, and combine them. Note that the middle column is included in both the left and right bracket, because that makes it easier to handle rectangles and bracked of width=1.
Let's focus on finding the right brackets first. For each pair of y-coordinates and height, there is at most one largest right bracket. We don't need to consider the smaller ones. So, let's define a function that finds the width of the largest right bracket for all y-coordinates and heights. The function takes as input just the right part of the image, and it will return the result in list of lists:
rightBracketWidths :: Image -> [[Maybe Int]]
Here is a slow 'specification-style' implementation
rightBracketWidthsslow im = [ [ findLast (containsBracket im y h) [1..imWidth im] | h <- [1..imHeight im-y] ] | y <- [0..imHeight im-1] ] where findLast pred = find pred . reverse
How do we even check that a right bracket is in an image? For that we can look at right and bottom endpoints:
-- pseudo code containsBracket im y h w = r !! y !! 0 >= w -- top border && r !! (y+h-1) !! 0 >= w -- bottom border && b !! y !! (w-1) >= y+h -- right border
Here I used two dimensional indexing. So r!!y!!0 stands for the right endpoint for column 0 of row y; or in other words, the number of foreground pixels at the start of that row.
This image illustrates the right endpoints of the top and bottom border in green, and the bottom endpoint of the right border in red. The bracket (in glorious pink) has to fit between these indicated endpoints.
The rightBracketWidthsslow function is, as the name suggests, slow. It does a linear search over the possible widths. With that it would take O(m2*n) to find all widths for an m by n image. That is no better than the complexity of the algorithm from last time.
In my previous blog post, I introduced a SearchTree type that answers just the type findLast query that we need. In fact, this rectangle problem was why I made that SearchTree data structure in the first place.
There are three conditions in containsBracket. We will handle the one for the top border, r!!y!!0 >= w by building a separate search tree for each y. This search tree then only the widths w <- [1..r!!y!!0].
That leaves the conditions on the bottom and right borders. Since we fixed y, we can directly write these conditions in terms of a SearchTree query: For the bottom border we need satisfy (Ge (y+h)) (b!!y!!(w-1)), and for the right border satisfy (Le (r!!(y+h-1)!!0)) w. As you can hopefully see, these are exactly the same as the conditions in the containsBracket function above.
We can combine the two conditions into a pair, to give (Ge (y+h), Le (r!!(y+h-1)!!0)). While, to build the search tree, we need to pair b!!y!!(w-1) with w. That is just a matter of zipping two lists together:
bracketsAtY :: Int -> [Int] -> [Int] -> [Maybe Int] bracketsAtY y bsy rs@(ry:_) = [ fmap (\(Max byw1, Min w) -> w) (findLast (Ge (y+h),Le ryh1) searchTree) | (h,ryh1) <- zip [1..] rs ] where searchTree = fromList [ (Max byw1, Min w) | (byw1,w) <- zip bsy [1..ry] ] -- notation: -- bsy = [b!!y!!0, b!!y!!1, ..] -- rs = [r!!y!!0, r!!(y+1)!!0, ..] -- byw1 = b!!y!!(w-1) -- ry = r!!y!!0 -- ryh1 = r!!(y+h-1)!!0
We need to call bracketsAtY for each y, together with the right row of bottom endpoints, and right endpoints:
rightBracketWidths a = zipWith3 bracketsAtY [0..] b (tails (map head r)) where -- as in the previous posts x = scanRightward (\_ x -> x + 1) (-1) a :: [[Int]] y = scanDownward (\_ y -> y + 1) (-1) a :: [[Int]] r = scanLeftward (\(a,x) r -> if a then r else x) (imWidth a) (zip2d a x) b = scanUpward (\(a,y) b -> if a then b else y) (imHeight a) (zip2d a y)
QuickCheck will confirm that this function is the same as the slow version above:
prop_rightBracketWidths = forAll genImage $ \im -> rightBracketWidths im == rightBracketWidthsslow im
λ> quickCheck prop_rightBracketWidths +++ OK, passed 100 tests.
With the efficient search trees, bracketsAtY takes O((m+n)*log n) time, and rightBracketWidths takes O(m*(m+n)*log n) time for an m by n image. For large images this is much faster than the O(m2*n) linear search.
If we have a way of finding right brackets, we can easily reuse that function for left brackets, by just flipping the image.
leftBracketWidths :: Image -> [[Maybe Int]] leftBracketWidths = rightBracketWidths . flipHorziontal where flipHorziontal = map reverse
Once we have the left and right brackets, we can combine them into rectangles. Suppose that a left bracket has width lw, and the right bracket rw. Then the width of the rectangle they form is lw+rw-1, since both include the middle line.
combineBrackets :: Int -> [[Maybe Int]] -> [[Maybe Int]] -> [Rect] combineBrackets xmid lbrackets rbrackets = [ Rect (xmid-lw+1) y (lw+rw-1) h | (y,lws,rws) <- zip3 [0..] lbrackets rbrackets , (h,Just lw,Just rw) <- zip3 [1..] lws rws ]
And finding (a superset of) all the maximal rectangles intersecting the vertical line x=xmid can be done by cutting the image on that line, finding brackets, and combining them.
rectsIntersecting :: Int -> Image -> [Rect] rectsIntersecting xmid im = combineBrackets xmid bracketsleft bracketsright where imleft = map (take (xmid+1)) im imright = map (drop xmid) im bracketsleft = leftBracketWidths imleft bracketsright = rightBracketWidths imright
We left out one important case: what if the largest rectangle does not intersect the mid-line? For that purpose man has invented recursion: First look for rectangles intersecting the middle, and then look for rectangles not intersecting the middle. For that we need to look in the left and right halves.
To make this asymptotically fast, we have to ensure that both the width and height decrease. Since the time complexity of rectsIntersecting includes a log n term, it is faster for wide images. So, if the image is tall, we just transpose it to make it wide instead.
The recursion pattern of vertical and horizontal and middle lines will in the end look something like this:
with the first level in yellow, the second in green, then magenta and red. So in the first level we find all rectangles intersecting the yellow line. Then in the second level all rectangles intersecting a green line, and so on.
Here is the code:
allRectsRecurse :: Image -> [Rect] allRectsRecurse im -- empty image ==> no rectangles | imWidth im == 0 || imHeight im == 0 =  -- height > width? ==> transpose | imHeight im > imWidth im = map transposeRect . allRectsRecurse . transpose $ im -- find and recruse | otherwise = rectsIntersecting xmid im -- find ++ findRectsRecurse imleft -- recurse left ++ map (moveRect (xmid+1) 0) (allRectsRecurse imright) -- recurse right where xmid = imWidth im `div` 2 imleft = map (take xmid) im -- *excluding* the middle line imright = map (drop (xmid+1)) im
transposeRect :: Rect -> Rect transposeRect (Rect x y w h) = Rect y x h w moveRect :: Int -> Int -> Rect -> Rect moveRect dx dy (Rect x y w h) = Rect (x+dx) (y+dy) w h
Since the image is roughly halved in each recursion step, the recursio will have depth O(log n) for an n by n image. At each level, the rectsIntersecting calls will take O(n2*log n) time, for a total of O(n2*(log n)2). This is significantly faster than the O(n3) from the previous post.
For the complexity theorists: it is possible to do slightly better by using a disjoint-set (union-find) data structure instead of a search tree for finding brackets. I believe that would bring the runtime down to O(n2*log n*α(n)) where α is the inverse Ackermann function. Unfortunately such a data structure requires mutation, the correctness proofs are much harder, and the gain is quite small.
Let me end by checking that the set of maximal rectangles we find are the same as those found by the specification from the previous post. Then by extension the largest rectangle found will also be the same.
prop_maximalRects = forAll genImage $ \im -> (sort . onlyTrulyMaximalRects . allRects) im == (sort . onlyTrulyMaximalRects . allRectsRecurse) im
λ> quickCheck prop_maximalRects +++ OK, passed 100 tests.