Extra unsafe sequencing of IO actions

Published: 2015-10-10T22:00Z
License: CC-BY

Warning: evil ahead!

A while ago Neil Mitchell wrote about a different implementation of sequence for the IO monad. The issue with the usual definition is that it is not tail recursive. Neil's version uses some hacks to essentially break out of the IO monad. But the solution does require two traversals of the list.

Now in any language other than Haskell this IO monad wouldn't exist at all, and with a bit of luck lists would be mutable. Then you could implement sequence by just appending items to the end of a list. In Haskell you can not do that. Or can you?

A obvious way to implement mutable lists in haskell is with IORefs. But then you end up with something that is not an ordinary list, and you would have to use the IO monad even for reading from it. Instead, why not be unsafe? Just because Haskell doesn't let you change the tail of a list doesn't mean that it is impossible.

Now obviously this requires something beyond ordinary haskell. And even doing it from C via the foreign function interface is hard, because GHC will try to marshall values you pass to C functions. But GHC also allows you to write primitive operations in C--, which is essentially a portable assembly language. In C-- you *can* just overwrite the tail pointer of a (:) constructor to point to something else.

So I wrote a simple unsafeSetField function.

unsafeSetFieldzh (W_ i, gcptr x, gcptr y) {
  W_ bd;
  x = UNTAG(x);
  P_[x + SIZEOF_StgHeader + WDS(i)] = y; // write in memory
bd = Bdescr(x); if (bdescr_gen_no(bd) != 0 :: bits16) { recordMutableCap(x, TO_W_(bdescr_gen_no(bd))); return (); } else { return (); } }

There are several things going on here. First of all, GHC uses pointer tagging, meaning that we need to untag the incomming pointer. Secondly, it might be the case that the x lives in the old GC generation, in which case we have to mark the fact that we changed it, since otherwise y might get garbage collected. By the way, the zh in the end of the function name is the z encoding for the # character.

Now to use this function from Haskell we import it and add some unsafeCoerce,

foreign import prim "unsafeSetFieldzh" unsafeSetField#
  :: Int# -> Any -> Any -> (##)
unsafeSetField :: Int -> a -> b -> IO () unsafeSetField (I# i) !x y = case unsafeSetField# i (unsafeCoerce# x :: Any) (unsafeCoerce# y :: Any) of (##) -> return () {-# INLINEABLE unsafeSetField #-}

With it we can implement sequence as follows

sequenceU :: [IO a] -> IO [a]
sequenceU [] = return []
sequenceU (mx0:xs0) = do
    x0 <- mx0
    let front = x0:[]
    go front xs0
    return front
  go back [] = return ()
  go back (mx:xs) = do
    x <- mx
    let back' = x:[]
    unsafeSetField 1 back back'
    go back' xs
{-# INLINEABLE sequenceT #-}

Now for the big questions: Does it work? The answer is that, yes it does! Benchmarking shows that the unsafe sequenceU is between 11% and 23% faster than Neil's sequenceIO in all cases. For small lists the standard sequence implementation is still marginally faster.

You should be aware that GHC sometimes shares values, so overwriting part of one might overwrite them all. And also that constant lists might become static values, meaning not allocated on the heap. So trying to overwrite parts of those will just crash the program.

I also wouldn't be surprised if the above code is subtly wrong. Perhaps I am missing a write barrier or doing something wrong with the generation check. So I wouldn't use it in production code if I were you.

What if you don't even know beforehand what constructor to use? The GHC runtime system has something called indirections. These are used to replace a thunk with its result after evaluation. But we can also use indirections to replace a value itself. But because of pointer tagging you can't just replace one constructor by another, because the tag would be wrong.

Instead the idea is to first allocate a special "hole" value, and then later fill that hole by overwriting it with an indirection. Note that you can only do that once, because the runtime system will follow and remove indirections when possible. So you get an API that looks like

newHole :: IO a
setHole :: a -> a -> IO a

It is also possible to implement sequence with holes. But, perhaps unsurprisingly, this turns out to be a bit slower. I'll leave the actual implementation as an exercise for the interested reader, as well as the question of what other evil you can commit with it.

I wanted to publish the functions from this post on hackage, but unfortunately I haven't yet figured out how to include C-- files in a cabal package. So instead everything is on github.


Dan DoelDate: 2015-10-14T02:16Zx

This can also be implemented with unsafeInterleaveIO (because it can be used to implement IVars, which are equivalent to your "holes"). But it is also less efficient than sequence.

Joachim BreitnerDate: 2015-10-14T10:59Zx

Great! I have been thinking about this issue a while ago, see https://www.joachim-breitner.de/blog/620-Constructing_a_list_in_a_Monad.<7a>. Basically you are answering my concluding question

I’m still wondering: What would be required from Haskell, GHC or the monads in question to have a fully satisfactory solution here, i.e. one that is as fast as the naive recursion, as stack efficient as the difference list solution, and allocates no thunks, only list cells?

I wonder if it is possible to wrap your hack in a safe API, with operations such as

  • Allocating a new empty mutable (snoc’able) list
  • Appending a new value
  • Closing the list, returning a regular list

The interface should prevent you from using the list while it is still under constraction, and such a thing must only be used once (either by appending or by returning).

Dan, is you approach using unsafeInterleaveIO the one that I describe in my blog post?

Joachim BreitnerDate: 2015-10-14T11:00Zx

Also, see http://hackage.haskell.org/package/ghc-heap-view for an example of a Cabal package that ships C-- files.

Joachim BreitnerDate: 2015-10-14T11:25Zx

Surprisingly, it is your approach is still 47% slower than a naive recursion in my benchmark. I wonder why that is the case.

BTW, re your captcha: "head" is also a function of type `a->[a]`.

Joachim BreitnerDate: 2015-10-14T11:29Zx

Ignore the last comment. I can reproduce that the hack gets much faster as the list becomes longer.

Joachim BreitnerDate: 2015-10-14T14:47Zx

I included this in my benchmark from two years ago, see http://www.joachim-breitner.de/blog/684-Constructing_a_list_in_a_monad_revisited for the result.

Dan DoelDate: 2015-10-14T14:48Zx

Joachim, the unsafeInterleaveIO version in your post is more straight forward than what I was talking about. Rather, I meant you can use it like so:

data IVar a = IVar (MVar a) a
newIVar :: IO (IVar a) newIVar = do v <- newEmptyMVar IVar v <$> unsafeInterleaveIO (readMVar v)
readIVar :: IVar a -> a readIVar (IVar _ result) = result
writeIVar :: IVar a -> a -> IO () writeIVar (IVar v _) x = do b <- tryPutMVar v x unless b $ error "Wrote IVar twice"

This API is similar to the newHole/setHole API Twan mentioned, but probably more agreeable to most people. It's, of course, considerably slower, and probably uses more memory, because of all the MVars and whatnot.

Doing mapM for IO your way is actually faster for very large lists than the usual mapM which I can't say for any of the other things I've tested (my IVar thing, a version that does it more explicitly, and with IORef instead of MVar to avoid some overhead, and also the version that accumulates and reverses the result list---I haven't seen that last one actually be faster, even for pretty large lists).

Neil MitchellDate: 2015-10-14T14:50Zx

Very cool - I thought I was being devious with my code, but you've gone way past that.

Dan DoelDate: 2015-10-14T15:09Zx

Actually, now that I think of it, Joachim's unsafeInterleaveIO version is probably almost identical to the code in Neil's post under the hood. It probably get optimized rather less, and it has some guards against duplicate evaluation which probably aren't necessary in this case, but if you stepped through some evaluation, it'd probably be doing the same thing.

Edward KmettDate: 2015-10-14T15:11Zx

Clever! I wonder if I can use this to improve the performance of my promises package. =)

Joachim BreitnerDate: 2015-10-14T15:12Zx

It would be worth trying if using unsafeFixIO makes the differences go away. But I would expect that the IORefs involves still incur a noticeable overhead.

Edward KmettDate: 2015-10-14T15:12Zx

Re: cabal, just include the .cmm in c-sources, e.g.

c-sources: cbits/mpfr-wrappers.cmm

Derek ElkinsDate: 2015-10-14T18:39Zx

So we can have Recycling Continuations in Haskell now?

Edward KmettDate: 2015-10-16T02:27Zx

Would it be possible to maybe put in an actual Blackhole in the indirection while it is being started? That way trying to enter a newHole would block waiting for the computation.

Edward KmettDate: 2015-10-16T02:47Zx

It would seem that in order to pull that off we'd need some thread we can declare as the tso that is the blackhole's owner?

Can we just spin up a thread that blocks indefinitely on something not-detectable to the RTS, and steal its thread ID as the owner of such a scheme?


(optional, will not be revealed)
Name of the lazy functional programming language I write about:
Use > code for code blocks, @code@ for inline code. Some html is also allowed.