The Type
Haskell is a types-first language, so we will begin by choosing an appropriate type to represent threads. First we must state in plain English what we want threads to do:
- Threads must extend existing sequences of instructions
- Threads must permit a set of operations: forking, yielding control, and terminating.
- Threads should permit multiple types of schedulers
- When you hear "multiple interpreters/schedulers/backends" you should think "free" (as in "free object")
- When you hear "sequence of instructions" you should think: "monad".
- When you qualify that with "extend" you should think: "monad transformer".
Syntax trees
"Free monad transformer" is a fancy mathematical name for an abstract syntax tree where sequencing plays an important role. We provide it with an instruction set and it builds us a syntax tree from those instructions.
We said we want our thread to be able to fork, yield, or terminate, so let's make a data type that forks, yields, or terminates:
{-# LANGUAGE DeriveFunctor #-} data ThreadF next = Fork next next | Yield next | Done deriving (Functor)ThreadF represents our instruction set. We want to add three new instructions, so ThreadF has three constructors, one for each instruction: Fork, Yield, and Done.
Our ThreadF type represents one node in our syntax tree. The next fields of the constructors represent where the children nodes should go. Fork creates two execution paths, so it has two children. Done terminates the current execution path, so it has zero children. Yield neither branches nor terminates, so it has one child. The deriving (Functor) part just tells the free monad transformer that the next fields are where the children should go.
Now the free monad transformer, FreeT, can build a syntax tree from our instruction set. We will call this tree a Thread:
import Control.Monad.Trans.Free -- from the `free` package type Thread = FreeT ThreadFAn experienced Haskell programmer would read the above code as saying "A Thread is a syntax tree built from ThreadF instructions".
Instructions
Now we need primitive instructions. The free package provides the liftF operation which converts a single instruction into a syntax tree one node deep:
yield :: (Monad m) => Thread m () yield = liftF (Yield ()) done :: (Monad m) => Thread m r done = liftF Done cFork :: (Monad m) => Thread m Bool cFork = liftF (Fork False True)You don't need to completely understand how that works, except to notice that the return value of each command corresponds to what we store in the child fields of the node:
- The yield command stores () as its child, so its return value is ()
- The done command has no children, so the compiler deduces that it has a polymorphic return value (i.e. r), meaning that it never finishes
- The cFork command stores boolean values as children, so it returns a Bool
We can combine cFork and done to reimplement a more traditional Haskell-style fork, using the convention that the left branch is the "parent" and the right branch is the "child":
import Control.Monad fork :: (Monad m) => Thread m a -> Thread m () fork thread = do child <- cFork when child $ do thread doneThe above code calls cFork and then says "if I am the child, run the forked action and then stop, otherwise proceed as normal".
Free monads
Notice that something unusual happened in the last code snippet. We assembled primitive Thread instructions like cFork and done using do notation and we got a new Thread back. This is because Haskell lets us use do notation to assemble any type that implements the Monad interface and our free monad transformer type automatically deduces the correct Monad instance for Thread. Convenient!
Actually, our free monad transformer is not being super smart at all. When we assemble free monad transformers using do notation, all it does is connect these primitive one-node-deep syntax trees (i.e. the instructions) into a larger syntax tree. When we sequence two commands like:
do yield done... this desugars to just storing the second command (i.e. done) as a child of the first command (i.e. yield).
The scheduler
Now we're going to write our own thread scheduler. This will be a naive round-robin scheduler:
import Data.Sequence -- Queue with O(1) head and tail operations roundRobin :: (Monad m) => Thread m a -> m () roundRobin t = go (singleton t) -- Begin with a single thread where go ts = case (viewl ts) of -- The queue is empty: we're done! EmptyL -> return () -- The queue is non-empty: Process the first thread t :< ts' -> do x <- runFreeT t -- Run this thread's effects case x of -- New threads go to the back of the queue Free (Fork t1 t2) -> go (t1 <| (ts' |> t2)) -- Yielding threads go to the back of the queue Free (Yield t') -> go (ts' |> t') -- Thread done: Remove the thread from the queue Free Done -> go ts' Pure _ -> go ts'... and we're done! No really, that's it! That's the whole threading implementation.
User-land threads
Let's try out our brave new threading system. We'll start off simple:
mainThread :: Thread IO () mainThread = do lift $ putStrLn "Forking thread #1" fork thread1 lift $ putStrLn "Forking thread #2" fork thread2 thread1 :: Thread IO () thread1 = forM_ [1..10] $ \i -> do lift $ print i yield thread2 :: Thread IO () thread2 = replicateM_ 3 $ do lift $ putStrLn "Hello" yieldEach of these threads has type Thread IO (). Thread is a "monad transformer", meaning that it extends an existing monad with additional functionality. In this case, we are extending the IO monad with our user-land threads, which means that any time we need to call IO actions we must use lift to distinguish IO actions from Thread actions.
When we call roundRobin we unwrap the Thread monad transformer and our threaded program collapses to a linear sequence of instructions in IO:
>>> roundRobin mainThread :: IO () Forking thread #1 Forking thread #2 1 Hello 2 Hello 3 Hello 4 5 6 7 8 9 10Moreover, this threading system is pure! We can extend monads other than IO, yet still thread effects. For example, we can build a threaded Writer computation, where Writer is one of Haskell's many pure monads:
import Control.Monad.Trans.Writer logger :: Thread (Writer [String]) () logger = do fork helper lift $ tell ["Abort"] yield lift $ tell ["Fail"] helper :: Thread (Writer [String]) () helper = do lift $ tell ["Retry"] yield lift $ tell ["!"]This time roundRobin produces a pure Writer action when we run logger:
roundRobin logger :: Writer [String] ()... and we can extract the results of that logging action purely, too:
execWriter (roundRobin logger) :: [String]Notice how the type evaluates to a pure value, a list of Strings in this case. Yet, we still get real threading of logged values:
>>> execWriter (roundRobin logger) ["Abort","Retry","Fail","!"]
Conclusion
You might think I'm cheating by off-loading the real work onto the free library, but all the functionality I used from that library boils down to 12 lines of very generic and reusable code (see the Appendix). This is a recurring theme in Haskell: when we stick to the theory we get reusable, elegant, and powerful solutions in a shockingly small amount of code.
The inspiration for this post was a computer science paper written by Peng Li and Steve Zdancewic titled A Language-based Approach to Unifying Events and Threads . The main difference is that I converted their continuation-based approach to a simpler free monad approach.
Edit: aseipp on /r/haskell just pointed out that my post is scarily similar to a pre-existing functional pearl: A Poor Man's Concurrency Monad by Koen Classen.
Appendix: Free monad transformer code
The essence of a syntax tree distilled into 12 lines of code:
data FreeF f a x = Pure a | Free (f x) newtype FreeT f m a = FreeT { runFreeT :: m (FreeF f a (FreeT f m a)) } instance (Functor f, Monad m) => Monad (FreeT f m) where return a = FreeT (return (Pure a)) FreeT m >>= f = FreeT $ m >>= \v -> case v of Pure a -> runFreeT (f a) Free w -> return (Free (fmap (>>= f) w)) instance MonadTrans (FreeT f) where lift = FreeT . liftM Pure liftF :: (Functor f, Monad m) => f r -> FreeT f m r liftF x = FreeT (return (Free (fmap return x)))
Nice wrap up.. thanks!
ReplyDeleteThanks for the article Gabriel ! Your explanation of free monads is probably the best one that you can find online.
ReplyDeletebtw there is a small typo - we are logging
lift $ putStrLn "Forking thread #1"
twice, which is a bit confusing
Thanks! Fixed
DeleteI know this is an old post but you could also fix the output :-)
ReplyDelete>>> roundRobin mainThread :: IO ()
Forking thread #1
Forking thread #2
1
Hello
2
Hello
3
Hello
4
5
6
7
8
9
10
Thanks! I fixed it
Delete