I will begin using the same example from the previous post: computing an average. The naive version would be written like this:
average :: [Int] -> Int average xs = sum xs `div` length xsThis does not work well because it:
- loads the entire list into memory, and:
- cannot fuse away the intermediate list.
import Control.Applicative import qualified Control.Foldl as L average :: L.Fold Int Int average = div <$> L.sum <*> L.lengthAll folds built this way have two nice properties which foldl guarantees:
- They only traverse the list once
- They will not leak space
main = print $ L.fold average [0..100000000]This gives good efficiency at about 1 nanosecond per list element because it fuses away the intermediate list:
500000000 real 0m0.956s user 0m0.952s sys 0m0.000sWe can see this if we inspect the core, too, which produces this reasonably tight loop (which I've cleaned up):
$wgo = \ (x :: Int#) (y :: Int#) -- y is keeping tally of the sum (z :: Int#) -> -- z is keeping tally of the length case x of x' { __DEFAULT -> $wgo (x' +# 1 ) (y +# x') (z +# 1 ); 1000000000 -> (# I# (y +# 1000000000), I# (z +# 1) #) }This works because fold is implemented in terms of foldr to trigger build/foldr fusion. However, not all lists will fuse this way. I find, for example, that lists of Doubles refuse to fuse in this way:
import Control.Applicative import qualified Control.Foldl as L -- This gives worse performance: ~40 ns per step average :: L.Fold Double Double average = (/) <$> L.sum <*> L.genericLength main = print $ L.fold average [0..1000000000]I haven't yet figured out how to trick GHC into fusing away these lists. If anybody knows how to do this, please contribute a patch.
Fortunately, these folds preserve the original accumulator, step function, and extraction function, so you can always unpack the Fold type and then implement the list fusion yourself:
main = case average of L.Fold step start done -> do let go n x = if (n > 1000000000) then x else go (n + 1) $! step x n print $ done (go 0 start)So while reliable list fusion is still an unsolved problem, you can still use foldl to get reasonable performance and guarantee no space leaks. Also, there is always the option of using foldl to do the work of assembling derived strict and streaming folds and then implementing list fusion yourself if you want to squeak out every last drop of performance.
I wrote up this library to dispel the myth that only experts can fold things efficiently in Haskell. foldl lets you begin from simple primitive folds that you can fit in your head and then chain them together into complex folds with little effort.
I'm surprised this isn't done in terms of your pipes library. Couldn't it be massively generalise to that framework?
ReplyDeleteIt's actually *more* general to do it this way. The `Fold` type can be unpackaged and used within any library that needs strict left folds.
DeleteFor example, right now `pipes` has this function:
foldl :: (Monad m) => (x -> a -> x) -> x -> (x -> b) -> Producer a m r -> m b
You can use `Fold`s to fold `pipes` just by unpackaging the type and passing the fields to that `foldl` command. I was considering making `foldl` do the unpacking automatically but I've currently chosen not to in order to avoid the dependency.
You can imagine that any library (like `conduit` or `machines`) can reuse `Fold` in the exact same way. It's very reusable.
The reason that this `Fold` type is more general is that it preserves the minimal information necessary to encode the fold without coupling it to any framework. Also, it wouldn't work anyway if you specialized it to `pipes`. You'd lose the information necessary to combine folds if you did that.
I think your naive version of `average` misses a `xs` as its parameter. ;)
ReplyDeleteYeah, you're right! I fixed it.
ReplyDelete