- A correct-by-construction ListT implementation that converts to and from proxies
- The CodensityP proxy transformer, which improves the time complexity of left-associated binds
- ArrowChoice operations for selectively applying pipes to subsets of an input stream
- "Pointful" operators
- Many performance improvements
ListT
Many people know that ListT in transformers is broken, thanks to the wonderful ListT done right article. Fewer people know that there is a correct-by-construction implementation of ListT up on Hackage in the List package. However, there are a couple of problems with the List package:
- It's difficult to build custom ListT actions.
- It's difficult to read out the result of List.
If you thought to yourself "Oh, no.... Gabriel added a List dependency to pipes", you'd be wrong! In fact, pipes has had ListT support since version 2.4 and I didn't even realize it until I was working on a perfect backtracking parser for the upcoming pipes-parse library.
The key lies in the Interact class that I introduced in pipes-2.4, which I've renamed to the ListT class in this release. People familiar with pipes know that this mysterious class provided two extra operators: (/>/) and (\>\), and that these operators happened to form two extra categories, with respond and request as their respective identities:
respond />/ f = f f />/ respond = f (f />/ g) />/ h = f />/ (g />/ h) request \>\ f = f f \>\ request = f (f \>\ g) \>\ h = f \>\ (g \>\ h)However, at the time I discovered these two additional categories I dismissed them as less interesting than the proxy composition categories, even mentioning in the tutorial that "they are more exotic and you probably never need to use them". Little did I know how wrong I was!
I later discovered that these two categories were both ListT Kleisli categories. The "respond" category (i.e. respond and (/>/))) is actually a monad over the downstream output of proxies, or in other words:
- respond corresponds to return
- (/>/) corresponds to (>=>)
newtype RespondT p a' a b' m b = RespondT { runRespondT :: p a' a b' b m b' } instance (Monad m, ListT p) => Monad (RespondT p a' a b' m) where return a = RespondT (respond a) m >>= f = RespondT ( runRespondT m //> \a -> runRespondT (f a)) -- (//>) is the "pointful" version of (/>/), -- just like (>>=) is the "pointful" version of (>=>)Using RespondT, you can bind a proxy's output as if you were binding a list within the list monad:
import Control.Proxy twoNumbers :: (Proxy p) => () -> Producer p Int IO () twoNumbers = readLnS >-> execU (putStrLn "Enter a number: ") >-> takeB_ 2 stringsTillQuit :: (Proxy p) => () -> Producer p String IO () stringsTillQuit = stdinS >-> execU (putStrLn "Enter a string: ") >-> takeWhileD (/= "quit") -- 'ProduceT' is a convenient type synonym around 'RespondT' exampleListT :: (ListT p) => () -> ProduceT p IO (Int, String) exampleListT () = do n <- RespondT $ twoNumbers () str <- RespondT $ stringsTillQuit () return (n, str)You can then compile RespondT back to a proxy just by unwrapping the newtype, generating a proxy that produces one output per permutation of bound values:
exampleProxy :: (List p) => () -> Producer p (Int, String) IO () exampleProxy = runRespondK exampleListTMore often, you would just combine these two steps into one:
exampleProxy :: (List p) => () -> Producer p (Int, String) IO () exampleProxy () = runRespondT $ do n <- RespondT $ twoNumbers () str <- RespondT $ stringsTillQuit () return (n, str)
>>> runProxy $ exampleProxy >-> printD Enter a number: 1<Enter> Enter a string: Hello<Enter> (1,"Hello") Enter a string: world<Enter> (1,"world") Enter a string: quit<Enter> Enter a number: 2<Enter> Enter a string: Testing<Enter> (2,"Testing") Enter a string: 123<Enter> (2,"123") Enter a string: quit<Enter>Notice how reading out the ListT is trivial. You just use the pipes machinery you know and love to read out the resulting lazy stream of values.
As you might have guessed, there is a symmetric ListT monad over upstream outputs, too, which I've named RequestT. RespondT and RequestT are correct by construction, meaning that they always satisfy the monad and monad transformer laws.
However, RespondT and RequestT are much more powerful than meets the eye. For example, you need not limit yourself to Producers when you use RespondT, as the following example demonstrates:
pipeT :: (ListT p) => () -> Pipe p Int (Int, Int) IO () pipeT () = runRespondT $ do x <- RespondT $ takeB_ 2 () y <- RespondT $ takeB_ 3 () return (x, y)You can non-deterministically select outputs from any Proxy type and RespondT just magically does the right thing:
>>> runProxy $ enumFromS 1 >-> pipeT >-> printD (1,2) (1,3) (1,4) (5,6) (5,7) (5,8)Similarly, you need not restrict yourself to unidirectional pipes. RespondT and RequestT won't bat an eyelash if you try to use them for bidirectional general-purpose proxies. pipes goes above and beyond a traditional ListT implementation.
The proxy prelude provides convenience functions for common operations, such as ranges or iterating over lists:
pairs :: (ListT p) => () -> Producer p (Int, Int) IO () pairs () = runRespondT $ do x <- rangeS 1 2 lift $ putStrLn $ "x = " ++ show x y <- eachS [4, 6, 8] lift $ putStrLn $ "y = " ++ show y return (x, y)eachS is named after Ruby's each function and rangeS is named after Python's range function. You can bind each one within RespondT to non-deterministically select from a list or range, respectively.
>>> runProxy $ pairs >-> printD x = 1 y = 4 (1,4) y = 6 (1,6) y = 8 (1,8) x = 2 y = 4 (2,4) y = 6 (2,6) y = 8 (2,8)It also wouldn't be a "ListT done right" unless it got the examples from that article correct, too:
myTest :: (ListT p) => Int -> () -> ProduceT p IO (Int, Int) myTest n () = do let squares = eachS $ takeWhile (<= n) $ map (^2) [0..] x <- squares y <- squares lift $ print (x, y) guard $ x + y == n lift $ putStrLn "Sum of squares." return (x, y)However, that example had a much more difficult time reading out just the first result. We can do so quite easily just by using the headD_ fold, which only drives the RespondT block long enough to retrieve the first result:
>>> let p = raiseK (runRespondK (myTest 5)) >-> headD_ >>> execWriterT $ runProxy p (0,0) (0,1) (0,4) (1,0) (1,1) (1,4) Sum of squares. First {getFirst = Just (1,4)}If you want to learn more, you can read the newly added ListT section of the tutorial, which provides even more code examples.
So now proxies now possess two symmetric ListT implementations you can add to your toolbox, and they improve on the state of the art by reusing the elegant pipes machinery for both building ListT actions and reading out their values.
Codensity Proxy Transformer
In addition to conduit, I must also contend with Edward's machines library (currently on Github). Until recently, machines possessed one notable advantage over pipes: it used a codensity transformation of its internal free monad to avoid a quadratic blowup from a large series of left-associated binds. Normally these do not arise commonly in practice, but it is still a nice feature to have.
Now pipes has assimilated this feature, too. You can improve the time complexity of any pipe just by wrapping the pipe in runCodensityP or runCodensityK, both of which behave like the improve function from the free package. These automatically fix any quadratic time complexity of left-associated binds.
replicateM is a great example of a function that generates lots of left-associated binds. If I try to use it within a pipe, I will get a quadratic blowup:
import Control.Monad import Control.Proxy leftAssociate () = replicateM 10000 (request ()) main = do xs <- runProxy $ enumFromS (1 :: Int) >-> leftAssociate print xs
$ time ./main >/dev/null real 0m3.773s user 0m3.716s sys 0m0.052s... but if you wrap the pipeline in runCodensityK, it switches to linear time complexity:
import Control.Monad import Control.Proxy import Control.Proxy.Trans.Codensity leftAssociate () = replicateM 10000 (request ()) main = do xs <- runProxy $ runCodensityK $ enumFromS (1 :: Int) >-> leftAssociate print xs
$ time ./main >/dev/null real 0m0.031s user 0m0.024s sys 0m0.000sEven better, you can wrap just the pathological pipe in runCodensityK:
main = do xs <- runProxy $ enumFromS (1 :: Int) >-> runCodensityK leftAssociate print xs... which gives a minor performance improvement (more easily detectable for larger numbers of requests):
time ./main >/dev/null real 0m0.027s user 0m0.024s sys 0m0.000sMy own performance measurements show that while the codensity transformation does improve time complexities for left-associated binds, it yields worse constant factors (about 6-fold slower for entirely pure code, but much less for IO-bound code), which is why I do not enable it on by default. The main reason the naive free monad in the ProxyFast implementation outperforms the codensity version is that it can use rewrite RULES to rewrite your code into the optimal tuned form, but the codensity-transformed version cannot.
ArrowChoice
Ever since I first released pipes, I've received numerous questions about whether or not pipes can be made an instance of Arrow. While you can't make Arrow work, you CAN make proxies implement ArrowChoice, although I don't provide an actual instance because there are two such instances (one for downstream and one for upstream) and the requisite newtypes would be very cumbersome.
You can find these combinators in the ArrowChoice section of the proxy prelude, which provides left{D/U} and right{D/U}. Using these combinators, you can selectively apply pipes to a subset of a stream:
stream :: (Monad m, Proxy p) => () -> Producer p (Either Int Char) m () stream = fromListS [Left 3, Right 'C', Right 'D', Left 4, Right 'E', Left 5]
>>> let p = stream >-> leftD (takeB_ 2 >-> mapD show) >-> printD >>> runProxy p Left "3" Right 'C' Right 'D' Left "4"This lets you dynamically switch behavior in response to stream values. For example, one person recently asked me how you would switch content handling in the middle of a pipeline (for example, after negotiating encryption and compression). One option I proposed is that they could use Either to distinguish between values before and after negotiation and then filter them differently using the ArrowChoice combinators:
negotiation () = do (before >-> mapD Left ) () (after >-> mapD Right) () main = runProxy $ source >-> negotiation >-> leftD idT >-> rightD (decompress >-> decrypt) >-> sinkHowever, that's purely a theoretical idea I threw out there. I haven't actually tried this solution in practice, yet.
Point-ful operators
All type classes now use the "point-ful" equivalents of their original point-free composition operators, and the point-free operators are now derived from the point-ful ones. There are two significant advantages of this:
- Types. Some previous perfectly safe code required the following hack: ((\_ -> p) >-> k) undefined. These point-ful operators now naturally lead to the correct and more general types of their corresponding point-free composition operators.
- Performance. I actually used these operators internally to get pipes performance so high. Exposing them directly in the type class removes a lot of code indirection and improves the performance of the base proxy implementations (typically by about 10-15%) and the proxy transformers because there is less indirection.
Performance
Other performance improvements include better rewrite RULES. I found several cases where the original rules were not firing, which is why the proxy prelude still depended on the manual worker/wrapper code to ensure that they fired. I recently found a more general set of rewrite RULES that work even more reliably and the proxy prelude now seems to optimize correctly even when written using the most naive code. However, I haven't committed the newer naive versions yet, as a precaution.
Stability
Many people probably want to know when I still stabilize the pipes package so that it can eventually go in the Haskell platform. The answer is that I won't stabilize it officially until I complete three upcoming libraries:
- pipes-parse, which provides a native parsing extension and a standard set of end-of-input machinery for pipes
- pipes-free, which will expose the underlying free monad and also provide an iostreams-like interface to pipes
- pipes-stm, which will be the basis of an FRP system built on top of pipes
Open Design Issues
There remains one major performance bottleneck in the fast proxy implementation: hoist. I can dramatically speed it up by removing an normalization step, but doing so means that you can only safely supply hoist with monad morphisms (such as lift), otherwise you will violate the monad transformer laws. In theory, you should really only use monad morphisms for hoist anyway, but in practice people violate this all the time and try to do "weird" things like hoist (runStateT 0), which is very insensible, and I can't easily use the types to forbid that kind of thing. So I would like feedback on whether people prefer speed or safety for the ProxyFast implementation. Alternatively, I could release a third base Proxy instance not selected by default that is identical to ProxyFast in all respects except for providing the faster hoist.
Another question I have is whether or not to merge the proxy prelude into a single module. I welcome any feedback on that. Some people comment that the the module hierarchy is a little-bit too fine-grained and that's one opportunity to condense four modules into one.
Upcoming libraries
The next library on the docket is pipes-parse, followed shortly by pipes-bytestring. I know that many people need the pipes-bytestring library to progress further with pipes, but it depends on pipes-parse, which establishes some important higher-level idioms for the pipes ecosystem in general. My rate of progress has slowed recently mainly because I'm wrapping up my PhD, but I expect that I can finish both within about two months even at my current rate.