pipes-parse is finally out!
pipes users know that
pipes has lagged behind
conduit and
io-streams in the parsing arena and this library provides the utilities necessary to close the gap. You can find the
pipes-parse library here, and I recommend reading
the tutorial. This post will mainly discuss the development of
pipes-parse and compare it to parsing solutions from other streaming libraries.
End of Input
pipes-parse copies both
io-streams and
conduit for the end of input protocol: wrap values in
Just and end with a stream of
Nothings. There are two ways you can modify an input stream to obey this protocol.
The first approach is to use the
wrap function, which enforces this protocol:
wrap :: (Monad m, Proxy p)
=> p a' a b' b m r -> p a' a b' (Maybe b) m s
Then you can just write:
wrap . producer >-> consumer
wrap proves its termination safety by having a polymorphic return value (because it ends with a never-ending stream of
Nothings).
The second approach is to rewrite your producer as an input stream in the style of
io-streams (see
this post for details):
source' :: (Monad m, Proxy p) => () -> Session p m (Maybe a)
... and use
request composition to connect the producer:
source' \>\ consumer
This approach proves its termination safety by virtue of using
request composition. The composition operator specializes to:
(\>\) :: (Monad m, Proxy p)
=> (() -> Session p m (Maybe a))
-> (() -> Consumer p (Maybe a) m b )
-> (() -> Session p m b )
The composite pipe's return value only derives from the downstream pipe (i.e.
consumer in this case). This is because
request composition is automatically safe against termination from the upstream pipe. In the above example,
source' just replaces every
request within
consumer and if
source' terminates all that means is that the
request completes.
What's nice is that both approaches are 100% compatible with each other. You, the pipe writer, do not need to anticipate which way users will supply input. You just write a pipe that consumes values of type
Maybe a and both of the above approaches will work with your pipe. Also, both of these approaches guarantee that you can return values directly from the downstream pipe without guarding the return value with a
Maybe.
Pushback and Leftovers
pipes implements pushback using the
StateP proxy transformer:
-- Like @request ()@, except try to use the leftovers buffer first
draw :: (Monad m, Proxy p)
=> StateP [a] p () (Maybe a) y' y m (Maybe a)
-- Push an element back onto the leftovers buffer
unDraw :: (Monad m, Proxy p)
=> a -> StateP [a] p x' x y' y m ()
This is a great example of how the proxy transformer system makes it easy to extend
pipes with new features without baking them into the core implementation. I can use the (newly-fixed)
StateP proxy transformer to add a leftovers buffer that
draw and
unDraw both use.
Pushback is where
pipes-parse significantly improves on the competition. To motivate the
pipes-parse solution, consider the type for
conduit's most general composition operator:
(>+>) :: Monad m
=> Pipe l a b r0 m r1
-> Pipe Void b c r1 m r2
-> Pipe l a c r0 m r2
-- ^
-- |
-- +-- Leftovers
The downstream conduit cannot provide leftovers because they will be lost after composition. With
pipes-parse you can save leftovers from both composed pipes very easily. To see how, imagine we have the following two pipe types:
p1 :: (Monad m, Proxy p)
=> () -> Pipe (StateP [a] p) (Maybe a) (Maybe b) m r
p2 :: (Monad m, Proxy p)
=> () -> Pipe (StateP [b] p) (Maybe b) (Maybe c) m r
-- ^
-- |
-- +-- Leftovers
Each of these pipes stores a leftovers buffer equal to its input type, but we can't yet compose these pipes because their leftovers buffers don't match. However,
pipes-parse provides lens support in the form of the
zoom function so that you can easily unify two leftovers buffers in order to compose them:
zoom _fst . p1
:: (Monad m, Proxy p)
=> () -> Pipe (StateP ([a], [b]) p) (Maybe a) (Maybe b) m r
zoom _snd . p2
:: (Monad m, Proxy p)
=> () -> Pipe (StateP ([a], [b]) p) (Maybe b) (Maybe c) m r
zoom _fst . p1 >-> zoom _snd . p2
:: (Monad m, Proxy p)
=> () -> Pipe (StateP ([a], [b]) p) (Maybe a) (Maybe c) m r
But you can do more than that! You can still access the leftovers buffers afterwards, too, again using
zoom:
example = do
(zoom _fst . p1 >-> zoom _snd . p2) ()
-- Draw, reusing the leftovers from @p1@
ma <- zoom _fst draw
-- Retrieve the leftovers from @p2@
mb <- zoom _snd get
...
This kind of multiple-buffer management isn't possible using
conduit.
zoom is a perfect example of the
functor design pattern. We lift two existing proxies to agree on a common global state for compatibility purposes. Therefore, we expect that three should be functor laws at play:
zoom id = id
zoom (f . g) = zoom f . zoom g
pipes also improves upon
io-streams pushback, too. With
io-streams all the push-back is done using
IORefs, meaning that:
- It isn't pure
- You can't easily control which streams share leftovers and which ones do not
- None of the state is reflected in the types
With
pipes-parse you get pure and precise control over leftovers. Moreover, you do not need to instrument streams to correctly forward values that you push back upstream, because
StateP abstracts over that for you.
Nesting and delimiting parsers
Like other streaming libraries,
pipes-parse makes it very easy to run a parser on a subset of the stream. This was probably the #1 feature requested, followed shortly by...
Resumable parsing
pipes-parse uses
StateP, so if you want to interrupt parsing you can just use
runStateK to return the current state of the leftovers for use in a later computation. Simple!
Perfect streaming
One of the more advanced features to come out of the last wave of development was what I like to call "perfect streaming". This has a very specific meaning: grouping the input and interacting with each group as a stream without bringing more than one chunk into memory.
For example, consider the following conduit:
lines :: Monad m => Conduit ByteString m ByteString
This will load each line into memory, which means that if your file is one long line then you will load the entire file into memory, defeating the purpose of streaming!
io-streams has the same problem, but, unlike
conduit,
io-streams can easily fix its
lines utility to stream perfectly and I plan to show Greg how to do this so that
io-streams users can benefit from the same trick.
pipes-parse does not teach how to use this trick, but it does lay the groundwork for it and the upcoming
pipes-bytestring library will provide examples of this idiom. If you want to see a concrete example of this trick in action, check out
Oliver Charles's upcoming pipes-tar library on Github to see a preview of this idiom, where he streams individual files from a TAR archive without ever loading more than one chunk in memory. His very interesting use case was the inspiration for this trick, and I also preview this idiom in this
Stack Overflow answer.
More generally, perfect streaming uses the
respond category's composition operator, which has the following general type:
(/>/) :: (Monad m, Proxy p)
=> (a -> p x' x b' b m a')
-> (b -> p x' x c' c m b')
-> (a -> p x' x c' c m a')
When you use
respond composition, both pipes share the same upstream interface meaning that you can group the input into subsections but still allow each subsection to access the original upstream interface. With appropriate information hiding you can set up pipes which behave like lenses to specific subsections of the stream and allow the user to stream from each subsection independently.
Compatibility
pipes-parse takes great care to ensure that non-parsing pipes are completely interoperable with parsing pipes, thanks to the following compatibility functions:
fmapPull
:: (Monad m, Proxy p)
=> (x -> p x a x b m r)
-> (x -> p x (Maybe a) x (Maybe b) m r)
returnPull
:: (Monad m, Proxy p)
=> x -> p x a x (Maybe a) m r
bindPull
:: (Monad m, Proxy p)
=> (x -> p x a x (Maybe b) m r)
-> (x -> p x (Maybe a) x (Maybe b) m r)
These three functions define functors and monads in the category where the objects are the downstream components of each proxy interface and the morphisms are pull-based pipes.
As you might guess,
fmapPull satisfies the functor laws:
fmapPull f >-> fmapPull g = fmapPull (f >-> g)
fmapPull pull = pull
Similarly,
returnPull and
bindPull satisfy the monad laws:
-- Using: f >>> g = f >-> bindPull g
returnPull >>> f = f
f >>> returnPull = f
(f >>> g) >>> h = f >>> (g >>> h)
... which are equivalent to:
returnPull >-> bindPull f = f
bindPull returnPull = pull
bindPull (f >-> bindPull g) = bindPull f >-> bindPull g
... and we can derive the functor from the monad:
fmapPull f = bindPull (f >-> returnPull)
-- i.e. fmap f = (>>= return . f)
These functions could replace
Maybe and parametrize it with a type class like:
class FunctorPull f where
fmapPull
:: (Monad m, Proxy p)
=> (x -> p x a x b m r)
-> (x -> p x (f a) x (f b) m r)
... and there is a sensible instance for
Either, too (in fact, that's how
rightD from the
pipes prelude works). However, I decided to keep them monomorphic for now for simplicity.
Conclusion
pipes-parse, like most
pipes libraries, keeps the best spirit of Haskell programming by:
- composing features from smaller, simpler, and correct building blocks,
- using higher-order functions to lift existing functions for compatibility,
- isolating features from each other to statically prevent accidental complexity
pipes-parse is the last of the three core libraries, the other two being
pipes-safe and
pipes-concurrency. These libraries define the central idioms for the
pipes ecosystem and they were all designed to be instructive and convention-setting in areas where there isn't a perfectly elegant solution and some pragmatic trade-offs had to be made.
The completion of these libraries marks the point where I feel the core
pipes API has proven itself to be sufficiently versatile and future-proof. The proxy transformer system makes the central API unusually stable because I don't need to bake in any new features that I want to add.
This means I will be upgrading
pipes to version
4.0 soon to mark the transition to a stabler API in preparation for eventual inclusion in the Haskell platform. Also, most development work will shift to derived libraries now.
That does not mean that the derived libraries are complete, yet. For example, I am currently writing up a
pipes-safe-2.0.0 which will feature improved promptness guarantees and eliminate the need for unsafe finalization primitives. Similarly, I am about to release a
pipes-safe-1.2.0 at the end of this week which will add broadcasts and continuous behaviors. More generally, I will only consider the derived core libraries to be mature when more code is built on top of them on the scale of what
conduit has right now.
The next library on the development docket is
pipes-bytestring. Now that
pipes-parse is complete I feel much more comfortable about the stability of
pipes-bytestring API. Also,
pipes now has
an official mailing list where you can ask questions, follow
pipes development, or offer feedback and contribute to upcoming
pipes libraries.