I wanted to share a few quick ways that beginning Haskell programmers can contribute to the Haskell ecosystem. I selected these tasks according to a few criteria:
- They are fun! These tasks showcase enjoyable tricks
- They are easy! They straightforwardly apply existing libraries
- They are useful! You can probably find something relevant to your project
For each task I'll give a brief end-to-end example of what a contribution might look like and link to relevant educational resources.
This post only assumes that you have the stack
build tool installed, which you can get from haskellstack.com. This tool takes care of the rest of the Haskell toolchain for you so you don't need to install anything else.
Contribution #1: Write a parser for a new file format
Writing parsers in Haskell is just about the slickest thing imaginable. For example, suppose that we want to parse the PPM "plain" file format, which is specified like this [Source]:
Each PPM image consists of the following:
- A "magic number" for identifying the file type. A ppm image's magic number is the two characters "P3".
- Whitespace (blanks, TABs, CRs, LFs).
- A width, formatted as ASCII characters in decimal.
- Whitespace.
- A height, again in ASCII decimal.
- Whitespace.
- The maximum color value (Maxval), again in ASCII decimal. Must be less than 65536 and more than zero.
- A single whitespace character (usually a newline).
- A raster of Height rows, in order from top to bottom. Each row consists of Width pixels, in order from left to right. Each pixel is a triplet of red, green, and blue samples, in that order. Each sample is represented as an ASCII decimal number.
The equivalent Haskell parser reads almost exactly like the specification:
{-# LANGUAGE OverloadedStrings #-}
import Control.Monad (guard)
import Data.Attoparsec.Text
data PPM = PPM
{ width :: Int
, height :: Int
, maximumColorValue :: Int
, image :: [[RGB]]
} deriving (Show)
data RGB = RGB
{ red :: Int
, green :: Int
, blue :: Int
} deriving (Show)
ppm3 :: Parser PPM
ppm6 = do
"P3"
skipMany1 space
w <- decimal
skipMany1 space
h <- decimal
skipMany1 space
maxVal <- decimal
guard (maxVal < 65536)
space
let sample = do
lo <- decimal
skipMany1 space
return lo
let pixel = do
r <- sample
g <- sample
b <- sample
return (RGB r g b)
rows <- count h (count w pixel)
return (PPM w h maxVal rows)
We can try to test our parser out on the following example file:
$ cat example.ppm
P6
4 4
255
0 0 0 100 0 0 0 0 0 255 0 255
0 0 0 0 255 175 0 0 0 0 0 0
0 0 0 0 0 0 0 15 175 0 0 0
255 0 255 0 0 0 0 0 0 255 255 255
We don't even have to compile a program to test our code. We can load our code into the Haskell REPL for quick feedback on whether or not our code works:
$ stack ghci attoparsec --resolver=lts-3.14
...
Prelude> :load ppm.hs
[1 of 1] Compiling Main ( ppm.hs, interpreted )
Ok, modules loaded: Main.
*Main> txt <- Data.Text.IO.readFile "example.ppm"
*Main> parseOnly ppm3 txt
Right (PPM {width = 4, height = 4, maximumColorValue = 255,
image = [[RGB {red = 0, green = 0, blue = 0},RGB {red = 100,
green = 0, blue = 0},RGB {red = 0, green = 0, blue = 0},RGB
{red = 255, green = 0, blue = 255}],[RGB {red = 0, green =
0, blue = 0},RGB {red = 0, green = 255, blue = 175},RGB {red
= 0, green = 0, blue = 0},RGB {red = 0, green = 0, blue = 0
}],[RGB {red = 0, green = 0, blue = 0},RGB {red = 0, green =
0, blue = 0},RGB {red = 0, green = 15, blue = 175},RGB {red
= 0, green = 0, blue = 0}],[RGB {red = 255, green = 0, blue
= 255},RGB {red = 0, green = 0, blue = 0},RGB {red = 0, gre
en = 0, blue = 0},RGB {red = 255, green = 255, blue = 255}]]
})
Works like a charm!
You can very quickly get your hands dirty with Haskell by writing a parser that converts a file format you know and love into a more structured data type.
To learn more about parser combinators in Haskell, I highly recommend this "functional pearl":
... as well as this attoparsec
tutorial:
To see a "long form" example of attoparsec
, check out this HTTP request parser written using attoparsec
:
I use "long form" in quotes because the entire code is around 60 lines long.
Contribution #2: Write a useful command-line tool
Haskell's turtle
library makes it very easy to write polished command-line tools in a tiny amount of code. For example, suppose that I want to build a simple comand-line tool for managing a TODO list stored in a todo.txt
file. First I just need to provide a subroutine for displaying the current list:
{-# LANGUAGE OverloadedStrings #-}
import Turtle
todoFile = "TODO.txt"
todoItem = d%": "%s
display :: IO ()
display = sh (do
(n, line) <- nl (input todoFile)
echo (format todoItem n line) )
... a subroutine for adding an item to the list:
add :: Text -> IO ()
add txt = runManaged (do
tempfile <- mktempfile "/tmp" "todo"
output tempfile (input todoFile <|> pure txt)
mv tempfile todoFile )
... and a subroutine for removing an item from the list:
remove :: Int -> IO ()
remove m = runManaged (do
tempfile <- mktempfile "/tmp" "todo"
output tempfile (do
(n, line) <- nl (input todoFile)
guard (m /= n)
return line )
mv tempfile todoFile )
... then I can just wrap them in a command line API. I create a command line parser that runs display
by default if the command line is empty:
parseDisplay :: Parser (IO ())
parseDisplay = pure display
... then a command line parser for the add
subcommand:
parseAdd :: Parser (IO ())
parseAdd =
fmap add
(subcommand "add" "Add a TODO item"
(argText "item" "The item to add to the TODO list") )
... and a command line parser for the remove
subcommand:
parseRemove :: Parser (IO ())
parseRemove =
fmap remove
(subcommand "rm" "Remove a TODO item"
(argInt "index" "The numeric index of the TODO item to remove") )
Finally, I combine them into a single composite parser for all three subcommands:
parseCommand :: Parser (IO ())
parseCommand = parseDisplay <|> parseAdd <|> parseRemove
... and run the parser:
main = do
command <- options "A TODO list manager" parseCommand
exists <- testfile todoFile
when (not exists) (touch todoFile)
command
... and I'm done! That's the full program:
{-# LANGUAGE OverloadedStrings #-}
import Turtle
todoFile = "TODO.txt"
todoItem = d%": "%s
display :: IO ()
display = sh (do
(n, line) <- nl (input todoFile)
echo (format todoItem n line) )
add :: Text -> IO ()
add txt = runManaged (do
tempfile <- mktempfile "/tmp" "todo"
output tempfile (input todoFile <|> pure txt)
mv tempfile todoFile )
remove :: Int -> IO ()
remove m = runManaged (do
tempfile <- mktempfile "/tmp" "todo"
output tempfile (do
(n, line) <- nl (input todoFile)
guard (m /= n)
return line )
mv tempfile todoFile )
parseDisplay :: Parser (IO ())
parseDisplay = pure display
parseAdd :: Parser (IO ())
parseAdd =
fmap add
(subcommand "add" "Add a TODO item"
(argText "item" "The item to add to the TODO list") )
parseRemove :: Parser (IO ())
parseRemove =
fmap remove
(subcommand "rm" "Remove a TODO item"
(argInt "index" "The numeric index of the TODO item to remove") )
parseCommand :: Parser (IO ())
parseCommand = parseDisplay <|> parseAdd <|> parseRemove
main = do
command <- options "A TODO list manager" parseCommand
exists <- testfile todoFile
when (not exists) (touch todoFile)
command
We can compile that program into an native binary on any platform (i.e. Windows, OS X, or Linux) with a fast startup time:
$ stack build turtle --resolver=lts-3.14
$ stack ghc --resolver=lts-3.14 -- -O2 todo.hs
... and verify that the program works:
$ ./todo add "Brush my teeth"
$ ./todo add "Shampoo my hamster"
$ ./todo
0: Brush my teeth
1: Shampoo my hamster
$ ./todo rm 0
$ ./todo
0: Shampoo my hamster
The program also auto-generates the usage and help information:
$ ./todo --help
A TODO list manager
Usage: todo ([add] | [rm])
Available options:
-h,--help Show this help text
Available commands:
add
rm
$ ./todo add
Usage: todo add ITEM
$ ./todo rm
Usage: todo rm INDEX
Amazingly, you can delete all the type signatures from the above program and the program will still compile. Try it! Haskell's type inference and fast type-checking algorithm makes it feel very much like a scripting language. The combination of type inference, fast startup time, and polished command line parsing makes Haskell an excellent choice for writing command-line utilities.
You can learn more about scripting in Haskell by reading the turtle
tutorial, written for people who have no prior background in Haskell programming:
Contribution #3: Client bindings to a web API
Haskell's servant
library lets you write very clean and satisfying bindings to a web API. For example, suppose that I want to define a Haskell client to to the JSONPlaceholder test API. We'll use two example endpoints that the API provides.
A GET
request against the /posts
endpoint returns a list of fake posts:
[
{
"userId": 1,
"id": 1,
"title": "sunt aut facere repellat ..."
"body": "quia et suscipit\nsuscipit ..."
},
{
"userId": 1,
"id": 2,
"title": "qui est esse",
"body": "est rerum tempore vitae\nsequi ..."
},
...
... and a POST
request against the same endpoint accepts a list of posts and returns them back as the response.
To write a client binding to this API, we just need to define a record representing APost
:
data APost = APost
{ userId :: Int
, id :: Int
, title :: Text
, body :: Text
} deriving (Show, Generic, FromJSON, ToJSON)
The last line instructs the Haskell compiler to auto-derive conversion functions between APost
and JSON
.
Now we just encode the REST API as a type:
-- We can `GET` a list of posts from the `/posts` endpoint
type GetPosts = "posts" :> Get '[JSON] [APost]
-- We can `POST` a list of posts to the `/posts` endpoint
-- using the request body and get a list of posts back as
-- the response
type PutPosts = ReqBody '[JSON] [APost] :> "posts" :> Post '[JSON] [APost]
type API = GetPosts :<|> PutPosts
... and then the compiler will "automagically" generate API bindings:
getPosts :<|> putPosts =
client (Proxy :: Proxy API) (BaseUrl Http "jsonplaceholder.typicode.com" 80)
Now anybody can use our code to GET
or POST
lists of posts. We can also quickly test out our code within the Haskell REPL to verify that everything works:
$ stack ghci servant-server servant-client --resolver=lts-3.14
ghci> :load client.hs
[1 of 1] Compiling Main ( httpbin.hs, interpreted )
Ok, modules loaded: Main.
*Main> import Control.Monad.Trans.Either as Either
*Main Either> -- Perform a `GET` request against the `/posts` endpoint
*Main Either> runEitherT getPosts
Right [APost {userId = 1, id = 1, title = "sunt aut facere ...
*Main Either> -- Perform a `POST` request against the `/posts` endpoint
*Main Either> runEitherT (putPosts [APost 1 1 "foo" "bar"])
Right [APost {userId = 1, id = 1, title = "foo", body = "bar"}]
Here's the full code with all the extensions and imports that enable this magic:
{-# LANGUAGE DataKinds #-}
{-# LANGUAGE DeriveGeneric #-}
{-# LANGUAGE DeriveAnyClass #-}
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE TypeOperators #-}
import Data.Aeson (FromJSON, ToJSON)
import Data.Text (Text)
import GHC.Generics (Generic)
import Servant
import Servant.Client
data APost = APost
{ userId :: Int
, id :: Int
, title :: Text
, body :: Text
} deriving (Show, Generic, FromJSON, ToJSON)
type GetPosts = "posts" :> Get '[JSON] [APost]
type PutPosts = ReqBody '[JSON] [APost] :> "posts" :> Post '[JSON] [APost]
type API = GetPosts :<|> PutPosts
getPosts :<|> putPosts =
client (Proxy :: Proxy API) (BaseUrl Http "jsonplaceholder.typicode.com" 80)
To learn more about how this works, check out the servant
tutorial here:
Note that servant
is both a client and server library so everything you learn about auto-generating client side bindings can be reused to auto-generate a server, too!
To see a more long-form example of bindings to the Google Translate API, check out this code:
Conclusion
Suppose that you write up some useful code and you wonder: "What's next? How do I make this code available to others?". You can learn more by reading the stack
user guide which contains complete step-by-step instructions for authoring a new Haskell project, including beginning from a pre-existing project template:
This comment has been removed by the author.
ReplyDeleteexcellent idea for a blogpost... as usual !
ReplyDeleteA lot of great ideas here, thanks! Personally i have been blocked often by the feeling that writing an API wrapper should be exhaustive. Everyone of us often needs just a subset of an API for his own purposes. One might think to start small and let the community extend the project over time, but this seems to be an antipattern in comparison with automatically generated APIs like those provided by https://github.com/brendanhay, for example
ReplyDeleteAnother idea is to optimize code that's used in benchmarking here https://benchmarksgame.alioth.debian.org/u64q/haskell.html
ReplyDeleteAnd, perhaps, write a blog post about how the optimization process was done. The gist is that for newbies in Haskell it pretty hard to think about performance, because most mainstream langs doesn't have notion of thunks and lazyness. So, the more blog posts about it, the easier it is to build the new mindset.
I'm pretty sure the PPM parser doesn't match its specification (nor does the example PPM file you give). See the words "pure binary" in point 9 of the spec? I don't think the parser implements that; instead, it seems to be reading decimal numbers in ASCII.
ReplyDeleteOops! You're right. I was accidentally implementing PPM3, so I just updated the example to be a PPM3 parser instead.
DeleteInspiring article - I'd really like to get started! Where's a good place to find these kinds of projects? Are there any open source projects that are in need of these kinds of parsers, client bindings, etc?
ReplyDeleteMost projects like that are one-man shows since they are pretty easy. For the larger projects (like bindings to large APIs such as Amazon's or Googles) the bindings are auto-generated.
DeleteI don't know if there is a centralized place that lists projects in need of help, but you can try the Github trending list for Haskell and see if any project interests you:
https://github.com/trending?l=haskell
If you want a high-impact area to contribute to, I'd personally recommend one of the Haskell tooling projects like:
* https://github.com/fpco/ide-backend
* https://github.com/leksah/leksah
* https://github.com/haskell/haskell-mode