Haskell for all: Detailed walkthrough for a beginner Haskell program

post

This post walks through the development of a small Haskell program for aligning equals symbols in a block of text. This walkthrough targets a beginning programmer by describing several steps and concepts in extra detail.

Note that this post describes how to author, compile, and run single-file Haskell programs for ease of experimentation and learning. For larger Haskell projects you will want to use cabal or stack to scaffold and run the project and to share your work with others. I mainly introduce things this way because this provides a lightweight way to get started and try the language.

Background

I obsess about visual cleanliness of the code I write because I optimize for ease of reading rather than ease of writing. One way I try to improve appearance is aligning equals signs. For example, the code might begin like this:

address = "192.168.0.44"
port = 22
hostname = "wind"

… and then I manually indent the equals signs to all align with each other:

address  = "192.168.0.44"
port     = 22
hostname = "wind"

My editor is vim, and you can install support for that using the Tabular plugin. However, I thought this would be an instructive example to implement from scratch to illustrate how to program in a functional style.

One neat feature of vim is that you can transform text inside your editor using any command line program. For example, I can select text using visual mode and then type:

:!some-command

… and vim will feed the selected text as standard input to the command-line program named some-command and replace the selected text with whatever the command emits to standard output.

This means that I only need to write a program that takes text to align on standard input and then emits the aligned text on standard output. I’ll call this program: align-equals.

Development environment

The command line is my “IDE”,so I will usually keep up to three terminal windows open:

One window edits text using vim
One window displays type errors using ghcid
One window tests the Haskell code I write in a REPL

I also use Nix, specifically a nix-shell to provision my development tools. I prefer Nix to provision development tools is because I don’t want to accumulate globally installed cruft on my system. Using Nix, I can provision any development tool or library transiently using nix-shell.

I will run all of the following examples inside of the following Nix shell (created for each new terminal):

$ nix-shell --packages 'haskellPackages.ghcWithHoogle (pkgs: [ pkgs.text pkgs.safe ])' haskellPackages.ghcid

That creates a transient shell that provides ghc, ghci, ghcid, and hoogle with the text and safe Haskell packages available. If I need to change the set of available Haskell packages I edit the command line and recreate the shell.

I turn on live type-checking by running:

$ ghcid --command='ghci align-equals.hs'

… which will then automatically reload align-equals.hs every time the file changes and display any errors or warnings that the Haskell compiler has found.

In another terminal, I open the code I’m editing inside the ghci REPL so that I can interactively test the functions that I am writing:

$ ghci align-equals.hs
GHCi, version 8.2.2: http://www.haskell.org/ghc/  :? for help
[1 of 1] Compiling Main             ( test.hs, interpreted )
Ok, one module loaded.
*Main>

Then in the third terminal, I actually edit the file:

$ vi align-equals.hs

The program

First, let’s describe what we plan to do using English prose, before we translate that to code:

We want to find the longest prefix preceding an = sign and then add spaces to the end of all other prefixes to match the length of the longest prefix.

Prefix length

In order to do that, we first need a function that computes the number of characters preceding an = sign for a given line. This function should have the following type:

import Data.Text (Text)

prefixLength :: Text -> Int

You can read that type signature as saying: “prefixLength is a function whose input is a value of type Text (i.e. a line of input) and whose output is an Int (the number of characters preceding the first = sign)”. We can even add comments to that effect:

prefixLength
    :: Text
    -- ^ Line of input
    -> Int
    -- ^ Number of characters preceding the first @=@ sign

I import Data.Text because I prefer not to use the default String type provided by Haskell’s Prelude, which is inefficient. The text package provides a high-performance alternative type named Text along with many useful utilities.

The implementation of the prefixLength function directly follows the description:

{-# LANGUAGE OverloadedStrings #-}

import Data.Text (Text)

import qualified Data.Text

prefixLength :: Text -> Int
prefixLength line = Data.Text.length prefix
  where
    (prefix, suffix) = Data.Text.breakOn "=" line

As the name suggests, the prefixLength is the length of the prefix. The only non-trivial part is just knowing how to discover the Data.Text.breakOn function created for this purpose.

I usually browse the online documentation for Haskell packages by doing a Google search for “hackage ${package name}” (i.e. “hackage text”). That’s how I found the breakOn function here:

breakOn

However, some people also prefer to use hoogle, which indexes and searches Haskell functions by name and by type. For example, if I’m looking for a function that splits a Text value into a prefix and suffix, I can run:

$ hoogle 'Text -> (Text, Text)'
Data.Text breakOn :: Text -> Text -> (Text, Text)
Data.Text breakOnEnd :: Text -> Text -> (Text, Text)
Data.Text.Lazy breakOn :: Text -> Text -> (Text, Text)
Data.Text.Lazy breakOnEnd :: Text -> Text -> (Text, Text)
Data.Text transpose :: [Text] -> [Text]
Data.Text.Lazy transpose :: [Text] -> [Text]
Data.Text intercalate :: Text -> [Text] -> Text
Data.Text.Lazy intercalate :: Text -> [Text] -> Text
Data.Text splitOn :: Text -> Text -> [Text]
Data.Text.Lazy splitOn :: Text -> Text -> [Text]
-- plus more results not shown, pass --count=20 to see more

I can also verify that my function works as expected by testing my function in the long-running REPL I have open:

*Main> :reload
Ok, one module loaded.
*Main> :set -XOverloadedStrings
*Main> Data.Text.breakOn "=" "foo = 1"
("foo ","= 1")
*Main> prefixLength "foo = 1"
4

I have to enable theOverloadedStrings extension because I’m not using the default String type from the Haskell Prelude. This extension allows other packages to override how string literals behave to work with alternative implementations like Text.

One thing that’s neat about Haskell is how order-insensitive the language is. You can define things out of order and the compiler doesn’t mind. That’s why we can write our code as if it says: “prefixLength is the length of the prefix … and by the way you can compute the prefix and suffix by splitting the string using breakOn”.

prefixLength line = Data.Text.length prefix
  where
    (prefix, suffix) = Data.Text.breakOn "=" line

This out-of-order coding style also jives well with lazy evaluation. Haskell is a “lazy” language, meaning that the program can evaluate things out-of-order, too, or even not at all if they are never used. For example, our prefixLength function never actually uses the suffix, so the program will never actually bother to compute or allocate that value.

The more you program in Haskell the less you think about your program as a sequence of statements and the more you think about your program as a graph of dependent computations.

Indenting a line

Now we need a function that pads a prefix with spaces up to a desired number of characters:

adjustLine :: Int -> Text -> Text

… or with comments:

adjustLine
    :: Int
    -- ^ The desired prefix length
    -> Text
    -- ^ The line to pad
    -> Text
    -- ^ The padded line

This function is slightly longer, but still pretty straightforward:

adjustLine :: Int -> Text -> Text
adjustLine desiredPrefixLength oldLine = newLine
  where
    (prefix, suffix) = Data.Text.breakOn "=" oldLine

    actualPrefixLength = Data.Text.length prefix

    additionalSpaces = desiredPrefixLength - actualPrefixLength

    spaces = Data.Text.replicate additionalSpaces " "

    newLine = Data.Text.concat [ prefix, spaces, suffix ]

Note that we can reorder all the lines after the where with no change to the program’s behavior. However, to keep things readable we order them so that the reader can understand them from top to bottom:

Split the line into a prefix and suffix
Compute the actual length of the prefix
Compute the number of spaces to pad by subtracting the actual length from the desired length
Create the padding by repeating a space the specified number of times
Create a new line by inserting the padding in between the prefix and suffix

Note that this reads a lot like a function defined in an imperative language. For example, the analogous Python code is not much different:

def adjustLine(desiredPrefixLength, oldLine):
    (prefix, suffix) = oldLine.split("=")

    actualPrefixLength = len(prefix)

    additionalSpaces = desiredPrefixLength - actualPrefixLength

    spaces = " " * additionalSpaces

    # Python's `split` strips the `=`, so we have to reintroduce it
    newLine = "".join([ prefix, spaces, "=", suffix ])

    return newLine

In general, you can port functional programs to imperative programs in this way if the functional program sticks to simple types (i.e. strings, numbers, records, etc.). For these simple programs, the functional program is essentially an imperative program where you forbid reassigning a value (i.e. “mutation”), which is generally good practice anyway for ease of program comprehension.

We can confirm that our function works by saving the complete program so far:

{-# LANGUAGE OverloadedStrings #-}

import Data.Text (Text)

import qualified Data.Text

prefixLength :: Text -> Int
prefixLength line = Data.Text.length prefix
  where
    (prefix, suffix) = Data.Text.breakOn "=" line

adjustLine :: Int -> Text -> Text
adjustLine desiredPrefixLength oldLine = newLine
  where
    (prefix, suffix) = Data.Text.breakOn "=" oldLine

    actualPrefixLength = Data.Text.length prefix

    additionalSpaces = desiredPrefixLength - actualPrefixLength

    spaces = Data.Text.replicate additionalSpaces " "

    newLine = Data.Text.concat [ prefix, spaces, suffix ]

… and then :reloading that program in our REPL:

*Main> :reload
Ok, one module loaded.
*Main> adjustLine 10 "foo = 1"
"foo       = 1"

Indenting multiple lines

Now we can define a function to indent multiple lines:

import Safe

adjustText :: Text -> Text
adjustText oldText = newText
  where
    oldLines = Data.Text.lines oldText

    prefixLengths = map prefixLength oldLines

    newLines =
        case Safe.maximumMay prefixLengths of
            Nothing ->
                []
            Just desiredPrefixLength ->
                map (adjustLine desiredPrefixLength) oldLines

    newText = Data.Text.unlines newLines

This function takes advantage of two convenient utilities called lines and unlines.

Data.Text.lines splits a block of Text into a list of lines:

*Main> :type Data.Text.lines
Data.Text.lines :: Text -> [Text]

… and Data.Text.unlines combines a list of lines back into a block of Text:

*Main> :type Data.Text.unlines
Data.Text.unlines :: [Text] -> Text

You can use these two utilities to simplify line-oriented Text transformations in Haskell:

Split the block of Text into a list of lines
Process the list of lines into a new list of lines
Join the new list of lines back into a block of Text

The interesting part of our adjustText function is how we process the list of lines:

    prefixLengths = map prefixLength oldLines

    newLines =
        case Safe.maximumMay prefixLengths of
            Nothing ->
                []
            Just desiredPrefixLength ->
                map (adjustLine desiredPrefixLength) oldLines

You can read the above code as saying:

Apply (“map”) the prefixLength function over each element of our list of lines to get a list of prefix lengths
Find the maximum length
If there is no maximum length, return an empty list of lines
If there is a maximum length, pad each line using that length

You might wonder: “Why would there not be a maximum length?” Well, consider the case where we begin with 0 lines of input: what is the maximum value of an empty list? The maximumMay function doesn’t throw an exception or return a bogus sentinel value that might be confused for real data. Instead, the maximumMay function returns an optional result:

data Maybe a = Just a | Nothing

maximumMay :: Ord a => [a] -> Maybe a

The a in the type of maximumMay can be any type that is comparable ( i.e. implements Ord), and in the context of our code that type is Int, so we can pretend that the maximumMay function actually has this more specific type:

maximumMay :: [Int] -> Maybe Int

In other words, given a list of Ints, the maximumMay function will Maybe return an Int (i.e. it might return an Int or it might not). This means that the result will either be a Nothing (i.e. no result) or an Int value wrapped inside of a Just.

We consume the result of maximumMay by “pattern matching” on the result, like this:

        case Safe.maximumMay prefixLengths of
            Nothing ->
                ...  -- First branch
            Just desiredPrefixLength ->
                ...  -- Second branch

The first branch covers the case where the list is empty. Here, the desiredPrefixLength is not in scope, so if we try to use that value we will get a type error. This provides a nice safeguard so that we can’t attempt to access a result that does not exist. In other languages this might be caught at runtime as a java.lang.NullPointerException or AttributeError: 'NoneType' object has no attribute 'x', but with pattern matching the type-checker can detect these sorts of bugs before we even attempt to run the program.

The second branch covers the case where the list is non-empty and does have a sensible maximum length. There we use that length to adjust each line.

The nice thing about pattern matching is that you can’t forget to handle these cases. If you try to use the result of maximumMay directly as an Int then you will get a type error. The maximumMay function wraps its own result in a Maybe to force downstream users to carefully consider how to handle the case where the list may be empty.

Tying it all together

So far all of our functions are “pure” functions, meaning that they are deterministic conversions from their inputs to their outputs that don’t mutate any variables and don’t have any side effects that we care about.

Carefully note the key phrase: “side effects that we care about”. These functions do technically have side effects, including:

Allocating memory/registers
Taking a non-zero amount of time to compute

Sometimes we do care about these types of side effects in certain contexts, like cryptography (where secure information can leak through these side channels) or embedded programming (where programs require careful attention to time/memory). However, for our simple utility these functions are effectively “pure”.

We can’t use our utility functions from the command line, though, unless we wrap them in a main function which a program can run, like this:

import qualified Data.Text.IO

main :: IO ()
main = Data.Text.IO.interact adjustText

The interact function converts any pure Text transformation into a runnable program that transforms standard input into standard output using the supplied transformation:

*Main> :type Data.Text.IO.interact
Data.Text.IO.interact :: (Text -> Text) -> IO ()

This is an example of a “higher-order” function: a function whose input is another function. The input to the interact function is another function of type Text -> Text. Conveniently, our adjustText function has exactly the correct type to be supplied as an argument to the interact function:

adjustText :: Text -> Text

Data.Text.IO.interact adjustText :: IO ()

… and then any value of type IO () can be assigned to main, which is what our program will run when invoked as a command-line executable.

If we save the following complete example to align-equals.hs:

{-# LANGUAGE OverloadedStrings #-}

module Main where

import Data.Text (Text)

import qualified Data.Text
import qualified Data.Text.IO
import qualified Safe

prefixLength :: Text -> Int
prefixLength line = Data.Text.length prefix
  where
    (prefix, suffix) = Data.Text.breakOn "=" line

adjustLine :: Int -> Text -> Text
adjustLine desiredPrefixLength oldLine = newLine
  where
    (prefix, suffix) = Data.Text.breakOn "=" oldLine

    actualPrefixLength = Data.Text.length prefix

    additionalSpaces = desiredPrefixLength - actualPrefixLength

    spaces = Data.Text.replicate additionalSpaces " "

    newLine = Data.Text.concat [ prefix, spaces, suffix ]

adjustText :: Text -> Text
adjustText oldText = newText
  where
    oldLines = Data.Text.lines oldText

    prefixLengths = map prefixLength oldLines

    newLines =
        case Safe.maximumMay prefixLengths of
            Nothing ->
                []
            Just desiredPrefixLength ->
                map (adjustLine desiredPrefixLength) oldLines

    newText = Data.Text.unlines newLines

main :: IO ()
main = Data.Text.IO.interact adjustText

… then we can compile that using:

$ ghc -O2 align-equals.hs

… and verify that the executable works using:

$ ./align-equals
foo = 1
a = 2
asdf = 3
<Ctrl-D>
foo  = 1
a    = 2
asdf = 3

That means that I can now use ./align-equals to transform my text buffer by selecting a text block like this:

address = "192.168.0.44"
port = 22
hostname = "wind"

… and running :!./align-equals to align the block:

address  = "192.168.0.44"
port     = 22
hostname = "wind"

… and now I don’t have to tediously align the code manually.

Conclusion

Hopefully this illustrates one way to learn the Haskell language in the course of building small but practical programs. The language introduces many cool features and concepts and this post only touches the tip of the iceberg.

20 comments:

Ludovic KutyOctober 9, 2018 at 3:55 AM
Your setup is very interesting. Thanks for sharing. Will read the rest at a later time...
BoazOctober 9, 2018 at 1:23 PM
Thanks! This is an awesome exercise for beginners like me :)

Still in the middle of trying this out, but so far I've noticed that you forgot a couple of instructions:
1. Each terminal needs to call nix-shell before running ghcid, ghci, and vi.
2. Need to import Safe in order to be able to compile.

Thanks again!
karakfaOctober 9, 2018 at 1:34 PM
Great post. Just to remind you the little command that can do the same `column -ts= -o=`
UnknownOctober 10, 2018 at 8:15 AM
"… and Data.Text.lines combines a list of lines back into a block of Text:" should read "... and Data.Text.unlines"?
jduanOctober 11, 2018 at 8:11 AM
Great intro to haskell. I am a haskell beginner and I have been reading a few books. Whats your suggestion for advancing to the intermediate level? Thanks!
Lao TzuOctober 12, 2018 at 12:18 PM
way too long
UnknownOctober 16, 2018 at 11:41 AM
Thanks for this, lots of great info!
AnonymousOctober 16, 2018 at 1:05 PM
Seeing a fully worked-out, detailed application is very helpful, thank you!
Marnen Laibow-KoserNovember 7, 2018 at 6:54 AM
One minor note: in my experience, aligning equals signs actually makes code *harder* to read, because it emphasizes the columns of names and values (which we don’t care about), rather than the individual assignments. In other words, we mostly care about reading assignments horizontally, but the extra space introduced by the = alignment actually makes it *harder* to do that.
Marnen Laibow-KoserNovember 7, 2018 at 6:56 AM
About my earlier comment: I think this is a great example for a basic Haskell program; I just wish people would stop claiming that there’s any value to aligning equals signs in particular. :) If this had been an example for aligning Markdown tables on | , day, I’d have been right there with you.
Denis GribDecember 13, 2018 at 2:02 PM
Very Thanks!
Rick FleischerJuly 29, 2020 at 10:54 AM
On a line with no "=", doesprefixLength return the length of the entire line?

Haskell for all

Monday, October 8, 2018

Detailed walkthrough for a beginner Haskell program