Map
s and tuples are useful data types for modeling relational operations. For example, suppose we have the following table, indexed by the Id
column:
| Id | First Name | Last Name |
|----|------------|-----------|
| 0 | Gabriel | Gonzalez |
| 1 | Oscar | Boykin |
| 2 | Edgar | Codd |
We can model that as a Map
where the key is the Id
column and the value is a tuple of FirstName
and LastName
:
import Data.Map (Map) -- from the `containers` library
import qualified Data.Map as Map
type Id = Int
type FirstName = String
type LastName = String
names :: Map Id (FirstName, LastName)
names = Map.fromList
[ (0, ("Gabriel", "Gonzalez"))
, (1, ("Oscar" , "Boykin" ))
, (2, ("Edgar" , "Codd" ))
]
Now suppose we have another table containing Twitter handles, also indexed by Id
:
| Id | Twitter Handle |
|----|----------------|
| 0 | GabrielG439 |
| 1 | posco |
| 3 | avibryant |
We can also encode that as a Map
:
type TwitterHandle = String
handles :: Map Id TwitterHandle
handles = Map.fromList
[ (0, "GabrielG439")
, (1, "posco" )
, (3, "avibryant" )
]
One of the nice properties of Map
s is that you can join them together:
import Control.Applicative
-- I'm using `join2` to avoid confusion with `Control.Monad.join`
join2 :: Map k v1 -> Map k v2 -> Map k (v1, v2)
join2 = ... -- Implementation elided for now
What if we could generalize join2
to work on types other than Map
. Perhaps we could use the Applicative
interface to implement join2
:
join2 :: Applicative f => f a -> f b -> f (a, b)
join2 = liftA2 (,)
However, the Map
type cannot implement Applicative
in its current form. The reason why is that there is no sensible definition for pure
:
pure :: v -> Map k v
pure = ???
This would require a Map
that was defined for every key, which we cannot encode. Or can we?
Tables
Well, who says we need to use the Map
type from containers? What if I were to encode my Map
this way:
import Prelude hiding (lookup)
-- | A map encoded as a lookup function
newtype Table k v = Table { lookup :: k -> Maybe v }
-- | Encode a traditional map as a lookup function
from :: Ord k => Map k v -> Table k v
from m = Table (\k -> Map.lookup k m)
This new type of Map
only permits a single operation: lookup
, but because we constrain our API to this single operation we can now implement the full Applicative
interface:
instance Functor (Table k) where
fmap f (Table g) = Table (\k -> fmap f (g k))
-- Same as: Table (fmap (fmap f) g)
instance Applicative (Table k) where
pure v = Table (\k -> Just v)
-- Same as: Table (pure (pure v))
Table f <*> Table x = Table (\k -> f k <*> x k)
-- Same as: Table (liftA2 (<*>) f x)
We can promote conventional Map
s to this new Table
type using the above from
function:
names' :: Table Id (FirstName, LastName)
names' = from names
handles' :: Table Id TwitterHandle
handles' = from handles
... and now the more general Applicative
join2
will work on these two tables:
>>> let table = join2 names' handles'
>>> :type table
table :: Table Id ((FirstName, LastName), TwitterHandle)
>>> lookup table 0
Just (("Gabriel","Gonzalez"),"GabrielG439")
>>> lookup table 1
Just (("Oscar","Boykin"),"posco")
>>> lookup table 2
Nothing
However, in its present form we can't dump the table's contents because we don't know which keys are present in the table. Let's fix that by adding an additional field to the Table
type listing the keys. We will treat functions as being defined for all keys:
import Data.Set (Set)
import qualified Data.Set as Set
data Keys k = All | Some (Set k)
instance Ord k => Num (Keys k) where
fromInteger 0 = Some Set.empty
fromInteger n | n > 0 = All
All + _ = All
_ + All = All
Some s1 + Some s2 = Some (Set.union s1 s2)
All * ks = ks
ks * All = ks
Some s1 * Some s2 = Some (Set.intersection s1 s2)
-- | A map encoded as a lookup function
data Table k v = Table
{ keys :: Keys k
, lookup :: k -> Maybe v
}
-- | Encode a traditional map as a lookup function
from :: Ord k => Map k v -> Table k v
from m = Table
{ keys = Some (Set.fromList (Map.keys m))
, lookup = \k -> Map.lookup k m
}
Even after extending Table
with keys
we can still implement the Applicative
interface:
instance Functor (Table k) where
fmap f (Table ks g) = Table ks (fmap (fmap f) g)
instance Ord k => Applicative (Table k) where
pure v =
Table 1 (pure (pure v))
Table s1 f <*> Table s2 x =
Table (s1 * s2) (liftA2 (<*>) f x)
... and now we can add a Show
instance, too!
instance (Show k, Show v) => Show (Table k v) where
show (Table ks f) = case ks of
All -> "<function>"
Some s -> unlines (do
k <- Set.toList s
let Just v = f k
return (show (k, v)) )
Let's give it a test drive:
>>> names'
(0,("Gabriel","Gonzalez"))
(1,("Oscar","Boykin"))
(2,("Edgar","Codd"))
>>> handles'
(0,"GabrielG439")
(1,"posco")
(3,"avibryant")
>>> join2 names' handles'
(0,(("Gabriel","Gonzalez"),"GabrielG439"))
(1,(("Oscar","Boykin"),"posco"))
So far, so good!
Alternative
However, we need to support more than just inner joins. We'd also like to support left, right, and outer joins, too.
Conceptually, a left join is one in which values from the right table may be optionally present. One way we could implement this would be to define a function that converts a finite map to a function defined on all keys. This function will return Nothing
for keys not present in the original finite map and Just
for keys that were present:
optional :: Table k v -> Table k (Maybe v)
optional (Table ks f) =
Table All (\k -> fmap Just (f k) <|> pure Nothing)
Then we could define leftJoin
in terms of join2
and optional
:
leftJoin :: Table k a -> Table k b -> Table k (a, Maybe b)
leftJoin t1 t2 = join2 t1 (optional t2)
However, if we try to compile the above code, the compiler will give us a really interesting error message:
Ambiguous occurrence ‘optional’
It could refer to either ‘Main.optional’,
or ‘Control.Applicative.optional’
Apparently, Control.Applicative
has an optional
function, too. Let's pause to check out the type of this mysterious function:
optional :: Alternative f => f v -> f (Maybe v)
Wow! That type signature is suprisingly similar to the one we wrote. In fact, if Table k
implemented the Alternative
interface, the types would match.
Alternative
is a type class (also provided by Control.Applicative
) that greatly resembles the Monoid
type class:
class Applicative f => Alternative f where
empty :: f a
(<|>) :: f a -> f a -> f a
... and the core Alternative
laws are identical to the Monoid
laws:
x <|> empty = x
empty <|> x = x
(x <|> y) <|> z = x <|> (y <|> z)
Let's dig even deeper and see how optional
uses this Alternative
type class:
optional v = fmap Just v <|> pure Nothing
Even the implementation is eerily similar! This strongly suggests that we should find a way to make our Table
type implement Alternative
. Perhaps something like this would work:
instance Ord k => Alternative (Table k) where
empty =
Table 0 (pure empty)
Table ks1 f1 <|> Table ks2 f2 =
Table (ks1 + ks2) (liftA2 (<|>) f1 f2)
Compare this to our Applicative
instance (duplicated here):
instance Ord k => Applicative (Table k) where
pure v =
Table 1 (pure (pure v))
Table s1 f <*> Table s2 x =
Table (s1 * s2) (liftA2 (<*>) f x)
The Alternative
instance mirrors our Applicative
instance except that we:
- replace
pure v
withempty
- replace
(<*>)
with(<|>)
- replace
1
with0
- replace
(*)
with(+)
... and what's really neat is that now the optional
function from Control.Applicative
behaves just like the original optional
function we wrote. (Exercise: Use equational reasoning to verify this)
Derived joins
Armed with this Alternative
instance, we can now implement leftJoin
in terms of the optional
provided by Control.Applicative
:
leftJoin t1 t2 = join2 t1 (optional t2)
Sure enough, it works:
>>> leftJoin names' handles'
(0,(("Gabriel","Gonzalez"),Just "GabrielG439"))
(1,(("Oscar","Boykin"),Just "posco"))
(2,(("Edgar","Codd"),Nothing)
Let's check out the type that the compiler infers for leftJoin
:
>>> :type leftJoin
leftJoin :: Alternative f => f a -> f b -> f (a, Maybe b)
Notice how there's no longer anything Table
-specific about leftJoin
. It works for anything that implements the Alternative
interface. I could leftJoin
two Maybe
s if I really wanted to:
>>> leftJoin (Just 1) (Just 2)
Just (1,Just 2)
>>> leftJoin (Just 1) Nothing
Just (1,Nothing)
>>> leftJoin Nothing (Just 1)
Nothing
... or two lists:
>>> leftJoin [1, 2] [3, 4]
[(1,Just 3),(1,Just 4),(1,Nothing),(2,Just 3),(2,Just 4),(2,
Nothing)]
In fact, I don't even really need specialized leftJoin
or rightJoin
functions. optional
is sufficiently light-weight that I could inline a right join on the fly:
>>> join2 (optional names') handles'
(0,(Just ("Gabriel","Gonzalez"),"GabrielG439"))
(1,(Just ("Oscar","Boykin"),"posco"))
(3,(Nothing,"avibryant"))
What happens if I try to do an "outer join"?
>>> -- DISCLAIMER: Technically not the same as an SQL outer join
>>> let o = join2 (optional names') (optional handles')
>>> o
<function>
The above "outer join" is defined for all keys (because both sides are optional
), so we get back a function! While we can't list the Table
(because it's conceptually infinite), we can still perform lookup
s on it:
>>> lookup o 0
Just (Just ("Gabriel","Gonzalez"),Just "GabrielG439")
>>> lookup o 2
Just (Just ("Edgar","Codd"),Nothing)
>>> lookup o 3
Just (Nothing,Just "avibryant")
>>> lookup o 4
Just (Nothing, Nothing)
... and if we were to join our "infinite" table against a finite table, we get back a finite table (Exercise: Try it! Define a new finite table to join against o
and see what happens)
What's nice about optional
is that we can easily left-join or right-join in multiple tables at a time. If I had four tables of types:
t1 :: Table k a
t2 :: Table k b
t3 :: Table k c
t4 :: Table k d
... I could left join t2
, t3
, and t4
into t1
by just writing:
liftA4 (,,,) t1 (optional t2) (optional t3) (optional t4)
:: Table k (a, Maybe b, Maybe c, Maybe d)
Now that I think about it, I don't even really need to provide join2
/join3
/join4
/join5
since they are not much shorter than using the liftA
family of functions in Control.Applicative
:
-- Exercise: What would `join1` be?
join2 = liftA2 (,)
join3 = liftA3 (,,)
join4 = liftA4 (,,,)
join5 = liftA5 (,,,,)
In other words, I can implement almost any imaginable join just by using liftA{n}
and some permutation of optional
s. I don't even know what I'd call this join:
liftA5 (,,,,) t1 (optional t2) t3 (optional t4) t5
... but the beauty is that I don't have to give a name for it. I can easily write anonymous joins on the fly using the Control.Applicative
module. Moreover, the above code will work for anything that implements the Alternative
interface.
Conclusion
Control.Applicative
provides a very general API for relational joins: the Alternative
type class (which includes Applicative
, since Applicative
is a super-class of Alternative
). Perhaps Control.Applicative
could be improved slightly by providing the join{n}
family of functions listed above, but it's still highly usable in its present state.
Note that this trick only works for relational abstractions embedded within Haskell
. This API can be generalized for external relational data stores (i.e. Postgres), which I will cover in a subsequent post.
I've made something very similar to these before to describe the interface to a database system. I called the composition of a function with Maybe a "partial function" (which unfortunately is strongly associated with computability) and a function with known keys an "enumerable partial function". The known keys can be viewed as a meet semi-lattice with an absorbing element bottom. This doesn't cover All keys, which I handled by considering any bounded lattice where we can test for bottom. A nice thing about working with lattices is when implementing the database you can define generalized indexed search trees elegantly in terms of a lattice equipped with a measure. In a way you are breaking each relation into the product of a lattice that can be reasoned about and a function that can't.
ReplyDeleteWhat I didn't realize, and your post demonstrates very nicely, is that the bounded lattice gives rise to an Alternative instance.
Amazing!
ReplyDelete