Algebraic data types are types that can be modeled using simple algebra, namely addition and multiplication. Wait, what? I can add types? Yes:
a + b = Either a bAddition is associative, and Eithers are "sort of" associative, which is to say that if we squint, the following two types are essentially the same:
Either (Either a b) c ~ Either a (Either b c) (a + b) + c = a + (b + c)... where I use ~ to denote "essentially the same". We can show they are equivalent by creating functions that transform each type to the other:
forward x = case x of Left e -> case e of Left a -> Left a Right b -> Right (Left b) Right c -> Right (Right c) reverse x = case x of Left a -> Left (Left a) Right e -> case e of Left b -> Left (Right b) Right c -> Right c -- or more tersely: forward = either (either Left (Right . Left)) (Right . Right) reverse = either (Left . Left) (either (Left . Right) Right)We can also multiply types:
a * b = (a, b)This is also associative:
((a, b), c) ~ (a, (b, c)) (a * b) * c = a * (b * c) forward ((a, b), c) = (a, (b, c)) reverse (a, (b, c)) = ((a, b), c)More importantly, multiplication distributes over addition, just like in normal algebra:
(a, Either b c) ~ Either (a, b) (a, c) a * (b + c) = a * b + a * c forward (a, e) = either (Left . (a ,)) (Right . (a ,)) e reverse = (either fst fst, either (Left . snd) (Right . snd))We can also create an analog of 0:
data Zero -- a data type with no constructor Either Zero a ~ a 0 + a = a forward (Right a) = a reverse a = Right a -- or reverse = RightThe forward function doesn't need to check the alternative case for the Left constructor because we can't create a value of type Zero.
The analog of 1 is any type with a single empty constructor, such as the () type:
type One = () (One, a) ~ a 1 * a = a forward (One, a) = a reverse a = (One, a)Multiplying by Zero has the expected effect:
(Zero, a) ~ Zero 0 * a = 0 -- Neither type can be builtIncredibly, we can "essentially" create
data Bool = True | False Bool ~ 1 + 1 -- i.e. Either () () = 2 data Maybe = Just a | Nothing Maybe a ~ a + 1 -- i.e. Either a () data Trool = True_ | False_ | Empty -- "troolean" Trool ~ 1 + 1 + 1 -- i.e. Either (Either () ()) () = 3 -- or type Trool = Maybe Bool Trool ~ Bool + 1 -- i.e. Either (Either () ()) () ~ 2 + 1 = 3 data TwoBools = TwoBools Bool Bool TwoBools ~ Bool * Bool -- i.e. (Bool, Bool) ~ 2 * 2 = 4Each type corresponds to an algebraic number which tells us how many states that type can represent. The Bool type represents 2 states. The two Trool types represent 3 states.
This state-counting explains why the algebraic equivalence works: If two types represent the same number of states, you can always reversibly convert them back and forth. When we multiply two types, we multiply the number of states they represent. When we add two types, we add the number of states they represent.
We can follow some simple rules to translate Haskell data types into algebraic expressions:
- Replace | with +
- Replace a constructor (or tuple) that holds multiple values with a product
- Replace an empty constructor with 1.
- Replace a type with no constructor with 0
data Sum a b c = S1 a | S2 b | S3 c Sum a b c ~ a + b + c type Sum2 a b c = Either a (Either b (Either c Zero))) Sum2 a b c = a + (b + (c + 0)) = a + b + c data Product a b c d e = P a b c d e Product a b c d e ~ a * b * c * d * e type Product2 a b c d e = (a, b, c, d, e) Product2 ~ a * b * c * d * e type Product3 a b c d e = (a, (b, (c, (d, (e, ()))))) Product 3 ~ a * (b * (c * (d * (e * 1)))) = a * b * c * d * e data Term a b c = T1 a b | T2 c Term a b c ~ a * b + c type Term2 a b c = Either (a, b) c Term2 a b c ~ a * b + cIn the above example, we can think of the type constructors as algebraic functions. Sum computes the sum of its three arguments. Product computes the product of its five arguments.
But what about recursive types? They work, too!
data List a = Nil | Cons a (List a) List a ~ 1 + a * List a ~ 1 + a * (1 + a * List a) ~ 1 + a * (1 + a * (1 + a * List a)) ~ ... ~ (1) + (a) + (a * a) + (a * a * a) + ...This tells us that a List is either empty (1) or (+) has 1 element (a) or (+) has 2 elements (a * a) or (+) has 3 elements (a * a * a) and so on. That's pretty slick!
So far I've only mentioned algebraic equivalents to data types, but we can even represent functions as algebraic expressions:
a -> b = b ^ aTo see why, imagine that our function has a finite domain (i.e. a has n possible states, where n is finite). Then, we could encode our function by just asking what will its result be for all possible values of a and storing those results in an n-tuple.
For example, if we write a function of type Bool -> Trool, then we could instead rewrite our function as a simple data type of type (Trool, Trool) (one Trool for each state of Bool). This immediately tells us that there are only 9 unique functions with that type, and the algebraic translation agrees:
Bool -> Trool ~ Trool^Bool ~ 3^2 = 9For example, if our function were:
data Trool = True_ | False_ | Empty f :: Bool -> Trool f x = case x of True -> Empty False -> False_I could instead encode that function as a tuple, as long as I use a consistent encoding/decoding convention:
f' :: (Trool, Trool) f' = (Empty, False_) f' ~ fIf -> really exponentiated types, we'd expect the following two types to be essentially equal:
a -> b -> c ~ (a, b) -> c (c^b)^a = c^(a * b) forward = uncurry reverse = curry... and they are!
Obviously, we work with many types that do
-- Base types data Zero type One = () -- Type Functions a + b = Either a b a * b = (a, b) a -> b = b^aUnfortunately, an "essentially equal" type won't cut it in Haskell, since it won't type-check, but I hope this helps explain some of the motivation behind Haskell's data type system.
It would be really elegant if Haskell's type system was just syntactic sugar on top of the above minimal type system. In other words, alternate constructors would just get translated into a sum fold:
data For = A a | B b | C c -- gets translated into newtype For = For { unFor :: Either a (Either b (Either c Zero))) }... and constructors that contained several values would get translated into a product fold:
data All = A a b c d e -- gets translated into newtype All = All { unAll :: (a, (b, (c, (d, (e, ()))))) }It would be even nicer if these newtypes could automatically derive class instances from their components types based on the algebra (i.e. a more powerful version of GeneralizedNewtypeDeriving). To elaborate, just imagine having a type H that was an instance of some interface:
class HasHeight t where getHeight :: t -> Double instance HasHeight H where getHeight = ...Then you define some other type based off of H:
data Object x y = Person H x | Building H y | Chair H deriving (HasHeight)Then the class system does some algebra behind the scenes to confirm that it can treat Object as a super-set of H:
Object x y = H * x + H * y + H = H * (x + y + 1) -- confirmed!Then it automatically derives the instance:
instance HasHeight Object where getHeight o = case o of Person h _ -> getHeight h Building h _ -> getHeight h Char h -> getHeight hUnfortunately, Haskell's class system isn't strong enough to express this kind of behavior, so we are stuck bickering over various record implementations instead of enjoying a truly algebraic type system.
No comments:
Post a Comment