This is a post is an FAQ answering the most common questions people ask me related to inlining and specialization. I’ve also structured it as a blog post that you can read from top to bottom.
What is inlining?
“Inlining” means a compiler substituting a function call or a variable with its definition when compiling code. A really simple example of inlining is if you write code like this:
module Example where
x :: Int
= 5
x
y :: Int
= x + 1 y
… then at compile time the Haskell compiler can (and will) substitute
the last occurrence of x
with its definition
(i.e. 5
):
y :: Int
= 5 + 1 y
… which then allows the compiler to further simplify the code to:
y :: Int
= 6 y
In fact, we can verify that for ourselves by having the compiler dump its intermediate “core” representation like this:
$ ghc -O2 -fforce-recomp -ddump-simpl -dsuppress-all Example.hs
… which will produce this output:
==================== Tidy Core ====================
Result size of Tidy Core
= {terms: 20, types: 7, coercions: 0, joins: 0/0}
x = I# 5#
$trModule4 = "main"#
$trModule3 = TrNameS $trModule4
$trModule2 = "Example"#
$trModule1 = TrNameS $trModule2
$trModule = Module $trModule3 $trModule1
y = I# 6#
… which we can squint a little bit and read it as:
x = 5
y = 6
… and ignore the other stuff.
A slightly more interesting example of inlining is a function call, like this one:
f :: Int -> Int
= x + 1
f x
y :: Int
= f 5 y
The compiler will be smart enough to inline f
by
replacing f 5
with 5 + 1
(here x
is 5
):
y :: Int
= 5 + 1 y
… and just like before the compiler will simplify that further to
y = 6
, which we can verify from the core output:
y = I# 6#
What is specialization?
“Specialization” means replacing a “polymorphic” function with a “monomorphic” function. A “polymorphic” function is a function whose type has a type variable, like this one:
-- Here `f` is our type variable
example :: Functor f => f Int -> f Int
= fmap (+ 1) example
… and a “monomorphic” version of the same function replaces the type variable with a specific (concrete) type or type constructor:
example2 :: Maybe Int -> Maybe Int
= fmap (+ 1) example2
Notice that example
and example2
are
defined in the same way, but they are not exactly the same function:
example
is more flexible and works on strictly more type constructorsexample
works on any type constructorf
that implementsFunctor
, whereasexample2
only works on theMaybe
type constructor (which implementsFunctor
).example
andexample2
compile to very different core representations
In fact, they don’t even have the same “shape” as far as GHC’s core
representation is concerned. Under the hood, the example
function takes two extra “hidden” function arguments compared to
example2
, which we can see if you dump the core output (and
I’ve tidied up the output a lot for clarity):
@f $Functor = fmap $Functor (\v -> v + 1)
example
Nothing = Nothing
example2 Just a) = Just (a + 1) example2 (
The two extra function arguments are:
@f
: This represents the type variablef
Yes, the type variable that shows up in the type signature also shows up at the term level in the GHC core representation. If you want to learn more about this you might be interested in my Polymorphism for Dummies post.
$Functor
: This represents theFunctor
instance forf
Yes, the
Functor
instance for a type likef
is actually a first-class value passed around within the GHC core representation. If you want to learn more about this you might be interested in my Scrap your Typeclasses post.
Notice how the compiler cannot optimize example
as well
as it can optimize example2
because the compiler doesn’t
(yet) know which type constructor f
we’re going to call
example
on and also doesn’t (yet) know which
Functor f
instance we’re going to use. However, once the
compiler does know which type constructor we’re using it can
optimize a lot more.
In fact, we can see this for ourselves by changing our code a little
bit to simply define example2
in terms of
example
:
example :: Functor f => f Int -> f Int
= fmap (+ 1)
example
example2 :: Maybe Int -> Maybe Int
= example example2
This compiles to the exact same code as before (you can check for yourself if you don’t believe me).
Here we would say that example2
is “example
specialized to the Maybe
type constructor”. When write
something like this:
example2 :: Maybe Int -> Maybe Int
= example example2
… what’s actually happening under the hood is that the compiler is actually doing something like this:
= example @Maybe $FunctorMaybe example2
In other words, the compiler is taking the more general
example
function (which works on any type constructor
f
and any Functor f
instance) and then
“applying” it to a specific type constructor (@Maybe
) and
the corresponding Functor
instance
($FunctorMaybe
).
In fact, we can see this for ourselves if we generate core output
with optimization disabled (-O0
instead of
-O2
) and if we remove the -dsuppress-all
flag:
$ ghc -O0 -fforce-recomp -ddump-simpl Example.hs
This outputs (among other things):
…
example2 = example @Maybe $FunctorMaybe
…
And when we enable optimizations (with -O2
):
$ ghc -O2 -fforce-recomp -ddump-simpl -dsuppress-all Example.hs
… then GHC inlines the definition of example
and
simplifies things further, which is how it generates this much more
optimized core representation for example2
:
Nothing = Nothing
example2 Just a) = Just (a + 1) example2 (
In fact, specialization is essentially the same thing as inlining under the hood (I’m oversimplifying a bit, but they are morally the same thing). The main distinction between inlining and specialization is:
specialization simplifies function calls with “type-level” arguments
By “type-level” arguments I mean (hidden) function arguments that are types, type constructors, and type class instances
inlining simplifies function calls with “term-level” arguments
By “term-level” arguments I mean the “ordinary” (visible) function arguments you know and love
Does GHC always inline or specialize code?
NO. GHC does not always inline or specialize code, for two main reasons:
Inlining is not always an optimization
Inlining can sometimes make code slower. In particular, it can often be better to not inline a function with a large implementation because then the corresponding CPU instructions can be cached.
Inlining a function requires access to the function’s source code
In particular, if the function is defined in a different module from where the function is used (a.k.a. the “call site”) then the call site does not necessarily have access to the function’s source code.
To expand on the latter point, Haskell modules are compiled separately (in other words, each module is a separate “compilation unit”), and the compiler generates two outputs when compiling a module:
a
.o
file containing object code (e.g.Example.o
)This object code is what is linked into the final executable to generate a runnable program.
a
.hi
file containing (among other things) source codeThe compiler can optionally store the source code for any compiled functions inside this
.hi
file so that it can inline those functions when compiling other modules.
However, the compiler does not always save the source code for all functions that it compile because there are downsides to storing source code for functions:
this slows down compilation
This slows down compilation both for the “upstream” module (the module defining the function we might want to inline) and the “downstream” module (the module calling the function we might want to inline). The upstream module takes longer to compile because now the full body of the function needs to be saved in the
.hi
file and the downstream module takes longer to compile because inlining isn’t free (all optimizations, including inlining, generate more work for the compiler).this makes the
.hi
file biggerThe
.hi
file gets bigger because it’s storing the source code of the function.this can also make the object code larger, too
Inlining a function multiple times can lead to duplicating the corresponding object code for that function.
This is why by default the compiler uses its own heuristic to decide
which functions are worth storing in the .hi
file. The
compiler does not indiscriminately save the source code
for all functions.
You can override the compiler’s heuristic, though, using …
Compiler directives
There are a few compiler directives (a.k.a. “pragmas”) related to inlining and specialization that we’ll cover here:
INLINABLE
INLINE
NOINLINE
SPECIALIZE
My general rule of thumb for these compiler directives is:
- don’t use any compiler directive until you benchmark your code to show that it helps
- if you do use a compiler directive,
INLINABLE
is probably the one you should pick
I’ll still explain what what all the compiler directives mean, though.
INLINABLE
INLINABLE
is a compiler directive that you use like
this:
f :: Int -> Int
= x + 1
f x {-# INLINABLE f #-}
The INLINABLE
directive tells the compiler to save the
function’s source code in the .hi
file in order to make
that function available for inlining downstream.
HOWEVER, INLINABLE
does
NOT force the compiler to inline that function. The
compiler will still use its own judgment to decide whether or not the
function should be inlined (and the compiler’s judgment tends to be
fairly good).
INLINE
INLINE
is a compiler directive that you use in a similar
manner as INLINABLE
:
f :: Int -> Int
= x + 1
f x {-# INLINE f #-}
INLINE
behaves like INLINABLE
except that
it also heavily biases the compiler in favor of inlining the
function. There are still some cases where the compiler will refuse to
fully inline the function (for example, if the function is recursive),
but generally speaking the INLINE
directive override’s the
compiler’s own judgment for whether or not to inline the function.
I would argue that you usually should prefer the
INLINABLE
pragma over the INLINE
pragma
because the compiler’s judgment for whether or not to inline things is
usually good. If you override the compiler’s judgment there’s a good
chance you’re making things worse unless you have benchmarks showing
otherwise.
NOINLINE
If you mark a function as NOINLINE
:
f :: Int -> Int
= x + 1
f x {-# NOINLINE f #-}
… then the compiler will refuse to inline that function. It’s pretty
rare to see people use a NOINLINE
annotation for
performance reasons (although there are circumstances where
NOINLINE
can be an optimization). It’s far, far,
far more common to see people use NOINLINE
in
conjunction with unsafePerformIO
because that’s what the unsafePerformIO
documentation recommends:
Use
{-# NOINLINE foo #-}
as a pragma on any functionfoo
that calls unsafePerformIO. If the call is inlined, the I/O may be performed more than once.
SPECIALIZE
SPECIALIZE
lets you hint to the compiler that it should
compile a polymorphic function for a monomorphic type ahead of time. For
example, if we define a polymorphic function like this:
example :: Functor f => f Int -> f Int
= fmap (+ 1) example
… we can tell the compiler to go ahead and specialize the
example
function for the special case where f
is Maybe
, like this:
example :: Functor f => f Int -> f Int
= fmap (+ 1)
example {-# SPECIALIZE example :: Maybe Int -> Maybe Int #-}
This tells the compiler to go ahead and compile the more specialized
version, too, because we expect some other module to use that more
specialized version. This is nice if we want to get the benefits of
specialization without exporting the function’s source code (so we don’t
bloat the .hi
file) or if we want more precise control over
when specialize does and does not happen.
In practice, though, I find that most Haskell programmers don’t want
to go to the trouble of anticipating and declaring all possible
specializations, which is why I endorse INLINABLE
as the
more ergonomic alternative to SPECIALIZE
.