Further demistifying the Monad in Scala: a Category Theory approach

Some of you may have read my article on monads. This time I want to talk about monads from a different, more theoretical point of view. It is not mandatory to read the previous article first (in case you haven’t), but if you don’t know anything about the topic, I guess it wouldn’t hurt to become familiar with it before jumping on this one.

Let’s start. Some of you may have heard the popular quote:

A monad is just a monoid in the category of endofunctors, what’s the problem?

Original statement is from Categories for the Working Mathematician by Saunders Mac Lane, but it has been quoted and rephrased numerous times. I took this statement as a reference point; a teaching objective, if you will. So I will attempt to explain monads from the category theory point of view by explaining that particular sentence.

Important note: A lot of stuff written here is simplified and may not be 100% mathematically accurate. Category theory is a very complex field of mathematics (sometimes even jokingly referred to as “abstract nonsense”).
This is not a math article. This is simply my attempt of creating an introductory, easy-to-follow text that should serve as a starting point to get you all excited about category theory and continue the exploration on your own.

Algebraic structures

What is a category? A category is an algebraic structure. OK, but what is an algebraic structure? Algebraic structure is basically a set (when used in this context of being an algebraic structure’s set, it is commonly referred to as underlying set) of one or more elements with one or more operations on them.

There are two main things that differentiate algebraic structures from one another:

laws that need to hold for operations on set elements
existence of identity element

Here are some laws that appear across different algebraic structures (operation in an algebraic structure is denoted as ○):

Closure: For all a, b in algebraic structure M, the result of operation a ○ b is also in M
Invertibility: For each element a in M there is an element b where a ○ b = b ○ a = e , where e is the identity
Commutativity: for all a, b in M: a ○ b = b ○ a
Associativity: for all a, b, c in M: (a ○ b) ○ c = a ○ (b ○ c)

Identity element is an element which (quoting the wiki) “leaves other elements unchanged when combined with them”. It’s the element that doesn’t affect the result when used in operation ○. We can formulate that into a law too:

Identity: for any a in M there is an element e where e ○ a = a ○ e = a

So, starting from a simple set with one or more operations and by adding a certain combination of the laws that are required, we can end up with various algebraic structures. Here are some:

Monoid

Getting warmer. What is a monoid?

Monoid can be defined as a:

- single set S
- with an associative binary operation ○
- and an identity element e
- following two laws:
(a ○ b) ○ c = a ○ (b ○ c)
a ○ e = e ○ a = a

Note how similar this looks to a category. We will soon see what this means.

But first an example for a monoid. OK, so we need a set with an associative binary operation and with some identity element. We could take addition on the set of integers Z. Alright, we have an identity element (it’s zero) and we have the associativity, because (x + y) + z= x + (y + z). If you take another look at that table of laws for various algebraic structures, you’ll notice that addition in Z is in fact a group, because it also has an inverse (subtraction).

But here’s something cool and possibly a bit disturbing at the same time — monoid is actually a category, just a bit of a special one: it has only one object.

If you scroll back to the table we saw earlier you’ll see that monoid is actually pretty similar to a category, only difference being the closure law. Recall that closure says “for all a, b in algebraic structure M, the result of operation a ○ b is also in M”. It’s pretty logical we got the closure property in a monoid since we always start from and end up in the same object (the only one we have). Categories don’t have closure because not all arrows can necessarily be composed (e.g. you can’t compose a → b and c → d) so the property “combine any two elements and you’ll arrive at something that’s also a part of the set” doesn’t hold. Other two laws (identity and associativity) are the same.

When switching the viewpoint from monoid as a set to monoid as a category, elements of the set become arrows of the category, identity element becomes the identity arrow, and binary operation becomes arrow composition. And yes, as soon as we enter the twilight zone also known as category theory, stuff gets weird. But with a bit of mindset shifting, our addition-in-Z example can be seen as a category.

First of all, numbers are now arrows. Think of an arrow as an “add X” function. These functions compose quite nicely, since adding 3 to a number and adding 5 to the result is the same as adding 8 to the original number. Furthermore, composing add3 with add5 followed by composing with add7 is the same as composing add3 with the composition of add5 and add7:

(add3 ○ add5) ○ add7 = add3 ○ (add5 ○ add7)

To cite a particular great source on category theory from which I took the adders example:

Instead of giving you the traditional rules of addition, I could as well give you the rules of composing adders, without any loss of information.

So, by shifting our viewpoint a little, we were able to see the monoid both as a set (on the left) and a single-object category (on the right).

b3uWNx07IXLiBE21ysOYyAeBiDsDF5rH29vr — (I took the photo from here)

Also, all the laws that we had in the first case are also present in the second case. Nothing more, nothing less.

OK, so we no longer have numbers; we have add-arrows which represent single-parameter adding functions. Second operand of adding operation is “hardcoded” (operation is now unary, not binary) so we have one function for each natural number. Total count of natural numbers and total count of arrows in our category are the same. Finally, arrows all share the same starting and ending object.

This object is completely irrelevant; think of it as a massless blob of nothingness. If you want, we could rename arrows from “addX’ to simply “x” and then the whole thing becomes really similar to our normal good old set Z with addition, just instead of “adding two and two equals four” we would say “composing two and two equals four”.

Here are the wikis for both viewpoints, perhaps you’ll find them useful:

Functor

Alright, now we know what algebraic structures are and how they are defined, and we also took a closer look into two of them — category and monoid. We saw that monoid is a category too, just with only one object.

What we need now is a functor (by the way, I already wrote a bit about functors too).

Compared to monoids and categories, functor is a bit different. It is not really an algebraic structure; it’s more of a function that maps from algebraic structures to algebraic structures. But not just any algebraic structures: functor is a mapping between categories.

Category, as we know by now, is a set of objects with arrows between them. Well, functor knows how to map the category to another category. It will map all the objects and all the arrows, and it will preserve the connections between objects (if there is an arrow between a and b in original category, there will be one between Fa and Fb in the resulting category). Given some object a or some arrow f: a → b from the original category, corresponding object (the one functor maps into) is denoted as F(a) and corresponding arrow is denoted as F(f): F(a) → F(b).

Functor that maps a category back to that same category is called an endofunctor. Quick detour from the category theory into the real world — in programming, we deal with functors in category of types, and they are all endofunctors. They map a category of types back to category of types. Whenever you mapped something (map in Scala, fmap in Haskell) you had an endofunctor in your hands. For example, Option, List and Future are all valid endofunctors. Basically anything that has a map() function is an endofunctor.

Don’t take this for granted though; it’s merely a convention. Some library may provide you with an object whose map() doesn’t obey the functor laws (see the old article) which would mean that the object itself cannot be considered a functor.

Example: in Scala we can map from List[Int] to List[String] (in other words, to map all elements of a List from Int to String):

List(1, 2, 3).map((x: Int) => x.toString)

We can have all kinds of mappings: from Int to String, from String to List[String], from List[String] to Banana etc. All of them are Scala types. This means we always map from a category of Scala types back to the category of Scala types. Hence the “endo” part.

Now let’s “rip out” the map() method from some functor F and view it as a function of two arguments — a functor object and a function which we map it with. For example, List(1, 2, 3).map(f) becomes map(f, List(1, 2, 3)).

If we curry that, it’s easy to see that signature of map is:

(a → b) → F[a] → F[b]

Function which we map with is denoted as a → b, starting functor object is F[a] and result functor is F[b]. Note that instead of providing both arguments, function a → b (e.g. (x: Int) => _.toString()) and a functor instance F[a] (e.g. List(1, 2, 3)), we can provide only the function f, in which case we get back a function F[a] → F[b] (if this is not something you’re comfortable with, do a quick research on currying and partially applied functions). So yeah, instead of providing both arguments of type A and B and getting back a value of type C, we can provide only parameter of type A and get back a function B → C. Providing only number 4 to a function which sums two numbers gives us back a function that takes some number n and returns n+4.

Let’s take another look at this interesting function (a → b) → F[a] → F[b]. It’s a function from (a → b) to F[a] → F[b]. To put it in category theory terms — it maps a category to another category, where objects of destination category carry the F symbol. Remember what we said earlier? “Given some object a or some arrow a → b from the original category, corresponding object (the one functor maps into) is denoted as F(a) and corresponding arrow is denoted as F(a) → F(b).” That’s exactly what map does.

But we are missing something. We saw just one half of being a functor: the one that maps morphisms. A function on functions. It takes a function and returns the “lifted” version of that function, that is, one that instead of taking a and returning b now takes F[a] and returns F[b]. By the way, in practice we often don’t just lift functions and save them for later. Instead we immediately apply them to a functor instance. For example, instead of lifting a function Int → String to List[Int] → List[String], we usually pass in an instance of List[Int] right away (or, in case of Scala, we invoke map() as a method on an instance of List[Int])).

However, we know that functor not only maps all source arrows to destination arrows, but also maps all source objects to destination objects, right? But we only saw the mapping of arrows, when we lifted a → b to F(a) → F(b). How to map a itself (and b, for that matter)? This is another side of a functor: apart from the function on functions, there is also a (somewhat implicit) function on types. Functor’s mapping of arrows is represented by the function-on-functions part, while mapping of objects is represented by the function-on-types part.

What is this function on types? It’s a unary type constructor. This means that, given some type, functor produces another new type. So if we take List functor as an example, given a String it gives us a List[String]. Given some type A, it gives us a List[A]. Type constructor for functors can be described in a more general way as * → * (takes one type as a parameter and produces a new type from it). Here’s some more useful reading on type constructors.

Quick wrap-up of functors. Each functor is a mapping of categories. It maps some source category C to destination category F by mapping objects in C to objects in F and morphisms in C to morphisms in F. We say that (quoting myself) “given some object a or some arrow f: a → b from the original category, corresponding object (the one functor maps into) is denoted as F(a) and corresponding arrow is denoted as F(f): F(a) → F(b)”.

As (Scala?) programmers, we are working with the category of types in Scala (category of types in Haskell is called Hask, so I guess the Scala one could be called Sca or something). Here objects are Scala types and morphisms are functions between those types. Each functor in Sca maps the objects (Scala types) via the type constructor, and morphisms (functions on Scala types) via function map. If we take Option as an example, it maps every object in Sca (that is, every Scala type) into Option(thatObject), and every morphism between objects in Scala (that is, function between Scala types) into Option(thatFunction).

And since it’s a functor Sca → Sca, it’s more specifically an endofunctor.

Monad

Now that we know what category, monoid and endofunctor are, we can imagine a category of endofunctors and try to find a monoid in that category. As the famous statement by mr. Mac Lane tells us, monoid in the category of endofunctors is in fact a monad.

What does it mean to be a monoid in some category? For example, a monoid in the category Set (remember, that’s the category of all sets). What is it? Well, take a look at the category; it contains all imaginable sets. Now pick out those that satisfy monoid laws, that is, pick out all those sets for whose elements we can find an identity element and define an associative binary operation.

One such set is the set of integers with identity element being zero and binary operation being addition. Note that monoidal binary operation on set S must operate on two operands, both from S, and return something that is also an element of S.

Cool, so a monoid in the category Set is any set that has those properties, that is, those two operations (for example, the set of natural numbers with addition). Here are some more examples of monoids in the category of X.

Let’s now see about “monoid in the category of endofunctors”. In this category, objects are endofunctors and arrows are mappings between those functors. An extra bit of terminology before we continue: mappings between functors are called natural transformations. They operate on a yet higher level of abstraction:

arrows map objects within a category from one to another
functors map categories from one to another
natural transformations map functors from one to another

We could also say that a natural transformation is an “arrow between functors”, and that’s exactly what we have here — we have a category of endofunctors, which means that objects are endofunctors themselves, and arrows between those endofunctors are by definition natural transformations.

Back to our search for monads. So what we are doing is reaching into the category of endofunctors and looking for such objects of that category for which two particular arrows are defined — identity and associative binary operation. When we have those two arrows, we have a monoid (in the category of endofunctors).

So we need an endofunctor F for which the following operations are defined:

η: I → F
μ: F x F → F

where I is the identity endofunctor, η (eta) is the fancy math name for identity on endofunctors, and μ (mu) is the fancy math name for associative binary operation on endofunctors. If you’re unsure where these two came from all of a sudden, remember that when we talked about monoid as a set, I showed you a wiki link which, as you can see, uses those exact two operations (more precisely, arrows in the monoidal category) to define a monoid (in the monoidal category).

Let’s examine those two operations in the context of category of endofunctors and see what they become in that case:

Identity is a natural transformation mapping from an identity endofunctor (a functor which maps a category to itself) to another endofunctor.
Asssociative binary operation is some operation x that knows how to take two endofunctors and turn them into one. Keep in mind that this operation must be associative, so (F x F) x F must yield the same result as F x (F x F).

Now let’s see how each of these two operations materialize in the Scala world.

First, η: I → F. What identity functor I does is it maps any category back to itself. That means its “type constructor” doesn’t actually create any new type F(value). It just leaves the value as it is. So what nat. transformation η really does is — it takes an identity functor and wraps it into a functor F context. In category of Scala types, identity functor maps a type back to itself, so we can view this whole transformation as lifting a Scala type into some F context, where F can be any functor.

What about μ: F x F → F? It’s a composition of two identical functors which results in just one. It’s a way to turn F[F[T]] into F[T].

Based on what we just said, it’s pretty logical to have apply as η and flatten as μ. Method apply constructs a type F[x] from x , while flatten “flattens” the composition F[F[x]] into F[x].

So whenever you see an endofunctor in Scala (in practice this is almost any object with method map, but remember to check the functor laws before you declare something a functor; for example, there’s a way to break those laws for Set) that also has flatten and unit, you are in fact dealing with a monad.

Three monad definitions and monad laws

We saw that monad is an endofunctor (remember, any functor in a programming language is actually an endofunctor because it maps from the category of types back to category of types) with two natural transformations:

identity/unit implemented as apply, and
associative binary operation implemented as flatten (note that terms unit and identity are used interchangeably; I tend to use unit when talking from a practical, programming viewpoint, and identity when talking about category theory, but sometimes I may mix them up; they’re the same thing).

Also, since we are talking about an (endo)functor, we know that there is a map method available too.

You may have heard about monad consisting of unit and flatMap. Yep, that’s correct; we can simply “squash” the flatten (coming from μ) and map (coming from its functor nature) methods into a single flatMap method. Our monad would still retain the same properties, it’s just that its associative binary operation would no longer be flatten, but flatMap. Monad laws (explained later) are not harmed.

So there are two valid definitions of a monad? One that involves unit and flatMap and one that involves unit, flatten and map?

Yes. And not just that — there’s also a third one. Monads can be expressed in three different ways. All three definitions are equivalent and we can easily transform one into another. Here they are:

unit + flatMap
unit + flatten + map
unit + compose

All three are equally powerful and each one can be expressed by using one of the other two. We’ll talk more about them later.

Now, every monad implementation (e.g. List, Option etc.) needs to obey the monad laws. Monad laws can be expressed in three different ways, depending on which monad definition you are using, but they all resemble the same core concept.

Here are the monad laws presented using the unit + flatMap definition:

left-identity law:
unit(x).flatMap(f) == f(x)
right-identity law:
m.flatMap(unit) == m
associativity law:
m.flatMap(f).flatMap(g) == m.flatMap(x ⇒ f(x).flatMap(g))

It is obvious that unit + flatMap definition is easily transformed to unit + flatten + map and vice versa because flatMap = flatten + map. I will now show the connection between unit + flatMap and unit + compose. I like the unit + compose definition because it makes the laws easier to express.

Let’s see the compose function:

def compose: (A => F[B]) => (B => F[C]) => A => F[C]

This is a two-parameter function where both parameters are functions of type A → F[B], and the result is also a function of the same type. Note that types A and B are completely free and may represent any concrete types in Scala (A→F[B] could be e.g. Int → List[String] or Int → Set[Int]). We use the letters simply to denote that the parameter of the first function is the same type as the parameter of the result function (here denoted as “A”).

By the way, functions of type A → F[B] are called Kleisli arrows, just in case you stumble upon the term somewhere. We say that the function we just defined has Kleisli arrows as parameters and also as return type. It’s a composition of Kleisli arrows.

Now let’s also define “unit”:

def unit: A => F[A]

We can easily check that this is indeed unit: it’s a neutral element that, when composed with, gives back the original element. Just take the signature and replace F[C] with F[B] so that the second parameter becomes an identity function:

(A => F[B]) => (B => F[B]) => A => F[B]

Again, types A, B and C are not fixed and can represent any concrete type so A → F[A] from the first identity expression is the same function as B → F[B] in this expression. You can think of identity function not as A → F[A], but as Whatever → F[Whatever] if you wish:

(A => F[Whatever]) => (Whatever => F[Whatever]) => A => F[Whatever]

So if we apply identity function as a second operand to compose, we will arrive back at our first operand. Good, composing identity with some function results in the same function. Identity holds. Associativity is pretty clear too; I will leave it to you to try and prove it by yourself.

What I wanted to show you is that compose can be expressed by using flatMap. Same goes in the other direction too; flatMap can be expressed by using compose.

Here it is:

trait Monad[F[_]] {

  def flatMap[A, B](fa: F[A], f: A => F[B]): F[B] = ???
  
  def compose[A, B, C](f1: A => F[B], f2: B => F[C]): A => F[C] = {
    a => flatMap[B, C](f1(a), f2)
  }
}

And the other direction:

trait Monad[F[_]] {

  def compose[A, B, C](f1: A => F[B], f2: B => F[C]): A => F[C] = ???

  def flatMap[A, B](fa: F[A], f: A => F[B]): F[B] = {
    compose[Unit, A, B]((u: Unit) => fa, f)()
  }
}

As I said earlier, unit + compose is particularly great for one specific purpose, and that is expressing monad laws. They become simpler to read and understand. Here they are again, this time using compose instead of flatMap:

left-identity law:
unit.compose(f) == f
right-identity law:
f.compose(unit) == f
associativity law:
f.compose(g.compose(h)) == (f.compose(g)).compose(h)

Remember that all three definitions are equally “good”, that is, all three are minimal sets of operations needed to define a monad and each one can be expressed by using one of the other two.

Always remember that monad laws are the essence of what makes monads what they are. Having monad operations (e.g. unit + flatMap) is not enough if the laws are not satisfied.

Summary

Our initial objective is complete. We now understand that famous sentence. Monoid in the category of endofunctors is any endofunctor with operations η and μ, and we call such endofunctor a monad (reminder: objects of that category are endofunctors and arrows are natural transformations).

So, monad can be defined in many ways, such as:

monoid in the category of endofunctors
object in the category of endofunctors with arrows η and μ
endofunctor with natural transformations η and μ

In Scala, η is implemented as apply and μ is implemented as flatten, which means that monad is any functor (that is, a construct with map method) that is additionally equipped with apply and flatten. There are two more completely equally valid ways of defining a monad in terms of its operations: unit + flatMap and unit + compose. They are completely equal and there is nothing one can do that others can’t. We saw how flatMap can be expressed using compose and vice versa; this is possible for all combinations.

And don’t forget about the monad laws.

Final word

Hopefully this article helped you gain some perception as to how certain category theory constructs fit together and, more specifically, what our old friends monads “really are” — monoids in the category of endofunctors.

By the way, the name monad comes from “monoid” and “triad”; monoid because it’s a monoid in the category of endofunctors, and triad because it’s a package of three things: an endofunctor equipped with two natural transformations.

That’s all for now. As usual, if you find any mistakes please do let me know via email (sinisalouc[at]gmail[dot]com). You can also find me on Twitter.

Cheers!