Contents
I’ve been reading up on subtyping lately. Subtyping helps us understand what valid changes we can make to a program—amongst other helpful uses—but the rules can be a bit confusing.
I’d been thinking about this for quite some time, musing over it, when I had a sudden flash of inspiration that helped it all click in my head.
Instead of trying to brute-force memorise the subtyping rules on algebraic data types, we can understand those rules by writing total functions which map subtypes to supertypes. The existence of those functions acts as a proof that there is a subtyping relation between the two types.
What is Subtyping? Why Do We Care About It? #
Subtyping is about substituting values of one type for values of another type. If we have some type that is a subtype of a type , then we can provide a value of type whenever our program expects a value of type This has two consequences for our code.
Firstly, if we create code fragments which operate on simple supertypes (i.e. there are many types which are subtypes of these) then we can reuse functionality in many parts of our program by providing subtypes of the demanded supertypes. This helps us keeps our code decoupled.
Secondly, when we make changes to our program’s data structures, if our new versions of the type are subtypes of the old versions of the type, our old code will operate on the updated data structures without needing to be edited.
Subtyping helps us keep our programs modular over both space (different module boundaries) and time (our program’s change over time).
We can memorise the rules after we’ve properly understood how subtyping works.
Total Functions #
Rather than just memorising subtyping rules for algebraic data types, I want to provide a richer understanding under the lens of total functions—functions which are defined over the entire range of a type. Functions which aren’t defined for every instance of a type are known as partial functions.
A famous example of a partial function in Haskell is head, which extracts the
first element of a non-empty list. head is implemented like so:
I’m ignoring the HasCallStack typeclass requirement that head needs to be
able to throw errors on empty lists.
head :: [a] -> a
head (x:_) = xBecause head only contains a pattern for non-empty lists, it succeeds on lists
with at least one element:
ghci> head (1 : [])
1
ghci> head (2 : 3 : [])
2But fails on empty lists:
ghci> head []
*** Exception: Prelude.head: empty list
CallStack (from HasCallStack):
error, called at libraries/base/GHC/List.hs:1644:3 in base:GHC.List
errorEmptyList, called at libraries/base/GHC/List.hs:87:11 in base:GHC.List
badHead, called at libraries/base/GHC/List.hs:83:28 in base:GHC.List
head, called at <interactive>:1:1 in interactive:Ghci1Contrast this with the total function safeHead implemented like so:
safeHead :: [a] -> Maybe a
safeHead [] = Nothing
safeHead (x:_) = Just xWhich works on both empty and non-empty lists:
ghci> safeHead []
Nothing
ghci> safeHead (1 : [])
Just 1
ghci> safeHead (2 : 3 : [])
Just 2And is therefore a total function.
So when I say that we want to write “total” functions that map subtypes to supertypes, I mean that they must be well-defined on every value of the subtype—no throwing errors.
A Brief Review of Algebraic Data Types #
Algebraic data types are about composing together types to make new types. Besides your regular built-in compound types (strings, lists, and so on), there are a few special types to go over:
VoidUnitProductSumFunction
Let’s zip over these quickly.
The Void Type #
The void type is a data type that cannot be constructed:
data VoidWe’ve defined the data type, but provided no constructors.
This type is also known as the empty type. It has 0 occupants
When I’m talking about “occupants”, I’m really talking about
the cardinality of the type, i.e. how many unique elements there are in the
type. since there is no way to make a value of type Void:
It’s quite rare in programming languages. Interestingly, Rust uses this in the
panic! macro (which throws a runtime exception) to notate that it doesn’t
return a value, but calls it the never type (as in, it never returns a value).
The Unit Type #
The unit type is a data type that can only be constructed one way, and as such has only 1 occupant:
In Haskell, it’s represented with the empty tuple, but for ease of
pronunciation I’ll distinguish between the type Unit and its constructor ():
data Unit = ()Rust also calls it the () type.
Confusingly, the C languages and Java call it void.
Python calls it None.
Product Types #
Here we get to the interesting part—type constructors which take multiple types and create a new compound type.
The first type constructor we’ll look at is the product type which contains values of two types at once, meaning it corresponds with the logical predicate of conjunction—it says “we have an element of type AND an element of type ”.
These are everywhere, under the guise of tuples (product types indexed by integer):
data Tuple2 a b = (,) a b
-- or, infix: = (a, b)and records (product types indexed by a name):
There isn’t a generic record type in Haskell unless you use a library.Tuple2 is a real type, though—the type of 2-tuples—so by analogy I defined
the generic 2-record as Record2.
data Record2 a b = Record2 { recordA :: a, recordB :: b }There’s lots of different names for records—Rust and C call these structs, Python calls these dictionaries or (slightly fancier) dataclasses, Erlang has records and maps Interestingly, Erlang’s records are compiled down to tuples. They’re basically a simple affordance for indexing into tuples with names—which works because tuples and records are isomorphic after reindexing.Conversely, maps are stored in different ways based on their size. Small maps (which store less than 32 key-value pairs) are stored as a pair of arrays (one of sorted keys, the other of values). Large maps (more than 32 key-value pairs) are stored as hash-array mapped tries., Elixir adds structs on top of Erlang. Elixir’s structs basically let us add compile-time checks on top of maps.
As for why these are called product types, it’s because the number of types in a product type is equal to the product of the cardinality of its component types:
Take the Boolean type, which has 2 occupants; True and False. If we have a
tuple (Bool, Bool), it has 2 × 2 = 4 occupants:
(False, False)
(False, True)
(True, False)
(True, True)Or take the tuple of type (Bool, Unit), which has 2 × 1 = 2 occupants:
(False, ())
(True, ())Sum Types #
The sum type has much rarer language-level support. Where a product type says “we have an and a ”, a sum type says “we have an OR a ” and therefore corresponds with the logical predicate of disjunction.
In Haskell, it’s represented by the Either type:
data Either a b = Left a
| Right bIt’s often doable in other languages by manually compiling to a tagged union, say in C:
typedef enum {
VALUE_INT,
VALUE_STRING
} ValueType;
typedef struct {
ValueType type;
union {
int i;
const char *s;
} data;
} Value;Or in old-school Java:
public class TaggedUnionExample {
enum ValueType {
INT,
STRING
}
static class Value {
ValueType type;
int intValue;
String stringValue;
static Value ofInt(int value) {
Value v = new Value();
v.type = ValueType.INT;
v.intValue = value;
return v;
}
static Value ofString(String value) {
Value v = new Value();
v.type = ValueType.STRING;
v.stringValue = value;
return v;
}
}
}Although modern Java lets us do this with sealed interfaces and records:
sealed interface Value permits IntValue, StringValue {}
record IntValue(int value) implements Value {}
record StringValue(String value) implements Value {}Rust, like Haskell, just lets us define sum types without any fuss:
enum Value {
Int(i64),
String(String)
}The sum type is also sometimes called the union type, like in Python, where we
can notate this with the | operator and do case analysis with chains of
if/elif isinstance checks (old-school) or pattern matching with match and
case (as of Python 3.10):
Value = int | str
def do_old(value: Value):
if isinstance(value, int):
...
elif isinstance(value, str):
...
def do_new(value: Value):
match value:
case int:
...
case str:
...As you might have guessed, the sum type is called that because its cardinality is the sum of the cardinality of its variant types:
Let’s work some examples. Take the data type Either Bool Bool. We should
expect it to have 2 + 2 = 4 occupants. We can enumerate all the possible
combinations and check:
Left False
Left True
Right False
Right TrueAnd if we take the data type Either Unit Bool, we can check that it has 1 + 2
= 3 occupants:
Left ()
Right False
Right TrueIn fact we can even show an isomorphism between Bool, which has 2 occupants,
and Either Unit Unit, which has 1 + 1 = 2 occupants as well:
class (Isomorphism b a) => Isomorphism a b where
to :: a -> b
instance Isomorphism Bool (Either Unit Unit) where
to False = Left ()
to True = Right ()
instance Isomorphism (Either Unit Unit) Bool where
to (Left ()) = False
to (Right ()) = TrueFunction Types #
And back into our comfort zone we go with the function type, which denotes a function from a value of type to a value of type .
On the logical level, function types correspond to implication—a function implies that we can go from having an to having a .
For each parameter of type , there are possible values that the function could return, so the cardinality of is:
And hence functions are exponential types:
Let’s say we have a Colour type with 3 inhabitants, and we want to create a
function of type Colour -> Bool. We should expect there to be 23
= 8
possible functions. Let’s enumerate and check:
data Colour = Red | Green | Blue
f0, f1, f2, f3, f4, f5, f6, f7 :: Colour -> Bool
f0 Red = False
f0 Green = False
f0 Blue = False
f1 Red = False
f1 Green = False
f1 Blue = True
f2 Red = False
f2 Green = True
f2 Blue = False
f3 Red = False
f3 Green = True
f3 Blue = True
f4 Red = True
f4 Green = False
f4 Blue = False
f5 Red = True
f5 Green = False
f5 Blue = True
f6 Red = True
f6 Green = True
f6 Blue = False
f7 Red = True
f7 Green = True
f7 Blue = TrueSubtyping Rules #
Before we get into the subtyping rules, let’s briefly go over some notation—as I go through the subtyping rules, I’ll provide a formal definition, a translation of the notation, and the total function that justifies why the rule is true.
The subtype operator means “is a subtype of”. So should be read as “A is a subtype of B”.
I’ll use Gentzen’s notation to represent inferences; premises are shown above a horizontal inference line, separated by commas, and the conclusion is shown below the inference line. For example, the logical statement “If P, then Q. P. Therefore Q.” would be depicted as:
Now, let’s define how we’ll notate the function that maps subtypes to supertypes.
As I mentioned, a big part of the value of subtyping is about ensuring that new versions of our types (when we make changes to our program) are subtypes of the old types—subtyping over time.
Imagine we have some function mapToOld that maps the new version of a type to
the old version of the same type. You can think of this as taking some type,
indexed by its version, and mapping it to the same type, but indexed by its
previous version. Morally, something like:
I won’t put in the
effort to make this computable by the GHC. Each time we created a new version
of a type, we’d need to add a completely new implementation of the type, then
create a new case for mapToOld from that version to the previous.Plus our old functions would all be operating on the old types, and we’d have to
either change them manually or iteratively apply mapToOld until we got the
correct version of the type. What a pain.
mapToOld :: Versioned (Succ Nat) a -> Versioned Nat aWhere Versioned is a GADT which constructs a versioned type, indexed by the
natural numbers.
In fact, when we make some changes to a type, if we’re able to define a simple
implementation of mapToOld (that doesn’t take any additional information
outside of what’s in the new type), then that implies the new type is a
subtype of the old one. Mapping a new type to an old type is really about
casting to a supertype. To make this subtyping relation implementable across a
wide range of types, I’ll notate it as a typeclass that we provide instances of:
class Subtype a b where
upcast :: a -> bThis typeclass should be read as “ is a subtype of , and to prove it, I have a function which lets me convert all values of type to values of type .”
This will be my template for whenever we’re trying to prove that is a subtype of by defining a concrete implementation. The subtyping rules I’m going to explain are generic over combinations of types, so I’ll define them using parametric polymorphism.We don’t care about what specific types and are (e.g. strings or integers or floats), we just care about the shape of the generic type constructors that are parameterised over them.
Strictly speaking, the definitions of the new typed versions should be done as variants within a GADT, where different constructors create different indexed types:
data Account (version :: Nat) where
Account :: Int -> Account 1
Account' :: String -> Int -> Account 2
instance Subtype (Account 2) (Account 1) where
upcast (Account' _ balance) = Account balanceAnd let’s not even consider implementing Versioned.
Maybe after I get in some practice with type-level programming. I’ll keep the Haskell notation simple by just providing instances
of Subtype between different types, no type-level version indexing.
Also, for context, whenever I notate some concrete datatype as <type>', I’m
really talking about the same data type (i.e. a programmer has edited the type)
but at a later point in time. If the programmer were to literally create a new
type, we would of course have to update all the call sites to use the new type.
So when I provide an implementation of Subtype that maps a new (sub-) type to
the old (super-) type, I wouldn’t expect this to run in a program,
Unless, of course, we do the GADT-indexed-by-version-type shenanigans. we are merely using it as a way to help us think about how there
are legal ways to change a type (i.e. by creating a subtype of it) while not
requiring any changes to the rest of the program.
With that bookkeeping squared away, let’s finally dive into the rules of subtyping.
The Base Case #
All types are subtypes of themselves.
Considering that the purpose of subtyping is “you can substitute a subtype wherever a type is expected and the program will run fine”, this is pretty obvious—of course you can give a value of type where a value of type is expected. It’s like saying . It’s true by tautology.
instance Subtype a a where
upcast = idTransitivity #
Subtyping is transitive, which is to say that if is a subtype of and is a subtype of , then is a subtype of
This should also be fairly self-explanatory. Morally, this would look like:
instance (Subtype a b, Subtype b c)
=> Subtype a c where
upcast = upcast . upcastBut this would cause problems if we were to try to run it in Haskell (due to
undecidability), so if you were to use the Subtype typeclass, you’d have to
manually compose upcast with itself whenever you need to cast a type up
multiple levels of the supertyping hierarchy.
Product Types #
Adding Fields to a Product #
A product type created by adding a field of type to a type is always a subtype of
To give a concrete example, imagine we have a Customer type that we use in our
program, which currently only stores their name and age:
data Customer = Customer {
customerAge :: Int,
customerName :: String
}What happens to the rest of our program if we decide to add a new field for their contact details?
data Customer' = Customer' {
customerAge :: Int,
customerName :: String
customerContactDetails :: ContactDetails
}Nothing! The rest of the program doesn’t expect to operate on a customer’s contact details, and therefore silently ignores the additional information.
Every Customer' can be mapped into a Customer simply by forgetting the
additional field, which lets us define upcast:
upcast :: Customer' -> Customer
upcast (Customer' age name _) = Customer age nameAnd so our new and updated Customer type is a subtype of the old one.
Generically speaking:
instance Subtype (a, b) a where
upcast (x, _) = x
instance Subtype (a, b) b where
upcast (_, y) = ySubtyping of Fields #
If you make the fields of a product type into subtypes, that creates a new subtype product type:
Imagine we have a product type Order which includes our Customer type:
data Order = Order {
orderCustomer :: Customer,
orderItem :: Item
}After we upgrade the Customer type to its subtype Customer', all the old
places which use Order don’t need to change because they can ignore the
additional information in orderCustomer (and orderItem is a subtype of
itself).
Concretely:
data Order' = Order {
orderCustomer :: Customer',
orderItem :: Item
}
upcast :: Order' -> Order
upcast (Order' customer item) =
Order (upcast customer) itemOr generically:
instance (Subtype a' a, Subtype b' b)
=> Subtype (a', b') (a, b) where
upcast (x, y) = (upcast x, upcast y)Sum Types #
Deleting Cases From a Sum Type #
Deleting cases from a sum type always produces a subtype.
This is a bit confusing. Why doesn’t it work like products, where adding additional information creates a subtype?
Adding a case adds uncertainty to a sum type—there are more possibilities of what case the type could be, which must be tamed via case analysis.
Say we have some ContactDetails data type which we’ve simplified after some
time (as none of our customers ever want us to contact them by phone):
data ContactDetails = CDEmail EmailAddress
| CDPhone PhoneNumber
data ContactDetails' = CDEmail EmailAddressAll the places that expect a subtype of ContactDetails can operate just fine
if we provide them the type ContactDetails'—they’ll have a case
analysis branch that will never get run (and is hence dead code), but otherwise
they’ll cope just fine.
Put generically:
instance Subtype a (Either a b) where
upcast x = Left x
instance Subtype b (Either a b) where
upcast y = Right yAlternatively, imagine we tried to produce a subtype by adding cases to a sum type. How would we define it?
instance Subtype (Either a b) a where
upcast (Left x) = x
upcast (Right y) = ???Subtyping of cases #
If you make the cases of a sum type into subtypes, that creates a new subtype sum type: Say that three times fast!
This is similar to subtyping the fields of a product type. Instead of mapping upcast over all the fields at once, we check to see which case the sum type has, then map upcast over it (while keeping the same
structure):
instance (Subtype a' a, Subtype b' b)
=> Subtype (Either a' b') (Either a b) where
upcast (Left x) = Left (upcast x)
upcast (Right y) = Right (upcast y)
-- or, in point-free style:
upcast = bimap upcast upcastA Brief Interlude on Type Variance #
Before we get to the last subtyping rule of the bunch, we need to discuss type variance. I started off this article with a bit of an inaccurate generalisation, saying that subtyping is about substituting types for other types. While this is true, it’s not the case that we are always substituting subtypes for supertypes (like I may have led you to believe)—sometimes, we want to substitute supertypes for subtypes.
I’ll start with a concrete example and then we can generalise from there.
We’ll create a simple subtyping hierarchy of three types:
- Music
- Rock music
- Progressive rock music If King Crimson has million fans, then I’m one of them.If King Crimson has one fan, then I’m THAT ONE.If King Crimson has no fans, that means I’m dead.
Progressive rock is a subtype of rock, rock is a subtype of music.
Let’s talk about a Producer<RockMusic>. If some part of our program expects
something which produces rock music, we can happily give it a
Producer<ProgRockMusic>, because progressive rock is a subtype of rock. We
couldn’t give it a Producer<Music>, though—what if it produces rap, or pop?
Note that the ordering of the subtyping goes the same way between the Producer<T>
and its underlying type.
We call this covariance—the type constructor and its type argument co- (“together”, “with”, “jointly”) vary.
Where is a covariant type constructor of type .
Both product types and sum types can be thought of as producers of their component values via pattern matching, which is essentially a message to the type asking for them to produce the values they wrap. Hence, they are both covariant—subtyping the fields makes a subtype product type, and subtyping the cases makes a subtype sum type.
Ah, but now let’s talk about a Consumer<RockMusic>. If some part of our
program expects something which consumes rock music, we can’t provide it a
Consumer<ProgRockMusic>, because the consumer is too discerning—it won’t be
able to handle being given glam rock, or heavy metal, or surf rock, and will
crash the program. But we can give it a more general Consumer<Music>, which
will happily process the rock music that the rest of the program feeds it.
So, the ordering of the subtyping between Consumer<T> and T is reversed!
This is called contravariance, because the type constructor and its type argument contra- (“against”, “opposite”) vary.
Where is a contravariant type constructor of type .
Finally, let’s talk about a type constructor which both consumes and produces
RockMusic.
By the subtyping rules on production, the type constructor can only accept
subtypes of RockMusic. By the subtyping rules on consumption, the type
constructor can only accept supertypes of RockMusic. And so, much like how the
only number both greater-than-or-equal-to and lesser-than-or-equal-to some
number x is x itself, this type constructor can only accept the same
type.
This is called invariance, because the type constructor and its type argument in- (“not”) vary, i.e. they don’t vary at all.
Incidentally, this is one of the problems with mutation. With mutable data types, every data type is both a consumer of its underlying values (via set) and producer of its underlying values (via get) and therefore mutable types are always invariant.
I won’t provide an inference rule as there’s no subtyping relations on an invariant type constructor.
To recap:
Covariant type constructors let us substitute subtypes for supertypes.
Contravariant type constructors let us substitute supertypes for subtypes.
Invariant type constructors don’t let us substitute types at all.
Function Types #
And now, back to our regularly scheduled discussion of subtyping on functions.
There is only one subtyping rule for functions.
A function consumes values of type and produces values of type , so they are contravariant in their parameter types, and covariant in their return type.
To spell out the variance rules:
- The parameters of the subtype function must be supertypes of the supertype function’s parameters
- The return value of the subtype function must be subtypes of the supertype function’s return value
In Haskell code, this looks like:
instance (Subtype a a') (Subtype b' b)
=> Subtype (a' -> b') (a -> b) where
upcast f = \x -> upcast (f (upcast x))
-- or, in point-free style
upcast f = upcast . f . upcastConcretely, we can create subtypes of our function by either making our parameters into supertypes: by adding cases to a sum type parameter value:
-- supertype of parameter: adding cases to a sum type
f0 :: Either String Int -> String
f0' :: Either Bool (Either String Int) -> String
-- supertype of parameter: removing fields from a product type
f1 :: (Bool, Int, String) -> String
f1' :: (Bool, Int) -> StringOr by making our return value into a subtype:
-- subtype of return value: removing cases from a sum type
f2 :: Int -> Either Bool String
f2' :: Int -> Bool
-- subtype of return value: adding fields to a product type
f3 :: String -> (Bool, Int)
f3' :: String -> (Bool, Int, Float)The Liskov Substitution Principle #
Finally, there is one thing I’ve not mentioned yet, and that’s the Liskov Substitution Principle:
A subtype should behave like a supertype as far as you can tell by using the supertype methods.
Unfortunately, unlike the structural subtyping rules I’ve gone over, the Liskov Substitution Principle is undecidable by a simple type checker A simple proof: by LSP rules, we would at the least need to be able to prove that the termination/non-termination of each method was the same for a type and its subtype.This is the halting problem. Famously undecidable:w .—it defines subtyping by behaviour essentially in terms of implementing an interface, not in terms of storing information.
Of course, the confusion of subclassing with subtyping is very common thanks to object-oriented languages, but this is a category error.
Of course, this has been confused with the use of inheritance in object-oriented languages, but subclassing is not subtyping. This is a very common mistake, but it’s a category error—subtyping talks about substituting types for other types, subclassing is usually about reusing implementation behaviour.It’s very easy to create a subclass of a class that is nevertheless not a subtype of it.
I will have to revisit behavioural subtyping another time.
Conclusion #
Let me wrap up by collating all the implementations of upcast:
-- all types are their own subtypes
upcast x = x
-- subtyping is transitive
upcast = upcast . upcast
-- product: adding a field
upcast (x, _) = x
upcast (_, y) = y
-- product: subtyping over fields
upcast (x, y) = (upcast x, upcast y)
-- sum: removing a case
upcast x = Left x
upcast y = Right y
-- sum: subtyping over cases
upcast (Left x) = Left (upcast x)
upcast (Right y) = Right (upcast y)
-- functions: subtyping by supertyping on parameters
-- and subtyping on return value
upcast f = upcast . f . upcastTo sum up, all of these implementations of upcast are total functions which
map a subtype to its supertypes. Thus whenever we have some type in a code
fragment, we can replace it with a subtype, upcasting if necessary.
Feel free to memorise the rules now. Flashcards are handy.
Anyways, we don’t need to implement upcast in our code for subtyping to be
helpful. Just thinking in terms of subtyping can help us write better software,
by being mindful of modularity over both time (measured along the world-line of
our software’s changing existence) and space (by helping our code stay
decoupled).
Firstly, when we update our program’s code, if we can keep our updated data types as subtypes of their previous versions, then we don’t need to update the rest of the program around them–the code will happily operate on the new subtype.
Secondly, if we design our code to operate on simple supertypes, then our old code can operate on very many subtypes and will not need to change in response to us changing data structures around. Our code will have low coupling and we’ll be free to update our data structures as user needs change.
Subtyping lets us safely substitute values of one type for another.