Dmitriy Kubyshkin

Type Checking as Evaluation

If you have never been exposed to a language with a static type checking, it might be hard to wrap your head about what it is and how it works. One way to look at type checking is as evaluation of a secondary program that mirrors the structure but uses types as values. That sounds way more confusing than it actually is. Let me explain...

Two Programs For The Price of One

Let's consider the following simple program that if run in the browser produces a popup saying "Hello, World!":

function hello(name: string): void {
  const message: string = "Hello, " + name + "!";
  alert(message);
}

const world: string = "World";
hello(world);

For the examples in this article, I will TypeScript, as it is the easiest to showcase due to both having static types and having an interpreted runtime, but the idea is universal.

If you are familiar with TypeScript, you might be a bit surprised to see all types to be explicitly annotated while it is not necessary. This is to avoid discussion of type inference. I might do a separate article later on it as well.

In the code above, we tell through special annotations that name, message, and world are all strings. Functions also have their types, namely (name: string): void for hello.

Now that the compiler has the knowledge of all the types, how does it check that they are correct? The way I found the easiest to explain this to the developers new to static type checking is to consider that we are writing two programs at the same time. One is the actual program we want to write to be later run by the user. The other program is the one compiler evaluates during compilation.

Since TypeScript is (mostly) just JavaScript with static types, the user program is exactly the same just without types:

function hello(name) {
  const message = "Hello, " + name + "!";
  alert(message);
}

const world = "World";
hello(world);

The other program is much more interesting. I will use JavaScript syntax as well, but in reality compilers typically use a custom representation and evaluation rules. Here's roughly what it looks like:

function hello(nameType) {
  if (nameType !== "string") {
    throw new Error(
      `Expected a string, got a ${nameType}`
    );
  }
  const messageType =
    _plus(_plus("string", nameType), "string");
  alert(messageType);
}

const worldType = "string";
hello(worldType);
console.log("All good!");

In the example above, I have replaced string literals with their type names instead, so "string". In JavaScript, we can not define our own +, instead, I'm using a special _plus function. We also need to have some definition of alert. Here's what they might look like in JavaScript syntax:

function alert(anyType) {
  // alert can accept any type
  // so nothing to check here
}

function _plus(a, b) {
  if (a !== b) {
    throw new Error(
      `Operands of + must have` +
      `the same type`
    );
  }
  // TypeScript allows to use + operator
  // on strings and numbers.
  if (a !== "string" && a !== "number") {
    throw new Error(
      `Expected a string or a number`
    );
  }
  // Result of + is of the
  // same type as arguments
  return a;
}

You can run the example above in any modern JavaScript interpreter. When run, "All good!" should be printed to the console. If you can try to change worldType to something else, for example:

const worldType = "number";

If you try to run the program again, you should see an error similar to this printed to the console that among other things says: "Error: Expected a string, got a number". Magic!

Before we move on further I will introduce a small helper function _assertType that will do the error throwing if types are not the same. It will be useful in the following examples

function _assertType(
  expected, actual
) {
  if (expected !== actual) {
    throw new Error(
      `Expected ${expected},` +
      `got ${actual}`
    );
  }
  return expected;
}

Control Flow

The program above is trivial. It has, for example, no conditionals. How would we deal with something like this:

function isTheAnswer(n: number): string {
  const result = (n === 42) ? "Yes" : "No";
  return result;
}

In regular execution of a program we would take one of the branches depending on the value of n passed to the function. For type checking of conditional expressions we need to take both branches and check it against the expected type. In the case above we check against the return type:

function isTheAnswer(n) {
  const resultType = _ternary(
    // n === 42
    _assertType(n, "number"),
    "string", // "Yes"
    "string" // "No"
  );
  
  const returnType = "string";
  return _assertType(
    returnType, resultType
  );
}

This is quite a bit to unpack, but same as with the + operator I had to replace a comparison, and a ternary expression with the functions that are defined a bit below. The main point to note here is that both side of the ternary and passed to the function for checking:

function _eqEq(leftType, rightType) {
  _assertType(leftType, rightType);
  return "boolean";
}

function _ternary(
  conditionType,
  thenType,
  elseType
) {
  _assertType("boolean", conditionType);
  return _assertType(thenType, elseType);
}

In TypeScript neither ternary nor comparisons are required to have the same type. I wish they did though. Pretty much every other language requires the same types in these situations.

The statement version of if works similarly, except that it doesn't require then/else branches to have the same type as there isn't really one. If we write out the same TypeScript source code as above like this:

function isTheAnswerIf(n: number): string {
  let result: string;
  if (n === 42) {
    result = "Yes";
  } else {
    result = "No";
  }
  return result;
}

The corresponding type checking code is then:

function isTheAnswerIf(n) {
  // let result: string
  let resultType = "string";
  
  // n === 42
  _eqEq(n, "number");
  
  resultType = // result = "Yes"
    _assertType(resultType, "string");
  resultType = // result = "No"
    _assertType(resultType, "string");
  
  const returnType = "string";
  return _assertType(returnType, resultType);
}

As we need to check both branches, the if itself usually disappears. The exception is the type refinement case that I will describe in the next section.

Switch statements behave the same as a sequence of if/else, that is the actual statement disappears, but all the branches are checked. I will not show the code, but as an exercise, you could try to write it on your own.

Unions and Conditional Type Refinement

Some languages allow you to specify a type that can be multiple things. You can say, for example, that something is either a string or a number:

function ensureString(
  x: string | number
): string {
  if (typeof x === "number") {
    return numberToString(x);
  }
  return x;
}

function numberToString(
  n: number
): string {
  // implementation not important 
  return "0";
}

ensureString("foo");
ensureString(42);

If we just disregard if statement as we did in the examples above, the checking would fail for ensureString("foo") as numberToString can only work on numbers. Instead, we keep the if, but remove typeof part as we already have the type:

function ensureString(xType) {
  const returnType = "string";
  if (xType === "number") {
    return _assertType(
      returnType,
      numberToString(xType)
    );
  }
  return _assertType(returnType, xType);
}

function numberToString(n) {
  _assertType("number", n); 
  return "string";
}

ensureString("string");
ensureString("number");

This works great, but I'm cheating a bit here. The problem with the code above is that I can not produce something equivalent to the following TypeScript snippet:

declare const stringOrNumber: string | number;
ensureString(stringOrNumber);

If you are not familiar with declare - it allows you to specify a type definition for the compiler without giving a value.

We could try to say ensureString("string|number"); but that would not work as we are using simple !== to check for type mismatches. To solve this we need to switch our type representation from simple strings to something a bit more complex. For unions, we will use arrays, such that string|number is represented as ["string", "number"].

We also need a function to check that types match to replace our !==. Same as the operator, the function will return a boolean instead of throwing and right away when types don't match as assertType does.

function _matchType(expected, actual) {
  // For simplicity turn both
  // types into "unions", even if
  // it only has one type
  if (!Array.isArray(actual)) {
    actual = [actual];
  }
  if (!Array.isArray(expected)) {
    expected = [expected];
  }
  // Ensure that every actual type
  // is one of the expected ones
  for (const e of expected) {
    if (!actual.includes(e)) {
      return false;
    }
  }
  return true;
}

We can now update _assertType to use this function, along with better printing of union types:

function _assertType(
  expected, actual
) {
  if (!_matchType(expected, actual)) {
    throw new Error(
      `Expected ${_typeToStr(expected)},` +
      `got ${_typeToStr(actual)}`
    );
  }
  return expected;
}

function _typeToStr(type) {
  if (Array.isArray(type)) {
    return type.join("|");
  }
  return type;
}

With this change in hand, we should now have a version of ensureString that would work for a union:

function ensureString(xType) {
  const returnType = "string";
  // For `if` type testing we pass
  // args in reverse order!
  if (_matchType(xType, "number")) {
    return _assertType(
      returnType,
      numberToString(xType)
    );
  }
  return _assertType(returnType, xType);
}

ensureString(["string", "number"]);

As noted in the code comment, there is an important nuance to the usage of _matchType for assertion versus branching. In assertions, we want to ensure that the actual type is a subset of what is allowed. For branching used for type refinement, the question is whether whatever was passed could be what we check for, this case number. This distinction influences the order of arguments.

Nominal Typing

The majority of the statically typed languages, like C++, Java, C#, Rust and Haskell used what is called nominal typing. In general, nominal typing allows you to say that one or more types are grouped into a record/struct/object and are now known under a specific name.

The most crucial property of nominal typing lies in the fact that even if you have another group with the same types under a different name, it will not match during type checking.

TypeScript does not have proper nominal typing. It is possible to fake it, but that would go too much in-depth on TypeScript specifics. In the following example, I will pretend that TypeScript does have nominal typing for classes. If you translate that code to something like Java it will produce the error we want:

class A {
  constructor(
    public x: number,
    public y: number
  ){}
}
class B {
  constructor(
    public x: number,
    public y: number
  ){}
}

const a = new A(1, 2);
// Not an error in TypeScript, but
// should be with nominal typing
const b: B = a;

To implement this in our type checking problem we first need to decide on the representation of the classes. Remember that when we are going to from for example, a number 42 in a source program we "lose" the value, and just keep the type - number. The same thing needs to happen with classes. So from something constructible with new to produce an object, we just get a single literal object representing a type:

const AType = {
  x: "number",
  y: "number",
};

const BType = {
  x: "number",
  y: "number",
};

const a = AType;
// Error thrown here:
const b = _assertType(BType, AType); 

Since objects in JavaScript are not == unless they are the same instance, we don't even need to change any code in _assertType. Running this code would throw an error like expected.

Duck Typing

The term duck typing comes from the duck test in reasoning: "If it looks like a duck, swims like a duck, and quacks like a duck, then it probably is a duck."

For type systems, this means that if some type has the same structure (properties, methods) as what we expect, we consider it to match. Hence, another name for this is structural typing.

TypeScript is one of few languages that almost exclusively uses structural type checking. To implement these checks we need to do basically the same thing as what we did for union types. Let's start with expanding actual.includes into a manual loop:

function _matchType(expected, actual) {
  // For simplicity turn both
  // types into "unions", even if
  // it only has one type
  if (!Array.isArray(actual)) {
    actual = [actual];
  }
  if (!Array.isArray(expected)) {
    expected = [expected];
  }
  // Ensure that every actual type
  // is one of the expected ones
  for (const e of expected) {
    let includes = false;
    for (const a of actual) {
      if (_eqType(e, a)) {
        includes = true;
        break;
      }
    }
    if (!includes) return false;
  }
  return true;
}

function _eqType(expected, actual) {
  return expected == actual;
}

The next step is to replace expected == actual with something that will iterate over the properties whenever necessary:

function _eqType(expected, actual) {
  // Take care of comparing a class
  // which will be "object" to a primitive
  // type that we have as "string"
  if (typeof expected !== typeof actual) {
    return false;
  }
  // structural matching
  if (typeof expected === "object") {
    for (let prop in expected) {
      if (!(prop in actual)) {
        return false;
      }
      // Properties can also be unions
      // or objects, so need to recurse
      if (!_matchType(
        expected[prop],
        actual[prop]
      )) {
        return false;
      }
    }
    return true;
  }
  // Fallback for primitive types
  return expected === actual;
}

If we run the same following code as we had in the nominal typing section, there is no longer an error as types are structurally the same:

const AType = {
  x: "number",
  y: "number",
};

const BType = {
  x: "number",
  y: "number",
};

const a = AType;
// All good!
const b = _assertType(BType, AType); 

Now, what if we want to compare types structurally in some cases but nominally in others? The easiest way is to do structural typing but introduce a hidden property that is always different between types. And autoincrementing integer works great, but in JavaScript we can do better by using Symbols:

const NominalTag = Symbol("NominalTag");
const AType = {
  x: "number",
  y: "number",
  [NominalTag]: Symbol("AType"),
};

const BType = {
  x: "number",
  y: "number",
  [NominalTag]: Symbol("BType"),
};

const a = AType;
// No error, what is going on?!
const b = _assertType(BType, AType); 

The code above is correct, however Symbols are not present in for (.. in ..) loops in JavaScript. Need to modify our _eqType a bit:

function _eqType(expected, actual) {
  if (typeof expected !== typeof actual) {
    return false;
  }
  if (typeof expected === "object") {
    // get both string and symbol keys
    const expectedProps = [
      ...Object.getOwnPropertyNames(expected),
      ...Object.getOwnPropertySymbols(expected)
    ];
    for (const prop of expectedProps) {
      
      // the rest is same as before
      if (!(prop in actual)) {
        return false;
      }
      if (!_matchType(
        expected[prop],
        actual[prop]
      )) {
        return false;
      }
    }
    return true;
  }
  return expected === actual;
}

Now we get our error as expected. I mentioned above that we could force nominal behavior in TypeScript. This is done by exp wrote above just on type level and with a terrible syntax:

declare const NominalTag:
  {readonly _: unique symbol}["_"];
class NominalA {
  constructor(
    public x: number,
    public y: number
  ){}
  
  [NominalTag]?:
    {readonly _: unique symbol}["_"]
}
class NominalB {
  constructor(
    public x: number,
    public y: number
  ){}
  
  [NominalTag]?:
    {readonly _: unique symbol}["_"]
}

const a = new NominalA(1, 2);
// Error, as expected
const b: NominalB = a;

Property Access

Now that we have types that have nested properties we need to figure out how accessing them works. Let's consider the following simple example:

type Foo = {
  bar: number;
  buzz: number;
}
const foo: Foo = {
  bar: 10,
  buzz: 32
};
const sum: number =
  foo.bar + foo.buzz;

Since the object types are also represented as objects we could also use . operator for access, and it would work for success case:

const FooType = {
  bar: "number",
  buzz: "number",
};
const foo = _assertType(
  FooType,
  {
    bar: "number",
    buzz: "number"
  }
);

// All good!
const sum = _assertType(
  "number",
  _plus(foo.bar, foo.buzz)
);

If we make a typo though, say bat instead of `bar, we would still get an error, but it will be one of the famously cryptic things about undefined as it is what you get for a missing property in JavaScript.

It would be better to have another helper that actually checks for existence of the property before access and gives a nice error:

const sum = _assertType(
  "number",
  _plus(
    // Error: Can not access property bat
    _prop(foo, "bat"),
    _prop(foo, "buzz")
  )
);

function _prop(obj, prop) {
  if (!(prop in obj)) {
    throw new Error(
      `Can not access property ${prop}`
    );
  }
  return obj[prop];
}

Generic Types

Many statically typed languages allow types that include other types. The simplest example is probably Array<T>. We can pass any type instead of T, allowing us to have Array<number>, Arrayor evenArray<Array>`.

Let's consider the following TypeScript snippet:

const numbers: Array<number> = [1, 2, 3];
// Error!
const strings: Array<string> = numbers;

On the type level implementing this is mostly trivial as it is just a function call that returns a nominal object type that includes our T parameter:

const Index = Symbol("Index");

const ArrayTag = Symbol("Array");
function ArrayType(T) {
  return {
    length: "number",
    [Index]: T,
    [NominalTag]: ArrayTag,
  };
}

const numbers = ArrayType("number");
// Error, as expected
const strings = _assertType(
  ArrayType("string"),
  numbers
);

This is all mostly what we have already seen before, but what is that [Index] thing? Well, code can index into an array with arr[12] syntax. To support that we need to know which type should the value be in that. In this case, the type is T. Let's see this in action considering the following TypeScript source:

const array: Array<number>
  = [1, 2, 3];
// All good, as expected
const n: number = array[1];

For our code, we will need to introduce a special _index helper as with other operators, but the rest is pretty straightforward:

const numbers = ArrayType("number");
// All good now!
const n = _assertType(
  "number",
  _index(numbers, "number")
);

function _index(array, indexType) {
  _assertType("number", indexType);
  // JavaScript allows indexing strings
  if (array === "string") {
    return "string";
  }
  if (typeof array !== "object") {
    throw new Error(
      `Can not index primitive types`
    );
  }
  // Check that the object is defined
  // to support indexing operation like
  // our ArrayType above
  if (!(Index in array)) {
    throw new Error(
      `Can not index type`
    );
  }
  // Return the underlying type
  return array[Index];
}

I'm not supporting type indexing on unions as it is very messy and causes a bunch of weird problems in TypeScript as well.

Generic types also can be part of a function signature. Let's start with the simplest generic function - identity:

function id<T>(x: T):T {
  return x;
}

Interestingly enough it is super simple in the type land:

function id(x) {
  // Return what was passed
  return x;
}

To make it a bit more interesting, we can make a generic function for pushing an item to an array:

Do not use this function in real projects - it has terrible performance!

function push<T>(
  array: Array<T>,
  item: T
) {
  array[array.length] = item;
  array.length = array.length + 1;
}

const numbers: Array<number>
  = [1, 2, 3];

// All good
push(numbers, 4);

// Error
push(numbers, "foo");

The code for the type version is going to get a bit messy, but there is actually nothing new at all:

function push(array, item) {
  // function signature check
  _assertType(array, ArrayType(item));
  
  // array[array.length] = item
  _assertType(
    _index(array, _prop(array, "length")),
    item
  );

  // array.length = array.length + 1;
  _assertType(
    _prop(array, "length"),
    _plus(_prop(array, "length"), "number")
  );
}

const numbers = ArrayType("number");
// All good!
push(numbers, "number");

// Error!
push(numbers, "string");

That is all there is regarding type checking of generic types. There are some specifics about searching the common types in cases of subclassing, but it is more object-oriented programming specific and also has more to do with type inference rather than type checking, so I will leave it for another time.

Recursion

In all the examples above, I was very careful not to have any recursive functions. The reason for it is not necessarily that it is hard to deal with the recursion. It's just that it makes things messy. Let's take a common implementation of Fibonacci numbers:

function fib(n: number): number {
  if (n === 0 || n === 1) {
    return 1;
  }
  return fib(n - 1) + fib(n - 2);
}

// All good
fib(5);

If we do the same translation we have been doing before we get:

function fib(n) {
  const returnType = "number";
  _assertType(n, "number");
  
  // if (n === 0 || n === 1) {
  _or(
    _eqEq(n, "number"),
    _eqEq(n, "number")
  );
  // return 1;
  _assertType(returnType, "number");

  // return fib(n - 1) + fib(n - 2);
  _assertType(
    returnType,
    _plus(
      fib(_minus(n, "number")),
      fib(_minus(n, "number"))
    )
  );
  
  return returnType;
}

// StackOverlfow Error
fib("number");

In the example above, _minus has the same implementation as _plus and _or takes two booleans and returns a boolean. I will leave them out for conciseness.

The problem though is if you try to run this code it will error out with a StackOverflow exception. Indeed, we are unconditionally calling fib inside of itself. The solution for this problem is pretty standard for detecting these kinds of recursion cycles - track the functions we have been in and short-circuit.

const CALL_STACK = [];

function fib(n) {
  const returnType = "number";
  _assertType(n, "number");

  // If we have been here before
  // just return the specified
  // return type right away
  if (CALL_STACK.includes(fib)) {
    return returnType;
  } else {
    // Add ourselves to stack
    CALL_STACK.push(fib);
  }

  // if (n === 0 || n === 1) {
  _or(
    _eqEq(n, "number"),
    _eqEq(n, "number")
  );
  // return 1;
  _assertType(returnType, "number");

  // return fib(n - 1) + fib(n - 2);
  _assertType(
    returnType,
    _plus(
      fib(_minus(n, "number")),
      fib(_minus(n, "number"))
    )
  );
  
  // Don't forget to remove
  // ourselves from the stack
  CALL_STACK.pop();

  return returnType;
}

// All good!
fib("number");

As you can see, it is just a global stack (array) with a couple of extra statements. Ideally, you want to add these checks to all the type functions because there are also cases of indirect recursion where a calls b who calls a back. It can be even more steps in-between, but the code above will catch them all.

Summary

That's it. The realization that type checking can be thought of as another program running at compile time was when things really "clicked" for me a few years ago. Hopefully, this article made the process of program compilation a bit less mysterious for you as well.

To make sure that you properly understand how this all works I would recommend you to play around with the examples in the article. Introduce deliberate type problems and see the errors. You could also try to convert some existing simple typed code to JavaScript type evaluation similar to the examples above.

You can get the final JavaScript type checking code here, and the original TypeScript code is available as well.

I'm not aware of any commercial compiler that would use the setup here, and there are many reasons why. The main one is probably the requirement to have some sort of runtime to run the type checking program. To do that in the same language would require it to work both in interpreted (without types!) and compiled modes. TypeScript / JavaScript is probably the closest thing we have.

Real-world compilers usually use systems that are closer to constraint/logic solvers than to the setup I've described here. Describing how they work is a story for another time.

Having said that, there is no reason one couldn't create a compiler that does type checking through evaluation. Maybe yours will be first?