Feb 1, 2023

Quasi-mathematical function signatures aka how to write cleaner functions

If you're getting this post for the second time, it's because the first version contained a bug (credit to Peyton for pointing out my example made no sense). In the words of Hannah Montana, everybody has those days.

The goal of this post is to highlight a "code smell" that I try to clean up when I can, that you've almost certainly encountered if you've been programming awhile. I'm going to call it "Domain Filtering". But before diving into it, we need to learn a wee little bit of math terminology.

A function in math is a very specific thing. You can think of it as a mapping from one set to another set.

from Wikipedia

Functions make a really specific promise: "Choose any item in my list and I will point you to exactly one related item in another list". That's it. The list of items you can choose from is called the "domain" and the list of related items your result resides in is called the "codomain". You may have been expecting me to say "range", but these are subtly different. The range is the collection of items that come out. The codomain is the set containing that set that the function promises to map into. For example, in our diagram, the range is the set {D, C} and the codomain is the set {D, B, C, A}. Make sense?

So wtf does this have to do with programming? Lets look at a very simple rust function signature:

pub fn divide(n: i64, d: i64) -> i64 {
	// The code to perform division
}

This function signature is making you a promise just like our diagram. It's saying "if you give me two 64 bit signed integers, I will direct you to a 64 bit signed integer". The domain is the set of all possible pairs of 64 bit signed integers, and the codomain is the set of 64 bit signed integers.

I hope you've noticed by now that this function is lying to you. It's making promises it can't keep, because of course this function has a classic edge case: what about dividing by zero? If we were to implement this function, we would have to do something like the following

pub fn divide(n: i64, d: i64) -> Result<i64, SomeErrorType> {
	if d == 0 {
    		// Handle the error
    	}
    
    // code for doing division
}

Now I'm making a promise I can keep, but my code is uglier and more complicated. I've had to introduce an error handling construct for the case where someone misuses my function. The original sin is still in place. I'm lying about my domain!

I need a way to be able to say d must not me zero. I could create a struct called NonZeroInteger as a way of separating out the 0 check into the constructor for the struct, but that adds a lot of code to our code base. I personally like a crate called contracts, which allows using macros to add runtime checks to functions. The below example asserts at runtime that nobody calls divide with d == 0.

#[debug_requires(d != 0, "We can't divide by zero!")]
pub fn divide(n: i64, d: i64) -> i64 {
	n / d
}

Now we're getting somewhere. Rather than add lots of complicated code to deal with a developer passing in an argument I don't want (0), we can add a single line and be reasonably certain nobody will misuse this function (anyone reading this function will know not to use 0, even if they skip the documentation).

Side note: It'd be really nice if programming languages had a built in NonZeroNumber type, it comes up surprisingly often.

You'll notice that because I don't have to handle an error case, my code is much nicer looking, all because I was willing to admit what my function was actually capable of. In my mind, this is one of the big benefits of an expressive type system: being able to write functions that tell the truth about their domains and codomains, at least most of the time. Rust may not have solved our 0 problem right of the bat with it's built in primitive types, but it solves much more common domain problems around questions like: Is this function argument ever null (rust doesn't have null, so no)? Is this reference still valid (lifetimes)? Can someone change the state of this argument out from under me while the function is executing (only one &mut, so no)? Etc.

But static typing isn't the only approach to this.

Many dynamically typed functional languages that have pattern matching including guards allow them to be used as part of function signatures as well as traditional pattern matching statements. Lets try implementing our function in elixir.

def divide(n, d) when d != 0, do: n / d

This function is very different from our rust function, as it performs float division rather than integer division, but it's just here to demonstrate handling the domain problem of division by zero. And we've specified the d cannot be zero in the function signature instead of having to handle it in the function body. Pretty neat.

When you write functions in your programming language of choice, do you do a lot of "domain filtering" at the beginning of each function? Are you constantly checking for null, empty strings or lists, or even if your array is actually an array or a map this time (cough php cough). I would encourage you to think about how you can solve this. Can you lift these checks out of the function body into the function signature? If not, can you cleanly separate the domain filtering logic out so the meaning of the function is clear to anyone who reads it. Most importantly, can you avoid introducing complex error handling logic to handle edge cases you know as a programmer should never come up, while ensuring other programmers don't misuse your functions?

Happy programming :)

Subscribe to BenIsOnTheInternet