Using Phantom Types to Create Builders Without Result or Unwrap in Rust

This blog post is a direct ripoff of a wonderful talk by Hayleigh Thompson at FOSDEM, which you can watch here:
https://www.youtube.com/watch?v=3lYHFctx2Ks
She also wrote a fabulous blog post about phantom types as well, which you can read here:
https://hayleigh-dot-dev.github.io/blog/phantom-types-in-gleam/


This blog post is about 70% plagiarism. The difference lies in 1. specifically discussing Rust where Hayleigh's content on this topic is mainly in a lovely programming language called Gleam and 2. this post doesn't extensively cover phantom types and their uses, instead considering the specific application of using phantom types to construct builder APIs for data structures intended to be serialized to and deserialized from some text format like JSON.

I'm gonna skip the intro and dive into an example. You're building up a request object with information about a meal order.

Entree: can be chicken, fish, steak, or tofu.
Soup: can be french onion or gumbo.
Salad: can be caesar or house.
Dessert: can be creme brulee, key lime pie, or coffee.

The object has a few conditionally required fields. You can purchase a "combo" which comes with a dessert and a soup or salad, or just an entree by itself.

So a request might look like

{
  "entree": "chicken",
  "soup": "gumbo",
  "dessert": "keyLimePie"
}

or

{
  "entree": "fish"
}

but

{
  "entree": "steak",
  "soup": "gumbo",
  "salad": "caesar"
}

is an invalid request for a number of reasons.
1. You can't have both a soup and salad, it's pick 1.
2. No dessert has been specified.

Modeling the request with untagged enums and serde.

You're a rustacean, so you don't slap together a bunch of Option s and call it a day. You decide to make invalid states unrepresentable, and start pumping out enums (using serde macros to serialize/deserialize correctly).

use serde::{Deserialize, Serialize};

#[derive(Debug, Serialize, Deserialize)]
#[serde(rename_all = "camelCase")]
enum Entree {
    Chicken,
    Fish,
    Steak,
    Tofu,
}

#[derive(Debug, Serialize, Deserialize)]
#[serde(rename_all = "camelCase")]
enum Dessert {
    KeyLimePie,
    CremeBrulee,
    Coffee,
}

#[derive(Debug, Serialize, Deserialize)]
#[serde(rename_all = "camelCase")]
enum Soup {
    FrenchOnion,
    Gumbo,
}

#[derive(Debug, Serialize, Deserialize)]
#[serde(rename_all = "camelCase")]
enum Salad {
    Ceaser,
    House,
}

#[derive(Debug, Serialize, Deserialize)]
#[serde(rename_all = "camelCase")]
enum SoupOrSalad {
    Soup(Soup),
    Salad(Salad),
}

#[derive(Debug, Serialize, Deserialize)]
#[serde(rename_all = "camelCase")]
#[serde(untagged)]
enum MealRequest {
    Combo {
        entree: Entree,
        #[serde(flatten)]
        side: SoupOrSalad,
        dessert: Dessert,
    },
    OnlyEntree {
        entree: Entree,
    }
}

You pat yourself on the back for being superior to your colleagues working in languages without tagged unions, and go get lunch while the project compiles.

So how is it to construct an object with this code?

let order = MealRequest::OnlyEntree {
    entree: Entree::Fish,
};
println!("{}", serde_json::to_string(&order).unwrap());
let complicated_order = MealRequest::Combo {
    entree: Entree::Chicken,
    side: SoupOrSalad::Soup(Soup::Gumbo),
    dessert: Dessert::KeyLimePie,
};
println!("{}", serde_json::to_string(&complicated_order).unwrap());
{"entree":"fish"}
{"entree":"chicken","soup":"gumbo","dessert":"keyLimePie"}

You've made a nice data type that will automatically serialize/deserialize correctly. But let's acknowledge some drawbacks.

1. You have to import SO much stuff, just to build a very simple object.
2. Constructing the object requires having ALL the information at once. It's not possible to pass the object through a series of steps that enrich it with information.
3. The rust code looks nothing like the json. It's conceptually difficult to understand.
4. Refactoring this code is likely to break the serde setup. (What would happen if I changed the order of the variants in MealRequest ?)

You decide you want to create a builder, but you have some reservations. Setting up a builder will mean littering your code with Option and unwrap . User's of your builder will need to carefully read your documentation to make sure they don't crash the program, OR add error handling for a Result they know will never be an Err variant.
This is where phantom types can come to the rescue.


What the Hell is a Phantom Types

The idea behind a phantom type is simply to have a type parameter which isn't used. It may seem strange at first to include an unnecessary generic, but phantom types allow us to do something really cool: change the compile time constraints of a program without changing the runtime representation.

The purpose of this blog post is not to explain phantom types, so if you need an intro/refresher, I highly recommend checking out one of the links mentioned at the top of this post. (Given you're reading this blog post, the syntax of Gleam will be incredibly straightforward, you probably need no intro to read the post or watch the video. Other content on the topic tends to be in more foreign looking languages like Elm and Haskell).

A Builder Using Phantom Types in Rust

I'm gonna paste a bunch of code here, and then we'll talk about it, but feel free to create an empty rust project, paste in the code, and play with it a little.

main.rs

mod meal;
use crate::meal::{Dessert, Entree, Meal, Salad, Soup};


fn main() {
    // The idea here is to use phantom types to define required and conditionally required optional fields
    // in order to achieve a builder API that has no `.unwrap()` step at runtime.

    // In this example, we are build a `Meal`. There are two pricing options: Just the entree OR three course meal.
    // The entree means there should only be an entree.
    // For the full course meal, there must be a dessert AND either a soup OR a salad.

    // Ways to use the builder

    // Just an entree is valid
    let meal: Meal = meal::new(Entree::Fish).build();
    println!("{:?}", meal);

    // An entree, dessert, and soup is valid
    let meal: Meal = meal::new(Entree::Steak)
        .set_dessert(Dessert::Coffee)
        .set_soup(Soup::Gumbo)
        .build();
    println!("{:?}", meal);

    // An entree, dessert, and salad is valid
    let meal = meal::new(Entree::Steak)
        .set_salad(Salad::Caesar)
        .set_dessert(Dessert::Coffee).build();
    println!("{:?}", meal);

    // Compiler errors (uncomment them to see how helpful they are)

    // Trying to set both a soup and salad is invalid
    // let meal: Meal = meal::new(Entree::Steak).set_dessert(Dessert::Coffee).set_salad(Salad::Caesar).set_soup(Soup::Gumbo).build();
    // let meal: Meal = meal::new(Entree::Steak).set_dessert(Dessert::Coffee).set_soup(Soup::Gumbo).set_salad(Salad::Caesar).build();

    // Trying to build a meal with just a salad is invalid. Uncomment this and see how the type
    // error specifies our builders requirements exactly.
    // let meal: Meal = meal::new(Entree::Fish).set_salad(Salad::House).build();
}

meal.rs

//! Demonstration of how to use phantom types to avoid
//! having to unwrap builders at runtime.

use std::marker::PhantomData;

#[derive(Debug)]
pub enum Entree {
    Chicken,
    Fish,
    Steak,
    Tofu,
}

#[derive(Debug)]
pub enum Soup {
    FrenchOnion,
    Gumbo,
}

#[derive(Debug)]
pub enum Salad {
    Caesar,
    House,
}

#[derive(Debug)]
pub enum Dessert {
    CremeBrulee,
    KeyLimePie,
    Coffee,
}

pub struct MealOptions<PDessert, PSoupOrSalad> {
    // phantom fields
    phantom_dessert: PhantomData<PDessert>,
    phantom_soup_or_salad: PhantomData<PSoupOrSalad>,

    // regular fields
    entree: Entree,
    soup: Option<Soup>,
    salad: Option<Salad>,
    dessert: Option<Dessert>
}

#[derive(Debug)]
pub struct Meal {
    entree: Entree,
    soup: Option<Soup>,
    salad: Option<Salad>,
    dessert: Option<Dessert>
}

pub struct DessertUnselected;
pub struct DessertSelected;

pub struct SoupOrSaladNotSet;
pub struct SoupSet;
pub struct SaladSet;

pub fn new(entree: Entree) -> MealOptions<DessertUnselected, SoupOrSaladNotSet> {
    MealOptions {
        phantom_dessert: PhantomData,
        phantom_soup_or_salad: PhantomData,
        entree,
        soup: None,
        salad: None,
        dessert: None
    }
}

impl MealOptions<DessertUnselected, SoupOrSaladNotSet> {
    pub fn build(self) -> Meal {
        let Self {
            phantom_dessert: _,
            phantom_soup_or_salad: _,
            entree,
            soup,
            salad,
            dessert
        } = self;
        Meal {
            entree,
            soup,
            salad,
            dessert
        }
    }
}

impl<T> MealOptions<DessertUnselected, T> {
    pub fn set_dessert(self, dessert: Dessert) -> MealOptions<DessertSelected, T> {
        let Self {
            phantom_dessert: _,
            phantom_soup_or_salad: _,
            entree,
            soup,
            salad,
            dessert: _
        } = self;
        let ret: MealOptions<DessertSelected, T> = MealOptions {
            phantom_dessert: PhantomData,
            phantom_soup_or_salad: PhantomData,
            entree,
            soup,
            salad,
            dessert: Some(dessert)
        };
        ret
    }
}

impl<T> MealOptions<T, SoupOrSaladNotSet> {
    pub fn set_soup(self, soup: Soup) -> MealOptions<T, SoupSet> {
        let Self {
            phantom_dessert: _,
            phantom_soup_or_salad: _,
            entree,
            soup: _,
            salad,
            dessert,
        } = self;
        let ret: MealOptions<T, SoupSet> = MealOptions {
            phantom_dessert: PhantomData,
            phantom_soup_or_salad: PhantomData,
            entree,
            soup: Some(soup),
            salad,
            dessert,
        };
        ret
    }

    pub fn set_salad(self, salad: Salad) -> MealOptions<T, SaladSet> {
        let Self {
            phantom_dessert: _,
            phantom_soup_or_salad: _,
            entree,
            soup,
            salad: _,
            dessert,
        } = self;
        let ret: MealOptions<T, SaladSet> = MealOptions {
            phantom_dessert: PhantomData,
            phantom_soup_or_salad: PhantomData,
            entree,
            soup,
            salad: Some(salad),
            dessert,
        };
        ret
    }
}

impl MealOptions<DessertSelected, SoupSet> {
    pub fn build(self) -> Meal {
        let Self {
            phantom_dessert: _,
            phantom_soup_or_salad: _,
            entree,
            soup,
            salad,
            dessert,
        } = self;
        Meal {
            entree,
            soup,
            salad,
            dessert,
        }
    }
}

impl MealOptions<DessertSelected, SaladSet> {
    pub fn build(self) -> Meal {
        let Self {
            phantom_dessert: _,
            phantom_soup_or_salad: _,
            entree,
            soup,
            salad,
            dessert
        } = self;
        Meal {
            entree,
            soup,
            salad,
            dessert
        }
    }
}

Seriously, paste these two files into a clean rust project and explore the experience using the builder from inside main.

We've created a builder in which the type system ensures the appropriate methods have been called before being allowed to call build. There's a bit of boilerplate, but the important thing to grasp is the signatures:

pub fn new(entree: Entree) -> MealOptions<DessertUnselected, SoupOrSaladNotSet>

We can create a new MealOptions (our builder), whose type indicates that the dessert and the soup/salad fields are unset

impl MealOptions<DessertUnselected, SoupOrSaladNotSet> {
    pub fn build(self) -> Meal {
        ... some code
    }
}

Given we have a MealOptions with those fields unset, we can create a meal. This is our "just the entree" option.

impl<T> MealOptions<DessertUnselected, T> {
    pub fn set_dessert(self, dessert: Dessert) -> MealOptions<DessertSelected, T> {
        ...some code
    }
}

Given we have a MealOptions where the dessert is not selected, we should be able to call a method called set_dessert to, well, set the dessert. Notice the type signature of the returned MealOptions now says DessertSelected .

impl<T> MealOptions<T, SoupOrSaladNotSet> {
    pub fn set_soup(self, soup: Soup) -> MealOptions<T, SoupSet> {
        ... some code
    }

    pub fn set_salad(self, salad: Salad) -> MealOptions<T, SaladSet> {
        ... some code
    }
}

Given we have a MealOptions where the soup or salad is not set, we should be able to call either set_soup or set_salad . Notice again the type signature changes. Once we call this method, we no longer have SoupOrSaladNotSet as one of our types, and lose access to this method.

impl MealOptions<DessertSelected, SoupSet> {
    pub fn build(self) -> Meal {
        ... some code
    }
}

impl MealOptions<DessertSelected, SaladSet> {
    pub fn build(self) -> Meal {
        ... some code
    }
}

Given we have a MealOptions<DessertSelected, SoupSet> or a MealOptions<DessertSelected, SaladSet> , we should be able to call build and get our Meal struct out.

Lets discuss the pros and cons on this builder.


Pros:
1. Because the type system determines which methods are valid at any step, if it compiles it's a valid request and
2. the LSP can guide you through it and
3. there's no unwrap.
4. The Meal struct produced at the end looks exactly like the JSON of the request, no fancy serde magic required, just a simple #[derive(Serialize, Deserialize)]

Cons:
1. We don't get any of this safety when deserializing, only when building the struct up with our builder.
2. In the same vein, once the meal struct is produced, we lose safety guarantees. You can use the MealOptions as the request to maintain this safety wherever it's used, but you'll need to write custom Serialize and Deserialize impls.

So when should I actually write code like this?

I think the phantom builder pattern makes a lot of sense when building clients. You can use the produced "simple" type internally without exposing the risk to your users, and provide a nice and easily discoverable builder API.
I think the serde + enums approach makes sense on the server (in your handlers), as it provides serde's automatic serialization/deserialization and encodes your business logic into the type more completely, preventing lots of calls to `unwrap` on the server for fields you know should or should not be present after an error check.

Or as an industry we could standardize around a message format that isn't JSON. Just sayin...

Subscribe to BenIsOnTheInternet

Sign up now to get access to the library of members-only issues.
Jamie Larson
Subscribe