Planting a tree in your rust binary

Last month I gave a talk at rustdelhi where I showcased my sdk-esque crate dmrc-rs. The crate pre-computes all the possible journeys in the delhi metro network during compile time and then embeds the entire tree into the user’s binary itself.

You can give it a watch on youtube.

Due to the time-constraint I heavily simplified the working of the embeddings. This post dives deep into the challenges one would face and the plumbing needed to get this construction working. It talks about how to achieve it, and the thought process behind every decision.

You can check out the example code over at github.com/keogami/keogami.

Getting Started

To keep things simple, we will start off with a simple structure with a dummy “generation” function. Assume that the following structure is hard to compute.

pub struct EmbeddedData {
    pub simple_string: String,
    pub simple_number: u32,
}

impl Default for EmbeddedData {
    fn default() -> Self {
        // super duper expensive operation
        Self {
            simple_string: "Our embedded string".into(),
            simple_number: 0xEFBEADDE,
        }
    }
}

Note

The funny looking number 0xEFBEADDE will make sense later.

The very first piece of the puzzle is figuring out how to perform the computation at compile time.

`build.rs`

Rust provides us with a really handy tool to run arbitrary code during the cargo build. This is quite frequently used for code generation by crates like prost. Or built which uses it for grabbing cargo metadata during build. This serves our purpose well.

You can read more about it in the cargo reference. It works by creating build.rs file with a fn main() that can do everything a normal rust program can do. Cargo also exposes useful env vars like OUT_DIR, where we can store files which can later be picked up by our actual main.rs.

OUT_DIR is a directory path in the target/ dir that is shared between the build script and main.rs.

fn main() {
    let data_to_embed = EmbeddedData::default();
    let bytes = serialize(data_to_embed);
    let dest = create_path_using("OUT_DIR");

    fs::write(dest, bytes).unwrap();
}

The second piece is to figure out how to include the generated bytes into our main.rs.

`include_bytes!()`

Rust also provides us with a macro called include_bytes!, which does exactly as it says on the tin. It includes the bytes from a specified file into the source code. This, when used in conjunction with static, allows us to include files into our binary.

static OUR_DATA: [u8; include_bytes!(..).len()] = include_bytes!(..);
// we will fill in the dots later.

fn main() {
    // line intentionally blank
}

Then we can just deserialize and be done.

static OUR_DATA: [u8; include_bytes!(..).len()] = include_bytes!(..);
// we will fill in the dots later.

fn main() {
    let data: EmbeddedData = deserialize(OUR_DATA).unwrap();
}

However, we quickly encounter a hurdle with this approach. We can not share the type EmbeddedData as easily.

Circular Dependency

To understand the issue, we must understand the relationship between build.rs and main.rs.

Tip

I encourage you to try build the above mentioned construction and check out the actual error spat out by the rust compiler

During the build, the build script is run before compilation of main.rs. If we define the struct in our main.rs, then build.rs will need to depend on main.rs, creating what’s called a Circular Dependency.

+---------+                                           +----------+
| main.rs | ----[depends on the serialized data]----> | build.rs |
+---------+                                           +----------+
    ^                                                      |
    +----------[depends on the struct definition]----------+

Note how the dependencies form a circle.

To resolve this cycle, we can define the struct in build.rs and then use it in main.rs. This does fix the issue, but there’s no clean way of doing that, and forces us to use hacky stuff like accessing the AST for our struct then writing it down as text in OUT_DIR. Essentially doing code generation for a static bit of code. disgusting.

Shared crate

The solution I have opted for is to create a shared crate, and have both build.rs and our main package depend on it. Then defining the struct in the shared lib.rs.

project
├── build.rs
├── Cargo.toml
├── shared
│   ├── Cargo.toml
│   └── src
│       └── lib.rs
└── src
    └── main.rs

Setting up our dependencies as such.

[package]
name = "project"
# ...snipped...

[dependencies]
shared = { path = "./shared" }
# ...snipped...

[build-dependencies]
shared = { path = "./shared" }
# ...snipped...

And with that the cyclic dependency is resolved by having a common denominator. We can now turn our attention to the holes we left, make things more concrete.

Generating the path to store our serialized data

It is quite straightforward for the build script.

use shared::EmbeddedData;

fn main() {
    // ...snipped...

    // recall out_dir is where can keep our generated files
    let output_path = std::env::var("OUT_DIR").unwrap();
    let output_path = Path::new(&output_path).join("serialized_data.dat");

    // ...snipped...
}

Here, serialized_data.dat can be named anything. We just need to update our main.rs accordingly.

static OUR_DATA: [u8; include_bytes!(..).len()] = include_bytes!(..);
static OUR_DATA: [u8; include_bytes!(concat!(env!("OUT_DIR"), "/", "serialized_data.dat")).len()] =
    include_bytes!(concat!(env!("OUT_DIR"), "/", "serialized_data.dat"));

fn main() {
    // ...snipped...
}

Oh! uh- now that’s a bit complicated. Let’s break this down a bit. Starting from the right hand side.

include_bytes!("path/to/file");
// we know what this is

concat!("literally.", "anything.", "here");
// this macro concatenates all its inputs.
// the above will resolve to "literally.anything.here" as a single string

env!("OUT_DIR");
// this macro grabs the env variable during compile time, as a single string

Putting all of these together, we are doing the same thing we did in the build script, but at compile time.

On the left hand side, we are defining our static variable, which requires that we fully define the type. Additionally, we need to specify the exact length of the byte array.

static OUR_DATA: [u8; include_bytes!(..).len()];
// To figure out the length, we simply include the bytes and then check the length

And with this we are ready to actually start using OUR_DATA.

One nitpick I have with the above approach is we need to specify serialized_data.dat in two places, which goes against the DRY principle.

This doesn’t affect the behavior. So if this doesn’t matter to you, feel free to skip to the next section.

Now, we can’t just create a variable and share it using our shared crate. Because of course we can’t.

include_bytes! can’t accept anything but a string literal. This is because during compilation, macros are expanded before the const evaluation which includes statics. Moreover, include_bytes! is a special macro defined directly in the compiler which expands to an array literal &[u8; N]. Therefore, it can only accept string literals.

Bottom line is that we can’t just create a variable. So I have decided to use a macro instead. how fun.

Luckily, it’s a very simple macro.

#[macro_export]
macro_rules! ARCHIVE_NAME {
    () => {
        // subtle foreshadow
        "archived_bytes.rkyv"
    };

    ($env_name:literal) => {
        concat!(env!($env_name), "/", ARCHIVE_NAME!())
    };
}

// USAGE

ARCHIVE_NAME!(); // expands to "archived_bytes.rkyv"
ARCHIVE_NAME!("OUT_DIR"); // expands to what?

Now it’s trivial to go back and update our use sites, and the compiler can have our back and make sure we don’t make a typo.

Why should I Deserialize?

So far we have taken the serialization format for granted. Using serde, be it json or bincode, requires that we deserialize every time we boot our application. This is unnecessary because we knew how our struct must live in memory, but then threw away all that information during serialization. Additionally, most implementations of deserialization perform allocations, essentially copying data and bloating the memory.

We can completely avoid the deserialization step by using the best zero-copy deserialization crate offered by the rust ecosystem, rkyv.

Important

It is not required that we use rkyv format. We can very well use json or bincode; it is simply undesirable. I suggest reading the rkyv book to make your own decision.

Rkyv is a crate that allows us to directly access the structure in our serialized buffer without a deserialization step. This solves both of the problems mentioned above.

It does so by using relative pointers and forcing a very strict page alignment.

The usage is rather simple. We derive the archive and serialize trait. And it produces a new struct Archived*, which allows zero-copy access.

use rkyv::{Archive, Serialize};

#[derive(Archive, Serialize, Debug)]
#[rkyv(derive(Debug))]
pub struct EmbeddedData {
    // ...snipped...
}

Use rkyv for serialization during build.

use shared::EmbeddedData;
use rkyv::{to_bytes, rancor};

fn main() {
    let data_to_embed = EmbeddedData::default();
    let bytes = serialize(data_to_embed);
    // to_bytes is generic over the error strategy; rancor::Error is the simplest choice
    let archived_bytes =
        to_bytes::<rancor::Error>(&data_to_embed).unwrap();

    // ...snipped...
}

Note

rancor is an alternative to thiserror with some interesting benefits. rkyv has decided to go with rancor, but it is not much relevant for us.

Then finally use it in our program.

Note

On safety: rkyv provides a method to validate the buffer. But because the buffer is part of our binary itself, we can be quite sure that it is valid and safe. If you can’t trust the contents of your binary, then you have much bigger problems to solve.

static ARCHIVED_DATA: [u8; include_bytes!(concat!(env!("OUT_DIR"), "/", "serialized_data.dat")).len()] =
    include_bytes!(concat!(env!("OUT_DIR"), "/", "serialized_data.dat"));

fn main() {
    let data: EmbeddedData = deserialize(ARCHIVED_DATA).unwrap();
    let data: &'static ArchivedEmbeddedData = unsafe {
        rkyv::access_unchecked(&ARCHIVED_DATA)
    };
}

The ArchivedEmbeddedData type is generated by the Archive derive macro and mirrors the fields of our original struct, so we can access them directly.

println!("{}", data.simple_string);
println!("{:#X}", data.simple_number);

And with that we are ready to use our data! Or, are we? vsauce theme plays

A very subtle gotcha

As mentioned above, rkyv forces a very strict page alignment policy. Or rather, the machine forces the page alignment. Rkyv handles this for us when we use the derive macro. However, a byte array [u8; N] has an alignment of 1byte. So when we include_bytes! our archive, the compiler is free to place it anywhere within our binary. Which means there’s a non-zero probability that our access will be misaligned.

It’s one of those “it works sometimes but panics the other times” bug.

Luckily, the fix is quite simple. rkyv provides us with a struct Align, which does exactly that. It instructs the compiler to use an alignment that rkyv expects. All we need to do is wrap our buffer.

static ARCHIVED_DATA: [u8; include_bytes!(concat!(env!("OUT_DIR"), "/", "serialized_data.dat")).len()] =
    include_bytes!(concat!(env!("OUT_DIR"), "/", "serialized_data.dat"));
static ARCHIVED_DATA: Align<[u8; include_bytes!(ARCHIVE_NAME!("OUT_DIR")).len()]> =
    Align(*include_bytes!(ARCHIVE_NAME!("OUT_DIR")));

And with that, our construction is done and we can start building whatever it was we started with. Go check out the example code on my github for a working example.

But where’s the tree

Good question. You can use whatever structure that can be serialized with rkyv, this includes not only the ubiquitous Box, Vec, HashMap, BTreeMap; but also shared references like Rc, Arc.

I wanted a catchy title, but I’m too tired to write a tree structure. So instead I leave you with some tools to help with debugging and development.

Inspecting the output

To check out the OUT_DIR, you can use cargo-outdir to grab the file path. Of course you can also manually scour the target dir.

cargo outdir --no-names

Running `cargo check`:
   Compiling rkyv-embedding v0.1.0 (/home/keogami/code/keogami/examples/rkyv-embedding)
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.03s
/home/keogami/code/keogami/examples/rkyv-embedding/target/debug/build/rkyv-embedding-ab81c85a4461c3c7/out

Then use hexdump to check out the contents.

hexdump -C /home/keogami/code/keogami/examples/rkyv-embedding/target/debug/build/rkyv-embedding-ab81c85a4461c3c7/out/archived_bytes.rkyv

00000000  4f 75 72 20 65 6d 62 65  64 64 65 64 20 73 74 72  |Our embedded str|
00000010  69 6e 67 00 93 00 00 00  ec ff ff ff de ad be ef  |ing.............|
00000020

Note

Hey look! the weird number turned into deadbeef as the last four bytes of second line. neat :D

Alternatively, you can read the symbol from the generated binary. This needs debuginfo enabled.

nm -C -S target/debug/rkyv-embedding | rg ARCHIVED_DATA

0000000000008ce0 0000000000000020 r rkyv_embedding::ARCHIVED_DATA

This gives you the start address (0x8ce0) and the size of the buffer (0x20). Knowing the size is very useful to check for bloat. Finally, the actual contents can be printed using objdump. Here the stop address is start + size = 0x8ce0 + 0x20 = 0x8d00

objdump -s -j .rodata --start-address=0x8ce0 --stop-address=0x8d00 target/debug/rkyv-embedding

target/debug/rkyv-embedding:     file format elf64-x86-64

Contents of section .rodata:
 8ce0 4f757220 656d6265 64646564 20737472  Our embedded str
 8cf0 696e6700 93000000 ecffffff deadbeef  ing.............

La fin.

Getting Started

build.rs

include_bytes!()

Circular Dependency

Shared crate

Generating the path to store our serialized data

Side Quest: share the filename

Why should I Deserialize?

A very subtle gotcha

But where’s the tree

Inspecting the output

`build.rs`

`include_bytes!()`