Last month I gave a talk at rustdelhi where I showcased my sdk-esque crate dmrc-rs. The crate pre-computes all the possible journeys in the delhi metro network during compile time and then embeds the entire tree into the user’s binary itself.
You can give it a watch on youtube.
Due to the time-constraint I heavily simplified the working of the embeddings. This post dives deep into the challenges one would face and the plumbing needed to get this construction working. It talks about how to achieve it, and the thought process behind every decision.
You can check out the example code over at github.com/keogami/keogami.
Getting Started
To keep things simple, we will start off with a simple structure with a dummy “generation” function. Assume that the following structure is hard to compute.
pub struct EmbeddedData { pub simple_string: String, pub simple_number: u32,}
impl Default for EmbeddedData { fn default() -> Self { // super duper expensive operation Self { simple_string: "Our embedded string".into(), simple_number: 0xEFBEADDE, } }}Note
The funny looking number 0xEFBEADDE will make sense later.
The very first piece of the puzzle is figuring out how to perform the computation at compile time.
build.rs
Rust provides us with a really handy tool to run arbitrary code during the
cargo build. This is quite frequently used for code generation by crates like
prost. Or built
which uses it for grabbing cargo metadata during build. This serves our purpose
well.
You can read more about it in the cargo
reference. It
works by creating build.rs file with a fn main() that can do everything a
normal rust program can do. Cargo also exposes useful env vars like OUT_DIR,
where we can store files which can later be picked up by our actual main.rs.
OUT_DIR is a directory path in the target/ dir that is shared between the
build script and main.rs.
fn main() { let data_to_embed = EmbeddedData::default(); let bytes = serialize(data_to_embed); let dest = create_path_using("OUT_DIR");
fs::write(dest, bytes).unwrap();}The second piece is to figure out how to include the generated bytes into our
main.rs.
include_bytes!()
Rust also provides us with a macro called include_bytes!, which does exactly
as it says on the tin. It includes the bytes from a specified file into the
source code. This, when used in conjunction with static, allows us to include
files into our binary.
static OUR_DATA: [u8; include_bytes!(..).len()] = include_bytes!(..);// we will fill in the dots later.
fn main() { // line intentionally blank}Then we can just deserialize and be done.
static OUR_DATA: [u8; include_bytes!(..).len()] = include_bytes!(..);// we will fill in the dots later.
fn main() { let data: EmbeddedData = deserialize(OUR_DATA).unwrap();}However, we quickly encounter a hurdle with this approach. We can not share the
type EmbeddedData as easily.
Circular Dependency
To understand the issue, we must understand the relationship between build.rs
and main.rs.
Tip
I encourage you to try build the above mentioned construction and check out the actual error spat out by the rust compiler
During the build, the build script is run before compilation of
main.rs. If we define the struct in our main.rs, then build.rs
will need to depend on main.rs, creating what’s called a Circular
Dependency.
+---------+ +----------+| main.rs | ----[depends on the serialized data]----> | build.rs |+---------+ +----------+ ^ | +----------[depends on the struct definition]----------+Note how the dependencies form a circle.
To resolve this cycle, we can define the struct in build.rs and then use it in
main.rs. This does fix the issue, but there’s no clean way of doing that, and
forces us to use hacky stuff like accessing the AST for our struct then writing
it down as text in OUT_DIR. Essentially doing code generation for a static bit
of code. disgusting.
Shared crate
The solution I have opted for is to create a shared crate, and have both
build.rs and our main package depend on it. Then defining the struct in the
shared lib.rs.
project├── build.rs├── Cargo.toml├── shared│ ├── Cargo.toml│ └── src│ └── lib.rs└── src └── main.rsSetting up our dependencies as such.
[package]name = "project"# ...snipped...
[dependencies]shared = { path = "./shared" }# ...snipped...
[build-dependencies]shared = { path = "./shared" }# ...snipped...And with that the cyclic dependency is resolved by having a common denominator. We can now turn our attention to the holes we left, make things more concrete.
Generating the path to store our serialized data
It is quite straightforward for the build script.
use shared::EmbeddedData;
fn main() { // ...snipped...
// recall out_dir is where can keep our generated files let output_path = std::env::var("OUT_DIR").unwrap(); let output_path = Path::new(&output_path).join("serialized_data.dat");
// ...snipped...}Here, serialized_data.dat can be named anything. We just need to update our
main.rs accordingly.
static OUR_DATA: [u8; include_bytes!(..).len()] = include_bytes!(..);static OUR_DATA: [u8; include_bytes!(concat!(env!("OUT_DIR"), "/", "serialized_data.dat")).len()] = include_bytes!(concat!(env!("OUT_DIR"), "/", "serialized_data.dat"));
fn main() { // ...snipped...}Oh! uh- now that’s a bit complicated. Let’s break this down a bit. Starting from the right hand side.
include_bytes!("path/to/file");// we know what this is
concat!("literally.", "anything.", "here");// this macro concatenates all its inputs.// the above will resolve to "literally.anything.here" as a single string
env!("OUT_DIR");// this macro grabs the env variable during compile time, as a single stringPutting all of these together, we are doing the same thing we did in the build script, but at compile time.
On the left hand side, we are defining our static variable, which requires that we fully define the type. Additionally, we need to specify the exact length of the byte array.
static OUR_DATA: [u8; include_bytes!(..).len()];// To figure out the length, we simply include the bytes and then check the lengthAnd with this we are ready to actually start using OUR_DATA.
Side Quest: share the filename
One nitpick I have with the above approach is we need to specify
serialized_data.dat in two places, which goes against the DRY
principle.
This doesn’t affect the behavior. So if this doesn’t matter to you, feel free to skip to the next section.
Now, we can’t just create a variable and share it using our shared crate.
Because of course we can’t.
include_bytes! can’t accept anything but a string literal. This is because
during compilation, macros are expanded before the const evaluation which
includes statics. Moreover, include_bytes! is a special macro defined directly
in the compiler which expands to an array literal &[u8; N]. Therefore, it can
only accept string literals.
Bottom line is that we can’t just create a variable. So I have decided to use a macro instead. how fun.
Luckily, it’s a very simple macro.
#[macro_export]macro_rules! ARCHIVE_NAME { () => { // subtle foreshadow "archived_bytes.rkyv" };
($env_name:literal) => { concat!(env!($env_name), "/", ARCHIVE_NAME!()) };}
// USAGE
ARCHIVE_NAME!(); // expands to "archived_bytes.rkyv"ARCHIVE_NAME!("OUT_DIR"); // expands to what?Now it’s trivial to go back and update our use sites, and the compiler can have our back and make sure we don’t make a typo.
Why should I Deserialize?
So far we have taken the serialization format for granted. Using serde, be
it json or bincode, requires that we deserialize every time we boot our
application. This is unnecessary because we knew how our struct must live in
memory, but then threw away all that information during serialization.
Additionally, most implementations of deserialization perform allocations,
essentially copying data and bloating the memory.
We can completely avoid the deserialization step by using the best zero-copy deserialization crate offered by the rust ecosystem, rkyv.
Important
It is not required that we use rkyv format. We can very well use json or bincode; it is simply undesirable. I suggest reading the rkyv book to make your own decision.
Rkyv is a crate that allows us to directly access the structure in our serialized buffer without a deserialization step. This solves both of the problems mentioned above.
It does so by using relative pointers and forcing a very strict page alignment.
The usage is rather simple. We derive the archive and serialize trait. And it
produces a new struct Archived*, which allows zero-copy access.
use rkyv::{Archive, Serialize};
#[derive(Archive, Serialize, Debug)]#[rkyv(derive(Debug))]pub struct EmbeddedData { // ...snipped...}Use rkyv for serialization during build.
use shared::EmbeddedData;use rkyv::{to_bytes, rancor};
fn main() { let data_to_embed = EmbeddedData::default(); let bytes = serialize(data_to_embed); // to_bytes is generic over the error strategy; rancor::Error is the simplest choice let archived_bytes = to_bytes::<rancor::Error>(&data_to_embed).unwrap();
// ...snipped...}Note
rancor is an alternative to thiserror with some interesting benefits.
rkyv has decided to go with rancor, but it is not much relevant for us.
Then finally use it in our program.
Note
On safety: rkyv provides a method to validate the buffer. But because the buffer is part of our binary itself, we can be quite sure that it is valid and safe. If you can’t trust the contents of your binary, then you have much bigger problems to solve.
static ARCHIVED_DATA: [u8; include_bytes!(concat!(env!("OUT_DIR"), "/", "serialized_data.dat")).len()] = include_bytes!(concat!(env!("OUT_DIR"), "/", "serialized_data.dat"));
fn main() { let data: EmbeddedData = deserialize(ARCHIVED_DATA).unwrap(); let data: &'static ArchivedEmbeddedData = unsafe { rkyv::access_unchecked(&ARCHIVED_DATA) };}The ArchivedEmbeddedData type is generated by the Archive derive macro and
mirrors the fields of our original struct, so we can access them directly.
println!("{}", data.simple_string);println!("{:#X}", data.simple_number);And with that we are ready to use our data! Or, are we? vsauce theme plays
A very subtle gotcha
As mentioned above, rkyv forces a very strict page alignment policy. Or rather,
the machine forces the page alignment. Rkyv handles this for us when we use
the derive macro. However, a byte array [u8; N] has an alignment of 1byte. So
when we include_bytes! our archive, the compiler is free to place it anywhere
within our binary. Which means there’s a non-zero probability that our access
will be misaligned.
It’s one of those “it works sometimes but panics the other times” bug.
Luckily, the fix is quite simple. rkyv provides us with a struct Align, which
does exactly that. It instructs the compiler to use an alignment that rkyv
expects. All we need to do is wrap our buffer.
static ARCHIVED_DATA: [u8; include_bytes!(concat!(env!("OUT_DIR"), "/", "serialized_data.dat")).len()] = include_bytes!(concat!(env!("OUT_DIR"), "/", "serialized_data.dat"));static ARCHIVED_DATA: Align<[u8; include_bytes!(ARCHIVE_NAME!("OUT_DIR")).len()]> = Align(*include_bytes!(ARCHIVE_NAME!("OUT_DIR")));And with that, our construction is done and we can start building whatever it was we started with. Go check out the example code on my github for a working example.
But where’s the tree
Good question. You can use whatever structure that can be serialized with rkyv,
this includes not only the ubiquitous Box, Vec, HashMap, BTreeMap; but also
shared references like Rc, Arc.
I wanted a catchy title, but I’m too tired to write a tree structure. So instead I leave you with some tools to help with debugging and development.
Inspecting the output
To check out the OUT_DIR, you can use cargo-outdir to grab the file path. Of course you can also manually scour the target dir.
cargo outdir --no-namesRunning `cargo check`: Compiling rkyv-embedding v0.1.0 (/home/keogami/code/keogami/examples/rkyv-embedding) Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.03s/home/keogami/code/keogami/examples/rkyv-embedding/target/debug/build/rkyv-embedding-ab81c85a4461c3c7/outThen use hexdump to check out the contents.
hexdump -C /home/keogami/code/keogami/examples/rkyv-embedding/target/debug/build/rkyv-embedding-ab81c85a4461c3c7/out/archived_bytes.rkyv00000000 4f 75 72 20 65 6d 62 65 64 64 65 64 20 73 74 72 |Our embedded str|00000010 69 6e 67 00 93 00 00 00 ec ff ff ff de ad be ef |ing.............|00000020Note
Hey look! the weird number turned into deadbeef as the last four bytes of second line. neat :D
Alternatively, you can read the symbol from the generated binary. This needs debuginfo enabled.
nm -C -S target/debug/rkyv-embedding | rg ARCHIVED_DATA0000000000008ce0 0000000000000020 r rkyv_embedding::ARCHIVED_DATAThis gives you the start address (0x8ce0) and the size of the buffer (0x20).
Knowing the size is very useful to check for bloat. Finally, the actual contents
can be printed using objdump. Here the stop address is start + size = 0x8ce0 + 0x20 = 0x8d00
objdump -s -j .rodata --start-address=0x8ce0 --stop-address=0x8d00 target/debug/rkyv-embeddingtarget/debug/rkyv-embedding: file format elf64-x86-64
Contents of section .rodata: 8ce0 4f757220 656d6265 64646564 20737472 Our embedded str 8cf0 696e6700 93000000 ecffffff deadbeef ing.............La fin.