Zig, resources, values, and destructors

2024-01-10 · eleven minute read · c++, programming-languages, rust, vale, zig

Classical Languages

In the vast world of programming, there are a lot of things that follow the same pattern: allocate a resource, then, free. Initialize, deinitialize. Create, destroy. Open, close.

Many languages approach this problem in different ways. Garbage-collected languages such as JavaScript, Lua, and Go just do it at runtime™, resulting in real computer science™ such as allocating ten gigabytes of ballast so garbage collection never runs. Peripherally related are reference-counted languages, such as Swift and (sometimes) C++ and (kind of) Python, which tend to make resource lifetimes more transparent than in garbage collected languages.

Rust’s values aren’t tied to any specific memory address. The Rust compiler tracks all variables, and if a variable goes out-of-scope that hasn’t been moved, its drop method is called.

C has no such extravagance. For the specific case of whole-program resources, it could be argued that C has atexit, but apart from this—it’s all on the programmer. Zig is an improvement on this state of affairs, but only just: defer runs some code when its surrounding scope is left, whether by running till the end or early return, still requiring the programmer to track their resources carefully. errdefer is a welcome addition, allowing resources to be deinitialized iff the scope is left by early return, and if that return was of the error variant in that function’s error union.

Values in C++ are inextricably linked to their memory address. Pass a std::vector into a function? That’ll be a deep copy. You should’ve used &const std::vector instead, or better yet, if you’re running C++20: std::span.

When a local variable leaves scope, its destructor is automatically called. Unless NRVO kicks in, returning one of these will be a deep-copy followed by a destructor. Fortunately, C++ has a way around this: moves. Unlike Rust’s moves, these must be implemented for each type, instead of letting the compiler magic its way into compile times larger than an American football field. This comes with trade-offs, and the most user-facing of these is the moved-from state.

std::vector<int> a = {1, 2};
std::vector<int> b = a; // copy assignment
a[0] = 3;
std::vector<int> c = &&a; // move assignment
// a: {}
// b: {1, 2}
// c: {3, 2}

After the copy into b, a is unaffected. But the move from a into c transitions a into the moved-from state. For a vector, that’s an empty vector. This move can be extremely cheap because c can reuse a’s old backing store.

But the moved-from state is, again, not a compiler thing. At the end of this scope, a’s destructor will still be run (if the compiler can’t optimize it out, that is). std::vector specifically doesn’t need any extra code in its destructor to deal with this state, because it’s just the “empty vector” state.

Destruction is also linked to the language’s concept of panics, or exceptions. In Rust, panicking through stack frames will drop local variables as the stack is unwound; C++ will do the same for exceptions. Zig, comfortably staying in its systems-programming lane, simply does not have the concept of a panic: if an integer overflows, and the code is compiled with runtime safety on, the entire process is dismantled bit-by-bit with no escape hatch. C is a little better in this regard, because in C integer overflow is generally silent, causing undefined behaviour that optimizes out bounds checks, leading to an out-of-bounds write that is the root cause of a multi-million-dollar zero-day exploit no I’m not angry why would you think that.

Vale

Vale, by Evan “Verdagon” Ovadia (that’s four separate words with a V and this is unironically awesome) uses the term Higher RAII to describe its way of handling destructors: instead of drop being the One True Way to destroy any given value (like in Rust), it’s only the default; and if it’s not implemented for a type, local variables of that type can’t just be ignored when a scope ends: you have to decide which destructor must be called.

Higher RAII is basically linear typing, but unlike linear typing, my did-not-study-type-theory brain can understand it.

Separately, instead of member functions Vale uses universal function call syntax, or as the cool kids call it UFCS. Again, this is super cool, not least because it means method implementations generally need only one tabstop, versus for example Rust which needs two—one for the impl block, and then one for the function body.

While this is interesting for Vale because it allows for multiple destructors with disjoint behaviors and it allows passing parameters to destructors, it’s mainly interesting to me and my systems programming languages, like Zig, for the latter reason.

You see, in C++ and Rust, destructors can never take extra arguments. This interacts poorly with (at least) one awesome Zig feature: allocators-as-values: in Zig, there’s no global allocator. (Well, there is—std.heap.c_allocator, or in a pinch, std.heap.page_allocator—but common courtesy prevents their use.) Instead, any type that allocates memory must be passed an allocator from above, either in the constructor or at every method (to properly foist all responsibility).

The second class of types, like std.ArrayListUnmanaged, immediately preclude an implicit destructor: it’d have to reach out into an outer scope and somehow pluck out the correct allocator!

This is one place Higher RAII would shine: what if the programmer were required to deinitialize every value? (The compiler verifying that you’re using the right allocator is, unfortunately, beyond the technology of our time.)

Now picture this: you’ve ported your own standard library to make sure anyone using it can set no foot wrong. Everything is great! And then the issue comes in (incidentally, GitHub assigns it the ID #1 because your programming language is just that good): “How do I store a homogenous, resizable, array?”

Oh, you haven’t gotten around to implementing collections yet. Obviously, the next move is to port Zig’s std.ArrayListUnmanaged API to your own language. This is pretty easy because, incidentally, your language’s syntax is identical to Zig’s.

Just a few copy-pastes later and the whole thing is done! …well, almost. There’s one method you’ve yet to implement, and you’ve certainly saved the best for last: drop.

fn drop(self: *Self,

The nexus of destructors, RAII, and language magic

“Well, that’s easy,” you think. “Let’s just copy this, too, from Zig!”

That’d look something like this:

fn drop(self: Self, allocator: Allocator) void {
	allocator.free(self.buffer());
}

Well, this works in most cases, except the case where the ArrayList’s individual items are resources. In that case, this’ll just leak everything, and that’s no good.

The next language you approach for divine inspiration is Rust, and the drop implementation for Rust’s Vec (analogous to std::vector or std.ArrayList) just drops every element in series, like this:

fn drop(self: Self, allocator: Allocator) void {
	for (self.items) |item| {
		item.drop();
	}

	allocator.free(self.buffer());
}

But this only works if every type has a single destructor that takes no parameters! And sure, this could work, even with custom allocators. Rust and C++ both have custom allocators, but they both chose to make the allocator a compile-time generic parameter, instead of a runtime value.

This doesn’t scale well to anything more than memory. The Vulkan API is a great example for this: the VkImage type. This is basically pixel data, and beyond the scope of this article.

To create a VkImage, you need lots of ingredients:

The VkDevice the image should be created for;
the vkCreateImage function pointer, because Vulkan allows these functions to change per-device;
and the allocation callbacks, in case any userland memory (as opposed to GPU memory) needs to be allocated.

To destroy a VkImage, you also need lots of ingredients:

The VkDevice the image was created for;
the vkDestroyImage function pointer;
and a compatible set of allocation callbacks.

And because Vulkan has had actual people decide what the API should look like, pretty much every single resource is created the same way, unlike OpenGL.

But there’s one problem: how do you represent this kind of destruction in languages that don’t have Higher RAII? Vulkan’s C++ API has made the decision to store three extra pointers alongside every resource, quadrupling their size. And sure, if you want to be realistic (sigh) this really isn’t a big deal. Memory just grows on trees nowadays, and this is probably a fraction of the memory that vkCreateImage would have to allocate anyway.

But I don’t want to be realistic, goshfrickingdarnit. I want to fix things, and this state of affairs is so obviously unacceptable. The next next idea you have is this: if “dropping an element” absolutely must be arbitrary code, what better way to pass arbitrary code than a closure?

fn drop(self: Self, allocator: Allocator, drop: fn(T) void) void {
	for (self.items) |item| {
		drop(item);
	}

	allocator.free(self.buffer());
}

See, the thing is, this actually works! Let’s use it in some code that is vaguely database-adjacent-shaped!

transactions.drop(allocator) |txn| {
	txn.commit(db);
};

Now this is the style of code that I genuinely love. But there’s one tiny roadblock: ZIG DOES NOT HAVE CLOSUcough

Sorry, my throat is in massive pain and talking hurts now. Let me repeat that at a normal volume: Zig does not have closures.

Anyway, here’s the idea that the rest of the article has been foreshadowing:

fn drain(self: Self, allocator: Allocator) ?(Self, T) {
	if (self.items.len == 0) {
		allocator.free(self.buffer());
		return null;
	}

	const last = self.pop();
	return (self, last);
}

This is more than a little hacky, and it just punts most of its complexity off onto the caller. And that’s with extensions to the language in the pattern matching and tuple departments, although to be fair I just really want every language under the sun to have pattern matching and tuples.

while (transactions.drain(allocator)) |(set transactions, txn)| {
	txn.commit(db);
}

This is stealing yet another 2050s innovation from Verdagon: the set keyword. Like most things from Vale, this is completely magical and conjures an astonishing amount of orthogonality out of thin air.

Admittedly, this does drop the list in reverse order, but that’s easily solvable with a separate type which I won’t use again because it’s not quite as ergonomic:

Destructor-as-a-structure

fn drain(self: Self, allocator: Allocator) Drain(T) {
	return .{
		.items = self.items,
		.buffer = self.buffer(),

		.allocator = allocator,
	};
}

pub fn Drain(comptime T: type) type {
	return struct {
		const Self = @This();

		items: []T,
		buffer: []T,

		allocator: Allocator,

		pub fn pop(self: Self) ?(Self, T) {
			if (self.items.len == 0) {
				self.allocator.free(self.buffer);
				return null;
			}

			const item = @takeOwnership(self.items[0]);

			self.items = self.items[1..];

			return (self, item);
		}
	};
}

It is more code to call this kind of destructor, but not much more code:

var drain = transactions.drain(allocator);
while (drain.pop()) |(set drain, txn)| {
	txn.commit();
}

And with move types added to the language, adding defer-as-destruct is reasonably easy, as you can see in my previous blog post about this that you shouldn’t read because it’s awful.

After simply adding move types to the language (Rust-style, of course) and making it an error to do anything that could panic (calling functions, pointer casts, basic arithmetic) without registering a defer-destructor for every local variable and allowing destructors to be tagged as noexcept and pattern matching and new syntactic sugar, panic safety is basically free. But at least we didn’t add closures, and that is a trade I’ll take any day.

Here’s some demo code:

const f = File.open("hey");
defer (f) f.close();

var lines = std.ArrayListUnmanaged([]u8) {};
defer (lines) {
	while (lines.drain(allocator)) |(set lines, line)| {
		allocator.free(line);
	}
}

while (try f.reader().readUntilDelimiterOrEofAlloc(allocator, '\n', 64 * 1024)) |line| {
	defer (line) allocator.free(line);

	try lines.append(allocator, line);
}

Don’t try this terribly suboptimal code at home, though: every line is allocated separately, and the file’s being read unbuffered, byte-by-byte.

With current Zig, if lines were to escape this snippet, the defer would have to be rewritten into an errdefer. If you’re keeping score at home, under my proposal (which would double the size of the Zig spec) it’d instead require zero modifications, which (in my humble opinion) is pretty cool!

And y’know what? Properly freeing this list is less code than reading a file line-by-line (even if it is more than current Zig requires), and at the end of the day, I think that’s enough.

But at least it’s leak-free under errors, and if Zig ever adds the concept of panicking and unwinding, this’ll be safe under that, too.