Literate Programming done quick
· read · blogging, literate-programming, rust
Sometimes, I’d like to make a blog post that is the canonical source code of a program that I want to showcase. Like this one that I just made up:
FizzBuzz is a classic interview question. The premise is simple: iterate from 1 to 100. For each number, if it's divisible by three, print "Fizz"; or divisible by 5, print "Buzz". If it's divisble by both three and five, print "FizzBuzz". Otherwise, print the number itself.
This is such a simple problem that it's actually well-suited for the C programming language! Let's do it.
```c file=fizzbuzz.c
#include <stdbool.h>
#include <stdio.h>
void print_number(int number);
int main(void) {
for (int i = 1; i <= 100; i++) {
print_number(i);
}
}
```
`print_number` is possibly the simplest C function you could possibly write:
```c file=fizzbuzz.c
bool divisible_by_three(int number);
bool divisible_by_five(int number);
void print_number(int number) {
int is_divisible_by_three = divisible_by_three(number);
int is_divisible_by_five = divisible_by_five(number);
printf(
"%d\n\x00\x00\x00"
"Fizz\n\x00"
"Buzz\n\x00"
"FizzBuzz\n\x00"
+ is_divisible_by_three * 6
+ is_divisible_by_five * 12,
number
);
}
```
Luckily, checking for divisibility by three is super easy in C, because it's got the modulo operator built-in.
```c file=fizzbuzz.c
bool divisible_by_three(int number) {
static int a = 0;
a = (a * 2025 + 4) % 28;
return !a;
}
```
Unfortunately, C doesn't have generics, so we have to code an additional function for divisibility by five:
```c file=fizzbuzz.c
bool divisible_by_five(int number) {
return "!\204\020B\b"[number % 40 / 8] & (1 << (number % 8));
}
```
Note that the associated file of each code block is stored in the most Unix-y way possible: as part of a space-delimited list of key-value pairs, where each pair is separated by an =
character.
(It’s worth noting that I had to patch my blog’s markdown parser to support this syntax.)
I don’t want to tear out these code blocks and put them into a file every time I actually use the program, so I’m going to write a program to do it for me! In Rust, of course—Rust tends to be really good at small scripts like this.
Cargo scripts, that is. This is a new (upcoming?) Cargo feature that supports automatically building and caching a single Rust file.
#! /usr/bin/env -S cargo +nightly -Zscript
---cargo
# This comment is three levels of parsing into this blog post!
package.edition = "2024"
---
#![feature(let_chains)]
use std::collections::HashMap;
use std::io::Result;
use std::path::Path;
When the wrong arguments are passed, we’ll want to show some help. Let’s write a function for that:
fn print_usage(argv0: &str) {
eprintln!("usage: {argv0} <FILE>");
eprintln!("{argv0} parses a markdown file's code blocks.");
eprintln!("All subfiles are created in a directory named <FILE>'s stem.");
eprintln!("All top-level code blocks whose info strings contain a space-separated `file=<SUBFILE>` will be concatenated into <SUBFILE>.");
}
In the interest of absolute simplicity (rule zero of getting things done™: as simple as possible, and then a little simpler), command line parsing is composed of two steps:
- Assert that there are two arguments, and then
- use the second argument as a file path.
fn main() -> Result<()> {
let args: Vec<_> = std::env::args().collect();
if args.len() != 2 {
print_usage(&args[0]);
return Ok(());
}
let markdown_path = Path::new(&args[1]);
let markdown_contents = std::fs::read_to_string(markdown_path)?;
let mut subfiles: HashMap<&str, String> = HashMap::new();
Parsing a(n imcomplete subset of) Markdown into its code blocks starts with parsing each line’s code fence, if it has one. Along with returning the code fence’s info string, I’ll return None instead of an empty info string to mark the semantic difference: lines with an info string don’t close code blocks.
fn parse_fence(line: &str) -> Option<(&str, Option<&str>)> {
let fence_char = match line.chars().next() {
Some('`') => '`',
Some('~') => '~',
_ => return None,
};
let fence_count = line
.chars()
.take_while(|&c| c == fence_char)
.count();
if fence_count < 3 {
return None;
}
// We're using `fence_count` as an index into a UTF-8 string, which is only valid if the fence's repeated character is one UTF-8 code unit long.
assert!(fence_char.len_utf8() == 1);
let (fence, info_string) = line.split_at(fence_count);
let info_string = Some(info_string.trim()).filter(|&s| !s.is_empty());
Some((fence, info_string))
}
Then we can go through the file line-by-line and add each line to its specified file.
struct ActiveCodeBlock<'a> {
closing_fence: &'a str,
file: Option<&'a str>,
}
let mut current_code_block: Option<ActiveCodeBlock> = None;
for line in markdown_contents.lines() {
if let Some(ActiveCodeBlock {
closing_fence,
file,
}) = current_code_block {
if let Some((fence, info_string)) = parse_fence(line)
&& fence.starts_with(closing_fence)
&& info_string.is_none() {
current_code_block = None;
continue;
}
let Some(file) = file else {
continue;
};
*subfiles.entry(file).or_default() += line;
*subfiles.entry(file).or_default() += "\n";
continue;
}
let Some((fence, info_string)) = parse_fence(line) else {
continue;
};
let mut keys = info_string.unwrap_or("").split_whitespace();
let _language = keys.next();
let keys_map = keys
.flat_map(|key| key.split_once('='))
.collect::<HashMap<_, _>>();
let file = keys_map.get("file").copied();
current_code_block = Some(ActiveCodeBlock {
closing_fence: fence,
file,
});
}
That’s parsing done! Now we can get to the really dangerous part: writing all of these files back onto disk.
To avoid any kind of path attack, let’s put them into a directory with the same name as the Markdown file and also ignore any directory components of the code block file paths.
But first a quick diversion: let’s not create the directory if there aren’t any code blocks.
if subfiles.is_empty() {
println!("no codeblocks found");
return Ok(());
}
// Remove a single file extension from `markdown_path`.
let directory = markdown_path.with_extension("");
match std::fs::create_dir(&directory) {
Err(err) if err.kind() == std::io::ErrorKind::AlreadyExists => (),
other => other?,
}
for (filename, contents) in subfiles {
let Some(filename) = std::path::Path::new(filename).file_name() else {
println!("{filename} is not a file");
continue;
};
let filepath = directory.join(filename);
println!("writing extracted file {filepath:?}");
std::fs::write(filepath, contents)?;
}
Finally, here’s the last thing that any Rust program, ever, does: return a unit Ok
.
Ok(())
}
There we go! We’ve got literate programming in one hundred and twenty-nine lines of code. It may be buggy, it may misparse Markdown files, but it sure ain’t slow.
From an informal test, this runs on a file with two million lines in about 240 milliseconds in release mode, adding up to about 230MiB/s.
As far as I can tell, the only way to run a Cargo script with optimizations is to prepend RUSTFLAGS="-C opt-level=3"
to its command line. It could probably be done in the shebang, as well, but I don’t even need optimizations: even in the dev
profile, Cargo still takes up the vast majority of the runtime on any reasonable Markdown file.