Oxidizing argparsh

This post is a part of a series. Click here for the previous post.

argparsh has too much overhead for some usecases - in this post, I will examine why and use Rust to build a better solution.

python is slow. Here’s a comparison of python start time vs bash.

[I] (base) aneesh@earth:~/a/_/2024$ hyperfine -N 'python -c ""'
Benchmark 1: python -c ""
  Time (mean ± σ):      14.1 ms ±   4.9 ms    [User: 10.6 ms, System: 3.3 ms]
  Range (min … max):     5.9 ms …  28.3 ms    207 runs
 
[I] (base) aneesh@earth:~/a/_/2024$ hyperfine -N 'bash -c ""'
Benchmark 1: bash -c ""
  Time (mean ± σ):       2.8 ms ±   0.4 ms    [User: 0.8 ms, System: 1.9 ms]
  Range (min … max):     1.9 ms …   4.2 ms    744 runs

This shows that the overhead of just starting python is almost 7x of starting bash. Why is this a problem? Because the argument parser I built in the last post is written in python. This means that every invocation of argparsh will add (on average) at least 14ms to the script, before accounting for any time spent executing real logic within argparsh. Consider the example from the end of the previous post:

# Create a parser program
parser=$({
  argparsh new $0 -d "argparsh example" -e "bye!"
  argparsh add_arg "a" -- \
    --choices "['a', 'b', 'c']"\
    --help "single letter arg"
  argparsh add_arg -i --interval -- --type int --default 10
  argparsh add_arg -f -- --action store_true

  argparsh subparser_init --required true
  argparsh subparser_add foo
  argparsh subparser_add bar

  argparsh add_arg --subparser foo qux
  argparsh set_defaults --subparser foo --myarg foo

  argparsh add_arg --subparser bar baz
  argparsh set_defaults --subparser bar --myarg bar
})

There are 11 argparsh calls! This means that before any “real” script logic is executed we need to wait for python to initialize 11 times, and then create the argument parsing state. To quantify the impact of this, I wrote a small benchmark that simulates running a script that uses argparsh:

# run as PARSER=argparsh bench.sh - later in the post we will be use "other" parsers
parser=$({
  $PARSER new $0
  $PARSER new $0
  $PARSER new $0
  $PARSER new $0
  $PARSER new $0
  $PARSER new $0
  $PARSER new $0
  $PARSER new $0
  $PARSER new $0
  $PARSER new $0
})

echo $parser

Running this benchmark I see:

[I] (base) aneesh@earth:~/a/r/argparsh$ hyperfine 'env PARSER=argparsh bash bench.sh'
Benchmark 1: env PARSER=argparsh bash bench.sh
  Time (mean ± σ):     262.4 ms ±  11.1 ms    [User: 222.2 ms, System: 40.4 ms]
  Range (min … max):   253.0 ms … 283.4 ms    10 runs

Since we expect python to take around 140ms to just start in this benchmark, this means that 120ms is spent running the actual code.

Rewrite It In Rust

Rewrite It In Rust (or RIIR) is something of a meme in the software development community. As a sucker for memes - I hopped on the bandwagon and began rewriting my slow python script as a fast rust program.

In seriousness, beyond Rust’s reputation for being a more humane way to write high performance programs than C or C++, my prior experiences using PyO3 left a lasting impression on me. The python/Rust boundary is far more pleasant to work with than the C++/python combination (and I’m no stranger to working with codebases that mix C++ and python! I’ve been doing so professionally for a few years).

Working in Rust was very fun, and I think I would have only been marginally faster in C++. I think this is largely because of the awesome ecosystem of third party libraries. Being able to easily change what format I was serializing/deserializing to during development was very convenient.

Here’s how my rewrite performs:

[I] (base) aneesh@earth:~/a/r/argparsh$ hyperfine 'env PARSER=./target/release/argparsh bash bench.sh'
Benchmark 1: env PARSER=./target/release/argparsh bash bench.sh
  Time (mean ± σ):      68.9 ms ±   4.5 ms    [User: 16.8 ms, System: 53.0 ms]
  Range (min … max):    55.0 ms …  76.7 ms    43 runs

About 4x faster than the python version! Here’s some of the fun things I encountered during development:

Embedding code in PyO3

The final steps of assembling the parser and parsing the CLI args is still done in python. This is because python’s argparse allows for easy untyped access to all parsed values - something that clap and other Rust parsers don’t easily allow. I think there is a future where I could rewrite this to not rely on any python, but it requires more time than I wanted to spend on this project for this week. PyO3 made it super easy to embed parts of the original python code in my Rust program. I think that incrementally upgrading a codebase from python to Rust might be far easier than going from python to C++, or even C++ to Rust!

Finding a fast serialization format

My initial implementation used serde-qs to serialize the arguments. However, serde-qs will fail to serialize/deserialize structs where all fields are Options that are set to None. However, I realized that similar to the python implementation I could instead use any binary serialization format, then encode the bytes as a query string. I found this benchmark comparing a bunch of different serialization formats. I chose bitcode since it did well on the benchmark. Along the way I tried a few others though, and I like how easy it was to try different formats quickly. This was mainly because modifying the dependencies of the program was super easy - this would have been a nightmare to do in C++.

Flushing buffers

One of the issues I ran into was that my existing test suite was failing when running the new binary! In particular, in cases where the test suite was supposed to detect argparsh emitting a exit command after invalid argument were parsed, no output was detected on stdout. After poking around a bit I was able to determine that the issue was that the python interpreter was buffering stdout/stderr, but was unable to flush the buffer before the program exited. I tried to find a PyO3 equivalent to python -u, but in the end, I solved this by using explicit calls to stdout.flush().

Conclusions

Here’s some of my overall thoughts from this process.

Python startup time is atrocious.
Python isn’t that slow for short programs
Python is great for prototyping but there isn’t an easy way to go from python program to a fast native program.
- Existing python compilers focus solely on numerical/data science workloads, but not general programs
Rust’s strong ecosystem makes it a good choice for rewriting
Command line interfaces are interfaces - if the interface is well designed it’s not too hard to change the implementation. (I’ve always been vaguely aware of this idea, but gained a greater appreciation for it during this project)

Written on November 18, 2024