Building argparsh - a shell agnostic argument parser

Argument parsing in the shell doesn’t have to suck, so I made it not suck.


Writing a robust argument parser and good help/usage text is like eating healthy. There’s a few really disciplined people out there who do it every day, but most of don’t really pay attention to it until our habits catch up to us, and we find ourselves facing the consequences of our past actions. This usually comes to me in the form of picking up a script that I wrote weeks, months, or even years ago and not remembering what the arguments were. I have to then read the code and try to work around how the arguments are consumed and what each argument represented. It takes what should be an easy task, and turns it into an annoying chore. Just like with healthy food habits, I don’t think it’s fair to just blame people. Often, a bad diet is indicative of regional food supply issues more than personal responsibility, and in this analogy those issues are poor devtools.

So, I decided to build argparsh. It uses python’s argparse as the underlying implementation, which not only allows me to avoid reinventing the wheel, but also uses an interface that many people are already familiar with. I’ve been using argparsh in some other projects and I’m very happy with what it can do.

Why not use getopts?

There’s already a tool that claims to do what I want - getopts.

getopts is a longstanding utility for handling argument handling, but in my opinion, it has not aged gracefully. My main gripe is that options are not self documenting, positional arguments aren’t easily described, and it’s still incumbent upon the user to provide their own usage/help text and correct incorrect invocations robustly.

Designing a new stateful argument parser generator

I believe that the approach of getopts, with a DSL as an argument to the parsing program is flawed. Here’s some of my reasoning:

  • There’s little room for users to get feedback on whether or not their invocation is correct with the DSL being embedded as a script.
  • It’s hard to edit/compose existing parsers/fragments of parsers.

This lead me to explore other existing methods of argument parsing. What I found was two main approaches:

  • Function Annotations
  • Parser Objects

Annotations describes libraries like click which annotate functions to build some kind of transformation to treat command line arguments as arguments and invocations of functions. However, these kinds of libraries are deeply integrated with the language they’re built-upon and don’t easily lend themselves to being a parser that’s invoked as a subprocess.

import click

@click.command()
@click.option("value", help="some value")
def app(value):
    print(f"{value=}")

if __name__ == "__main__":
    app()

Parser Objects describes libraries where a stateful object that represents the argument parser is constructed over a series of method calls before being used to parse input arguments (e.g. argparse). This construction lends itself well to composition, as the method calls can be composed.

import argparse

parser = arpgparse.ArgumentParser()
parser.add_argument("value", help="some value")

if __name__ == "__main__":
    args = parser.parse_args()
    print(f"{args.value=}")

To me, this seemed like the correct interface for building an argument parser in a language agnostic way. The only problem was figuring out how to model the state.

My initial thought was to have an interface like the following:

parser=$(argparsh new $0)
parser=$(argparsh add_arg $parser value --help="some value")
...
eval $(argparsh parse "$parser" "$@")

In this approach, the idea was that each command would produce part of a python script:

~$ argparsh new foo
import argparse
parser = argparse.ArgumentParser("foo")

~$ argparsh add_arg "$parser" value --help="some value"
import argparse
parser = argparse.ArgumentParser("foo")
parser.add_argument("value", help="some value")

and then parse would simply append parser.parse_args(CLI ARGS) and the convert the resulting value into something that could be evaluated by the shell. However, this seemed cumbersome, and I realized that it was possible to get this to work without having to take in the “state” object for every call:

~$ {
 argparsh new foo
 argparsh add_arg "$parser" value --help="some value"
}
import argparse
parser = argparse.ArgumentParser("foo")
parser.add_argument("value", help="some value")

However, this approach limits error checking that can be done in each call, but that seemed acceptable to me. The next issue was that by outputting pieces of a python script, the variable holding the state will have whitespace. This means that omitting quotes when using the variable will cause weird error states for users. To workaround this, I instead serialized some values for each command with pickle, then URL-encoded those bytes to get a string with no spaces. Then I had each command print & followed by the URL-encoded data and end without printing a newline so that the concatenated state variable could be a single string with no whitespace.

In this process, I found myself wishing that shells had some kind of standard interface for an object with methods to mutate internal state.

The final version of the tool look like this:

# Create a parser program
parser=$({
  argparsh new $0 -d "argparsh example" -e "bye!"
  argparsh add_arg "a" -- \
    --choices "['a', 'b', 'c']"\
    --help "single letter arg"
  argparsh add_arg -i --interval -- --type int --default 10
  argparsh add_arg -f -- --action store_true

  argparsh subparser_init --required true
  argparsh subparser_add foo
  argparsh subparser_add bar

  argparsh add_arg --subparser foo qux
  argparsh set_defaults --subparser foo --myarg foo

  argparsh add_arg --subparser bar baz
  argparsh set_defaults --subparser bar --myarg bar
})

eval $(argparsh parse $parser --prefix arg_ -- "$@")
echo "Parsed args as shell variables:"
echo "[arg]: a="$arg_a
echo "[arg]: interval="$arg_interval
echo "[arg]: f="$arg_f
if [ "$arg_myarg" == "foo" ]; then
  echo "[arg]: qux="$arg_qux
else
  if [ "$arg_myarg" == "bar" ]; then
    echo "[arg]: baz="$arg_baz
  fi
fi

Conclusions

There’s other details that were involved in making this project not only work, but work robustly enough for me to feel comfortable including this in other projects I maintain. To see more about that, I recommend taking a look at the code here. In the next post, I will write a bit about optimizing argparsh, and why that became necessary.

Here’s some key takeaways I had while working on and using the initial prototype:

  • Portable tools for building argument parsers are actually a very good idea
  • Shells could benefit from objects
  • Python as a means of augmenting the shell
This post is a part of a series. Click here for the next post.
Written on November 17, 2024