Why getopt?
I use getopt almost exclusively in all software that I write by myself, and often insist on using it when collaborating with others, even when the language convention is to use something else.
The reason is simple: getopt is a part of the user interface, and user
interfaces should strive to be simple and consistent. As an end user,
I find it jarring when, for example, I have to run a script by
specifying the interpreter by hand, or when the language-specific
extension is a part of the file name. This is an implementation detail
which should not concern me - the #!
should take care of that for
me. Similarly, getopt is over 40 years old, is supported nearly
universally, and is easy to understand both for the user and the
programmer.
It’s a matter of UX
Users don’t like to be surprised when interacting with a program. If the platform’s convention is to put the “OK” button on the right, and “cancel” button on its left, presenting them in the opposite order is like laying a trap; even you don’t get tripped by it, you must’ve expended additional energy on interpreting the situation.
It’s the same with command line argument parsing. Some people might be
used to typing rm -rf
, others have rm -fr
in their muscle memory.
However a program written e.g. using Go’s flag module might
trip someone up, since a single dash is allowed to specify a long
option, rather than a set of short options; in an extreme example,
-fr
and -rf
can mean completely different things.
It’s a matter of code and documentation quality
Complex libraries, such as Python’s argparse, hide what is actually going on in your program’s argument handling code. While they allow very fancy things to be expressed tersely, the actual logic becomes opaque to the reader. Consider this example from argparse’s introduction:
import argparse
parser = argparse.ArgumentParser(description='Process some integers.')
parser.add_argument('integers', metavar='N', type=int, nargs='+',
help='an integer for the accumulator')
parser.add_argument('--sum', dest='accumulate', action='store_const',
const=sum, default=max,
help='sum the integers (default: find the max)')
args = parser.parse_args()
print(args.accumulate(args.integers))
Here’s (almost) identical logic written using getopt:
import os
import getopt
args, opts = getopt.getopt(os.argv[1:], "", ["sum"])
func = max
for opt, arg in opts:
if opt == "--sum":
func = sum
print(func(int(arg) for arg args))
Now there’s of course two things missing (which offers a very good counter-argument against getopt): documentation, and validation/error handling.
Let’s have another look at the documentation that was auto-generated by argparse:
usage: prog.py [-h] [--sum] N [N ...]
Process some integers.
positional arguments:
N an integer for the accumulator
options:
-h, --help show this help message and exit
--sum sum the integers (default: find the max)
In my opinion, this message could just as well be hardcoded in the
program source. Its existence provides an excellent reference to
whoever is reading the code, and entices focusing on the clarity of
the message. It is a good idea to start writing the program by first
writing this help message. If I were to implement prog.py
from
scratch, I would write the help message as follows:
Usage: accumulate [-h | --help] [--func=F] ARGS
This utility accumulates ARGS (each interpreted as a number),
according to the function F (which by default is max).
Options:
-h, --help Show this help message and exit.
--func=F Use function F to accumulate the numbers.
The function F can be one of:
max Find the largest number among the arguments. (Default.)
You must provide at least one argument.
sum Sum the arguments. A sum of zero arguments is zero.
By writing the documentation first, we’ve achieved the following:
- Our program now has a name (
accumulate
), and a more clearly defined purpose. - We’ve identified the edge/error cases, such as attempting to find the maximum of zero numbers; meanwhile a sum of zero numbers is the addition identity (zero), so it would make sense to allow that.
- We’ve generalized our program to handle numbers, rather than integers. Python has a module for exact decimal arithmetic, so why not use that?
- We’ve made the interface more extensible, leaving enough space to allow adding a hundred more functions in the future, without cluttering the option namespace, or painting ourselves in the corner by introducing mutually exclusive options.
- The printed text is (somewhat structured) hand-written prose, which reads more easily than the auto-generated text.
So how does the code to handle all of that now look like?
import os
import getopt
import decimal
def show_usage():
print("Usage: accumulate [-h | --help] [--func=F] ARGS")
def show_help():
show_usage()
print("""
This utility...
""") # omitted for brevity
def main():
try:
args, opts = getopt.getopt(os.argv[1:], "h", ["help", "func="])
except getopt.GetoptError:
show_usage()
exit(1)
funcs = {"max": max, "sum": sum}
func = max
for opt, arg in opts:
if opt in ["-h", "--help"]:
show_help()
exit()
elif opt == "--func":
try:
func = funcs[arg]
except LookupError:
show_usage()
exit(1)
if func == sum and len(args) == 0:
print("Error: cannot sum zero numbers.")
exit(1)
print(func(decimal.Decimal(arg) for arg args))
if __name__ == "__main__":
main()
So, is this a lot of error handling code? No, I don’t think so. Real-world programs need to handle such edge cases all of the time.
Is this too much code for such a small utility? After all, we’ve gone
from ten to dozens of lines of code. Again, I don’t think so. Even the
tiniest utility (many of which will never ever get a proper manual
page) will greatly benefit from a carefully written --help
-style
reference. The task at hand happens to fit the example given in
argparse’s introduction, but many real-world utilities won’t.
Resorting to use every single one of argparse’s capabilities in an
attempt to writte less lines of code is just golf.
Click, Typer, etc
Don’t even get me started!
Appendix A: boilerplate
The argument-parsing boilerplate for different languages can be trivially copy-pasted from a template; I keep a couple of such copypastas in my dotfiles:
I’ve taken on maintaining a fork of an excellent getopt library for Go, and provided some boilerplate in the examples directory:
Appendix B: support for getopt
If you see a glaring omission, feel free to tickle me and suggest an edit!
Operating systems / platforms
- POSIX; and via POSIX: glibc, musl, and therefore, probably every Linux distro in existence.
- Windows/WSL, per installed Linux distro.
- OpenBSD
- NetBSD
- FreeBSD
- DragonFly BSD
- illumos
- Solaris
- SerenityOS
Programming languages
Also, check out the article on Rosetta Code.