The Tao of Option Parsing
=========================

Optik was explicitly designed to encourage the creation of programs with
straightforward, conventional command-line interfaces.  To that end, it
supports only the most common command-line syntax and semantics
conventionally used under UNIX.  If you are unfamiliar with these
conventions, read this document to acquaint yourself with them.


Terminology
-----------

argument
  a string entered on the command-line, and passed by the shell to
  ``execl()`` or ``execv()``.  In Python, arguments are elements of
  ``sys.argv[1:]`` (``sys.argv[0]`` is the name of the program being
  executed).  UNIX shells also use the term "word".

  It is occasionally desirable to substitute an argument list other
  than ``sys.argv[1:]``, so you should read "argument" as "an element of
  ``sys.argv[1:]``, or of some other list provided as a substitute for
  ``sys.argv[1:]``".

option   
  an argument used to supply extra information to guide or customize the
  execution of a program.  There are many different syntaxes for
  options; the traditional UNIX syntax is a hyphen ("-") followed by a
  single letter, e.g. ``"-x"`` or ``"-F"``.  Also, traditional UNIX
  syntax allows multiple options to be merged into a single argument,
  e.g.  ``"-x -F"`` is equivalent to ``"-xF"``.  The GNU project
  introduced ``"--"`` followed by a series of hyphen-separated words,
  e.g. ``"--file"`` or ``"--dry-run"``.  These are the only two option
  syntaxes provided by Optik.

  Some other option syntaxes that the world has seen include:

  * a hyphen followed by a few letters, e.g. ``"-pf"`` (this is
    *not* the same as multiple options merged into a single argument)
  * a hyphen followed by a whole word, e.g. ``"-file"`` (this is
    technically equivalent to the previous syntax, but they aren't
    usually seen in the same program)
  * a plus sign followed by a single letter, or a few letters,
    or a word, e.g. ``"+f"``, ``"+rgb"``
  * a slash followed by a letter, or a few letters, or a word, e.g.
    ``"/f"``, ``"/file"``

  These option syntaxes are not supported by Optik, and they never will
  be.  This is deliberate: the first three are non-standard on any
  environment, and the last only makes sense if you're exclusively
  targeting VMS, MS-DOS, and/or Windows.

option argument
  an argument that follows an option, is closely associated with that
  option, and is consumed from the argument list when that option is.
  With Optik, option arguments may either be in a separate argument
  from their option::

    -f foo
    --file foo

  or included in the same argument::

    -ffoo
    --file=foo

  Typically, a given option either takes an argument or it doesn't.
  Lots of people want an "optional option arguments" feature, meaning
  that some options will take an argument if they see it, and won't if
  they don't.  This is somewhat controversial, because it makes parsing
  ambiguous: if ``"-a"`` takes an optional argument and ``"-b"`` is
  another option entirely, how do we interpret ``"-ab"``?  Because of
  this ambiguity, Optik does not support this feature.

positional argument
  something leftover in the argument list after options have been
  parsed, i.e. after options and their arguments have been parsed and
  removed from the argument list.

required option
  an option that must be supplied on the command-line; note that the
  phrase "required option" is self-contradictory in English.  Optik
  doesn't prevent you from implementing required options, but doesn't
  give you much help at it either.  See ``examples/required_1.py`` and
  ``examples/required_2.py`` in the Optik source distribution for two
  ways to implement required options with Optik.

For example, consider this hypothetical command-line::

  prog -v --report /tmp/report.txt foo bar

``"-v"`` and ``"--report"`` are both options.  Assuming that
``--report`` takes one argument, ``"/tmp/report.txt"`` is an option
argument.  ``"foo"`` and ``"bar"`` are positional arguments.


What are options for?
---------------------

Options are used to provide extra information to tune or customize the
execution of a program.  In case it wasn't clear, options are usually
*optional*.  A program should be able to run just fine with no options
whatsoever.  (Pick a random program from the UNIX or GNU toolsets.  Can
it run without any options at all and still make sense?  The main
exceptions are ``find``, ``tar``, and ``dd`` -- all of which are mutant
oddballs that have been rightly criticized for their non-standard syntax
and confusing interfaces.)

Lots of people want their programs to have "required options".  Think
about it.  If it's required, then it's *not optional*!  If there is a
piece of information that your program absolutely requires in order to
run successfully, that's what positional arguments are for.

As an example of good command-line interface design, consider the humble
``cp`` utility, for copying files.  It doesn't make much sense to try to
copy files without supplying a destination and at least one source.
Hence, ``cp`` fails if you run it with no arguments.  However, it has a
flexible, useful syntax that does not require any options at all::

    cp SOURCE DEST
    cp SOURCE ... DEST-DIR

You can get pretty far with just that.  Most ``cp`` implementations
provide a bunch of options to tweak exactly how the files are copied:
you can preserve mode and modification time, avoid following symlinks,
ask before clobbering existing files, etc.  But none of this distracts
from the core mission of ``cp``, which is to copy either one file to
another, or several files to another directory.


What are positional arguments for?
----------------------------------

Positional arguments are for those pieces of information that your
program absolutely, positively requires to run.

A good user interface should have as few absolute requirements as
possible.  If your program requires 17 distinct pieces of information in
order to run successfully, it doesn't much matter *how* you get that
information from the user -- most people will give up and walk away
before they successfully run the program.  This applies whether the user
interface is a command-line, a configuration file, or a GUI: if you make
that many demands on your users, most of them will simply give up.

In short, try to minimize the amount of information that users are
absolutely required to supply -- use sensible defaults whenever
possible.  Of course, you also want to make your programs reasonably
flexible.  That's what options are for.  Again, it doesn't matter if
they are entries in a config file, widgets in the "Preferences" dialog
of a GUI, or command-line options -- the more options you implement, the
more flexible your program is, and the more complicated its
implementation becomes.  Too much flexibility has drawbacks as well, of
course; too many options can overwhelm users and make your code much
harder to maintain.

.. $Id: tao.txt 413 2004-09-28 00:59:13Z greg $
