genser - generate serialising code

This is a pair of programs which generate code to serialise and
deserialise data structures, given a description of them.  I wrote
them for the protocol used by userfs, but they are quite general.  One
program generates code for encoding, decoding and finding the size of
the encoded representation, and the other generates prototypes for
them, and emits the types in ansi C.

GENHDR

Usage: genhdr [-C] input.ty 

This emits all the datatypes and function prototypes in ansi C to
standard output.  If -C is specified then array structures are
generated with destructors which free memory allocated by decoding.

GENCODE

Usage: gencode [-sedC] [-l dir] [-s suff] input.ty [> output.c]

Gencode generates C code for the encode/decode/sizeof functions
(generated with the -e, -d, and -s options respectively).  -l dir will
generate one function per file into directory "dir" for the generation
of archive libraries so that programs don't have to link everything
in.  -s sets the suffix of the output files (.c by default).  -C
generates code to work with the destuctors generated with genhdr -C,
and sets the suffix to .cc when used with -l.

INPUT FILE FORMAT

The input file is essentially C type definitions, with a few
exceptions.  By default, code is generated for any type named with
typedef, and any anonymous type used in a typedef.

Arrays are defined as follows:

	typedef int foo[];

This defines a type "foo", which is an unbounded array of ints (signed
32 bit words).  This generates a structure of the form:

	struct {
		int *elems;
		long nelem;
	};

which is a pointer to the base of the array, and the number of
elements.

This, and the complete exclusion of functions from the type system,
are the main differences from pure C syntax.  A number of parsing
hacks have been put in place so that C syntax can be parsed without
semantic content for genser.

Structures may be named with the "struct foo {...};" syntax, but they
are ignored until they are used in a named type.

Often you want to include a system include file for a couple of types,
but it defines dozens.  Typedefs can be marked as "generate on demand"
(when used in other types) by enclosing them in a notypedef block:

notypedef {
#include <sys/types.h>
}

This will only generate code and definitions for the types in
<sys/types.h> if another type uses them.

The input file is run through cpp ("/lib/cpp -Ulinux -C").

It is possible to quote parts of the input file directly into the
output file, by putting '%' at the beginning of the line.  These lines
are completely uninterpreted and are copied through with the '%'
stripped off.  The order in the output of these lines is maintained,
but the order in relation to genser output corresponding to input
surrounding the quoted lines is not specified, but generally they will
be before any generated output.  Quoted lines are not copies through
by gencode, only genhdr, so they are only in the header lines.

%/* Copy into output file */
%#include <sys/types.h>

When decoding arrays of variable size and pointers to objects, the
decode routine calls a function or macro void *ALLOC(size_t size) to
allocate memory.  It expects this function will always return a valid
pointer to free memory.  By default, it is defined as malloc(), but it
can be redefined in the quoted section to something appropriate to
local conditions.  The memory allocated in the decode function must be
manually freed when you've finished.  If one generates C++ code (-C
option to gen(code|hdr)) then destructors which call FREE() are
generated for arrays.  They will only attempt to free memory if it was
allocated by a decode function.

coder.h is a file that must always be included.  It contains the
definitions of the encode/decode/sizeof functions for the base types
used by genser.  It is normally included as "coder.h", but if one
defines _NO_CODER_H_ in the output file (with a quoted define), it
will not be automatically included.  One can then include it in a more
appropriate way, or replace it altogether.

There are no known actual *bugs*, but there are a few limitations and
desireable features.  Most importantly, C++ support could be much
better.  Each type could be a class with encode/decode/sizeof methods,
and memory can be allocated with new and delete if required.  Also it
should have a better grasp of C so it will always be able to parse
Linux header files.

Also, this readme file could be turned into a real man page or texinfo
page.

Bug reports and comments to
	Jeremy Fitzhardinge <jeremy@sw.oz.au>
