.\"=====================================================================
.\"  @Troff-man-file{
.\"     author          = "Nelson H. F. Beebe",
.\"     version         = "1.06",
.\"     date            = "23 September 2004",
.\"     time            = "15:02:19 MDT",
.\"     filename        = "biblex.man",
.\"     address         = "University of Utah
.\"                        Department of Mathematics, 110 LCB
.\"                        155 S 1400 E RM 233
.\"                        Salt Lake City, UT 84112-0090
.\"                        USA",
.\"     telephone       = "+1 801 581 5254",
.\"     FAX             = "+1 801 581 4148",
.\"     URL             = "http://www.math.utah.edu/~beebe",
.\"     checksum        = "18684 236 926 6983",
.\"     email           = "beebe@math.utah.edu, beebe@acm.org,
.\"                        beebe@computer.org  (Internet)",
.\"     codetable       = "ISO/ASCII",
.\"     keywords        = "bibliography, BibTeX, lexical analysis",
.\"     supported       = "yes",
.\"     docstring       = "This file is the UNIX nroff/troff manual
.\"                        page documentation for biblex, a tool for
.\"                        lexically analyzing BibTeX bibliography
.\"                        data base files into a token stream that
.\"                        can be conveniently processed by other
.\"                        tools, or reconstructed into a BibTeX file
.\"                        by bibunlex.
.\"
.\"                        The checksum field above contains a CRC-16
.\"                        checksum as the first value, followed by the
.\"                        equivalent of the standard UNIX wc (word
.\"                        count) utility output of lines, words, and
.\"                        characters.  This is produced by Robert
.\"                        Solovay's checksum utility.",
.\"  }
.\"=====================================================================
.\"
.if t .ds Bi B\s-2IB\s+2T\\h'-0.1667m'\\v'0.20v'E\\v'-0.20v'\\h'-0.125m'X
.if n .ds Bi BibTeX
.\"
.if t .ds Sc S\s-2CRIBE\s+2
.if n .ds Sc Scribe
.\"
.\"=====================================================================
.TH BIBLEX 1 "23 September 2004" 1.06"
.\"=====================================================================
.SH NAME
biblex \- lexically analyze BibTeX bibliography data base files
.\"=====================================================================
.SH SYNOPSIS
.B biblex
.I "<infile"
.I ">outfile"
.nf
or
.fi
.B biblex
.I "bibfile1 bibfile2 bibfile3 .\|.\|."
.I ">outfile"
.\"=====================================================================
.SH DESCRIPTION
.B biblex
converts one or more bibliography data base files
in \*(Bi format to a lexical token stream that is
convenient for processing by other tools.
.PP
The companion
.BR bibunlex (1)
program can be used to recombine such a token
stream back into a \*(Bi file.
.PP
\*(Sc-format bibliography files can be handled as
well if they are first converted to \*(Bi form by
.BR bibclean (1).
.PP
Only minimal checks are made on the correctness of
the input stream, and
.B biblex
will happily carry out a lexical analysis of
nonsensical input, without issuing warnings or
errors of any kind, other than possible internal
string buffer overflow.  To verify that
.BR biblex 's
output token stream is meaningful, the input files
can be given to
.BR bibparse (1)
for parsing analysis according to a proposed
grammar for \*(Bi.
.\"=====================================================================
.SH "LEXICAL ANALYSIS"
.B biblex
produces output in lines of the form
.PP
.RS
.nf
<token-number><tab><token-name><tab>"<token-value>"
.fi
.RE
.PP
Each output line contains a single complete token,
identified by a small integer number for use by a
computer program, a token type name for human
readers, and a string value in quotes.
.PP
Special characters in the token value string are
represented with ANSI/ISO Standard C escape
sequences, so all characters other than NUL are
representable, and multi-line values can be
represented in a single line.
.PP
Here are the token numbers and token type names
that can appear in the output:
.PP
.RS
.nf
 0   UNKNOWN
 1   ABBREV
 2   AT
 3   COMMA
 4   COMMENT
 5   ENTRY
 6   EQUALS
 7   FIELD
 8   INCLUDE
 9   INLINE
10   KEY
11   LBRACE
12   LITERAL
13   NEWLINE
14   PREAMBLE
15   RBRACE
16   SHARP
17   SPACE
18   STRING
19   VALUE
.fi
.RE
.PP
Programs that parse such output should also be
prepared for lines beginning with the warning
prefix, %%, or the error prefix, ??, and for
ANSI/ISO Standard C line number directives of the
form
.RS
# line 273 "texbook1.bib"
.RE
which record the line number and file name
of the current input file.
.PP
As an example of the use of
.BR biblex ,
the UNIX command pipeline
.RS
.nf
\fBbiblex\fP \fImylib.bib\fP | \e
    \fBawk\fP '$2 == "KEY" {print $3}' | \e
    \fBsed\fP -e 's/"//g' | \e
    \fBsort\fP
.fi
.RE
will extract a sorted list of all citation keys in
the file
.IR mylib.bib .
.PP
The LITERAL token type is used for arbitrary text
that
.B biblex
does not examine further, such as the contents of
a @Preamble{.\|.\|.} or a @Comment{.\|.\|.}.
.PP
The UNKNOWN token type should never appear in the
output stream.  It is used internally to
initialize token type variables.
.\"=====================================================================
.SH BUGS
Limitations of the
.BR lex (1)
lexical analyzer generator used to construct
.B biblex
prevent handling of files containing ASCII NUL;
that character will be interpreted as an
end-of-file condition.
.PP
Older versions of
.BR lex (1)
are not
.IR "8-bit clean" ;
they will not reliably handle characters 128\(en255.
This latter deficiency is being remedied by the
X/Open Consortium activities to internationalize
and standard UNIX applications.
.\"=====================================================================
.SH "SEE ALSO"
.BR bibcheck (1),
.BR bibclean (1),
.BR bibdup (1),
.BR bibextract (1),
.BR bibjoin (1),
.BR biblabel (1),
.BR biborder (1),
.BR bibparse (1),
.BR bibsearch (1),
.BR bibsort (1),
.BR bibtex (1),
.BR bibunlex (1),
.BR citefind (1),
.BR citesub (1),
.BR citetags (1),
.BR latex (1),
.BR scribe (1),
.BR tex (1).
.br
X/Open Company, Ltd.,
.IR "X/Open Portability Guide, XSI Commands and Utilities" ,
volume 1.  Prentice-Hall, Englewood Cliffs, NJ
07632, USA, 1989.  ISBN 0-13-685835-X.
.\"=====================================================================
.SH AUTHOR
.nf
Nelson H. F. Beebe
University of Utah
Department of Mathematics, 110 LCB
155 S 1400 E RM 233
Salt Lake City, UT 84112-0090
USA
Email: \fCbeebe@math.utah.edu\fP, \fCbeebe@acm.org\fP, \fCbeebe@computer.org\fP (Internet)
WWW URL: \fChttp://www.math.utah.edu/~beebe\fP
Telephone: +1 801 581 5254
FAX: +1 801 581 4148
.fi
.\"=====================================================================
.\" This is for GNU Emacs file-specific customization:
.\" Local Variables:
.\" fill-column: 50
.\" End:
