Using Structured Text

  The goal of StructuredText is to make it possible to express
  structured text using a relatively simple plain text format. Simple
  structures, like bullets or headings are indicated through
  conventions that are natural, for some definition of
  "natural". Hierarchical structures are indicated through
  indentation. The use of indentation to express hierarchical
  structure is inspired by the Python programming language.

  Use of StructuredText consists of one to three logical steps. In the
  first step, a text string is converted to a network of objects using
  the 'structurize' facility, as in the following example::

    raw = open("mydocument.txt").read()
    from zope.structuredtext.stng import structurize
    st = structurize(raw)

  The output of 'structurize' is simply a StructuredTextDocument
  object containing StructuredTextParagraph objects arranged in a
  hierarchy. Paragraphs are delimited by strings of two or more
  whitespace characters beginning and ending with newline
  characters. Hierarchy is indicated by indentation. The indentation
  of a paragraph is the minimum number of leading spaces in a line
  containing non-white-space characters after converting tab
  characters to spaces (assuming a tab stop every eight characters).

  StructuredTextNode objects support the read-only subset of the
  Document Object Model (DOM) API. It should be possible to process
  'StructuredTextNode' hierarchies using XML tools such as XSLT.

  The second step in using StructuredText is to apply additional
  structuring rules based on text content. A variety of differentText
  rules can be used. Typically, these are used to implement a
  structured text language for producing documents, but any sort of
  structured text language could be implemented in the second
  step. For example, it is possible to use StructuredText to implement
  structured text formats for representing structured data. The second
  step, which could consist of multiple processing steps, is
  performed by processing, or "coloring", the hierarchy of generic
  StructuredTextParagraph objects into a network of more specialized
  objects. Typically, the objects produced should also implement the DOM
  API to allow processing with XML tools.

  A document processor is provided to convert a StructuredTextDocument
  object containing only StructuredTextParagraph objects into a
  StructuredTextDocument object containing a richer collection of
  objects such as bullets, headings, emphasis, and so on using hints
  in the text. Hints are selected based on conventions of the sort
  typically seen in electronic mail or news-group postings. It should
  be noted, however, that these conventions are somewhat culturally
  dependent, fortunately, the document processor is easily customized
  to implement alternative rules. Here's an example of using the DOC
  processor to convert the output of the previous example::

    from zope.structuredtext.document import Document
    doc = Document()(st)

  The final step is to process the colored networks produced from the
  second step to produce additional outputs. The final step could be
  performed by Python programs, or by XML tools. A Python outputter is
  provided for the document processor output that produces Hypertext Markup
  Language (HTML) text::

    from zope.structuredtext.html import HTML
    html = HTML()(doc)

Customizing the document processor

  The document processor is driven by two tables. The first table,
  named 'paragraph_types', is a sequence of callable objects or method
  names for coloring paragraphs. If a table entry is a string, then it
  is the name of a method of the document processor to be used. For
  each input paragraph, the objects in the table are called until one
  returns a value (not 'None'). The value returned replaces the
  original input paragraph in the output. If none of the objects in
  the paragraph types table return a value, then a copy of the
  original paragraph is used.  The new object returned by calling a
  paragraph type should implement the ReadOnlyDOM,
  StructuredTextColorizable, and StructuredTextSubparagraphContainer
  interfaces. See the 'document.py' source file for examples.

  A paragraph type may return a list or tuple of replacement
  paragraphs, this allowing a paragraph to be split into multiple
  paragraphs. 

  The second table, 'text_types', is a sequence of callable objects or
  method names for coloring text. The callable objects in this table
  are used in sequence to transform the input text into new text or
  objects.  The callable objects are passed a string and return
  nothing ('None') or a three-element tuple consisting of:

    - a replacement object,

    - a starting position, and

    - an ending position

  The text from the starting position is (logically) replaced with the
  replacement object. The replacement object is typically an object
  that implements that implements the ReadOnlyDOM, and
  StructuredTextColorizable interfaces. The replacement object can
  also be a string or a list of strings or objects. Replacement is
  done from beginning to end and text after the replacement ending
  position will be passed to the character type objects for processing.

Example: adding wiki links

  We want to add support for Wiki links. A Wiki link is a string of
  text containing mixed-case letters, such that at least two of the
  letters are upper case and such that the first letter is upper case.
