12.3 Definite Clause Grammars — DCGs
Grammar rules are described in many standard Prolog texts ([2]).
In ECLiPSe they are provided by a predefined global3 macro for
-->/2.
When the parser reads a clause whose main functor is -->/2, it transforms
it according to the standard rules.
The syntax for DCGs is as follows:
grammar_rule --> grammar_head, ['-->'], grammar_body.
grammar_head --> non_terminal.
grammar_head --> non_terminal, [','], terminal.
grammar_body --> grammar_body, [','], grammar_body.
grammar_body --> grammar_body, [';'], grammar_body.
grammar_body --> grammar_body, ['->'], grammar_body.
grammar_body --> grammar_body, ['|'], grammar_body.
grammar_body --> iteration_spec, ['do'], grammar_body.
grammar_body --> ['-?->'], grammar_body.
grammar_body --> grammar_body_item.
grammar_body_item --> ['!'].
grammar_body_item --> ['{'], Prolog_goals, ['}'].
grammar_body_item --> non_terminal.
grammar_body_item --> terminal.
The non-terminals are syntactically identical to prolog goals (atom, compound
term or variable), the terminals are lists of prolog terms (typically
characters or tokens). Every
term is transformed, unless it is enclosed in curly brackets. The control
constructs like conjunction ,/2, disjunction (;/2 or |/2),
the cut (!/0), the condition (->/1) and do-loops do not need to
be enclosed in curly brackets.
The grammar can be accessed with the built-in phrase/3.
The first argument of phrase/3 is the name of the
grammar to be used, the
second argument one is a list containing the input to be parsed. If the
parsing is successful the built-in will succeed.
For instance with the grammar
a --> [] | [z], a.
phrase(a, X, []) will give on backtracking: X = [z] ; X = [z, z] ; X = [z, z, z] ; ....
12.3.1 Simple DCG example
The following example illustrates a simple grammar declared using the DCGs.
sentence --> imperative, noun_phrase(Number), verb_phrase(Number).
imperative, [you] --> [].
imperative --> [].
noun_phrase(Number) --> determiner, noun(Number).
noun_phrase(Number) --> pronom(Number).
verb_phrase(Number) --> verb(Number).
verb_phrase(Number) --> verb(Number), noun_phrase(_).
determiner --> [the].
noun(singular) --> [man].
noun(singular) --> [apple].
noun(plural) --> [men].
noun(plural) --> [apples].
verb(singular) --> [eats].
verb(singular) --> [sings].
verb(plural) --> [eat].
verb(plural) --> [sing].
pronom(plural) --> [you].
The above grammar may be successfully parsed
using phrase/3. If the predicate
succeeds then the input has been parsed successfully.
[eclipse 1]: phrase(sentence, [the,man,eats,the,apple], []).
yes.
[eclipse 2]: phrase(sentence, [the,men,eat], []).
yes.
[eclipse 3]: phrase(sentence, [the,men,eats], []).
no.
[eclipse 4]: phrase(sentence, [eat,the,apples], []).
yes.
[eclipse 5]: phrase(sentence, [you,eat,the,man], []).
yes.
The predicate phrase/3 may be used to return the point at which
parsing of a grammar fails — if the returned list is empty then the
input has been successfully parsed.
[eclipse 1]: phrase(sentence, [the,man,eats,something,nasty],X).
X = [something, nasty] More? (;)
no (more) solution.
[eclipse 2]: phrase(sentence, [eat,the,apples],X).
X = [the, apples] More? (;)
X = [] More? (;)
no (more) solution.
[eclipse 3]: phrase(sentence, [hello,there],X).
no (more) solution.
12.3.2 Mapping to Prolog Clauses
Grammar rule are translated to Prolog clauses by adding two arguments
which represent the input before and after the nonterminal which is
represented by the rule.
The effect of the transformation can be observed, e.g. by calling biprefexpand_clause/2../bips/kernel/compiler/expand_clause-2.html:
[eclipse 1]: expand_clause(p(X) --> q(X), Expanded).
X = X
Expanded = p(X, _250, _243) :- q(X, _250, _243)
Yes (0.00s cpu)
[eclipse 2]: expand_clause(p(X) --> [a], Expanded).
X = X
Expanded = p(X, _251, _244) :- 'C'(_251, a, _244)
Yes (0.00s cpu)
12.3.3 Parsing other Data Structures
DCGs are in principle not limited to the parsing of lists.
The predicate 'C'/3 is responsible for reading resp. generating
the input tokens. The default definition is
'C'([Token|Rest], Token, Rest).
The first argument represents the parsing input before consuming
Token and Rest is the input after consuming Token.
By redefining 'C'/3, it is possible to apply a DCG to other
input sources than a list, e.g. to parse directly from an I/O stream:
:- local 'C'/3.
'C'(Stream-Pos0, Token, Stream-Pos1) :-
seek(Stream, Pos0),
read_string(Stream, " ", _, TokenString),
atom_string(Token, TokenString),
at(Stream, Pos1).
sentence --> noun, [is], adjective.
noun --> [prolog] ; [lisp].
adjective --> [boring] ; [great].
This can then be applied to a string as follows:
[eclipse 1]: String = "prolog is great", open(String, string, S),
phrase(sentence, S-0, S-End).
...
End = 15
yes.
Here is another redefinition of 'C'/3, using a similar idea, which allows
the direct parsing of ECLiPSe strings as sequences of characters:
:- local 'C'/3.
'C'(String-Pos0, Char, String-Pos1) :-
Pos0 =< string_length(String),
string_code(String, Pos0, Char),
Pos1 is Pos0+1.
anagram --> [].
anagram --> [_].
anagram --> [C], anagram, [C].
This can then be applied to a string as follows:
[eclipse 1]: phrase(anagram, "abba"-1, "abba"-5).
yes.
[eclipse 2]: phrase(anagram, "abca"-1, "abca"-5).
no (more) solution.
Unlike the default definition, these redefinitions of 'C'/3 are not bi-directional.
Consequently, the grammar rules using them can only be used for parsing,
not for generating sentences.
Note that every grammar rule uses that definition of 'C'/3 which is visible in
the module where the grammar rule itself is defined.