|
Lookup Definite Clause Grammars |
|
|
(c) Carlos Viegas Damásio, October 2003 |
1. Description |
This small application implements an extension of
Definite Clause Grammars (DCGs) which introduces lookahead symbols in the
compiled code. Ordinary DCGs introduce two additional arguments in each
compiled clause, one for the input list to parse and other for the
remaining list to parse after execution of the predicate (production). Our
compilation method introduces 4 additional arguments:
This technique allows the lookup DCG code to explore the indexing facilities of most Prolog implementations and the user to write the grammars in a more natural way, with significant performance improvments. However, in order to be able to use lookahead information, the input string must be terminated with a special symbol (usually -1). To support the development of large applications we've introduced additional syntactic sugar. To simplify the determination of lookahead symbol information, the lookup DGC compiler resorts to the tabling features of XSB Prolog and therefore is not portable to othe Prolog systems. However, the generated code is fully standard and can be used in any Prolog system. This parser generator has been used for the implementation of a full non-validating XML Parser. |
|
2.
Lookup DCG syntax Productions can have two forms:
The bodies of productions have a similar syntax to ordinary DGCs, except that we introduced additional syntax to represent terminal symbols, permitting the specification of (union of ) interval ranges. Regarding non-terminals, we allow the inline expansion of non-terminals by its rules. Cuts are allowed in production bodies, as well as actions with the usual { Prolog Code } syntax. The full syntex is described next:
|
3.
Installation and usage of the Lookup DCG parser generator
|
4.
Example The following grammar parses lists of natural numbers and names separated by line feeds, either 0xA or 0xD. % An example Look Up DGC :- start( example/1 ). :- end( -1 ). :- - digit/1. example( Is ) ::= lf, !, example( Is ). example( [] ) ::= []. example( [I|Is] ) --> item( I ), !, lf, example( Is ). item( I ) ::= !, number( I ). item( I ) ::= name( I ). number( N ) --> + digit(D), !, rest_digits( Ds ), { number_codes( N, [D|Ds] ) }. rest_digits( [D|Ds] ) --> + digit( D ), !, rest_digits( Ds ). rest_digits( [] ) ::= []. digit( 0'0 ) --> "0". digit( 0'1 ) --> "1". digit( 0'2 ) --> "2". digit( 0'3 ) --> "3". digit( 0'4 ) --> "4". digit( 0'5 ) --> "5". digit( 0'6 ) --> "6". digit( 0'7 ) --> "7". digit( 0'8 ) --> "8". digit( 0'9 ) --> "9". name( N ) --> startchar(C), !, rest_name( Cs ), { atom_codes( N, [C|Cs] ) }. rest_name( [C|Cs] ) --> namechar( C ), !, rest_name( Cs ). rest_name( [] ) --> []. startchar( C ) --> [[0'A-0'Z,0'a-0'z]]/[C], !. namechar( D ) ::= + digit(D), !. namechar( C ) ::= startchar(C). lf --> [16'A]. lf --> [16'D]. To generate the parser for this grammar, consult the parser generator file and then call gen_parser/2: | ?- [lookupdcg]. [lookupdcg loaded] [readgram loaded] [predparserint loaded] [parserexp loaded] yes| ?- gen_parser( ['example.G'], 'example.P' ). example / 1 item / 1 number / 1 rest_digits / 1 name / 1 rest_name / 1 startchar / 1 namechar / 1 lf / 0 yes The generated code is stored in example.P. The
user is suggested to view and try to understand the code. Notice that no
rules for digit/1 are generated since all
occurrences of digit in the grammar are in-line expanded using the
+ digit(D) facility. The use of cuts can
be very subtle, as can be noticed from the rules for
item/1 and
startchar/1. | ?- example( 10, [0'a,0'0,0'Z,10,0'1,0'0,10,10,0'1,10,-1], Is, -1, [] ). Is = [a0Z,10,1] yes |
|
5. Copyright This is an academical and experimental tool. It cannot be used for commercial purposes without explicit consent of the author. |
|
6. Disclaimer This is an academical and experimental tool. I do not give any guarantee of any form regarding the use of this tool. |
|
Last update: October 28th, 2003 |