- 论坛徽章:
- 0
|
***********************
5.2 Using Existing Grammars as a Guide
***********************
没啥意思,不过有句话倒是可以听听,
Identifying the grammar rules and expressing their right sides in pseudocode
is challenging at first but gets easier and easier as you build grammars for
more languages.
*******************************
5.3 Recognizing Common Language Patterns with ANTLR Grammars
*******************************
Now that we have a general top-down strategy for roughing out a grammar,
we need to focus on the common language patterns: sequence, choice, token
dependence, and nested phrase.
Pattern: Sequence
The structure you’ll see most often in computer languages is a sequence of
elements, such as the sequence of methods in a class definition.
Variations on this pattern include the sequence with terminator and sequence
with separator. CSV files demonstrate both nicely. Here’s how we can express
the pseudocode grammar from the previous section in ANTLR notation:
file : (row '\n')* ; // sequence with a '\n' terminator
row : field (',' field)* ; // sequence with a ',' separator
field: INT ; // assume fields are just integers
Rule file uses the list with terminator pattern to match zero or more row '\n'
sequences. The '\n' token terminates each element of the sequence. Rule row
uses the list with separator pattern by matching a field followed by zero or
more ',' field sequences. The ',' separates the fields. row matches sequences like
1 and 1,2 and 1,2,3, and so on.
Pattern: Choice (Alternatives)
To express the notion of choice in a language, we use | as the “or” operator
in ANTLR rules to separate grammatical choices called alternatives or productions.
Grammars are full of choices.
Returning to our CSV grammar, we can make a more flexible field rule by
allowing the choice of integers or strings.
field : INT | STRING ;
Any time you find yourself saying “language structure x can be either this or
that,” then you’ve identified a choice pattern. Use | in rule x.
Pattern: Token Dependency
we need a way to express dependencies between tokens. If we see one symbol
in a sentence, we must find its matching counterpart elsewhere in the sentence.
To express this with a grammar, we use a sequence that specifies both
symbols, usually enclosing or grouping other elements. In this case, we
completely specify vectors like this:
vector : '[' INT+ ']' ; // [1], [1 2], [1 2 3], ...
Keep in mind that dependent symbols don’t necessarily have to match. Cderived
languages also have the a?b:c ternary operator where the ? sets up a
requirement to see : later in the phrase.
Pattern: Nested Phrase
A nested phrase has a self-similar language structure, one whose subphrases
conform to that same structure.
Let’s see how nesting works for code blocks. A while statement is the keyword
while followed by a condition expression in parentheses followed by a statement.
We can also treat multiple statements as a single block statement by wrapping
them in curly braces. Expressing that grammatically looks like this:
stat: 'while' '(' expr ')' stat // match WHILE statement
| '{' stat* '}' // match block of statements in curlies
... // and other kinds of statements
;
The looping statement, stat, of the while can be a single statement or a group
of statements if we enclose them in {...}. Rule stat is directly recursive because
it refers to itself in the first (and second) alternatives. If we moved the second
alternative to its own rule, rules stat and block would be mutually indirectly
recursive.
stat: 'while' '(' expr ')' stat // match WHILE statement
| block // match a block of statements
... // and other kinds of statements
;
block: '{' stat* '}' ; // match block of statements in curlies
**************************************************
**************************************************
For future reference, here’s a table summarizing ANTLR’s core grammar
notation:
And here is a table summarizing what we’ve learned so far about common
computer language patterns:
|
|