windows 上据说支持不好,不过这是好久得事了,现在不知道怎样。不过微软到现在好像也没参与llvm。

Google搞llvm得头说得, 2012年的事了。
http://channel9.msdn.com/Events/ ... y-s-Million-Monkeys

    多谢。我打算先学antlr ,然后linux,然后再 llvm(如果还有精力的话,并且如果进展顺利的话,并且如果第三次世界大战没有发生的话,并且如果两会圆满结束的话)。

因为antlr 使用比较简单,还有配套数据,所以暂时学习antlr了。

4.4 Making Things Happen During the Parse


To demonstrate actions embedded in a grammar, let’s build a program that
prints out a specific column from rows of data.

编写语法文件 和 java文件

The only
thing different here is that we’re passing in a column number to the parser
using a custom constructor and telling the parser not to build a tree.


These actions extract and print values matched by the parser, but they don’t
alter the parse itself. Actions can also finesse how the parser recognizes input


看英文 书籍 确实挺累,不习惯啊。

语义 预测
Altering the Parse with Semantic Predicates

Let’s look at a grammar that reads in sequences of integers. The
trick is that part of the input specifies how many integers to group together.

The key in the following Data grammar is a special Boolean-valued action
called a semantic predicate: {$i<=$n}?. That predicate evaluates to true until
we surpass the number of integers requested by the sequence rule parameter




4.5 编译 XML



An XML parser treats everything other than tags and
entity references (such as &pound as text chunks. When the lexer sees <, it
switches to “inside” mode and switches back to the default mode when it sees
> or />.


Rewriting the Input Stream


Let’s build a tool that processes Java source code to insert serialization
identifiers, serialVersionUID, for use with java.io.Serializable (like Eclipse does

这个 语法文件 内容太大,就不上玉照了。

The key is the TokenStreamRewriter object that knows how to give altered views
of a token stream without actually modifying the stream.


It’s time to slow down our pace and revisit all of the concepts explored in this
chapter with the goal of learning all of the details. Each chapter in the next
part of the book will take us another step toward becoming language implementers.
We’ll start by learning ANTLR notation and figuring out how to derive
grammars from examples and language reference manuals.

第五章 designing grammars
那些年,汉字 还是 贵族


Now we’re going to slow down and learn the details
needed to perform useful tasks such as building internal data structures,
extracting information, and generating a translation of the input. The first
step on our journey, though, is to learn how to build grammars.

The constraints of word order and dependency, derived from natural language,
blossom into four abstract computer language patterns.

• Sequence: This is a sequence of elements such as the values in an array
• Choice: This is a choice between multiple, alternative phrases such as
the different kinds of statements in a programming language.
• Token dependence: The presence of one token requires the presence of
its counterpart elsewhere in a phrase such as matching left and right
• Nested phrase: This is a self-similar language construct such as nested
arithmetic expressions or nested statement blocks in a programming

5.1 Deriving Grammars from Language Samples


Proper grammar design mirrors functional decomposition or top-down design
in the programming world. That means we work from the coarsest to the
finest level, identifying language structures and encoding them as grammatical

For example, “a comma-separated-value (CSV) file is a sequence of
rows terminated by newlines.” The essential word file to the left of is a is the
rule name, and everything to the right of is a becomes the «stuff» on the right
side of a rule definition.

Then we step down a level in granularity by describing the elements identified
on the right side of the start rule.

Stepping down another level of detail, we could say that a row is a sequence
of fields separated by commas. Then, a field is a number or string. Our
pseudocode looks like this:

file : «sequence of rows that are terminated by newlines» ;
row : «sequence of fields separated by commas» ;
field : «number or string» ;


Let’s see how this technique works for describing some of the key structures
in a Java file. (We can make the rule names stand out by italicizing them.)
At the coarsest level, a Java compilation unit is an optional package specifier,
followed by one or more class definitions. Stepping down a level, a class definition
is keyword class, followed by an identifier, followed optionally by a
superclass specifier, followed optionally by an implements clause, followed by
a class body. A class body is a series of members enclosed in curly braces. A
member is a nested class definition, a field, or a method. From here, we would
describe fields and methods and then the statements within methods. You
get the idea. Start at the highest possible level and work your way down,
treating even large subphrases like Java class definitions as rules to define
later. In grammar pseudocode, we’d start out like this:

compilationUnit : «optional packageSpec then classDefinitions» ;
packageSpec : 'package' identifier ';' ;
classDefinition :
'class' «optional superclassSpec optional implementsClause classBody» ;
superclassSpec : 'super' identifier ;
implementsClause :
'implements' «one or more identifiers separated by comma» ;
classBody : '{' «zero-or-more members» '}' ;
member : «nested classDefinition or field or method» ;

看出来了,作者写的根本就不是 一本 关于 ANTLR4 入门 方面的书籍。

另外,感觉 作者好像不是很懂 龙书,因为看他的文章一点不能帮助学习龙书;龙书上的知识也不能帮助学习 ANTLR4。这难道就是传说中的,理论脱离实际了吗?



5.2 Using Existing Grammars as a Guide

Identifying the grammar rules and expressing their right sides in pseudocode
is challenging at first but gets easier and easier as you build grammars for
more languages.

5.3 Recognizing Common Language Patterns with ANTLR Grammars


Now that we have a general top-down strategy for roughing out a grammar,
we need to focus on the common language patterns: sequence, choice, token
dependence, and nested phrase.

Pattern: Sequence

The structure you’ll see most often in computer languages is a sequence of
elements, such as the sequence of methods in a class definition.

Variations on this pattern include the sequence with terminator and sequence
with separator. CSV files demonstrate both nicely. Here’s how we can express
the pseudocode grammar from the previous section in ANTLR notation:

file : (row '\n')* ; // sequence with a '\n' terminator
row : field (',' field)* ; // sequence with a ',' separator
field: INT ; // assume fields are just integers

Rule file uses the list with terminator pattern to match zero or more row '\n'
sequences. The '\n' token terminates each element of the sequence. Rule row
uses the list with separator pattern by matching a field followed by zero or
more ',' field sequences. The ',' separates the fields. row matches sequences like
1 and 1,2 and 1,2,3, and so on.

Pattern: Choice (Alternatives)

To express the notion of choice in a language, we use | as the “or” operator
in ANTLR rules to separate grammatical choices called alternatives or productions.
Grammars are full of choices.

Returning to our CSV grammar, we can make a more flexible field rule by
allowing the choice of integers or strings.

field : INT | STRING ;

Any time you find yourself saying “language structure x can be either this or
that,” then you’ve identified a choice pattern. Use | in rule x.

Pattern: Token Dependency

we need a way to express dependencies between tokens. If we see one symbol
in a sentence, we must find its matching counterpart elsewhere in the sentence.
To express this with a grammar, we use a sequence that specifies both
symbols, usually enclosing or grouping other elements. In this case, we
completely specify vectors like this:

vector : '[' INT+ ']' ; // [1], [1 2], [1 2 3], ...

Keep in mind that dependent symbols don’t necessarily have to match. Cderived
languages also have the a?b:c ternary operator where the ? sets up a
requirement to see : later in the phrase.

Pattern: Nested Phrase

A nested phrase has a self-similar language structure, one whose subphrases
conform to that same structure.

Let’s see how nesting works for code blocks. A while statement is the keyword
while followed by a condition expression in parentheses followed by a statement.
We can also treat multiple statements as a single block statement by wrapping
them in curly braces. Expressing that grammatically looks like this:

stat: 'while' '(' expr ')' stat // match WHILE statement
| '{' stat* '}' // match block of statements in curlies
... // and other kinds of statements


The looping statement, stat, of the while can be a single statement or a group
of statements if we enclose them in {...}. Rule stat is directly recursive because
it refers to itself in the first (and second) alternatives. If we moved the second
alternative to its own rule, rules stat and block would be mutually indirectly

stat: 'while' '(' expr ')' stat // match WHILE statement
| block // match a block of statements
... // and other kinds of statements
block: '{' stat* '}' ; // match block of statements in curlies


For future reference, here’s a table summarizing ANTLR’s core grammar

And here is a table summarizing what we’ve learned so far about common
computer language patterns:

