免费注册 查看新帖 |

Chinaunix

  平台 论坛 博客 文库
最近访问板块 发新帖
楼主: oldbeginner
打印 上一主题 下一主题

编译器学习笔记02(世界公民antlr)——2014_2_27 [复制链接]

论坛徽章:
1
午马
日期:2013-08-23 23:39:47
11 [报告]
发表于 2014-03-03 20:26 |只看该作者
windows 上据说支持不好,不过这是好久得事了,现在不知道怎样。不过微软到现在好像也没参与llvm。

Google搞llvm得头说得, 2012年的事了。
http://channel9.msdn.com/Events/ ... y-s-Million-Monkeys

论坛徽章:
0
12 [报告]
发表于 2014-03-03 20:33 |只看该作者
本帖最后由 oldbeginner 于 2014-03-03 21:12 编辑



    多谢。我打算先学antlr ,然后linux,然后再 llvm(如果还有精力的话,并且如果进展顺利的话,并且如果第三次世界大战没有发生的话,并且如果两会圆满结束的话)。

因为antlr 使用比较简单,还有配套数据,所以暂时学习antlr了。

论坛徽章:
0
13 [报告]
发表于 2014-03-03 21:18 |只看该作者
*************************************
4.4 Making Things Happen During the Parse

************************************



To demonstrate actions embedded in a grammar, let’s build a program that
prints out a specific column from rows of data.


编写语法文件 和 java文件



The only
thing different here is that we’re passing in a column number to the parser
using a custom constructor and telling the parser not to build a tree.


然后,执行


These actions extract and print values matched by the parser, but they don’t
alter the parse itself. Actions can also finesse how the parser recognizes input
phrases.


   

论坛徽章:
0
14 [报告]
发表于 2014-03-04 13:18 |只看该作者
*****************************
英语充电

*****************************

看英文 书籍 确实挺累,不习惯啊。

决定找些英文资料充充电,选择 经典演讲 应该不错,最好能找到中国人的演讲。运气真好,好像就上个星期,出了一篇精彩的评论,而且有心人认认真真翻译成英文。

看这篇文章,即能学汉译英,又能学英语,一石二鸟。

*****************************
只挑精彩的阅读,各段之间有省略(看原文 http://club.kdnet.net/dispbbs.asp?boardid=1&id=9881899 三楼)

Farewell, Gary LOCKE!
标题很简陋

文章一上来就是下马威,有点普特勒风格,
Gary Locke is a third generation Chinese American born in America. Like a banana, he’s yellow on the outside, but white on the inside. This character of his has been well exploited by the Obama administration for its diplomatic advantages

上面是翻译后的,原文是这样的,
骆家辉是一个在美国出生的第三代华裔,他的“黄皮白心”的香蕉人属性成了奥巴马外交的优势。

翻译的人很辛苦,把 黄皮白心 详细描述出来,中文的“香蕉人”这个单词 即使对很多国人也是不熟悉的。

So above the table there you have a black hair, yellow skinned Chinese diaspora generating favourable impressions for the U.S.; but beneath such impression is the America’s effort to stir up storms and create tensions in the Asia-Pacific region.

感觉上面这句很难,看看原文,
当美国在亚太不断搅起漩涡、制造矛盾的时候,却有一个表面上久居海外的游子、黑头发黄皮肤的皮囊来为美 国叫好。

感觉翻译有点问题,游子 和 皮囊 应该 是两个名称,都翻译成了 Chinese,不妥。

However, after a while, a banana will inevitably start to rot. Not only its “white core” deep inside will then be exposed, it will also have been rotted into a disgusting “black core”.

不明觉历的样子。
原文也比较酷,快赶上去年朝鲜判决书的质量:
然而香蕉放久了“黄皮”总归是要烂掉的,不但“白心”会露出来,也会变成反胃的“黑心”。

But Mr Locke, foreign to the words of his ancestors, unable to understand Chinese Laws, loved to interfere and lecture on China’s internal affairs.

但骆氏不但不认祖先的文字,还看不懂中国法律,特喜欢对着中国的内政指手画脚。

看来,不懂汉字的外籍华人 都可以 用 不认祖先的文字 来描述。

Mr Ambassador, do your ancestors know your “performances” [as an ambassador in China?] [I think] they’ll have you expelled from the family if they knew.

笑而不语,原文:
大使先生,您的“业绩”,您祖上知道么?您祖上要是知道,可要把您逐出门户了。

英语应该没有这种表达方法,这应该是汉语和英语双向影响的一种模式。

Farewell, smog; farewell, the plague; Farewell Gary LOCKE!

感觉中文更传神,原文:
送雾霾,送瘟神。Farewell,骆氏家辉!


****************************************************

突然觉得,这篇文章写得如此 明显,高级黑的可能性占 7 成,作者 利用 负负得正 (物极必反的原理)的方式 来启蒙大众,原因你懂得。

****************************************

一天,韩国 恢复汉字运动 成员 找到语文老师 想学习 新的汉字用法,
语文老师随口 武松打鸡
他很兴奋,听完解释后又很失望,韩国人用不到这个词。
语文老师,香蕉人
他又很兴奋,听完解释后打了语文老师一顿。


   

论坛徽章:
0
15 [报告]
发表于 2014-03-04 13:43 |只看该作者
********************************
语义 预测
Altering the Parse with Semantic Predicates
********************************

Let’s look at a grammar that reads in sequences of integers. The
trick is that part of the input specifies how many integers to group together.



The key in the following Data grammar is a special Boolean-valued action
called a semantic predicate: {$i<=$n}?. That predicate evaluates to true until
we surpass the number of integers requested by the sequence rule parameter
n.


准备工作做好,


然后,



   

论坛徽章:
0
16 [报告]
发表于 2014-03-04 22:41 |只看该作者
************************
4.5 编译 XML

***********************
一个xml文件和一个语法文件



xml很简单


语法,
An XML parser treats everything other than tags and
entity references (such as &pound as text chunks. When the lexer sees <, it
switches to “inside” mode and switches back to the default mode when it sees
> or />.


然后,


论坛徽章:
0
17 [报告]
发表于 2014-03-05 15:50 |只看该作者
****************************
Rewriting the Input Stream

****************************


Let’s build a tool that processes Java source code to insert serialization
identifiers, serialVersionUID, for use with java.io.Serializable (like Eclipse does
automatically).

这个 语法文件 内容太大,就不上玉照了。





The key is the TokenStreamRewriter object that knows how to give altered views
of a token stream without actually modifying the stream.




   

论坛徽章:
0
18 [报告]
发表于 2014-03-05 16:08 |只看该作者
本帖最后由 oldbeginner 于 2014-03-05 16:16 编辑

*************************************
慢下来,是为了走得更好

************************************



It’s time to slow down our pace and revisit all of the concepts explored in this
chapter with the goal of learning all of the details. Each chapter in the next
part of the book will take us another step toward becoming language implementers.
We’ll start by learning ANTLR notation and figuring out how to derive
grammars from examples and language reference manuals.



这张图是配套2010年中国基尼系数0.6的(最近几年都不敢调查了),很形象,但有一个致命的缺点,就是分 “城” 和“乡”是错误的和误导的,

最起码应该分”体制内或和体制相关的“和”体制外且无关系的“

   

画外音,这些图片和antlr 有啥关系?

另一个画外音,学antlr要慢下来,静下来,否则会出现像某大国。。。。。


接地气最厉害的不在天c,想接地气,向非洲学习



真好,理解逐级接地气了吗?

论坛徽章:
0
19 [报告]
发表于 2014-03-06 15:43 |只看该作者
本帖最后由 oldbeginner 于 2014-03-06 15:59 编辑

******************************
第五章 designing grammars
那些年,汉字 还是 贵族
******************************




你天天韩国韩剧的,你祖上知道吗?





Now we’re going to slow down and learn the details
needed to perform useful tasks such as building internal data structures,
extracting information, and generating a translation of the input. The first
step on our journey, though, is to learn how to build grammars.

The constraints of word order and dependency, derived from natural language,
blossom into four abstract computer language patterns.

• Sequence: This is a sequence of elements such as the values in an array
initializer.
• Choice: This is a choice between multiple, alternative phrases such as
the different kinds of statements in a programming language.
• Token dependence: The presence of one token requires the presence of
its counterpart elsewhere in a phrase such as matching left and right
parentheses.
• Nested phrase: This is a self-similar language construct such as nested
arithmetic expressions or nested statement blocks in a programming
language.

*******************************
5.1 Deriving Grammars from Language Samples

*******************************

Proper grammar design mirrors functional decomposition or top-down design
in the programming world. That means we work from the coarsest to the
finest level, identifying language structures and encoding them as grammatical
rules.

For example, “a comma-separated-value (CSV) file is a sequence of
rows terminated by newlines.” The essential word file to the left of is a is the
rule name, and everything to the right of is a becomes the «stuff» on the right
side of a rule definition.

Then we step down a level in granularity by describing the elements identified
on the right side of the start rule.

Stepping down another level of detail, we could say that a row is a sequence
of fields separated by commas. Then, a field is a number or string. Our
pseudocode looks like this:

file : «sequence of rows that are terminated by newlines» ;
row : «sequence of fields separated by commas» ;
field : «number or string» ;


***********************************

Let’s see how this technique works for describing some of the key structures
in a Java file. (We can make the rule names stand out by italicizing them.)
At the coarsest level, a Java compilation unit is an optional package specifier,
followed by one or more class definitions. Stepping down a level, a class definition
is keyword class, followed by an identifier, followed optionally by a
superclass specifier, followed optionally by an implements clause, followed by
a class body. A class body is a series of members enclosed in curly braces. A
member is a nested class definition, a field, or a method. From here, we would
describe fields and methods and then the statements within methods. You
get the idea. Start at the highest possible level and work your way down,
treating even large subphrases like Java class definitions as rules to define
later. In grammar pseudocode, we’d start out like this:

compilationUnit : «optional packageSpec then classDefinitions» ;
packageSpec : 'package' identifier ';' ;
classDefinition :
'class' «optional superclassSpec optional implementsClause classBody» ;
superclassSpec : 'super' identifier ;
implementsClause :
'implements' «one or more identifiers separated by comma» ;
classBody : '{' «zero-or-more members» '}' ;
member : «nested classDefinition or field or method» ;


***********************************
看出来了,作者写的根本就不是 一本 关于 ANTLR4 入门 方面的书籍。

另外,感觉 作者好像不是很懂 龙书,因为看他的文章一点不能帮助学习龙书;龙书上的知识也不能帮助学习 ANTLR4。这难道就是传说中的,理论脱离实际了吗?

不过,龙书一定有必要吗?不看龙书,微软就不能开发windows了吗?

   

论坛徽章:
0
20 [报告]
发表于 2014-03-06 17:29 |只看该作者
***********************
5.2 Using Existing Grammars as a Guide
***********************


没啥意思,不过有句话倒是可以听听,
Identifying the grammar rules and expressing their right sides in pseudocode
is challenging at first but gets easier and easier as you build grammars for
more languages.


*******************************
5.3 Recognizing Common Language Patterns with ANTLR Grammars

*******************************

   
Now that we have a general top-down strategy for roughing out a grammar,
we need to focus on the common language patterns: sequence, choice, token
dependence, and nested phrase.

Pattern: Sequence

The structure you’ll see most often in computer languages is a sequence of
elements, such as the sequence of methods in a class definition.

Variations on this pattern include the sequence with terminator and sequence
with separator. CSV files demonstrate both nicely. Here’s how we can express
the pseudocode grammar from the previous section in ANTLR notation:

file : (row '\n')* ; // sequence with a '\n' terminator
row : field (',' field)* ; // sequence with a ',' separator
field: INT ; // assume fields are just integers


Rule file uses the list with terminator pattern to match zero or more row '\n'
sequences. The '\n' token terminates each element of the sequence. Rule row
uses the list with separator pattern by matching a field followed by zero or
more ',' field sequences. The ',' separates the fields. row matches sequences like
1 and 1,2 and 1,2,3, and so on.


Pattern: Choice (Alternatives)

To express the notion of choice in a language, we use | as the “or” operator
in ANTLR rules to separate grammatical choices called alternatives or productions.
Grammars are full of choices.

Returning to our CSV grammar, we can make a more flexible field rule by
allowing the choice of integers or strings.

field : INT | STRING ;

Any time you find yourself saying “language structure x can be either this or
that,” then you’ve identified a choice pattern. Use | in rule x.


Pattern: Token Dependency

we need a way to express dependencies between tokens. If we see one symbol
in a sentence, we must find its matching counterpart elsewhere in the sentence.
To express this with a grammar, we use a sequence that specifies both
symbols, usually enclosing or grouping other elements. In this case, we
completely specify vectors like this:

vector : '[' INT+ ']' ; // [1], [1 2], [1 2 3], ...

Keep in mind that dependent symbols don’t necessarily have to match. Cderived
languages also have the a?b:c ternary operator where the ? sets up a
requirement to see : later in the phrase.



Pattern: Nested Phrase

A nested phrase has a self-similar language structure, one whose subphrases
conform to that same structure.

Let’s see how nesting works for code blocks. A while statement is the keyword
while followed by a condition expression in parentheses followed by a statement.
We can also treat multiple statements as a single block statement by wrapping
them in curly braces. Expressing that grammatically looks like this:

stat: 'while' '(' expr ')' stat // match WHILE statement
| '{' stat* '}' // match block of statements in curlies
... // and other kinds of statements

;

The looping statement, stat, of the while can be a single statement or a group
of statements if we enclose them in {...}. Rule stat is directly recursive because
it refers to itself in the first (and second) alternatives. If we moved the second
alternative to its own rule, rules stat and block would be mutually indirectly
recursive.

stat: 'while' '(' expr ')' stat // match WHILE statement
| block // match a block of statements
... // and other kinds of statements
;
block: '{' stat* '}' ; // match block of statements in curlies



**************************************************
**************************************************

For future reference, here’s a table summarizing ANTLR’s core grammar
notation:


And here is a table summarizing what we’ve learned so far about common
computer language patterns:















您需要登录后才可以回帖 登录 | 注册

本版积分规则 发表回复

  

北京盛拓优讯信息技术有限公司. 版权所有 京ICP备16024965号-6 北京市公安局海淀分局网监中心备案编号:11010802020122 niuxiaotong@pcpop.com 17352615567
未成年举报专区
中国互联网协会会员  联系我们:huangweiwei@itpub.net
感谢所有关心和支持过ChinaUnix的朋友们 转载本站内容请注明原作者名及出处

清除 Cookies - ChinaUnix - Archiver - WAP - TOP