学习Lex遇到的问题集

wilbur8415 发表于 2008-02-10 12:51

我想要提取html文件中的超链接准备用Lex来做

这个帖子是我学习Lex遇到的问题我的qq是 724881183邮箱是wilbur8415@163.com

希望与大家一起交流

wilbur8415 发表于 2008-02-10 12:52

2008.2.10

1）Lex开始状态在BEGIN到某个状态后会继续执行接下面的C语句妈？直到把这个动作执行完？

例如 :

<STATE>. {BEGIN INITIAL; int a =1;}

在STATE状态匹配任何字符后又跳回INITIAL状态那么 int a=1还会不会被执行？

（我一直理解这个BEGIN有点像 GOTO语句）

2）在执行yy_push_state(STATE); 后如果在某处调用了 yy_pop_state();是不是状态就立刻跳转到 STATE状态？

3）关于动作一般 ; （分号）就是说忽略的意思

那么后面什么动作都没有只有空行是什么意思例如：

.|/n /*什么都没有*/

cjaizss 发表于 2008-02-10 13:33

1)测试方法：
{BEGIN XXX;printf("BEGIN XXX\n");}
2)同样可以测试:
<A>xxx {printf("A:xxx\n");}
<B>xxx {printf("A:xxx\n");}/*一样的regex*/
xxxx {push}
xxxxx {pop}
3)表示什么都不做，也就是过滤掉这些，这些被匹配的部分起到分割符的作用

自己动手测试，想办法测试是最好的学习方法，这样进步很快

[ 本帖最后由 cjaizss 于 2008-2-10 13:42 编辑 ]

wilbur8415 发表于 2008-02-10 17:29

Three routines are available for manipulating stacks of start conditions:

`void yy_push_state(int new_state)'
pushes the current start condition onto the top of the start condition stack and switches to new_state as though you had used `BEGIN new_state' (recall that start condition names are also integers).
`void yy_pop_state()'
pops the top of the stack and switches to it via BEGIN.
`int yy_top_state()'
returns the top of the stack without altering the stack's contents.
The start condition stack grows dynamically and so has no built-in size limitation. If memory is exhausted, program execution aborts.

To use start condition stacks, your scanner must include a `%option stack' directive (see Options below).

push难道是直接转状态了（as though you had used `BEGIN new_state' ）

[ 本帖最后由 wilbur8415 于 2008-2-10 18:48 编辑 ]

wilbur8415 发表于 2008-02-10 18:43

cjaizss能否先说一下第2个的答案因为我这里还没有linux环境测试不了

cjaizss 发表于 2008-02-10 20:46

连测试环境都没有，你还学？？？

cjaizss 发表于 2008-02-10 21:35

yy_push_stat函数是把当前状态压进栈，然后转向指定状态，yy_pop_stat函数是从栈上弹出栈顶状态。
例子:
%{
#include <stdio.h>
%}
%option stack
%s A
%%
<A>a {printf("hoho,Aa\n");}
a {printf("hoho,a\n");yy_push_state(A);}
<A>b {printf("hoho,Ab\n");yy_pop_state();}
b {printf("hoho,b\n");}
[\t \n]+ ;
. ;
%%
int main()
{
     yylex();
     return 0;
}

linux-0gt0:/tmp # flex 1.l;gcc lex.yy.c -ll;./a.out
a
hoho,a
a
hoho,Aa
b
hoho,Ab
b
hoho,b

[ 本帖最后由 cjaizss 于 2008-2-10 21:46 编辑 ]

cjaizss 发表于 2008-02-10 21:54

最后，个人觉得你提取html文件中的超链接可能是在用宰牛刀杀鸡

wilbur8415 发表于 2008-02-10 22:10

谢cjaizss 了哈哈

我在看一个开源的小型搜索引擎 TSE 它里面提取链接就用的lex

我也可能提取其他的不止是超链接还有图片地址等我觉得可能用正则比较方便

我在下Cygwin用的镜像是Cygwin.cn的那个站点不知道是不是连接有问题下到一

半就停止了试了5，6遍

cjaizss 发表于 2008-02-10 22:36

:)要用就干脆用一个真正的linux，虚拟机也可。当然，如果你的程序中执行非常频繁，lex/yacc自然是首选，hoho。否则，文本处理脚本是很不错的选择，偶首先想到的就是awk/sed，perl应当更好，可惜我不会，但awk/sed已经足够。

页: [1] 2

Chinaunix's Archiver

学习Lex遇到的问题集