免费注册 查看新帖 |

Chinaunix

  平台 论坛 博客 文库
最近访问板块 发新帖
查看: 20322 | 回复: 69
打印 上一主题 下一主题

有多少种方法可以去掉c代码中的注释? [复制链接]

论坛徽章:
0
跳转到指定楼层
1 [收藏(0)] [报告]
发表于 2006-10-24 11:22 |只看该作者 |倒序浏览
这两天看代码的时候碰到了一个结构体,注释多的惊人(当然,这是个好习惯,^_^)。但我要写分析笔记,为了格式好看,于是想如何能够方便地把这些注释去掉。

就以这个结构体为例吧:

  1. /* The structure which defines the type of a value.  It should never
  2.    be possible for a program lval value to survive over a call to the
  3.    inferior (i.e. to be put into the history list or an internal
  4.    variable).  */

  5. struct value
  6. {
  7.   /* Type of value; either not an lval, or one of the various
  8.      different possible kinds of lval.  */
  9.   enum lval_type lval;

  10.   /* Is it modifiable?  Only relevant if lval != not_lval.  */
  11.   int modifiable;

  12.   /* Location of value (if lval).  */
  13.   union
  14.   {
  15.     /* If lval == lval_memory, this is the address in the inferior.
  16.        If lval == lval_register, this is the byte offset into the
  17.        registers structure.  */
  18.     CORE_ADDR address;

  19.     /* Pointer to internal variable.  */
  20.     struct internalvar *internalvar;

  21.     /* Number of register.  Only used with lval_reg_frame_relative.  */
  22.     int regnum;
  23.   } location;

  24.   /* Describes offset of a value within lval of a structure in bytes.
  25.      If lval == lval_memory, this is an offset to the address.
  26.      If lval == lval_register, this is a further offset from
  27.      location.address within the registers structure.  
  28.      Note also the member embedded_offset below.  */
  29.   int offset;

  30.   /* Only used for bitfields; number of bits contained in them.  */
  31.   int bitsize;

  32.   /* Only used for bitfields; position of start of field.
  33.      For BITS_BIG_ENDIAN=0 targets, it is the position of the LSB.
  34.      For BITS_BIG_ENDIAN=1 targets, it is the position of the MSB. */
  35.     int bitpos;

  36.   /* Frame value is relative to.  In practice, this ID is only used if
  37.      the value is stored in several registers in other than the
  38.      current frame, and these registers have not all been saved at the
  39.      same place in memory.  This will be described in the lval enum
  40.      above as "lval_reg_frame_relative".  */
  41.   struct frame_id frame_id;

  42.   /* Type of the value.  */
  43.   struct type *type;

  44.   /* If a value represents a C++ object, then the `type' field gives
  45.      the object's compile-time type.  If the object actually belongs
  46.      to some class derived from `type', perhaps with other base
  47.      classes and additional members, then `type' is just a subobject
  48.      of the real thing, and the full object is probably larger than
  49.      `type' would suggest.

  50.      If `type' is a dynamic class (i.e. one with a vtable), then GDB
  51.      can actually determine the object's run-time type by looking at
  52.      the run-time type information in the vtable.  When this
  53.      information is available, we may elect to read in the entire
  54.      object, for several reasons:

  55.      - When printing the value, the user would probably rather see the
  56.        full object, not just the limited portion apparent from the
  57.        compile-time type.

  58.      - If `type' has virtual base classes, then even printing `type'
  59.        alone may require reaching outside the `type' portion of the
  60.        object to wherever the virtual base class has been stored.

  61.      When we store the entire object, `enclosing_type' is the run-time
  62.      type -- the complete object -- and `embedded_offset' is the
  63.      offset of `type' within that larger type, in bytes.  The
  64.      VALUE_CONTENTS macro takes `embedded_offset' into account, so
  65.      most GDB code continues to see the `type' portion of the value,
  66.      just as the inferior would.

  67.      If `type' is a pointer to an object, then `enclosing_type' is a
  68.      pointer to the object's run-time type, and `pointed_to_offset' is
  69.      the offset in bytes from the full object to the pointed-to object
  70.      -- that is, the value `embedded_offset' would have if we
  71.      followed the pointer and fetched the complete object.  (I don't
  72.      really see the point.  Why not just determine the run-time type
  73.      when you indirect, and avoid the special case?  The contents
  74.      don't matter until you indirect anyway.)

  75.      If we're not doing anything fancy, `enclosing_type' is equal to
  76.      `type', and `embedded_offset' is zero, so everything works
  77.      normally.  */
  78.     struct type *enclosing_type;
  79.     int embedded_offset;
  80.     int pointed_to_offset;

  81.     /* Values are stored in a chain, so that they can be deleted
  82.        easily over calls to the inferior.  Values assigned to internal
  83.        variables or put into the value history are taken off this
  84.        list.  */
  85.     struct value *next;

  86.     /* Register number if the value is from a register.  */
  87.     short regno;

  88.     /* If zero, contents of this value are in the contents field.  If
  89.        nonzero, contents are in inferior memory at address in the
  90.        location.address field plus the offset field (and the lval
  91.        field should be lval_memory).

  92.        WARNING: This field is used by the code which handles
  93.        watchpoints (see breakpoint.c) to decide whether a particular
  94.        value can be watched by hardware watchpoints.  If the lazy flag
  95.        is set for some member of a value chain, it is assumed that
  96.        this member of the chain doesn't need to be watched as part of
  97.        watching the value itself.  This is how GDB avoids watching the
  98.        entire struct or array when the user wants to watch a single
  99.        struct member or array element.  If you ever change the way
  100.        lazy flag is set and reset, be sure to consider this use as
  101.        well!  */
  102.     char lazy;

  103.     /* If nonzero, this is the value of a variable which does not
  104.        actually exist in the program.  */
  105.     char optimized_out;

  106.     /* The BFD section associated with this value.  */
  107.     asection *bfd_section;

  108.     /* Actual contents of the value.  For use of this value; setting
  109.        it uses the stuff above.  Not valid if lazy is nonzero.
  110.        Target byte-order.  We force it to be aligned properly for any
  111.        possible value.  Note that a value therefore extends beyond
  112.        what is declared here.  */
  113.     union
  114.     {
  115.       long contents[1];
  116.       DOUBLEST force_doublest_align;
  117.       LONGEST force_longest_align;
  118.       CORE_ADDR force_core_addr_align;
  119.       void *force_pointer_align;
  120.     } aligner;
  121.     /* Do not add any new members here -- contents above will trash them.  */
  122. };
复制代码


希望编辑之后的效果就是:
  1. struct value
  2. {
  3.     enum lval_type lval;
  4.     int modifiable;
  5.     union
  6.     {
  7.         CORE_ADDR address;
  8.         struct internalvar *internalvar;
  9.         int regnum;
  10.     } location;
  11.     int offset;
  12.     int bitsize;
  13.     int bitpos;
  14.     struct frame_id frame_id;
  15.     struct type *type;
  16.     struct type *enclosing_type;
  17.     int embedded_offset;
  18.     int pointed_to_offset;
  19.     struct value *next;
  20.     short regno;
  21.     char lazy;
  22.     char optimized_out;
  23.     asection *bfd_section;
  24.     union
  25.     {
  26.         long contents[1];
  27.         DOUBLEST force_doublest_align;
  28.         LONGEST force_longest_align;
  29.         CORE_ADDR force_core_addr_align;
  30.         void *force_pointer_align;
  31.     } aligner;
  32. };
复制代码


请问各位都知道哪些方法可以完成这个任务?欢迎举报!
如果各位想到了方法,麻烦准确地描述一下操作步骤,我好学习,

在文本编辑器中使用普通方式进行删除统一归为一种方法(不管你是用vim的“x”还是其它编辑器的“Del”,),但如果能在文本编辑器中使用诸如独特的正则表达式等“快捷”操作来完成任务的,则可算是一种独立的方法。

希望能够借此学习一下各种文本编辑器和脚本语言的特性,所以希望大家不吝赐教!

我先写了一个awk脚本来完成这个任务:
awkfile:
  1. BEGIN { del = 0 }
  2. /\/\*/ { del = 1 }
  3. /\*\// { del = 0; next }
  4. del == 0 && /^.+$/{ print }
复制代码


实验结果如下,假设上述结构体的代码放在value文件中:
  1. [~]$
  2. [~]$ cat awkfile
  3. BEGIN { del = 0 }
  4. /\/\*/ { del = 1 }
  5. /\*\// { del = 0; next }
  6. del == 0 && /^.+$/{ print }
  7. [~]$
  8. [~]$
  9. [~]$ awk -f awkfile value
  10. struct value
  11. {
  12.     enum lval_type lval;
  13.     int modifiable;
  14.     union
  15.     {
  16.         CORE_ADDR address;
  17.         struct internalvar *internalvar;
  18.         int regnum;
  19.     } location;
  20.     int offset;
  21.     int bitsize;
  22.     int bitpos;
  23.     struct frame_id frame_id;
  24.     struct type *type;
  25.     struct type *enclosing_type;
  26.     int embedded_offset;
  27.     int pointed_to_offset;
  28.     struct value *next;
  29.     short regno;
  30.     char lazy;
  31.     char optimized_out;
  32.     asection *bfd_section;
  33.     union
  34.     {
  35.         long contents[1];
  36.         DOUBLEST force_doublest_align;
  37.         LONGEST force_longest_align;
  38.         CORE_ADDR force_core_addr_align;
  39.         void *force_pointer_align;
  40.     } aligner;
  41. };
  42. [~]$
  43. [~]$
复制代码

[ 本帖最后由 雨丝风片 于 2006-10-24 11:29 编辑 ]

论坛徽章:
0
2 [报告]
发表于 2006-10-24 11:26 |只看该作者
我举的结构体的例子有一个特殊之处,就是它的代码和注释没有混杂在同一行内,因此我写的脚本也只能处理这种情况。期待各位给出能够处理代码和注释混杂在同一行内的情况的方法。

论坛徽章:
1
寅虎
日期:2013-09-29 23:15:15
3 [报告]
发表于 2006-10-24 11:40 |只看该作者
学习ing ^_^

论坛徽章:
0
4 [报告]
发表于 2006-10-24 11:41 |只看该作者
原帖由 congli 于 2006-10-24 11:40 发表
学习ing ^_^


是我学习ing!你们赶快想想有没有其它的方法,^_^

论坛徽章:
0
5 [报告]
发表于 2006-10-24 11:59 |只看该作者
用Perl或者PHP的正则表达式。

你去php版问问写过采集的,应该是小菜。

(不过我不会)

论坛徽章:
0
6 [报告]
发表于 2006-10-24 12:01 |只看该作者
原帖由 Macolex 于 2006-10-24 11:59 发表
用Perl或者PHP的正则表达式。

你去php版问问写过采集的,应该是小菜。

(不过我不会)


哦,最好是有现成的方法推荐过来哈!
我的目的之一也是想看看不同的脚本语言在完成这个任务时的优缺点如何,

论坛徽章:
0
7 [报告]
发表于 2006-10-24 13:37 |只看该作者
这个脚本怎么处理"/**/"?

似乎要正确处理所有的情况是很麻烦的……

论坛徽章:
1
荣誉版主
日期:2011-11-23 16:44:17
8 [报告]
发表于 2006-10-24 14:10 |只看该作者
perl正则可以实现,但我还没学到这呢

论坛徽章:
0
9 [报告]
发表于 2006-10-24 14:47 |只看该作者
原帖由 antijp 于 2006-10-24 13:37 发表
这个脚本怎么处理"/**/"?

似乎要正确处理所有的情况是很麻烦的……


如果是一行之内只有/*... */的话,前面的脚本已经能够处理了,只是它无法处理c代码和注释在一行之内混杂的情况。我新写了一个awk脚本:

  1. BEGIN { del = 0 }
  2. /\/\*.*\*\// { sub(/\/\*.*\*\//, "") }
  3. /\/\*/ { del = 1 }
  4. /\*\// { del = 0; next }
  5. /^ *$/ { next }
  6. del == 0 { print }
复制代码


这个脚本可以去掉下面四种形式的注释:

  1. /* aaaaaaaaaaaaaaaaaaaaaaaaa
  2.    bbbbbbbbbbbbbbbbbbbbbbbbb
  3.    ccccccccccccccccccccccccc */

  4. /* ddddddddddddddddddddddddd */

  5. int foo; /* eeeeeeeeeeeeeeeee */

  6. /* ffffffffffffffff */  int bar;
复制代码


但还处理不了下面这种形式的注释:

  1. int foobar; /* fffffffffffffffff
  2.                ggggggggggggggggg */
复制代码

[ 本帖最后由 雨丝风片 于 2006-10-24 15:02 编辑 ]

论坛徽章:
0
10 [报告]
发表于 2006-10-24 14:53 |只看该作者
原帖由 大大狗 于 2006-10-24 14:10 发表
perl正则可以实现,但我还没学到这呢


不是可不可以实现的问题,是大家都能想到哪些具体方法来实现,并比较这些方法的各自特点的问题。
如果编写脚本来完成整个任务,多半是离不开正则表达式的,这跟用不用perl到没什么关系,但不排除
不同语言之间的正则表达式等方面的差异会对这个问题的求解产生可观的影响,所以这个问题才很有
探究的意义。

我写的awk脚本也用到了“正则”,
您需要登录后才可以回帖 登录 | 注册

本版积分规则 发表回复

  

北京盛拓优讯信息技术有限公司. 版权所有 京ICP备16024965号-6 北京市公安局海淀分局网监中心备案编号:11010802020122 niuxiaotong@pcpop.com 17352615567
未成年举报专区
中国互联网协会会员  联系我们:huangweiwei@itpub.net
感谢所有关心和支持过ChinaUnix的朋友们 转载本站内容请注明原作者名及出处

清除 Cookies - ChinaUnix - Archiver - WAP - TOP