- 论坛徽章:
- 0
|
<DIV>
<P style="TEXT-ALIGN: center; mso-line-height-alt: 15.6pt" align=center><A name=OLE_LINK3></A><A name=OLE_LINK2></A><A name=OLE_LINK1></A><A name=OLE_LINK6></A><A name=OLE_LINK5></A><A name=OLE_LINK4><SPAN style="mso-bookmark: OLE_LINK5"><SPAN style="mso-bookmark: OLE_LINK6"><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><B style="mso-bidi-font-weight: normal"><SPAN style="FONT-SIZE: 22pt"><FONT face=宋体>基于<SPAN lang=EN-US>hadoop</SPAN>的大规模数据排序算法</FONT></SPAN></B></SPAN></SPAN></SPAN></SPAN></SPAN></A><SPAN style="mso-bookmark: OLE_LINK4"><SPAN style="mso-bookmark: OLE_LINK5"><SPAN style="mso-bookmark: OLE_LINK6"><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-SIZE: 10.5pt" lang=EN-US></SPAN></SPAN></SPAN></SPAN></SPAN></SPAN></SPAN></P><SPAN style="mso-bookmark: OLE_LINK6"></SPAN><SPAN style="mso-bookmark: OLE_LINK5"></SPAN><SPAN style="mso-bookmark: OLE_LINK4"></SPAN>
<P style="mso-line-height-alt: 15.6pt"><FONT face=宋体><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><B style="mso-bidi-font-weight: normal"><SPAN style="FONT-SIZE: 22pt" lang=EN-US><SPAN style="mso-spacerun: yes"> </SPAN></SPAN></B></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><B style="mso-bidi-font-weight: normal"><SPAN style="FONT-SIZE: 15pt" lang=EN-US><SPAN style="mso-spacerun: yes"> </SPAN></SPAN></B></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><B style="mso-bidi-font-weight: normal"><SPAN style="FONT-SIZE: 13.5pt" lang=EN-US>-------2011.10.26</SPAN></B></SPAN></SPAN></SPAN></FONT></P>
<P style="LINE-HEIGHT: 15.6pt"><FONT size=3><FONT face=宋体><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN lang=EN-US><SPAN style="mso-spacerun: yes"> </SPAN></SPAN>小组成员:</SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-SIZE: 10.5pt" lang=EN-US></SPAN></SPAN></SPAN></SPAN></FONT></FONT></P>
<P style="LINE-HEIGHT: 15.6pt"><FONT size=3><FONT face=宋体><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN lang=EN-US><SPAN style="mso-spacerun: yes"> </SPAN></SPAN>组长:韩旭红<SPAN lang=EN-US> 1091000161</SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-SIZE: 10.5pt" lang=EN-US></SPAN></SPAN></SPAN></SPAN></FONT></FONT></P>
<P style="LINE-HEIGHT: 15.6pt"><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><FONT size=3><FONT face=宋体><SPAN lang=EN-US><SPAN style="mso-spacerun: yes"> </SPAN></SPAN>组员:李巍<SPAN lang=EN-US> 1091000167 </SPAN>李越<SPAN lang=EN-US> 1091000169<SPAN style="mso-spacerun: yes"> </SPAN></SPAN>闫悦<SPAN lang=EN-US> 1091000178</SPAN></FONT></FONT></SPAN></SPAN></SPAN></P>
<P style="LINE-HEIGHT: 15.6pt"><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-SIZE: 10.5pt" lang=EN-US><FONT face=宋体> </FONT></SPAN></SPAN></SPAN></SPAN></P>
<P style="TEXT-INDENT: -17.1pt; MARGIN-LEFT: 53.2pt; mso-line-height-alt: 15.6pt; mso-para-margin-left: 3.44gd; mso-char-indent-count: -.95"><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 幼圆; FONT-SIZE: 18pt; mso-bidi-font-weight: bold; mso-bidi-font-family: 幼圆">近几周内容:<SPAN lang=EN-US></SPAN></SPAN></SPAN></SPAN></SPAN></P>
<P style="TEXT-INDENT: 27pt; MARGIN-LEFT: 53.25pt; mso-line-height-alt: 15.6pt; mso-para-margin-left: 5.07gd; mso-char-indent-count: 1.5"><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 幼圆; FONT-SIZE: 18pt; mso-bidi-font-weight: bold; mso-bidi-font-family: 幼圆">最近一段时间我们首先把操作平台配置好了,然后大致了解了一下<SPAN lang=EN-US>java</SPAN>的内容。把一些源代码看了看,看了一些有关大规模数据排序算法的内容。今天我们组介绍一下<SPAN lang=EN-US>nutch</SPAN>的内容。<SPAN lang=EN-US></SPAN></SPAN></SPAN></SPAN></SPAN></P>
<P style="TEXT-INDENT: 27pt; MARGIN-LEFT: 53.25pt; mso-line-height-alt: 15.6pt; mso-para-margin-left: 5.07gd; mso-char-indent-count: 1.5"><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 幼圆; FONT-SIZE: 18pt; mso-bidi-font-weight: bold; mso-bidi-font-family: 幼圆" lang=EN-US>Nutch</SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 幼圆; FONT-SIZE: 18pt; mso-bidi-font-weight: bold; mso-bidi-font-family: 幼圆">的内容谨此让大家了解<SPAN lang=EN-US>,</SPAN>有兴趣可以看一下<SPAN lang=EN-US>,</SPAN>估计我们也没有时间做这部分的内容了。。。</SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><I style="mso-bidi-font-style: normal"><SPAN style="FONT-SIZE: 10.5pt" lang=EN-US></SPAN></I></SPAN></SPAN></SPAN></P>
<P style="TEXT-ALIGN: left; MARGIN: 0cm 0cm 0pt; mso-pagination: widow-orphan; mso-margin-top-alt: auto; mso-margin-bottom-alt: auto; mso-outline-level: 2" class=MsoNormal align=left><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 幼圆; FONT-SIZE: 18pt; mso-bidi-font-weight: bold; mso-hansi-font-family: 宋体; mso-bidi-font-family: 宋体; mso-font-kerning: 0pt">一.概述</SPAN></SPAN></SPAN></SPAN><A href="http://baike.baidu.com/albums/46642/46642.html#0$29790130d4e371baa8018e48" target=_blank><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="LETTER-SPACING: 0.4pt; TEXT-DECORATION: none; text-underline: none; mso-no-proof: yes" lang=EN-US></P></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"></SPAN></SPAN></SPAN></A>
<P style="TEXT-ALIGN: center; LINE-HEIGHT: 18pt; MARGIN: 0cm 0cm 0pt; BACKGROUND: #f6f6f6" class=MsoNormal align=center><a href="http://blog.chinaunix.nethttp://blog.chinaunix.net/attachment/201110/26/24677087_1319642683EnhB.jpg" target="_blank"><IMG border=0 src="http://blog.chinaunix.nethttp://blog.chinaunix.net/attachment/201110/26/24677087_1319642683EnhB.jpg" .load="imgResize(this, 650);" ;></A></P>
<P style="TEXT-ALIGN: center; LINE-HEIGHT: 18pt; MARGIN: 0cm 0cm 0pt; BACKGROUND: #f6f6f6" class=MsoNormal align=center>图 1 </P>
<P style="TEXT-ALIGN: center; LINE-HEIGHT: 18pt; MARGIN: 0cm 0cm 0pt; BACKGROUND: #f6f6f6" class=MsoNormal align=center> </P>
<P style="TEXT-ALIGN: center; LINE-HEIGHT: 18pt; MARGIN: 0cm 0cm 0pt; BACKGROUND: #f6f6f6" class=MsoNormal align=center> </P><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US></SPAN></SPAN></SPAN></SPAN>
<P style="LINE-HEIGHT: 18pt; TEXT-INDENT: 22.6pt; MARGIN: 0cm 0cm 0pt; BACKGROUND: white; mso-char-indent-count: 2.0" class=MsoNormal><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>Nutch</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">是一个由</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>Java</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">实现的,刚刚诞生开放源代码</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>(open-source)</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">的</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>web</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">搜索引擎。</SPAN><SPAN style="LETTER-SPACING: 0.4pt"><FONT face=Calibri> <SPAN lang=EN-US></SPAN></FONT></SPAN></SPAN></SPAN></SPAN></P>
<P style="LINE-HEIGHT: 18pt; TEXT-INDENT: 22.6pt; MARGIN: 0cm 0cm 0pt; BACKGROUND: white; mso-char-indent-count: 2.0" class=MsoNormal><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">随着互联网技术的不断发展</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>,</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">搜索引擎已经成为人们获取网络信息的主要工具。研究搜索引擎网页排序的目的是从众多搜索结果中将内容相关和权威的网页排在前面</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>,</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">帮助用户迅速定位需要的网络资源。网页排序算法直接影响到搜索引擎信息检索的准确率和用户使用满意度。</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>Nutch</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">是基于</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>Java</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">实现的开源搜索引擎。通过对</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>Nutch</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">进行深入研究</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>,</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">指出其目前存在的两大问题</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>,</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">其一是没有实现</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>PageRank</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">算法</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>,</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">影响了最终排序效果</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>;</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">其二是对中文进行单字切分</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>,</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">影响了查询结果准确率。</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><BR></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin"> </SPAN><SPAN style="LETTER-SPACING: 0.4pt"><FONT face=Calibri> </FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">首先</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>,</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">针对目前</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>Nutch</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">搜索引擎中没有实现网页</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>PageRank</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">计算的问题</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>,</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">利用</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>MapReduce</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">并行计算模型处理大数据集的优势</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>,</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">在</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>Nutch</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">机群系统上设计和实现了基于</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>MapReduce</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">的</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>PageRank</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">分布式并行算法。实验结果表明</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>:</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">处理的数据量越大</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>,</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">机群中的节点越多</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>,</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">计算</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>PageRank</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">的效率越高</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>;</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">另外</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>,</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">该分布式并行算法具有较好的可扩展性。然后</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>,</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">针对目前</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>Nutch</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">对中文进行单字切分的问题</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>,</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">加入了</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>JE</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">中文分词器对</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>Nutch</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">的中文分词进行改进。在分析和研究经典</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>PageRank</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">算法原理的基础上</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>,</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">通过设置控制站外与站内链接的比重因子对该算法进行了改进。为了改善</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>Nutch</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">基于</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>Lucene</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">的网页综合排序模型</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>,</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">将改进后的</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>PageRank</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">算法因子融入到</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>Nutch</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">网页评分计算公式当中。实验表明</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>,</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">改进后</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>Nutch</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">明显提高了查询结果的准确率</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>,</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">改善了中文网页的排序效果。</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US></SPAN></SPAN></SPAN></SPAN></P>
<P style="TEXT-INDENT: -53.25pt; MARGIN-LEFT: 53.25pt; mso-line-height-alt: 15.6pt"><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 幼圆; FONT-SIZE: 18pt; mso-bidi-font-weight: bold; mso-bidi-font-family: 幼圆">二.</SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 'Times New Roman','serif'; FONT-SIZE: 7pt; mso-bidi-font-weight: bold; mso-fareast-font-family: 幼圆" lang=EN-US> </SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 幼圆; FONT-SIZE: 18pt; mso-bidi-font-weight: bold" lang=EN-US>Nutch </SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 幼圆; FONT-SIZE: 18pt; mso-bidi-font-weight: bold">致力于做到:<SPAN lang=EN-US></SPAN></SPAN></SPAN></SPAN></SPAN></P>
<P style="LINE-HEIGHT: 18pt; MARGIN: 0cm 0cm 0pt; BACKGROUND: white" class=MsoNormal><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 幼圆; FONT-SIZE: 18pt; mso-bidi-font-weight: bold" lang=EN-US><SPAN style="mso-spacerun: yes"> </SPAN></SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>Nutch </FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">致力于做到:</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US></SPAN></SPAN></SPAN></SPAN></P>
<P style="LINE-HEIGHT: 18pt; MARGIN: 0cm 0cm 0pt; BACKGROUND: white" class=MsoNormal><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin"> </SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>* </FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">每个月取几十亿网页</SPAN><SPAN style="LETTER-SPACING: 0.4pt"><FONT face=Calibri> <SPAN lang=EN-US></SPAN></FONT></SPAN></SPAN></SPAN></SPAN></P>
<P style="LINE-HEIGHT: 18pt; MARGIN: 0cm 0cm 0pt; BACKGROUND: white" class=MsoNormal><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin"> </SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>* </FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">为这些网页维护一个索引</SPAN><SPAN style="LETTER-SPACING: 0.4pt"><FONT face=Calibri> <SPAN lang=EN-US></SPAN></FONT></SPAN></SPAN></SPAN></SPAN></P>
<P style="LINE-HEIGHT: 18pt; MARGIN: 0cm 0cm 0pt; BACKGROUND: white" class=MsoNormal><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin"> </SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>* </FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">对索引文件进行每秒上千次的搜索</SPAN><SPAN style="LETTER-SPACING: 0.4pt"><FONT face=Calibri> <SPAN lang=EN-US></SPAN></FONT></SPAN></SPAN></SPAN></SPAN></P>
<P style="LINE-HEIGHT: 18pt; MARGIN: 0cm 0cm 0pt; BACKGROUND: white" class=MsoNormal><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin"> </SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>* </FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">提供高质量的搜索结果</SPAN><SPAN style="LETTER-SPACING: 0.4pt"><FONT face=Calibri> <SPAN lang=EN-US></SPAN></FONT></SPAN></SPAN></SPAN></SPAN></P>
<P style="LINE-HEIGHT: 18pt; MARGIN: 0cm 0cm 0pt; BACKGROUND: white" class=MsoNormal><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin"> </SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>* </FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">以最小的成本运作</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US></SPAN></SPAN></SPAN></SPAN></P>
<P style="mso-line-height-alt: 15.6pt"><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 幼圆; FONT-SIZE: 18pt; mso-bidi-font-weight: bold; mso-bidi-font-family: 幼圆">三. 组成</SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 幼圆; FONT-SIZE: 18pt; mso-bidi-font-weight: bold" lang=EN-US></SPAN></SPAN></SPAN></SPAN></P>
<DIV style="BORDER-BOTTOM: #dedfe1 1pt solid; BORDER-LEFT: medium none; PADDING-BOTTOM: 5pt; PADDING-LEFT: 0cm; PADDING-RIGHT: 0cm; BACKGROUND: white; BORDER-TOP: medium none; BORDER-RIGHT: medium none; PADDING-TOP: 0cm; mso-border-bottom-alt: solid #DEDFE1 .75pt; mso-element: para-border-div">
<P style="BORDER-BOTTOM: medium none; TEXT-ALIGN: left; BORDER-LEFT: medium none; PADDING-BOTTOM: 0cm; LINE-HEIGHT: 18pt; MARGIN: 0cm 0cm 7.5pt; PADDING-LEFT: 0cm; PADDING-RIGHT: 0cm; BACKGROUND: white; BORDER-TOP: medium none; BORDER-RIGHT: medium none; PADDING-TOP: 0cm; mso-pagination: widow-orphan; mso-outline-level: 2; mso-border-bottom-alt: solid #DEDFE1 .75pt; mso-padding-alt: 0cm 0cm 5.0pt 0cm" class=MsoNormal align=left><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><B><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; FONT-SIZE: 13.5pt; mso-bidi-font-size: 11.0pt; mso-bidi-font-family: 宋体; mso-font-kerning: 0pt" lang=EN-US>Nutch</SPAN></B></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><B><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; FONT-SIZE: 13.5pt; mso-bidi-font-size: 11.0pt; mso-bidi-font-family: 宋体; mso-font-kerning: 0pt">主要分为两个部分</SPAN></B></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><B><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; FONT-SIZE: 13.5pt; mso-bidi-font-family: 宋体; mso-font-kerning: 0pt" lang=EN-US></SPAN></B></SPAN></SPAN></SPAN></P></DIV>
<P style="LINE-HEIGHT: 18pt; MARGIN: 0cm 0cm 0pt; BACKGROUND: white" class=MsoNormal><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; FONT-SIZE: 12pt; mso-bidi-font-family: 宋体; mso-font-kerning: 0pt"> </SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">爬虫</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>crawler</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">和查询</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>searcher</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">。</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>Crawler</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">主要用于从网络上抓取网页并为这些网页建立索引。</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>Searcher</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">主要利用这些索引检索用户的查找关键词来产生查找结果。两者之间的接口是索引,所以除去索引部分,两者之间的耦合度很低。</SPAN><SPAN style="LETTER-SPACING: 0.4pt"><FONT face=Calibri> <SPAN lang=EN-US></SPAN></FONT></SPAN></SPAN></SPAN></SPAN></P>
<P style="LINE-HEIGHT: 18pt; MARGIN: 0cm 0cm 0pt; BACKGROUND: white" class=MsoNormal><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>Crawler</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">和</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>Searcher</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">两部分尽量分开的目的主要是为了使两部分可以分布式配置在硬件平台上,例如将</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>Crawler</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">和</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>Searcher</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">分别放在两个主机上,这样可以提升性能。</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US></SPAN></SPAN></SPAN></SPAN></P>
<DIV style="BORDER-BOTTOM: #dedfe1 1pt solid; BORDER-LEFT: medium none; PADDING-BOTTOM: 5pt; PADDING-LEFT: 0cm; PADDING-RIGHT: 0cm; BACKGROUND: white; BORDER-TOP: medium none; BORDER-RIGHT: medium none; PADDING-TOP: 0cm; mso-border-bottom-alt: solid #DEDFE1 .75pt; mso-element: para-border-div">
<FONT face=宋体><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN class=headline-content2><SPAN style="LETTER-SPACING: 0.4pt; FONT-SIZE: 13.5pt">爬虫,<SPAN lang=EN-US>Crawler</SPAN></SPAN></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="LETTER-SPACING: 0.4pt; FONT-SIZE: 13.5pt" lang=EN-US></SPAN></SPAN></SPAN></SPAN></FONT></DIV>
<P style="LINE-HEIGHT: 18pt; MARGIN: 0cm 0cm 0pt; BACKGROUND: white" class=MsoNormal><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin"> </SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>Crawler</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">的重点在两个方面,</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>Crawler</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">的工作流程和涉及的数据文件的格式和含义。数据文件主要包括三类,分别是</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>web database</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">,一系列的</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>segment</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">加上</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>index</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">,三者的物理文件分别存储在爬行结果目录下的</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>db</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">目录下</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>webdb</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">子文件夹内,</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>segments</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">文件夹和</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>index</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">文件夹。</SPAN><SPAN style="LETTER-SPACING: 0.4pt"><FONT face=Calibri> <SPAN lang=EN-US></SPAN></FONT></SPAN></SPAN></SPAN></SPAN></P>
<P style="LINE-HEIGHT: 18pt; MARGIN: 0cm 0cm 0pt; BACKGROUND: white" class=MsoNormal><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin"> </SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>Web database</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">,也叫</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>WebDB</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">,其中存储的是爬虫所抓取网页之间的链接结构信息,它只在爬虫</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>Crawler</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">工作中使用而和</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>Searcher</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">的工作没有任何关系。</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>WebDB</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">内存储了两种实体的信息:</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>page</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">和</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>link</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">。</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>Page</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">实体通过描述网络上一个网页的特征信息来表征一个实际的网页,因为网页有很多个需要描述,</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>WebDB</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">中通过网页的</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>URL</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">和网页内容的</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>MD5</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">两种索引方法对这些网页实体进行了索引。</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>Page</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">实体描述的网页特征主要包括网页内的</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>link</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">数目,抓取此网页的时间等相关抓取信息,对此网页的重要度评分等。同样的,</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>Link</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">实体描述的是两个</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>page</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">实体之间的链接关系。</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>WebDB</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">构成了一个所抓取网页的链接结构图,这个图中</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>Page</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">实体是图的结点,而</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>Link</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">实体则代表图的边。</SPAN><SPAN style="LETTER-SPACING: 0.4pt"><FONT face=Calibri> <SPAN lang=EN-US></SPAN></FONT></SPAN></SPAN></SPAN></SPAN></P>
<P style="LINE-HEIGHT: 18pt; MARGIN: 0cm 0cm 0pt; BACKGROUND: white" class=MsoNormal><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin"> 一次爬行会产生很多个</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>segment</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">,每个</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>segment</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">内存储的是爬虫</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>Crawler</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">在单独一次抓取循环中抓到的网页以及这些网页的索引。</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>Crawler</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">爬行时会根据</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>WebDB</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">中的</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>link</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">关系按照一定的爬行策略生成每次抓取循环所需的</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>fetchlist</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">,然后</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>Fetcher</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">通过</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>fetchlist</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">中的</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>URLs</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">抓取这些网页并索引,然后将其存入</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>segment</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">。</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>Segment</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">是有时限的,当这些网页被</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>Crawler</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">重新抓取后,先前抓取产生的</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>segment</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">就作废了。在存储中。</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>Segment</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">文件夹是以产生时间命名的,方便我们删除作废的</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>segments</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">以节省存储空间。</SPAN><SPAN style="LETTER-SPACING: 0.4pt"><FONT face=Calibri> <SPAN lang=EN-US></SPAN></FONT></SPAN></SPAN></SPAN></SPAN></P>
<P style="LINE-HEIGHT: 18pt; MARGIN: 0cm 0cm 0pt; BACKGROUND: white" class=MsoNormal><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin"> </SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>Index</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">是</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>Crawler</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">抓取的所有网页的索引,它是通过对所有单个</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>segment</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">中的索引进行合并处理所得的。</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>Nutch</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">利用</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>Lucene</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">技术进行索引,所以</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>Lucene</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">中对索引进行操作的接口对</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>Nutch</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">中的</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>index</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">同样有效。但是需要注意的是,</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>Lucene</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">中的</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>segment</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">和</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>Nutch</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">中的不同,</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>Lucene</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">中的</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>segment</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">是索引</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>index</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">的一部分,但是</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>Nutch</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">中的</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>segment</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">只是</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>WebDB</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">中各个部分网页的内容和索引,最后通过其生成的</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>index</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">跟这些</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>segment</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">已经毫无关系了。</SPAN><SPAN style="LETTER-SPACING: 0.4pt"><FONT face=Calibri> <SPAN lang=EN-US></SPAN></FONT></SPAN></SPAN></SPAN></SPAN></P>
<DIV style="BORDER-BOTTOM: #dedfe1 1pt solid; BORDER-LEFT: medium none; PADDING-BOTTOM: 5pt; PADDING-LEFT: 0cm; PADDING-RIGHT: 0cm; BACKGROUND: white; BORDER-TOP: medium none; BORDER-RIGHT: medium none; PADDING-TOP: 0cm; mso-border-bottom-alt: solid #DEDFE1 .75pt; mso-element: para-border-div">
<SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><A name=5></A><SPAN class=headline-content2><SPAN style="LETTER-SPACING: 0.4pt; FONT-SIZE: 13.5pt" lang=EN-US><FONT face=宋体>Crawler</FONT></SPAN></SPAN></SPAN></SPAN></SPAN><FONT face=宋体><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN class=headline-content2><SPAN style="LETTER-SPACING: 0.4pt; FONT-SIZE: 13.5pt">工作流程</SPAN></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="LETTER-SPACING: 0.4pt; FONT-SIZE: 13.5pt" lang=EN-US></SPAN></SPAN></SPAN></SPAN></FONT></DIV>
<P style="LINE-HEIGHT: 18pt; MARGIN: 0cm 0cm 0pt; BACKGROUND: white" class=MsoNormal><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin"> 在分析了</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>Crawler</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">工作中设计的文件之后,接下来我们研究</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>Crawler</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">的抓取流程以及这些文件在抓取中扮演的角色。</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>Crawler</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">的工作原理:首先</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>Crawler</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">根据</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>WebDB</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">生成一个待抓取网页的</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>URL</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">集合叫做</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>Fetchlist</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">,接着下载线程</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>Fetcher</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">根据</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>Fetchlist</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">将网页抓取回来,如果下载线程有很多个,那么就生成很多个</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>Fetchlist</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">,也就是一个</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>Fetcher</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">对应一个</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>Fetchlist</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">。然后</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>Crawler</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">用抓取回来的网页更新</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>WebDB</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">,根据更新后的</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>WebDB</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">生成新的</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>Fetchlist</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">,里面是未抓取的或者新发现的</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>URLs</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">,然后下一轮抓取循环重新开始。这个循环过程可以叫做</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>“</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">产生</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>/</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">抓取</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>/</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">更新</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>”</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">循环。</SPAN><SPAN style="LETTER-SPACING: 0.4pt"><FONT face=Calibri> <SPAN lang=EN-US></SPAN></FONT></SPAN></SPAN></SPAN></SPAN></P>
<P style="LINE-HEIGHT: 18pt; MARGIN: 0cm 0cm 0pt; BACKGROUND: white" class=MsoNormal><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin"> 指向同一个主机上</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>Web</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">资源的</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>URLs</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">通常被分配到同一个</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>Fetchlist</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">中,这可防止过多的</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>Fetchers</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">对一个主机同时进行抓取造成主机负担过重。另外</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>Nutch</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">遵守</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>Robots Exclusion Protocol</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">,网站可以通过自定义</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>Robots.txt</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">控制</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>Crawler</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">的抓取。</SPAN><SPAN style="LETTER-SPACING: 0.4pt"><FONT face=Calibri> <SPAN lang=EN-US></SPAN></FONT></SPAN></SPAN></SPAN></SPAN></P>
<P style="LINE-HEIGHT: 18pt; MARGIN: 0cm 0cm 0pt; BACKGROUND: white" class=MsoNormal><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin"> 在</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>Nutch</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">中,</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>Crawler</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">操作的实现是通过一系列子操作的实现来完成的。这些子操作</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>Nutch</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">都提供了子命令行可以单独进行调用。下面就是这些子操作的功能描述以及命令行,命令行在括号中。</SPAN><SPAN style="LETTER-SPACING: 0.4pt"><FONT face=Calibri> <SPAN lang=EN-US></SPAN></FONT></SPAN></SPAN></SPAN></SPAN></P>
<P style="LINE-HEIGHT: 18pt; MARGIN: 0cm 0cm 0pt; BACKGROUND: white" class=MsoNormal><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin"> </SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>1. </FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">创建一个新的</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>WebDb (admin db -create). </FONT></SPAN></SPAN></SPAN></SPAN></P>
<P style="LINE-HEIGHT: 18pt; MARGIN: 0cm 0cm 0pt; BACKGROUND: white" class=MsoNormal><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin"> </SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>2. </FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">将抓取起始</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>URLs</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">写入</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>WebDB</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">中</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri> (inject). </FONT></SPAN></SPAN></SPAN></SPAN></P>
<P style="LINE-HEIGHT: 18pt; MARGIN: 0cm 0cm 0pt; BACKGROUND: white" class=MsoNormal><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin"> </SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>3. </FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">根据</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>WebDB</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">生成</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>fetchlist</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">并写入相应的</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>segment(generate). </FONT></SPAN></SPAN></SPAN></SPAN></P>
<P style="LINE-HEIGHT: 18pt; MARGIN: 0cm 0cm 0pt; BACKGROUND: white" class=MsoNormal><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin"> </SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>4. </FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">根据</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>fetchlist</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">中的</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>URL</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">抓取网页</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri> (fetch). </FONT></SPAN></SPAN></SPAN></SPAN></P>
<P style="LINE-HEIGHT: 18pt; MARGIN: 0cm 0cm 0pt; BACKGROUND: white" class=MsoNormal><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin"> </SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>5. </FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">根据抓取网页更新</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>WebDb (updatedb). </FONT></SPAN></SPAN></SPAN></SPAN></P>
<P style="LINE-HEIGHT: 18pt; MARGIN: 0cm 0cm 0pt; BACKGROUND: white" class=MsoNormal><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin"> </SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>6. </FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">循环进行</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>3</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">-</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>5</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">步直至预先设定的抓取深度。</SPAN><SPAN style="LETTER-SPACING: 0.4pt"><FONT face=Calibri> <SPAN lang=EN-US></SPAN></FONT></SPAN></SPAN></SPAN></SPAN></P>
<P style="LINE-HEIGHT: 18pt; MARGIN: 0cm 0cm 0pt; BACKGROUND: white" class=MsoNormal><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin"> </SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>7. </FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">根据</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>WebDB</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">得到的网页评分和</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>links</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">更新</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>segments (updatesegs). </FONT></SPAN></SPAN></SPAN></SPAN></P>
<P style="LINE-HEIGHT: 18pt; MARGIN: 0cm 0cm 0pt; BACKGROUND: white" class=MsoNormal><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin"> </SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>8. </FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">对所抓取的网页进行索引</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>(index). </FONT></SPAN></SPAN></SPAN></SPAN></P>
<P style="LINE-HEIGHT: 18pt; MARGIN: 0cm 0cm 0pt; BACKGROUND: white" class=MsoNormal><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin"> </SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>9. </FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">在索引中丢弃有重复内容的网页和重复的</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>URLs (dedup). </FONT></SPAN></SPAN></SPAN></SPAN></P>
<P style="LINE-HEIGHT: 18pt; MARGIN: 0cm 0cm 0pt; BACKGROUND: white" class=MsoNormal><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin"> </SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>10. </FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">将</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>segments</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">中的索引进行合并生成用于检索的最终</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>index(merge). </FONT></SPAN></SPAN></SPAN></SPAN></P>
<DIV style="BORDER-BOTTOM: #dedfe1 1pt solid; BORDER-LEFT: medium none; PADDING-BOTTOM: 5pt; PADDING-LEFT: 0cm; PADDING-RIGHT: 0cm; BACKGROUND: white; BORDER-TOP: medium none; BORDER-RIGHT: medium none; PADDING-TOP: 0cm; mso-border-bottom-alt: solid #DEDFE1 .75pt; mso-element: para-border-div">
<SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><A name=6></A><SPAN class=headline-content2><SPAN style="LETTER-SPACING: 0.4pt; FONT-SIZE: 13.5pt" lang=EN-US><FONT face=宋体>Crawler</FONT></SPAN></SPAN></SPAN></SPAN></SPAN><FONT face=宋体><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN class=headline-content2><SPAN style="LETTER-SPACING: 0.4pt; FONT-SIZE: 13.5pt">详细工作流程是</SPAN></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="LETTER-SPACING: 0.4pt; FONT-SIZE: 13.5pt" lang=EN-US></SPAN></SPAN></SPAN></SPAN></FONT></DIV>
<P style="LINE-HEIGHT: 18pt; MARGIN: 0cm 0cm 0pt; BACKGROUND: white" class=MsoNormal><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin"> 在创建一个</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>WebDB</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">之后</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>(</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">步骤</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>1), “</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">产生</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>/</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">抓取</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>/</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">更新</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>”</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">循环</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>(</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">步骤</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>3</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">-</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>6)</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">根据一些种子</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>URLs</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">开始启动。当这个循环彻底结束,</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>Crawler</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">根据抓取中生成的</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>segments</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">创建索引(步骤</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>7</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">-</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>10</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">)。在进行重复</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>URLs</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">清除(步骤</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>9</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">)之前,每个</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>segment</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">的索引都是独立的(步骤</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>8</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">)。最终,各个独立的</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>segment</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">索引被合并为一个最终的索引</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>index</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">(步骤</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>10</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">)。</SPAN><SPAN style="LETTER-SPACING: 0.4pt"><FONT face=Calibri> <SPAN lang=EN-US></SPAN></FONT></SPAN></SPAN></SPAN></SPAN></P>
<P style="LINE-HEIGHT: 18pt; MARGIN: 0cm 0cm 0pt; BACKGROUND: white" class=MsoNormal><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin"> 其中有一个细节问题,</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>Dedup</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">操作主要用于清除</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>segment</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">索引中的重复</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>URLs</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">,但是我们知道,在</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>WebDB</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">中是不允许重复的</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>URL</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">存在的,那么为什么这里还要进行清除呢?原因在于抓取的更新。比方说一个月之前你抓取过这些网页,一个月后为了更新进行了重新抓取,那么旧的</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>segment</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">在没有删除之前仍然起作用,这个时候就需要在新旧</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>segment</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">之间进行除重。</SPAN><SPAN style="LETTER-SPACING: 0.4pt"><FONT face=Calibri> <SPAN lang=EN-US></SPAN></FONT></SPAN></SPAN></SPAN></SPAN></P>
<P style="mso-line-height-alt: 15.6pt"><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 幼圆; FONT-SIZE: 18pt; mso-bidi-font-weight: bold; mso-bidi-font-family: 幼圆">四、<SPAN lang=EN-US>Nutch</SPAN>和<SPAN lang=EN-US>Lucene </SPAN></SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US></SPAN></SPAN></SPAN></SPAN></P>
<P style="LINE-HEIGHT: 18pt; MARGIN: 0cm 0cm 0pt; BACKGROUND: white" class=MsoNormal><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin"> </SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>Nutch</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">是基于</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>Lucene</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">的。</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>Lucene</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">为</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>Nutch</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">提供了文本索引和搜索的</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>API</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">。</SPAN><SPAN style="LETTER-SPACING: 0.4pt"><FONT face=Calibri> <SPAN lang=EN-US></SPAN></FONT></SPAN></SPAN></SPAN></SPAN></P>
<P style="LINE-HEIGHT: 18pt; MARGIN: 0cm 0cm 0pt; BACKGROUND: white" class=MsoNormal><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin"> 一个常见的问题是:我应该使用</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>Lucene</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">还是</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>Nutch</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">?</SPAN><SPAN style="LETTER-SPACING: 0.4pt"><FONT face=Calibri> <SPAN lang=EN-US></SPAN></FONT></SPAN></SPAN></SPAN></SPAN></P>
<P style="LINE-HEIGHT: 18pt; MARGIN: 0cm 0cm 0pt; BACKGROUND: white" class=MsoNormal><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin"> 最简单的回答是:如果你不需要抓取数据的话,应该使用</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>Lucene</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">。</SPAN><SPAN style="LETTER-SPACING: 0.4pt"><FONT face=Calibri> <SPAN lang=EN-US></SPAN></FONT></SPAN></SPAN></SPAN></SPAN></P>
<P style="LINE-HEIGHT: 18pt; MARGIN: 0cm 0cm 0pt; BACKGROUND: white" class=MsoNormal><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin"> 常见的应用场合是:你有数据源,需要为这些数据提供一个搜索页面。在这种情况下,最好的方式是直接从数据库中取出数据并用</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>Lucene API </FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">建立索引。</SPAN><SPAN style="LETTER-SPACING: 0.4pt"><FONT face=Calibri> <SPAN lang=EN-US></SPAN></FONT></SPAN></SPAN></SPAN></SPAN></P>
<P style="TEXT-ALIGN: left; MARGIN: 0cm 0cm 0pt; mso-pagination: widow-orphan; mso-margin-top-alt: auto; mso-margin-bottom-alt: auto; mso-outline-level: 2" class=MsoNormal align=left><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin"> 在你没有本地数据源,或者数据源非常分散的情况下,应该使用</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US><FONT face=Calibri>Nutch</FONT></SPAN></SPAN></SPAN></SPAN><SPAN style="mso-bookmark: OLE_LINK1"><SPAN style="mso-bookmark: OLE_LINK2"><SPAN style="mso-bookmark: OLE_LINK3"><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin">。</SPAN></SPAN></SPAN></SPAN></P></DIV> |
|