免费注册 查看新帖 |

Chinaunix

  平台 论坛 博客 文库
最近访问板块 发新帖
查看: 1845 | 回复: 0

[Hadoop&HBase] 基于hadoop大规模数据排序算法---韩旭红组 第一次报告 [复制链接]

论坛徽章:
0
发表于 2011-12-23 02:39 |显示全部楼层
<DIV>
<P style="TEXT-ALIGN: center" align=center><B style="mso-bidi-font-weight: normal"><SPAN style="FONT-SIZE: 22pt"><FONT face=宋体><a href="http://blog.chinaunix.nethttp://blog.chinaunix.net/attachment/201109/12/24677087_1315835170vdlC.gif" target="_blank"></A><a href="http://blog.chinaunix.nethttp://blog.chinaunix.net/attachment/201109/12/24677087_1315835292ptoe.jpg" target="_blank"></A><a href="http://blog.chinaunix.nethttp://blog.chinaunix.net/attachment/201109/12/24677087_1315835558fFHw.jpg" target="_blank"></A><a href="http://blog.chinaunix.nethttp://blog.chinaunix.net/attachment/201109/12/24677087_1315835600685U.jpg" target="_blank"></A>基于<SPAN lang=EN-US>hadoop</SPAN>的大规模数据排序算法</FONT></SPAN></B></P>
<P style="TEXT-ALIGN: center" align=center><B style="mso-bidi-font-weight: normal"><SPAN style="FONT-SIZE: 22pt"><FONT size=5 face=宋体>(第一次报告)</FONT></SPAN></B></P>
<P><FONT face=宋体><B style="mso-bidi-font-weight: normal"><SPAN style="FONT-SIZE: 22pt" lang=EN-US><SPAN style="mso-spacerun: yes">&nbsp;</SPAN><SPAN style="mso-spacerun: yes">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </SPAN><SPAN style="mso-spacerun: yes">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</SPAN></SPAN></B><B style="mso-bidi-font-weight: normal"><SPAN style="FONT-SIZE: 15pt" lang=EN-US><SPAN style="mso-spacerun: yes">&nbsp;</SPAN><SPAN style="mso-spacerun: yes">&nbsp;</SPAN><FONT size=4>-------2011.9.11</FONT></SPAN></B></FONT></P>
<P><FONT size=3><FONT face=宋体><SPAN lang=EN-US><SPAN style="mso-spacerun: yes">&nbsp; </SPAN></SPAN>小组成员:</FONT></FONT></P>
<P><FONT size=3><FONT face=宋体><SPAN lang=EN-US><SPAN style="mso-spacerun: yes">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </SPAN></SPAN>组长:韩旭红<SPAN lang=EN-US> 1091000161</SPAN></FONT></FONT></P>
<P><FONT size=3><FONT face=宋体><SPAN lang=EN-US><SPAN style="mso-spacerun: yes">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </SPAN><SPAN style="mso-spacerun: yes">&nbsp;&nbsp;&nbsp;&nbsp;</SPAN></SPAN>组员:李巍<SPAN lang=EN-US> 1091000167&nbsp;&nbsp; </SPAN></FONT></FONT><FONT size=3><FONT face=宋体>李越<SPAN lang=EN-US> 1091000169</SPAN></FONT></FONT><FONT size=3><FONT face=宋体><SPAN lang=EN-US><SPAN style="mso-spacerun: yes">&nbsp;</SPAN><SPAN style="mso-spacerun: yes">&nbsp;</SPAN></SPAN>闫悦<SPAN lang=EN-US> 1091000178</SPAN></FONT></FONT></P>
<P style="TEXT-INDENT: -53.25pt; MARGIN-LEFT: 53.25pt; mso-list: l0 level1 lfo1"><SPAN style="FONT-FAMILY: 幼圆; FONT-SIZE: 18pt; mso-bidi-font-weight: bold; mso-bidi-font-family: 幼圆" lang=EN-US><SPAN style="mso-list: Ignore">一.<SPAN style="FONT: 7pt 'Times New Roman'">&nbsp;&nbsp;&nbsp;&nbsp; </SPAN></SPAN></SPAN><SPAN style="FONT-FAMILY: 幼圆; FONT-SIZE: 18pt; mso-bidi-font-weight: bold">简介</SPAN></P>
<P style="TEXT-INDENT: -53.25pt; MARGIN-LEFT: 53.25pt; mso-list: l0 level1 lfo1"><SPAN style="FONT-FAMILY: 幼圆; FONT-SIZE: 18pt; mso-bidi-font-weight: bold">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </SPAN><SPAN style="FONT-FAMILY: 幼圆; FONT-SIZE: 18pt; mso-no-proof: yes" lang=EN-US><SPAN style="FONT-SIZE: 22pt"><FONT face=宋体><a href="http://blog.chinaunix.nethttp://blog.chinaunix.net/attachment/201109/12/24677087_1315835170vdlC.gif" target="_blank"><IMG style="WIDTH: 273px; HEIGHT: 156px" border=0 src="http://blog.chinaunix.nethttp://blog.chinaunix.net/attachment/201109/12/24677087_1315835170vdlC.gif" width=373 .load="imgResize(this, 650);" height=207 ;></A></FONT></SPAN></SPAN></P>
<P style="TEXT-ALIGN: center; MARGIN: 0cm 0cm 0pt" class=MsoCaption align=center><FONT size=2><SPAN style="FONT-FAMILY: 黑体; mso-ascii-font-family: Cambria; mso-ascii-theme-font: major-latin; mso-hansi-font-family: Cambria; mso-hansi-theme-font: major-latin">图表</SPAN><FONT face=Cambria> <SPAN lang=EN-US><SPAN style="mso-no-proof: yes">1</SPAN></SPAN><SPAN lang=EN-US> hadoop</SPAN></FONT></FONT></P>
<P style="TEXT-ALIGN: center; MARGIN: 0cm 0cm 0pt" class=MsoCaption align=center><FONT size=2><FONT face=Cambria><SPAN lang=EN-US>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </SPAN></FONT></FONT><FONT face=宋体><SPAN style="FONT-SIZE: 10.5pt" lang=EN-US>Hadoop</SPAN><SPAN style="FONT-SIZE: 10.5pt">是一个<SPAN lang=EN-US><A href="http://baike.baidu.com/view/991489.htm" target=_blank><SPAN style="COLOR: windowtext; FONT-SIZE: 12pt; TEXT-DECORATION: none; text-underline: none" lang=EN-US><SPAN lang=EN-US>分布式系统</SPAN></SPAN></A></SPAN>基础架构,由<SPAN lang=EN-US>Apache</SPAN>基金会开发。用户可以在不了解分布式底层细节的情况下,开发分布式程序。充分利用集群的威力高速运算和存储。<SPAN lang=EN-US>Hadoop</SPAN>实现了一个<SPAN lang=EN-US><A href="http://baike.baidu.com/view/771589.htm" target=_blank><SPAN style="COLOR: windowtext; FONT-SIZE: 12pt; TEXT-DECORATION: none; text-underline: none" lang=EN-US><SPAN lang=EN-US>分布式文件系统</SPAN></SPAN></A></SPAN>,简称<SPAN lang=EN-US>HDFS</SPAN>。<SPAN lang=EN-US>HDFS</SPAN>有着高容错性的特点,并且设计用来部署在低廉的硬件上。而且它提供高传输率来访问<SPAN lang=EN-US><A href="http://baike.baidu.com/view/330120.htm" target=_blank><SPAN style="COLOR: windowtext; FONT-SIZE: 12pt; TEXT-DECORATION: none; text-underline: none" lang=EN-US><SPAN lang=EN-US>应用程序</SPAN></SPAN></A></SPAN>的数据,适合那些有着超大数据集的应用程序。<SPAN lang=EN-US>HDFS</SPAN>放宽了<SPAN lang=EN-US>POSIX</SPAN>的要求这样可以流的形式访问文件系统中的数据。<SPAN lang=EN-US></SPAN></SPAN></FONT></P>
<P style="TEXT-INDENT: -53.25pt; MARGIN-LEFT: 53.25pt; mso-list: l0 level1 lfo1"><SPAN style="FONT-FAMILY: 幼圆; FONT-SIZE: 18pt; mso-bidi-font-weight: bold; mso-bidi-font-family: 幼圆" lang=EN-US><SPAN style="mso-list: Ignore">二.<SPAN style="FONT: 7pt 'Times New Roman'">&nbsp;&nbsp;&nbsp;&nbsp; </SPAN></SPAN></SPAN><SPAN style="FONT-FAMILY: 幼圆; FONT-SIZE: 18pt; mso-bidi-font-weight: bold" lang=EN-US>hadoop</SPAN><SPAN style="FONT-FAMILY: 幼圆; FONT-SIZE: 18pt; mso-bidi-font-weight: bold">架构<SPAN lang=EN-US></SPAN></SPAN></P>
<P style="PAGE-BREAK-AFTER: avoid; TEXT-ALIGN: left; TEXT-INDENT: 0cm; MARGIN: 0cm 0cm 0pt 53.25pt; mso-pagination: widow-orphan; mso-char-indent-count: 0" class=MsoListParagraph align=left><SPAN style="mso-font-kerning: 0pt; mso-no-proof: yes" lang=EN-US><SPAN style="FONT-SIZE: 22pt"><FONT face=宋体><a href="http://blog.chinaunix.nethttp://blog.chinaunix.net/attachment/201109/12/24677087_1315835170vdlC.gif" target="_blank"></A><a href="http://blog.chinaunix.nethttp://blog.chinaunix.net/attachment/201109/12/24677087_1315835292ptoe.jpg" target="_blank"></A>&nbsp;&nbsp;&nbsp; <a href="http://blog.chinaunix.nethttp://blog.chinaunix.net/attachment/201109/12/24677087_1315835558fFHw.jpg" target="_blank"><IMG border=0 src="http://blog.chinaunix.nethttp://blog.chinaunix.net/attachment/201109/12/24677087_1315835558fFHw.jpg" .load="imgResize(this, 650);" ;></A></FONT></SPAN><SPAN style="FONT-SIZE: 22pt"><FONT face=宋体><a href="http://blog.chinaunix.nethttp://blog.chinaunix.net/attachment/201109/12/24677087_1315835170vdlC.gif" target="_blank"></A><a href="http://blog.chinaunix.nethttp://blog.chinaunix.net/attachment/201109/12/24677087_1315835292ptoe.jpg" target="_blank"></A></FONT></SPAN></SPAN></P>
<P style="TEXT-ALIGN: center; MARGIN: 0cm 0cm 0pt" class=MsoCaption align=center><FONT size=2><SPAN style="FONT-FAMILY: 黑体; mso-ascii-font-family: Cambria; mso-ascii-theme-font: major-latin; mso-hansi-font-family: Cambria; mso-hansi-theme-font: major-latin">图表</SPAN><FONT face=Cambria> <SPAN lang=EN-US><SPAN style="mso-no-proof: yes">2</SPAN></SPAN><SPAN lang=EN-US> hadoop</SPAN></FONT><SPAN style="FONT-FAMILY: 黑体; mso-ascii-font-family: Cambria; mso-ascii-theme-font: major-latin; mso-hansi-font-family: Cambria; mso-hansi-theme-font: major-latin">架构</SPAN><SPAN style="FONT-FAMILY: 宋体; FONT-SIZE: 12pt; mso-bidi-font-family: 宋体; mso-font-kerning: 0pt" lang=EN-US></SPAN></FONT></P>
<P style="MARGIN-LEFT: 53.25pt"><SPAN style="FONT-FAMILY: 幼圆; FONT-SIZE: 18pt; mso-bidi-font-weight: bold" lang=EN-US>&nbsp;</SPAN></P>
<P style="TEXT-ALIGN: left; LINE-HEIGHT: 18pt; MARGIN: 0cm 0cm 0pt; BACKGROUND: white; mso-pagination: widow-orphan" class=MsoNormal align=left><SPAN style="FONT-FAMILY: 宋体; LETTER-SPACING: 0.4pt; FONT-SIZE: 12pt; mso-bidi-font-family: 宋体; mso-font-kerning: 0pt">  <SPAN lang=EN-US><SPAN style="FONT-FAMILY: 宋体; FONT-SIZE: 12pt; mso-bidi-font-family: 宋体; mso-font-kerning: 0pt; mso-ansi-language: EN-US; mso-fareast-language: ZH-CN; mso-bidi-language: AR-SA"> </SPAN></SPAN></SPAN><FONT size=3><FONT face=宋体><SPAN lang=EN-US><SPAN style="mso-spacerun: yes"><SPAN style="FONT-FAMILY: 宋体; FONT-SIZE: 12pt; mso-bidi-font-family: 宋体; mso-font-kerning: 0pt" lang=EN-US>Hadoop </SPAN><SPAN style="FONT-FAMILY: 宋体; FONT-SIZE: 12pt; mso-bidi-font-family: 宋体; mso-font-kerning: 0pt">有许多元素构成。其最底部是<SPAN lang=EN-US>HDFS</SPAN>,它存储<SPAN lang=EN-US> Hadoop </SPAN>集群中所有存储节点上的文件。<SPAN lang=EN-US>HDFS</SPAN>的上一层是<SPAN lang=EN-US> MapReduce </SPAN>引擎,该引擎由<SPAN lang=EN-US> JobTrackers </SPAN>和<SPAN lang=EN-US> TaskTrackers </SPAN>组成。</SPAN><SPAN style="LETTER-SPACING: 0.4pt" lang=EN-US></SPAN></P>
<P style="TEXT-INDENT: -53.25pt; MARGIN-LEFT: 53.25pt; mso-list: l0 level1 lfo1"><SPAN style="FONT-FAMILY: 幼圆; FONT-SIZE: 18pt; mso-bidi-font-weight: bold; mso-bidi-font-family: 幼圆" lang=EN-US><SPAN style="mso-list: Ignore">三.<SPAN style="FONT: 7pt 'Times New Roman'">&nbsp;&nbsp;&nbsp;&nbsp; </SPAN></SPAN></SPAN><SPAN style="FONT-FAMILY: 幼圆; FONT-SIZE: 18pt; mso-bidi-font-weight: bold">分布式计算模型<SPAN lang=EN-US></SPAN></SPAN></P>
<P style="TEXT-INDENT: 24pt; mso-char-indent-count: 2.0"></SPAN></SPAN></FONT></FONT>&nbsp;</P>
<P style="TEXT-INDENT: 24pt; mso-char-indent-count: 2.0"><FONT size=3><FONT face=宋体><SPAN lang=EN-US><SPAN style="mso-spacerun: yes"></SPAN></SPAN>一个<SPAN lang=EN-US>hadoop</SPAN>集群往往有几十台甚至成百上千台<SPAN lang=EN-US>low cost</SPAN>的计算机组成,我们运行的每一个任务都要在这些计算机上做任务的分发,执行中间数据排序以及最后的汇总,期间还包含节点发现,任务的重试,故障节点替换等等等等的维护以及异常情况处理。</FONT></FONT></P>
<P style="TEXT-INDENT: 24pt; mso-char-indent-count: 2.0"><FONT size=3><FONT face=宋体>所以说<SPAN lang=EN-US>hadoop</SPAN>就是一个计算模型。一个分布式的计算模型。<STRONG><SPAN style="FONT-FAMILY: 宋体; FONT-WEIGHT: normal; mso-bidi-font-family: 宋体" lang=EN-US></SPAN></STRONG></FONT></FONT></P>
<P><SPAN style="FONT-FAMILY: 幼圆; FONT-SIZE: 18pt; mso-bidi-font-weight: bold">四.<SPAN lang=EN-US>Hadoop</SPAN>的大规模数据排序算法<SPAN lang=EN-US></SPAN></SPAN></P>
<P style="TEXT-INDENT: 24pt; mso-char-indent-count: 2.0"><FONT size=3 face=宋体>使用<SPAN lang=EN-US>hadoop</SPAN>进行大量的数据排序排序最直观的方法是把文件所有内容给<SPAN lang=EN-US>map</SPAN>之后,<SPAN lang=EN-US>map</SPAN>不做任何处理,直接输出给一个<SPAN lang=EN-US>reduce</SPAN>,利用<SPAN lang=EN-US>hadoop</SPAN>的自己的<SPAN lang=EN-US>shuffle</SPAN>机制,对所有数据进行排序,而后由<SPAN lang=EN-US>reduce</SPAN>直接输出。</FONT></P>
<P style="TEXT-INDENT: 24pt; mso-char-indent-count: 2.0"><FONT size=3 face=宋体>然而这样的方法跟单机毫无差别,完全无法用到多机分布式计算的便利。因此这种方法是不行的。</FONT></P>
<P style="TEXT-INDENT: 24pt; mso-char-indent-count: 2.0"><FONT size=3 face=宋体>利用<SPAN lang=EN-US>hadoop</SPAN>分而治之的计算模型,可以参照快速排序的思想。在这里我们先简单回忆一下快速排序。快速排序基本步骤就是需要现在所有数据中选取一个作为支点。然后将大于这个支点的放在一边,小于这个支点的放在另一边。</FONT></P>
<P style="TEXT-INDENT: 24pt; mso-char-indent-count: 2.0"><FONT size=3 face=宋体>设想如果我们有<SPAN lang=EN-US>N</SPAN>个支点(这里可以称为标尺),就可以把所有的数据分成<SPAN lang=EN-US>N+1</SPAN>个<SPAN lang=EN-US>part</SPAN>,将这<SPAN lang=EN-US>N+1</SPAN>个<SPAN lang=EN-US>part</SPAN>丢给<SPAN lang=EN-US>reduce</SPAN>,由<SPAN lang=EN-US>hadoop</SPAN>自动排序,最后输出<SPAN lang=EN-US>N+1</SPAN>个内部有序的文件,再把这<SPAN lang=EN-US>N+1</SPAN>个文件首尾相连合并成一个文件,收工。</FONT></P>
<P><FONT size=3 face=宋体>由此我们可以归纳出这样一个用<SPAN lang=EN-US>hadoop</SPAN>对大量数据排序的步骤:</FONT></P>
<P><FONT size=3><FONT face=宋体><SPAN lang=EN-US>1</SPAN>)<SPAN lang=EN-US>&nbsp; </SPAN>对待排序数据进行抽样;</FONT></FONT></P>
<P><FONT size=3><FONT face=宋体><SPAN lang=EN-US>2</SPAN>)<SPAN lang=EN-US>&nbsp; </SPAN>对抽样数据进行排序,产生标尺;</FONT></FONT></P>
<P><FONT size=3><FONT face=宋体><SPAN lang=EN-US>3</SPAN>)<SPAN lang=EN-US>&nbsp; Map</SPAN>对输入的每条数据计算其处于哪两个标尺之间;将数据发给对应区间<SPAN lang=EN-US>ID</SPAN>的<SPAN lang=EN-US>reduce</SPAN></FONT></FONT></P>
<P><FONT size=3><FONT face=宋体><SPAN lang=EN-US>4</SPAN>)<SPAN lang=EN-US>&nbsp; Reduce</SPAN>将获得数据直接输出。</FONT></FONT></P>
<P><FONT size=3 face=宋体>这里使用对一组<SPAN lang=EN-US>url</SPAN>进行排序来作为例子:</FONT></P>
<P style="PAGE-BREAK-AFTER: avoid"><SPAN lang=EN-US><FONT size=3 face=宋体>&nbsp;</FONT></P>
<P style="PAGE-BREAK-AFTER: avoid" align=center><SPAN style="COLOR: blue; TEXT-DECORATION: none; mso-no-proof: yes; text-underline: none"><SPAN style="FONT-SIZE: 22pt"><FONT face=宋体><a href="http://blog.chinaunix.nethttp://blog.chinaunix.net/attachment/201109/12/24677087_1315835170vdlC.gif" target="_blank"></A><a href="http://blog.chinaunix.nethttp://blog.chinaunix.net/attachment/201109/12/24677087_1315835292ptoe.jpg" target="_blank"></A><a href="http://blog.chinaunix.nethttp://blog.chinaunix.net/attachment/201109/12/24677087_1315835558fFHw.jpg" target="_blank"></A><a href="http://blog.chinaunix.nethttp://blog.chinaunix.net/attachment/201109/12/24677087_1315835600685U.jpg" target="_blank"><IMG border=0 src="http://blog.chinaunix.nethttp://blog.chinaunix.net/attachment/201109/12/24677087_1315835600685U.jpg" .load="imgResize(this, 650);" ;></A></FONT></SPAN></SPAN></SPAN></P>
<P style="TEXT-ALIGN: center; MARGIN: 0cm 0cm 0pt" class=MsoCaption align=center><FONT size=2><SPAN style="FONT-FAMILY: 黑体; mso-ascii-font-family: Cambria; mso-ascii-theme-font: major-latin; mso-hansi-font-family: Cambria; mso-hansi-theme-font: major-latin">图表</SPAN><FONT face=Cambria> <SPAN lang=EN-US><SPAN style="mso-no-proof: yes">3</SPAN></SPAN><SPAN lang=EN-US><SPAN style="mso-spacerun: yes">&nbsp; </SPAN><SPAN style="mso-spacerun: yes">&nbsp;</SPAN>url</SPAN></FONT><SPAN style="FONT-FAMILY: 黑体; mso-ascii-font-family: Cambria; mso-ascii-theme-font: major-latin; mso-hansi-font-family: Cambria; mso-hansi-theme-font: major-latin">排序</SPAN></FONT></P>
<P><FONT size=3 face=宋体>这里还有一点小问题要处理:如何将数据发给一个指定<SPAN lang=EN-US>ID</SPAN>的<SPAN lang=EN-US>reduce</SPAN>?<SPAN lang=EN-US>hadoop</SPAN>提供了多种分区算法。这些算法根据<SPAN lang=EN-US>map</SPAN>输出的数据的<SPAN lang=EN-US>key</SPAN>来确定此数据应该发给哪个<SPAN lang=EN-US>reduce</SPAN>(<SPAN lang=EN-US>reduce</SPAN>的排序也依赖<SPAN lang=EN-US>key</SPAN>)。因此,如果需要将数据发给某个<SPAN lang=EN-US>reduce</SPAN>,只要在输出数据的同时,提供一个<SPAN lang=EN-US> key</SPAN>(在上面这个例子中就是<SPAN lang=EN-US>reduce</SPAN>的<SPAN lang=EN-US>ID+url</SPAN>),数据就该去哪儿去哪儿了。</FONT></P>
<P><SPAN style="FONT-FAMILY: 幼圆; FONT-SIZE: 18pt; mso-bidi-font-weight: bold">五.注意事项<SPAN lang=EN-US></SPAN></SPAN></P>
<P><FONT size=3><FONT face=宋体><SPAN lang=EN-US>1) </SPAN>标尺的抽取应该尽可能的均匀,这与快速排序很多变种算法均是强调支点的选取是一致的。</FONT></FONT></P>
<P><FONT size=3><FONT face=宋体><SPAN lang=EN-US>2) HDFS</SPAN>是一种读写性能很不对称的文件系统。应该尽可能的利用其读性能很强的特点。减少对写文件和<SPAN lang=EN-US>shuffle</SPAN>操作的依赖。举例来说,当需要根据数据的统计情况来决定对数据的处理的时候。将统计和数据处理分成两轮<SPAN lang=EN-US>map-reduce</SPAN>比将统计信息合并和数据处理都放到一个<SPAN lang=EN-US>reduce</SPAN>中要快速的多。</FONT></FONT></P>
<P><SPAN style="FONT-FAMILY: 幼圆; FONT-SIZE: 18pt; mso-bidi-font-weight: bold">六.总结</SPAN><B style="mso-bidi-font-weight: normal"><SPAN style="FONT-FAMILY: 幼圆; FONT-SIZE: 18pt" lang=EN-US></SPAN></B></P>
<P style="TEXT-INDENT: 21pt; mso-char-indent-count: 2.0"><SPAN class=apple-style-span><SPAN style="FONT-FAMILY: 'Simsun','serif'; COLOR: black; FONT-SIZE: 10.5pt" lang=EN-US>Hadoop</SPAN></SPAN><FONT face=宋体><SPAN class=apple-style-span><SPAN style="COLOR: black; FONT-SIZE: 10.5pt; mso-ascii-font-family: Simsun; mso-hansi-font-family: Simsun">实际是一种以数据为驱动的计算模型,结合</SPAN></SPAN><SPAN class=apple-style-span><SPAN style="FONT-FAMILY: 'Simsun','serif'; COLOR: black; FONT-SIZE: 10.5pt" lang=EN-US>MapReduce</SPAN></SPAN><SPAN class=apple-style-span><SPAN style="COLOR: black; FONT-SIZE: 10.5pt; mso-ascii-font-family: Simsun; mso-hansi-font-family: Simsun">和</SPAN></SPAN><SPAN class=apple-style-span><SPAN style="FONT-FAMILY: 'Simsun','serif'; COLOR: black; FONT-SIZE: 10.5pt" lang=EN-US>HDFS</SPAN></SPAN><SPAN class=apple-style-span><SPAN style="COLOR: black; FONT-SIZE: 10.5pt; mso-ascii-font-family: Simsun; mso-hansi-font-family: Simsun">,将任务运行在数据存放的计算节点上,充分利用了计算节点的存储和计算资源,同时也大大节省了网络传输数据的开销。</SPAN></SPAN></FONT></P></DIV>
您需要登录后才可以回帖 登录 | 注册

本版积分规则 发表回复

  

北京盛拓优讯信息技术有限公司. 版权所有 京ICP备16024965号-6 北京市公安局海淀分局网监中心备案编号:11010802020122 niuxiaotong@pcpop.com 17352615567
未成年举报专区
中国互联网协会会员  联系我们:huangweiwei@itpub.net
感谢所有关心和支持过ChinaUnix的朋友们 转载本站内容请注明原作者名及出处

清除 Cookies - ChinaUnix - Archiver - WAP - TOP