免费注册 查看新帖 |

Chinaunix

  平台 论坛 博客 文库
最近访问板块 发新帖
查看: 1910 | 回复: 0

[Hadoop&HBase] 基于hadoop大规模数据排序算法-万虎组-第二次报告 [复制链接]

论坛徽章:
0
发表于 2011-12-23 02:39 |显示全部楼层
<DIV>
<P style="TEXT-ALIGN: center; TEXT-INDENT: 49.2pt; MARGIN: 0cm 0cm 0pt; mso-char-indent-count: 3.5; mso-layout-grid-align: none" class=MsoNormal align=center><B style="mso-bidi-font-weight: normal"><SPAN style="FONT-FAMILY: 宋体; COLOR: #333333; FONT-SIZE: 14pt; mso-bidi-font-family: Tahoma; mso-hansi-font-family: Tahoma; mso-font-kerning: 0pt; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-ascii-font-family: Tahoma">进一步了解云计算</SPAN></B><B style="mso-bidi-font-weight: normal"><SPAN style="FONT-FAMILY: 'Tahoma','sans-serif'; COLOR: #333333; FONT-SIZE: 14pt; mso-font-kerning: 0pt" lang=EN-US></SPAN></B></P>
<P style="TEXT-ALIGN: left; MARGIN: 0cm 0cm 0pt; mso-layout-grid-align: none" class=MsoNormal align=left><B><SPAN style="FONT-FAMILY: 宋体; mso-hansi-font-family: Calibri; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-hansi-theme-font: minor-latin; mso-bidi-font-size: 10.5pt">组长:</SPAN></B><SPAN style="FONT-FAMILY: 宋体; mso-hansi-font-family: Calibri; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-hansi-theme-font: minor-latin; mso-bidi-font-size: 10.5pt">万虎</SPAN><SPAN style="mso-bidi-font-size: 10.5pt" lang=EN-US><BR></SPAN><B><SPAN style="FONT-FAMILY: 宋体; mso-hansi-font-family: Calibri; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-hansi-theme-font: minor-latin; mso-bidi-font-size: 10.5pt">成员:</SPAN></B><SPAN style="FONT-FAMILY: 宋体; mso-hansi-font-family: Calibri; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-hansi-theme-font: minor-latin; mso-bidi-font-size: 10.5pt">万虎、牛庆亚、宋思梦、文滔、胡海砷</SPAN><SPAN style="FONT-FAMILY: 黑体; COLOR: #231f20; mso-bidi-font-family: 黑体; mso-font-kerning: 0pt; mso-bidi-font-size: 10.5pt" lang=EN-US></SPAN></P>
<P style="TEXT-ALIGN: left; MARGIN: 0cm 0cm 0pt; mso-layout-grid-align: none" class=MsoNormal align=left><SPAN style="FONT-FAMILY: 黑体; COLOR: #231f20; FONT-SIZE: 12pt; mso-bidi-font-family: 黑体; mso-font-kerning: 0pt" lang=EN-US></SPAN>&nbsp;</P>
<P style="TEXT-ALIGN: left; MARGIN: 0cm 0cm 0pt; mso-layout-grid-align: none" class=MsoNormal align=left><SPAN style="FONT-FAMILY: 黑体; COLOR: #231f20; FONT-SIZE: 12pt; mso-bidi-font-family: 黑体; mso-font-kerning: 0pt">一<SPAN lang=EN-US><SPAN style="mso-spacerun: yes">&nbsp; </SPAN></SPAN>云计算架构服务层</SPAN></P>
<P style="TEXT-ALIGN: left; MARGIN: 0cm 0cm 0pt; mso-layout-grid-align: none" class=MsoNormal align=left><SPAN style="FONT-FAMILY: 黑体; COLOR: #231f20; FONT-SIZE: 12pt; mso-bidi-font-family: 黑体; mso-font-kerning: 0pt"><SPAN lang=EN-US></SPAN></SPAN>&nbsp;</P>
<P style="TEXT-ALIGN: left; TEXT-INDENT: 21pt; MARGIN: 0cm 0cm 0pt; mso-char-indent-count: 2.0; mso-layout-grid-align: none" class=MsoNormal align=left><SPAN style="FONT-FAMILY: 宋体; COLOR: #333333; mso-bidi-font-family: 宋体; mso-hansi-font-family: Tahoma; mso-font-kerning: 0pt; mso-bidi-font-size: 10.5pt">尽管在互联网的第一次革命中三层<SPAN lang=EN-US> (</SPAN>或<SPAN lang=EN-US> n </SPAN>层<SPAN lang=EN-US>) </SPAN>模型作为一般架构出现,但虚拟化在云中的应用创造出一组新层:应用程序、服务和基础设施。这些层不只封装按需提供的资源,而且还定义了一个新的应用程序开发模式。同时在每个抽象层中,存在定义根据使用情况提供的服务的无数商业机会。</SPAN><SPAN style="FONT-FAMILY: 黑体; COLOR: #414142; FONT-SIZE: 11pt; mso-bidi-font-family: 黑体; mso-font-kerning: 0pt" lang=EN-US></SPAN></P>
<P style="TEXT-ALIGN: left; MARGIN: 0cm 0cm 0pt; mso-layout-grid-align: none" class=MsoNormal align=left><B style="mso-bidi-font-weight: normal"><SPAN style="FONT-FAMILY: 黑体; COLOR: #231f20; FONT-SIZE: 11pt; mso-bidi-font-family: 黑体; mso-font-kerning: 0pt" lang=EN-US>1</SPAN></B><B style="mso-bidi-font-weight: normal"><SPAN style="FONT-FAMILY: 黑体; COLOR: #231f20; FONT-SIZE: 11pt; mso-bidi-font-family: 黑体; mso-font-kerning: 0pt">、把软件当作服务 </SPAN></B><B style="mso-bidi-font-weight: normal"><SPAN style="FONT-FAMILY: 'SunSans-Demi','sans-serif'; COLOR: #231f20; FONT-SIZE: 11pt; mso-bidi-font-family: SunSans-Demi; mso-font-kerning: 0pt; mso-fareast-font-family: 黑体" lang=EN-US>(SaaS)</SPAN></B></P>
<P style="TEXT-ALIGN: left; TEXT-INDENT: 21pt; MARGIN: 0cm 0cm 0pt; mso-char-indent-count: 2.0; mso-layout-grid-align: none" class=MsoNormal align=left><SPAN style="FONT-FAMILY: 宋体; COLOR: #333333; mso-bidi-font-family: 宋体; mso-hansi-font-family: Tahoma; mso-font-kerning: 0pt; mso-bidi-font-size: 10.5pt" lang=EN-US>SaaS </SPAN><SPAN style="FONT-FAMILY: 宋体; COLOR: #333333; mso-bidi-font-family: 宋体; mso-hansi-font-family: Tahoma; mso-font-kerning: 0pt; mso-bidi-font-size: 10.5pt">是最高层,其特色是包含一个通过多重租用<SPAN lang=EN-US> (Multitenancy) </SPAN>根据需要作为一项服务提供的完整应用程序。所谓“多重租用”是指单个软件实例运行于提供商的基础设施,并为多个客户机构提供服务。<SPAN lang=EN-US></SPAN></SPAN></P>
<P style="TEXT-ALIGN: left; TEXT-INDENT: 21pt; MARGIN: 0cm 0cm 0pt; mso-char-indent-count: 2.0; mso-layout-grid-align: none" class=MsoNormal align=left><SPAN style="FONT-FAMILY: 宋体; COLOR: #333333; mso-bidi-font-family: 宋体; mso-hansi-font-family: Tahoma; mso-font-kerning: 0pt; mso-bidi-font-size: 10.5pt">最为人所知的<SPAN lang=EN-US> SaaS </SPAN>示例是<SPAN lang=EN-US> Salesforce.com</SPAN>,不过现在有了许多其他示例,其中包括<SPAN lang=EN-US>Google Apps</SPAN>,提供基本商业服务,如电子邮件。当然,<SPAN lang=EN-US>Salesforce.com </SPAN>的多重租用应用程序领先于云计算的定义好几年时间。另一方面,就像云计算中的许多其他层一样,<SPAN lang=EN-US>Salesforce.com </SPAN>现在的<SPAN lang=EN-US> Force.com </SPAN>版本不只在一个云层工作,<SPAN lang=EN-US>Force.com </SPAN>是一个辅助性应用程序开发环境,或当作服务的平台。<SPAN lang=EN-US></SPAN></SPAN></P>
<P style="TEXT-ALIGN: left; MARGIN: 0cm 0cm 0pt; mso-layout-grid-align: none" class=MsoNormal align=left><B style="mso-bidi-font-weight: normal"><SPAN style="FONT-FAMILY: 黑体; COLOR: #231f20; FONT-SIZE: 11pt; mso-bidi-font-family: 黑体; mso-font-kerning: 0pt" lang=EN-US>2</SPAN></B><B style="mso-bidi-font-weight: normal"><SPAN style="FONT-FAMILY: 黑体; COLOR: #231f20; FONT-SIZE: 11pt; mso-bidi-font-family: 黑体; mso-font-kerning: 0pt">、把平台当作服务 </SPAN></B><B style="mso-bidi-font-weight: normal"><SPAN style="FONT-FAMILY: 'SunSans-Demi','sans-serif'; COLOR: #231f20; FONT-SIZE: 11pt; mso-bidi-font-family: SunSans-Demi; mso-font-kerning: 0pt; mso-fareast-font-family: 黑体" lang=EN-US>(PaaS)</SPAN></B></P>
<P style="TEXT-ALIGN: left; TEXT-INDENT: 21pt; MARGIN: 0cm 0cm 0pt; mso-char-indent-count: 2.0; mso-layout-grid-align: none" class=MsoNormal align=left><SPAN style="FONT-FAMILY: 宋体; COLOR: #333333; mso-bidi-font-family: 宋体; mso-hansi-font-family: Tahoma; mso-font-kerning: 0pt; mso-bidi-font-size: 10.5pt">中间层<SPAN lang=EN-US> (</SPAN>或<SPAN lang=EN-US> PaaS) </SPAN>是对开发环境抽象的封装和对有效服务负载的封装。原形有效负载是一个<SPAN lang=EN-US> Xen </SPAN>映像<SPAN lang=EN-US> (Amazon Web </SPAN>服务的组成部分<SPAN lang=EN-US>)</SPAN>,该映像包含一个基本<SPAN lang=EN-US> Web </SPAN>栈<SPAN lang=EN-US>(</SPAN>例如,一个<SPAN lang=EN-US> Linux </SPAN>发行套件、一个<SPAN lang=EN-US> Web </SPAN>服务器,以及一个编程环境,如<SPAN lang=EN-US> Pearl </SPAN>或<SPAN lang=EN-US> Ruby)</SPAN>。<SPAN lang=EN-US>PaaS </SPAN>产品可执行各个阶段的软件开发和测试,也可以专用于某个领域,例如,内容管理。商业示例包括<SPAN lang=EN-US> Google App Engine</SPAN>,它在<SPAN lang=EN-US> Google </SPAN>的基础设施上提供应用程序服务。上述<SPAN lang=EN-US>PaaS </SPAN>服务可以提供极大的灵活性,但可能会受到通过供应商提供的能力的制约。<SPAN lang=EN-US></SPAN></SPAN></P>
<P style="TEXT-ALIGN: left; MARGIN: 0cm 0cm 0pt; mso-layout-grid-align: none" class=MsoNormal align=left><B style="mso-bidi-font-weight: normal"><SPAN style="FONT-FAMILY: 黑体; COLOR: #231f20; FONT-SIZE: 11pt; mso-bidi-font-family: 黑体; mso-font-kerning: 0pt" lang=EN-US>3</SPAN></B><B style="mso-bidi-font-weight: normal"><SPAN style="FONT-FAMILY: 黑体; COLOR: #231f20; FONT-SIZE: 11pt; mso-bidi-font-family: 黑体; mso-font-kerning: 0pt">、把基础设施当作服务 </SPAN></B><B style="mso-bidi-font-weight: normal"><SPAN style="FONT-FAMILY: 'SunSans-Demi','sans-serif'; COLOR: #231f20; FONT-SIZE: 11pt; mso-bidi-font-family: SunSans-Demi; mso-font-kerning: 0pt; mso-fareast-font-family: 黑体" lang=EN-US>(IaaS)</SPAN></B></P>
<P style="TEXT-ALIGN: left; TEXT-INDENT: 21pt; MARGIN: 0cm 0cm 0pt; mso-char-indent-count: 2.0; mso-layout-grid-align: none" class=MsoNormal align=left><SPAN style="FONT-FAMILY: 宋体; COLOR: #333333; mso-bidi-font-family: 宋体; mso-hansi-font-family: Tahoma; mso-font-kerning: 0pt; mso-bidi-font-size: 10.5pt">把基础设施当作服务<SPAN lang=EN-US> (IaaS) </SPAN>处于最低层级,而且是一种作为标准化服务在网上提供基本存储和计算能力的手段。服务器、存储系统、交换机、路由器和其他系统协作<SPAN lang=EN-US> (</SPAN>例如,通过虚拟化技术<SPAN lang=EN-US>) </SPAN>处理特定类型的工作负载 — 从批处理到峰值负载期间的服务器<SPAN lang=EN-US>/</SPAN>存储扩大。<SPAN lang=EN-US></SPAN></SPAN></P>
<P style="TEXT-ALIGN: left; TEXT-INDENT: 21pt; MARGIN: 0cm 0cm 0pt; mso-char-indent-count: 2.0; mso-layout-grid-align: none" class=MsoNormal align=left><SPAN style="FONT-FAMILY: 宋体; COLOR: #333333; mso-bidi-font-family: 宋体; mso-hansi-font-family: Tahoma; mso-font-kerning: 0pt; mso-bidi-font-size: 10.5pt">最著名的商业示例是<SPAN lang=EN-US> Amazon Web </SPAN>服务<SPAN lang=EN-US> (AWS)</SPAN>,其<SPAN lang=EN-US> EC2 </SPAN>和<SPAN lang=EN-US> S3 </SPAN>服务分别提供基本计算和存储服务。另一个示例是<SPAN lang=EN-US> Joyent</SPAN>,其主要产品是一系列虚拟化服务器,这些服务器提供运行网站的高度可扩展的随需应变基础设施,包括用<SPAN lang=EN-US> Ruby on Rails</SPAN>、<SPAN lang=EN-US>PHP</SPAN>、<SPAN lang=EN-US>Python</SPAN>和<SPAN lang=EN-US> Java </SPAN>编写的丰富<SPAN lang=EN-US> Web </SPAN>应用程序。<SPAN lang=EN-US></SPAN></SPAN></P>
<P style="TEXT-ALIGN: left; TEXT-INDENT: 21pt; MARGIN: 0cm 0cm 0pt; mso-char-indent-count: 2.0; mso-layout-grid-align: none" class=MsoNormal align=left><SPAN style="FONT-FAMILY: 宋体; COLOR: #333333; mso-bidi-font-family: 宋体; mso-hansi-font-family: Tahoma; mso-font-kerning: 0pt; mso-bidi-font-size: 10.5pt" lang=EN-US></SPAN>&nbsp;</P>
<P style="TEXT-ALIGN: left; TEXT-INDENT: 22pt; MARGIN: 0cm 0cm 0pt; mso-char-indent-count: 2.0; mso-layout-grid-align: none" class=MsoNormal align=left><SPAN style="FONT-FAMILY: 黑体; COLOR: #414142; FONT-SIZE: 11pt; mso-bidi-font-family: 黑体; mso-font-kerning: 0pt" lang=EN-US></SPAN>&nbsp;</P>
<P style="TEXT-ALIGN: left; MARGIN: 0cm 0cm 0pt; mso-layout-grid-align: none" class=MsoNormal align=left><B style="mso-bidi-font-weight: normal"><SPAN style="FONT-FAMILY: 宋体; COLOR: #333333; FONT-SIZE: 12pt; mso-bidi-font-family: Tahoma; mso-hansi-font-family: Tahoma; mso-font-kerning: 0pt; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-ascii-font-family: Tahoma">二</SPAN></B><B style="mso-bidi-font-weight: normal"><SPAN style="FONT-FAMILY: 'Tahoma','sans-serif'; COLOR: #333333; FONT-SIZE: 12pt; mso-font-kerning: 0pt" lang=EN-US><SPAN style="mso-spacerun: yes">&nbsp; </SPAN></SPAN></B><B style="mso-bidi-font-weight: normal"><SPAN style="FONT-FAMILY: 宋体; COLOR: #333333; FONT-SIZE: 12pt; mso-bidi-font-family: Tahoma; mso-hansi-font-family: Tahoma; mso-font-kerning: 0pt; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-ascii-font-family: Tahoma">云计算的一种实现形式</SPAN></B><B style="mso-bidi-font-weight: normal"><SPAN style="FONT-FAMILY: 'Tahoma','sans-serif'; COLOR: #333333; FONT-SIZE: 12pt; mso-font-kerning: 0pt" lang=EN-US>Hadoop</SPAN></B></P>
<P style="TEXT-ALIGN: left; MARGIN: 0cm 0cm 0pt; mso-layout-grid-align: none" class=MsoNormal align=left><B style="mso-bidi-font-weight: normal"><SPAN style="FONT-FAMILY: 'Tahoma','sans-serif'; COLOR: #333333; FONT-SIZE: 12pt; mso-font-kerning: 0pt" lang=EN-US></SPAN></B>&nbsp;</P>
<P style="TEXT-ALIGN: left; TEXT-INDENT: 26.25pt; MARGIN: 0cm 0cm 0pt; mso-char-indent-count: 2.5; mso-layout-grid-align: none" class=MsoNormal align=left><SPAN style="FONT-FAMILY: 'Tahoma','sans-serif'; COLOR: #333333; mso-font-kerning: 0pt; mso-bidi-font-size: 10.5pt" lang=EN-US>Hadoop</SPAN><SPAN style="FONT-FAMILY: 宋体; COLOR: #333333; mso-bidi-font-family: Tahoma; mso-hansi-font-family: Tahoma; mso-font-kerning: 0pt; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-ascii-font-family: Tahoma; mso-bidi-font-size: 10.5pt">作为云计算的重要实现形式,在此简单介绍一下。</SPAN><SPAN style="FONT-FAMILY: 'Tahoma','sans-serif'; COLOR: #333333; mso-font-kerning: 0pt; mso-bidi-font-size: 10.5pt" lang=EN-US></SPAN></P>
<P style="TEXT-ALIGN: left; TEXT-INDENT: 26.25pt; MARGIN: 0cm 0cm 0pt; mso-char-indent-count: 2.5; mso-layout-grid-align: none" class=MsoNormal align=left><SPAN style="FONT-FAMILY: 'Tahoma','sans-serif'; COLOR: #333333; mso-font-kerning: 0pt; mso-bidi-font-size: 10.5pt" lang=EN-US>Hadoop</SPAN><SPAN style="FONT-FAMILY: 宋体; COLOR: #333333; mso-bidi-font-family: 宋体; mso-hansi-font-family: Tahoma; mso-font-kerning: 0pt; mso-bidi-font-size: 10.5pt">框架中最核心的设计就是:</SPAN><SPAN style="FONT-FAMILY: 'Tahoma','sans-serif'; COLOR: #333333; mso-font-kerning: 0pt; mso-bidi-font-size: 10.5pt" lang=EN-US>MapReduce </SPAN><SPAN style="FONT-FAMILY: 宋体; COLOR: #333333; mso-bidi-font-family: 宋体; mso-hansi-font-family: Tahoma; mso-font-kerning: 0pt; mso-bidi-font-size: 10.5pt">和</SPAN><SPAN style="FONT-FAMILY: 'Tahoma','sans-serif'; COLOR: #333333; mso-font-kerning: 0pt; mso-bidi-font-size: 10.5pt" lang=EN-US>HDFS</SPAN><SPAN style="FONT-FAMILY: 宋体; COLOR: #333333; mso-bidi-font-family: 宋体; mso-hansi-font-family: Tahoma; mso-font-kerning: 0pt; mso-bidi-font-size: 10.5pt">。</SPAN><SPAN style="FONT-FAMILY: 'Tahoma','sans-serif'; COLOR: #333333; mso-font-kerning: 0pt; mso-bidi-font-size: 10.5pt" lang=EN-US>MapReduce </SPAN><SPAN style="FONT-FAMILY: 宋体; COLOR: #333333; mso-bidi-font-family: 宋体; mso-hansi-font-family: Tahoma; mso-font-kerning: 0pt; mso-bidi-font-size: 10.5pt">的思想是由</SPAN><SPAN style="FONT-FAMILY: 'Tahoma','sans-serif'; COLOR: #333333; mso-font-kerning: 0pt; mso-bidi-font-size: 10.5pt" lang=EN-US>Google</SPAN><SPAN style="FONT-FAMILY: 宋体; COLOR: #333333; mso-bidi-font-family: 宋体; mso-hansi-font-family: Tahoma; mso-font-kerning: 0pt; mso-bidi-font-size: 10.5pt">的一篇论文所提及而被广为流传的, 简单的一句话解释</SPAN><SPAN style="FONT-FAMILY: 'Tahoma','sans-serif'; COLOR: #333333; mso-font-kerning: 0pt; mso-bidi-font-size: 10.5pt" lang=EN-US>MapReduce</SPAN><SPAN style="FONT-FAMILY: 宋体; COLOR: #333333; mso-bidi-font-family: 宋体; mso-hansi-font-family: Tahoma; mso-font-kerning: 0pt; mso-bidi-font-size: 10.5pt">就是</SPAN><SPAN style="FONT-FAMILY: 'Tahoma','sans-serif'; COLOR: #333333; mso-font-kerning: 0pt; mso-bidi-font-size: 10.5pt" lang=EN-US>“</SPAN><SPAN style="FONT-FAMILY: 宋体; COLOR: #333333; mso-bidi-font-family: 宋体; mso-hansi-font-family: Tahoma; mso-font-kerning: 0pt; mso-bidi-font-size: 10.5pt">任务的分解与结果的汇总</SPAN><SPAN style="FONT-FAMILY: 'Tahoma','sans-serif'; COLOR: #333333; mso-font-kerning: 0pt; mso-bidi-font-size: 10.5pt" lang=EN-US>”</SPAN><SPAN style="FONT-FAMILY: 宋体; COLOR: #333333; mso-bidi-font-family: 宋体; mso-hansi-font-family: Tahoma; mso-font-kerning: 0pt; mso-bidi-font-size: 10.5pt">。</SPAN><SPAN style="FONT-FAMILY: 'Tahoma','sans-serif'; COLOR: #333333; mso-font-kerning: 0pt; mso-bidi-font-size: 10.5pt" lang=EN-US>HDFS </SPAN><SPAN style="FONT-FAMILY: 宋体; COLOR: #333333; mso-bidi-font-family: 宋体; mso-hansi-font-family: Tahoma; mso-font-kerning: 0pt; mso-bidi-font-size: 10.5pt">是</SPAN><SPAN style="FONT-FAMILY: 'Tahoma','sans-serif'; COLOR: #333333; mso-font-kerning: 0pt; mso-bidi-font-size: 10.5pt" lang=EN-US>Hadoop</SPAN><SPAN style="FONT-FAMILY: 宋体; COLOR: #333333; mso-bidi-font-family: 宋体; mso-hansi-font-family: Tahoma; mso-font-kerning: 0pt; mso-bidi-font-size: 10.5pt">分布式文件系统(</SPAN><SPAN style="FONT-FAMILY: 'Tahoma','sans-serif'; COLOR: #333333; mso-font-kerning: 0pt; mso-bidi-font-size: 10.5pt" lang=EN-US>Hadoop Distributed File System</SPAN><SPAN style="FONT-FAMILY: 宋体; COLOR: #333333; mso-bidi-font-family: 宋体; mso-hansi-font-family: Tahoma; mso-font-kerning: 0pt; mso-bidi-font-size: 10.5pt">)的缩写,为分布式计算存储提供了底层支持。</SPAN><SPAN style="FONT-FAMILY: 'Tahoma','sans-serif'; COLOR: #333333; mso-font-kerning: 0pt; mso-bidi-font-size: 10.5pt" lang=EN-US></SPAN></P>
<P style="TEXT-ALIGN: left; TEXT-INDENT: 36.75pt; MARGIN: 0cm 0cm 0pt; mso-char-indent-count: 3.5; mso-layout-grid-align: none" class=MsoNormal align=left><SPAN style="FONT-FAMILY: 'Tahoma','sans-serif'; COLOR: #333333; mso-font-kerning: 0pt; mso-bidi-font-size: 10.5pt" lang=EN-US>MapReduce</SPAN><SPAN style="FONT-FAMILY: 宋体; COLOR: #333333; mso-bidi-font-family: 宋体; mso-hansi-font-family: Tahoma; mso-font-kerning: 0pt; mso-bidi-font-size: 10.5pt">从它名字上来看就大致可以看出个缘由,两个动词</SPAN><SPAN style="FONT-FAMILY: 'Tahoma','sans-serif'; COLOR: #333333; mso-font-kerning: 0pt; mso-bidi-font-size: 10.5pt" lang=EN-US>Map</SPAN><SPAN style="FONT-FAMILY: 宋体; COLOR: #333333; mso-bidi-font-family: 宋体; mso-hansi-font-family: Tahoma; mso-font-kerning: 0pt; mso-bidi-font-size: 10.5pt">和</SPAN><SPAN style="FONT-FAMILY: 'Tahoma','sans-serif'; COLOR: #333333; mso-font-kerning: 0pt; mso-bidi-font-size: 10.5pt" lang=EN-US>Reduce</SPAN><SPAN style="FONT-FAMILY: 宋体; COLOR: #333333; mso-bidi-font-family: 宋体; mso-hansi-font-family: Tahoma; mso-font-kerning: 0pt; mso-bidi-font-size: 10.5pt">,</SPAN><SPAN style="FONT-FAMILY: 'Tahoma','sans-serif'; COLOR: #333333; mso-font-kerning: 0pt; mso-bidi-font-size: 10.5pt" lang=EN-US>“Map</SPAN><SPAN style="FONT-FAMILY: 宋体; COLOR: #333333; mso-bidi-font-family: 宋体; mso-hansi-font-family: Tahoma; mso-font-kerning: 0pt; mso-bidi-font-size: 10.5pt">(展开)</SPAN><SPAN style="FONT-FAMILY: 'Tahoma','sans-serif'; COLOR: #333333; mso-font-kerning: 0pt; mso-bidi-font-size: 10.5pt" lang=EN-US>”</SPAN><SPAN style="FONT-FAMILY: 宋体; COLOR: #333333; mso-bidi-font-family: 宋体; mso-hansi-font-family: Tahoma; mso-font-kerning: 0pt; mso-bidi-font-size: 10.5pt">就是将一个任务分解成为多个任务,</SPAN><SPAN style="FONT-FAMILY: 'Tahoma','sans-serif'; COLOR: #333333; mso-font-kerning: 0pt; mso-bidi-font-size: 10.5pt" lang=EN-US>“Reduce”</SPAN><SPAN style="FONT-FAMILY: 宋体; COLOR: #333333; mso-bidi-font-family: 宋体; mso-hansi-font-family: Tahoma; mso-font-kerning: 0pt; mso-bidi-font-size: 10.5pt">就是将分解后多任务处理的结果汇总起来,得出最后的分析结果。这不是什么新思想,其实在前面提到的多线程,多任务的设计就可以找到这种思想的影子。不论是现实社会,还是在程序设计中,一项工作往往可以被拆分成为多个任务,任务之间的关系可以分为两种:一种是不相关的任务,可以并行执行;另一种是任务之间有相互的依赖,先后顺序不能够颠倒,这类任务是无法并行处理的。回到大学时期,教授上课时让大家去分析关键路径,无非就是找最省时的任务分解执行方式。在分布式系统中,机器集群就可以看作硬件资源池,将并行的任务拆分,然后交由每一个空闲机器资源去处理,能够极大地提高计算效率,同时这种资源无关性,对于计算集群的扩展无疑提供了最好的设计保证。任务分解处理以后,那就需要将处理以后的结果再汇总起来,这就是</SPAN><SPAN style="FONT-FAMILY: 'Tahoma','sans-serif'; COLOR: #333333; mso-font-kerning: 0pt; mso-bidi-font-size: 10.5pt" lang=EN-US>Reduce</SPAN><SPAN style="FONT-FAMILY: 宋体; COLOR: #333333; mso-bidi-font-family: 宋体; mso-hansi-font-family: Tahoma; mso-font-kerning: 0pt; mso-bidi-font-size: 10.5pt">要做的工作。<SPAN lang=EN-US></SPAN></SPAN></P>
<P style="TEXT-ALIGN: left; MARGIN: 0cm 0cm 0pt; mso-layout-grid-align: none" class=MsoNormal align=left>
<TABLE cellSpacing=0 cellPadding=0 width="100%">
<TBODY>
<TR>
<TD style="BORDER-BOTTOM-COLOR: #ece9d8; BACKGROUND-COLOR: transparent; BORDER-TOP-COLOR: #ece9d8; BORDER-RIGHT-COLOR: #ece9d8; BORDER-LEFT-COLOR: #ece9d8">
<DIV>
<P style="MARGIN: 0cm 0cm 0pt" class=MsoNormal><SPAN lang=EN-US><FONT face=Calibri></FONT></SPAN></P></DIV></TD></TR></TBODY></TABLE></P>
<DIV>&nbsp;<IMG src="file:///C:/Documents%20and%20Settings/Administrator/Application%20Data/Tencent/Users/953672236/QQ/WinTemp/RichOle/K2DDFM~M%25V1IR932)9HKJXL.jpg"></DIV>
<P style="TEXT-ALIGN: left; MARGIN: 0cm 0cm 0pt; mso-layout-grid-align: none" class=MsoNormal align=left><B><SPAN style="FONT-FAMILY: 'Tahoma','sans-serif'; COLOR: #333333; mso-font-kerning: 0pt; mso-bidi-font-size: 10.5pt" lang=EN-US>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; MapReduce</SPAN></B><B><SPAN style="FONT-FAMILY: SimSun,Bold; COLOR: #333333; mso-bidi-font-family: 'SimSun,Bold'; mso-hansi-font-family: Tahoma; mso-font-kerning: 0pt; mso-bidi-font-size: 10.5pt">结构示意图<SPAN lang=EN-US></SPAN></SPAN></B></P>
<P style="TEXT-ALIGN: left; TEXT-INDENT: 21pt; MARGIN: 0cm 0cm 0pt; mso-char-indent-count: 2.0; mso-layout-grid-align: none" class=MsoNormal align=left><SPAN style="FONT-FAMILY: 宋体; COLOR: #333333; mso-bidi-font-family: 宋体; mso-hansi-font-family: Tahoma; mso-font-kerning: 0pt; mso-bidi-font-size: 10.5pt" lang=EN-US></SPAN>&nbsp;</P>
<P style="TEXT-ALIGN: left; TEXT-INDENT: 21pt; MARGIN: 0cm 0cm 0pt; mso-char-indent-count: 2.0; mso-layout-grid-align: none" class=MsoNormal align=left><SPAN style="FONT-FAMILY: 宋体; COLOR: #333333; mso-bidi-font-family: 宋体; mso-hansi-font-family: Tahoma; mso-font-kerning: 0pt; mso-bidi-font-size: 10.5pt">上图就是</SPAN><SPAN style="FONT-FAMILY: 'Tahoma','sans-serif'; COLOR: #333333; mso-font-kerning: 0pt; mso-bidi-font-size: 10.5pt" lang=EN-US>MapReduce </SPAN><SPAN style="FONT-FAMILY: 宋体; COLOR: #333333; mso-bidi-font-family: 宋体; mso-hansi-font-family: Tahoma; mso-font-kerning: 0pt; mso-bidi-font-size: 10.5pt">大致的结构图,在</SPAN><SPAN style="FONT-FAMILY: 'Tahoma','sans-serif'; COLOR: #333333; mso-font-kerning: 0pt; mso-bidi-font-size: 10.5pt" lang=EN-US>Map</SPAN><SPAN style="FONT-FAMILY: 宋体; COLOR: #333333; mso-bidi-font-family: 宋体; mso-hansi-font-family: Tahoma; mso-font-kerning: 0pt; mso-bidi-font-size: 10.5pt">前还可能会对输入的数据有</SPAN><SPAN style="FONT-FAMILY: 'Tahoma','sans-serif'; COLOR: #333333; mso-font-kerning: 0pt; mso-bidi-font-size: 10.5pt" lang=EN-US>Split</SPAN><SPAN style="FONT-FAMILY: 宋体; COLOR: #333333; mso-bidi-font-family: 宋体; mso-hansi-font-family: Tahoma; mso-font-kerning: 0pt; mso-bidi-font-size: 10.5pt">(分割)的过程,保证任务并行效率,在</SPAN><SPAN style="FONT-FAMILY: 'Tahoma','sans-serif'; COLOR: #333333; mso-font-kerning: 0pt; mso-bidi-font-size: 10.5pt" lang=EN-US>Map</SPAN><SPAN style="FONT-FAMILY: 宋体; COLOR: #333333; mso-bidi-font-family: 宋体; mso-hansi-font-family: Tahoma; mso-font-kerning: 0pt; mso-bidi-font-size: 10.5pt">之后还会有</SPAN><SPAN style="FONT-FAMILY: 'Tahoma','sans-serif'; COLOR: #333333; mso-font-kerning: 0pt; mso-bidi-font-size: 10.5pt" lang=EN-US>Shuffle</SPAN><SPAN style="FONT-FAMILY: 宋体; COLOR: #333333; mso-bidi-font-family: 宋体; mso-hansi-font-family: Tahoma; mso-font-kerning: 0pt; mso-bidi-font-size: 10.5pt">(混合)的过程,对于提高</SPAN><SPAN style="FONT-FAMILY: 'Tahoma','sans-serif'; COLOR: #333333; mso-font-kerning: 0pt; mso-bidi-font-size: 10.5pt" lang=EN-US>Reduce</SPAN><SPAN style="FONT-FAMILY: 宋体; COLOR: #333333; mso-bidi-font-family: 宋体; mso-hansi-font-family: Tahoma; mso-font-kerning: 0pt; mso-bidi-font-size: 10.5pt">的效率以及减小数据传输的压力有很大的帮助。后面会具体提及这些部分的细节。</SPAN><SPAN style="FONT-FAMILY: 'Tahoma','sans-serif'; COLOR: #333333; mso-font-kerning: 0pt; mso-bidi-font-size: 10.5pt" lang=EN-US>HDFS </SPAN><SPAN style="FONT-FAMILY: 宋体; COLOR: #333333; mso-bidi-font-family: 宋体; mso-hansi-font-family: Tahoma; mso-font-kerning: 0pt; mso-bidi-font-size: 10.5pt">是分布式计算的存储基石,</SPAN><SPAN style="FONT-FAMILY: 'Tahoma','sans-serif'; COLOR: #333333; mso-font-kerning: 0pt; mso-bidi-font-size: 10.5pt" lang=EN-US>Hadoop</SPAN><SPAN style="FONT-FAMILY: 宋体; COLOR: #333333; mso-bidi-font-family: 宋体; mso-hansi-font-family: Tahoma; mso-font-kerning: 0pt; mso-bidi-font-size: 10.5pt">的分布式文件系统和其他分布式文件系统有很多类似的特质。分布式文件系统基本的几个特点:<SPAN lang=EN-US></SPAN></SPAN></P>
<P style="TEXT-ALIGN: left; MARGIN: 0cm 0cm 0pt; mso-layout-grid-align: none" class=MsoNormal align=left><SPAN style="FONT-FAMILY: 'Tahoma','sans-serif'; COLOR: #333333; mso-font-kerning: 0pt; mso-bidi-font-size: 10.5pt" lang=EN-US>1. </SPAN><SPAN style="FONT-FAMILY: 宋体; COLOR: #333333; mso-bidi-font-family: 宋体; mso-hansi-font-family: Tahoma; mso-font-kerning: 0pt; mso-bidi-font-size: 10.5pt">对于整个集群有单一的命名空间。<SPAN lang=EN-US></SPAN></SPAN></P>
<P style="TEXT-ALIGN: left; MARGIN: 0cm 0cm 0pt; mso-layout-grid-align: none" class=MsoNormal align=left><SPAN style="FONT-FAMILY: 'Tahoma','sans-serif'; COLOR: #333333; mso-font-kerning: 0pt; mso-bidi-font-size: 10.5pt" lang=EN-US>2. </SPAN><SPAN style="FONT-FAMILY: 宋体; COLOR: #333333; mso-bidi-font-family: 宋体; mso-hansi-font-family: Tahoma; mso-font-kerning: 0pt; mso-bidi-font-size: 10.5pt">数据一致性。适合一次写入多次读取的模型,客户端在文件没有被成功创建之前无法看到文件存在。<SPAN lang=EN-US></SPAN></SPAN></P>
<P style="TEXT-ALIGN: left; MARGIN: 0cm 0cm 0pt; mso-layout-grid-align: none" class=MsoNormal align=left><SPAN style="FONT-FAMILY: 'Tahoma','sans-serif'; COLOR: #333333; mso-font-kerning: 0pt; mso-bidi-font-size: 10.5pt" lang=EN-US>3. </SPAN><SPAN style="FONT-FAMILY: 宋体; COLOR: #333333; mso-bidi-font-family: 宋体; mso-hansi-font-family: Tahoma; mso-font-kerning: 0pt; mso-bidi-font-size: 10.5pt">文件会被分割成多个文件块,每个文件块被分配存储到数据节点上,而且根据配置会由</SPAN><SPAN style="FONT-FAMILY: 宋体; COLOR: #333333; mso-bidi-font-family: 宋体; mso-hansi-font-family: Calibri; mso-font-kerning: 0pt; mso-hansi-theme-font: minor-latin; mso-bidi-font-size: 10.5pt">复制文件块来保证数据的安全性。</SPAN><SPAN style="FONT-FAMILY: 宋体; COLOR: #333333; mso-bidi-font-family: 宋体; mso-hansi-font-family: Tahoma; mso-font-kerning: 0pt; mso-bidi-font-size: 10.5pt" lang=EN-US></SPAN></P></DIV>
您需要登录后才可以回帖 登录 | 注册

本版积分规则 发表回复

  

北京盛拓优讯信息技术有限公司. 版权所有 京ICP备16024965号-6 北京市公安局海淀分局网监中心备案编号:11010802020122 niuxiaotong@pcpop.com 17352615567
未成年举报专区
中国互联网协会会员  联系我们:huangweiwei@itpub.net
感谢所有关心和支持过ChinaUnix的朋友们 转载本站内容请注明原作者名及出处

清除 Cookies - ChinaUnix - Archiver - WAP - TOP