免费注册 查看新帖 |

Chinaunix

  平台 论坛 博客 文库
最近访问板块 发新帖
查看: 2567 | 回复: 0
打印 上一主题 下一主题

[Hadoop&HBase] 关于hadoop的备份task [复制链接]

论坛徽章:
0
跳转到指定楼层
1 [收藏(0)] [报告]
发表于 2011-12-23 02:32 |只看该作者 |倒序浏览
<div><span class="Apple-style-span" style="font-family: 'Lucida Grande', 'Lucida Sans Unicode', Verdana, sans-serif; font-size: 12px; line-height: 24px; background-color: rgb(247, 243, 237); "><a href="http://dongxicheng.org/mapreduce/hadoop-schedulers/" target="_blank">http://dongxicheng.org/mapreduce/hadoop-schedulers/</a></span></div><span class="Apple-style-span" style="font-family: 'Lucida Grande', 'Lucida Sans Unicode', Verdana, sans-serif; font-size: 12px; line-height: 24px; background-color: rgb(247, 243, 237); ">现有的Hadoop调度器存在较大缺陷,主要体现在探测落后任务的算法上:如果一个task的进度落后于同类型task进度的20%,则把该task当做落后任务(这种任务决定了job的完成时间,需尽量缩短它的执行时间),从而为它启动一个备份任务(speculative task)。如果集群异构的,对于同一个task,即使是在相同节点上的执行时间也会有较大差别,因而在异构集群中很容易产生大量的备份任务。</span><div><font class="Apple-style-span" face="'Lucida Grande', 'Lucida Sans Unicode', Verdana, sans-serif"><span class="Apple-style-span" style="font-size: 12px; line-height: 24px;"><a href="http://developer.yahoo.com/hadoop/tutorial/module4.html" target="_blank" target="_blank">http://developer.yahoo.com/hadoop/tutorial/module4.html</a><br></span></font><span class="Apple-style-span" style="color: rgb(51, 51, 51); font-family: arial, helvetica, clean, sans-serif; font-size: 13px; line-height: 16px; background-color: rgb(255, 255, 255); "><p style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; padding-bottom: 1em; line-height: 1.49em; "><b>Speculative execution:</b>&nbsp;One problem with the Hadoop system is that by dividing the tasks across many nodes, it is possible for a few slow nodes to rate-limit the rest of the program. For example if one node has a slow disk controller, then it may be reading its input at only 10% the speed of all the other nodes. So when 99 map tasks are already complete, the system is still waiting for the final map task to check in, which takes much longer than all the other nodes.</p><p style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; padding-bottom: 1em; line-height: 1.49em; ">By forcing tasks to run in isolation from one another, individual tasks do not know&nbsp;<i>where</i>&nbsp;their inputs come from. Tasks trust the Hadoop platform to just deliver the appropriate input. Therefore, the same input can be processed&nbsp;<i>multiple times in parallel</i>, to exploit differences in machine capabilities. As most of the tasks in a job are coming to a close, the Hadoop platform will schedule redundant copies of the remaining tasks across several nodes which do not have other work to perform. This process is known as<i>speculative execution</i>. When tasks complete, they announce this fact to the JobTracker. Whichever copy of a task finishes first becomes the definitive copy. If other copies were executing speculatively, Hadoop tells the TaskTrackers to abandon the tasks and discard their outputs. The Reducers then receive their inputs from whichever Mapper completed successfully, first.</p><p style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; padding-bottom: 1em; line-height: 1.49em; ">Speculative execution is enabled by default. You can disable speculative execution for the mappers and reducers by setting the&nbsp;mapred.map.tasks.speculative.execution&nbsp;andmapred.reduce.tasks.speculative.execution&nbsp;JobConf options to&nbsp;false, respectively.</p><p style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; padding-bottom: 1em; line-height: 1.49em; ">关于一个job启动了几个备份task,可以从:50030/jobtracker.jsp中看到,基本上kill掉的就是备份task</p><p style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; padding-bottom: 1em; line-height: 1.49em; "><a href="http://blog.chinaunix.net/attachment/201110/21/1838361_1319176214fDDq.png" target="_blank" target="_blank"><img src="http://blog.chinaunix.net/attachment/201110/21/1838361_1319176214fDDq.png" .load="imgResize(this, 650);" border="0" ;=""></a></p></span></div>
您需要登录后才可以回帖 登录 | 注册

本版积分规则 发表回复

  

北京盛拓优讯信息技术有限公司. 版权所有 京ICP备16024965号-6 北京市公安局海淀分局网监中心备案编号:11010802020122 niuxiaotong@pcpop.com 17352615567
未成年举报专区
中国互联网协会会员  联系我们:huangweiwei@itpub.net
感谢所有关心和支持过ChinaUnix的朋友们 转载本站内容请注明原作者名及出处

清除 Cookies - ChinaUnix - Archiver - WAP - TOP