免费注册 查看新帖 |

Chinaunix

  平台 论坛 博客 文库
最近访问板块 发新帖
查看: 2229 | 回复: 0
打印 上一主题 下一主题

[Hadoop&HBase] Hadoop初步编程 [复制链接]

论坛徽章:
0
跳转到指定楼层
1 [收藏(0)] [报告]
发表于 2011-12-19 13:54 |只看该作者 |倒序浏览
<div>1.运行简单计数程序</div>
<div>首先准备两个文本文件,在命令行中输入执行命令:</div>
<div>echo "hello hadoop word count"&gt;/tmp/test_file1.txt</div>
<div>echo "hello hadoop,I'm a vegetable bird"&gt;/tmp/test_file2.txt</div>
<div>将两个文件复制到dfs里,执行命令</div>
<div>bin/hadoop dfs -mkdir test-in&nbsp;&nbsp;&nbsp;&nbsp; (创建文件夹test-in)</div>
<div>bin/hadoop dfs -copyFromLocal /tmp/test*.txt test-in&nbsp;&nbsp;&nbsp; (复制两文件到test-in)</div>
<div>bin/hadoop dfs -ls test-in&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;(查看是否复制成功)显示如下列表:</div>
<div>
<div class="codeText" id="codeText">
<ol class="dp-css" style="margin: 0px 1px 0px 0px; padding: 5px 0px;">
<li>Found 2 items</li>
<li>-rw-r--r-- 1 hadoop supergroup 24 2011-01-21 18:40 /user/hadoop/test-in/test_file1.txt</li>
<li>-rw-r--r-- 1 hadoop supergroup 34 2011-01-21 18:40 /user/hadoop/test-in/test_file2.txt</li></ol></div></div>
<div>注:这里的test-in其实是HDFS路径下的目录,绝对路径为“hdfs://localhost:9000/user/hadoop/test-in”</div>
<div>运行示例,执行如下命令</div>
<div>bin/hadoop jar hadoop-mapred-examples-0.21.0.jar wordcount test-in test-out&nbsp; (将生成结果输出到test-out)屏幕显示:</div>
<div>
<div class="codeText" id="codeText">
<ol class="dp-css" style="margin: 0px 1px 0px 0px; padding: 5px 0px;">
<li>11/01/21 18:50:16 INFO security.Groups: Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; cacheTimeout=300000</li>
<li>11/01/21 18:50:17 WARN conf.Configuration: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id</li>
<li>11/01/21 18:50:17 WARN mapreduce.JobSubmitter: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.</li>
<li>11/01/21 18:50:17 INFO input.FileInputFormat: Total input paths to process : 2</li>
<li>11/01/21 18:50:17 WARN conf.Configuration: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps</li>
<li>11/01/21 18:50:17 INFO mapreduce.JobSubmitter: number of splits:2</li>
<li>11/01/21 18:50:18 INFO mapreduce.JobSubmitter: adding the following namenodes' delegation tokens:null</li>
<li>11/01/21 18:50:18 INFO mapreduce.Job: Running job: job_201101211705_0001</li>
<li>11/01/21 18:50:19 INFO mapreduce.Job: map 0% reduce 0%</li>
<li>11/01/21 18:50:35 INFO mapreduce.Job: map 100% reduce 0%</li>
<li>11/01/21 18:50:44 INFO mapreduce.Job: map 100% reduce 100%</li>
<li>11/01/21 18:50:47 INFO mapreduce.Job: Job complete: job_201101211705_0001</li>
<li>11/01/21 18:50:47 INFO mapreduce.Job: Counters: 33</li>
<li>FileInputFormatCounters</li>
<li>BYTES_READ=58</li>
<li>FileSystemCounters</li>
<li>FILE_BYTES_READ=118</li>
<li>FILE_BYTES_WRITTEN=306</li>
<li>HDFS_BYTES_READ=300</li>
<li>HDFS_BYTES_WRITTEN=68</li>
<li>Shuffle Errors</li>
<li>BAD_ID=0</li>
<li>CONNECTION=0</li>
<li>IO_ERROR=0</li>
<li>WRONG_LENGTH=0</li>
<li>WRONG_MAP=0</li>
<li>WRONG_REDUCE=0</li>
<li>Job Counters </li>
<li>Data-local map tasks=2</li>
<li>Total time spent by all maps waiting after reserving slots (ms)=0</li>
<li>Total time spent by all reduces waiting after reserving slots (ms)=0</li>
<li>SLOTS_MILLIS_MAPS=22290</li>
<li>SLOTS_MILLIS_REDUCES=6539</li>
<li>Launched map tasks=2</li>
<li>Launched reduce tasks=1</li>
<li>Map-Reduce Framework</li>
<li>Combine input records=9</li>
<li>Combine output records=9</li>
<li>Failed Shuffles=0</li>
<li>GC time elapsed (ms)=642</li>
<li>Map input records=2</li>
<li>Map output bytes=94</li>
<li>Map output records=9</li>
<li>Merged Map outputs=2</li>
<li>Reduce input groups=8</li>
<li>Reduce input records=9</li>
<li>Reduce output records=8</li>
<li>Reduce shuffle bytes=124</li>
<li>Shuffled Maps =2</li>
<li>Spilled Records=18</li>
<li>SPLIT_RAW_BYTES=242</li></ol></div></div>
<div>查看执行结果:</div>
<div>bin/hadoop dfs -ls test-out&nbsp;&nbsp; 显示:</div>
<div>
<div class="codeText" id="codeText">
<ol class="dp-css" style="margin: 0px 1px 0px 0px; padding: 5px 0px;">
<li>Found 2 items</li>
<li>-rw-r--r-- 1 hadoop supergroup 0 2011-01-21 18:50 /user/hadoop/test-out/_SUCCESS</li>
<li>-rw-r--r-- 1 hadoop supergroup 68 2011-01-21 18:50 /user/hadoop/test-out/part-r-00000</li></ol></div></div>
<div>&nbsp;查看最终统计结果:(执行命令)</div>
<div>bin/hadoop dfs -cat&nbsp; test-out/part-r-00000&nbsp;&nbsp;&nbsp;&nbsp; 显示统计结果,统计了每次词在文件中出现的次数</div>
<div>
<div class="codeText" id="codeText">
<ol class="dp-css" style="margin: 0px 1px 0px 0px; padding: 5px 0px;">
<li>a 1</li>
<li>bird 1</li>
<li>count 1</li>
<li>hadoop 1</li>
<li>hadoop,I'm 1</li>
<li>hello 2</li>
<li>vegetable 1</li>
<li>word 1</li></ol></div></div>
<div>&nbsp;</div>
<div><br>&nbsp;</div>
<div>&nbsp;</div>
<div>&nbsp;</div>
<div>&nbsp;</div>
<div>&nbsp;</div>
<div><br><br>&nbsp;</div>
<div>&nbsp;</div>
<div>&nbsp;</div>
<div>参考:</div>
<div>&nbsp;</div>
<div><a href="http://www.ibm.com/developerworks/cn/opensource/os-cn-hadoop1/" target="_blank" target="_blank">https://www.ibm.com/developerworks/cn/opensource/os-cn-hadoop1/</a></div>
<div><a href="https://www.ibm.com/developerworks/cn/opensource/os-cn-hadoop2/" target="_blank" target="_blank">https://www.ibm.com/developerworks/cn/opensource/os-cn-hadoop2/</a></div>
<div><a href="https://www.ibm.com/developerworks/cn/opensource/os-cn-hadoop3/" target="_blank" target="_blank">https://www.ibm.com/developerworks/cn/opensource/os-cn-hadoop3/</a></div>
<div><a href="http://itstarting.javaeye.com/blog/520985" target="_blank" target="_blank">http://itstarting.javaeye.com/blog/520985</a>&nbsp;</div>
您需要登录后才可以回帖 登录 | 注册

本版积分规则 发表回复

  

北京盛拓优讯信息技术有限公司. 版权所有 京ICP备16024965号-6 北京市公安局海淀分局网监中心备案编号:11010802020122 niuxiaotong@pcpop.com 17352615567
未成年举报专区
中国互联网协会会员  联系我们:huangweiwei@itpub.net
感谢所有关心和支持过ChinaUnix的朋友们 转载本站内容请注明原作者名及出处

清除 Cookies - ChinaUnix - Archiver - WAP - TOP