Hadoop初步编程
<div>1.运行简单计数程序</div><div>首先准备两个文本文件,在命令行中输入执行命令:</div>
<div>echo "hello hadoop word count">/tmp/test_file1.txt</div>
<div>echo "hello hadoop,I'm a vegetable bird">/tmp/test_file2.txt</div>
<div>将两个文件复制到dfs里,执行命令</div>
<div>bin/hadoop dfs -mkdir test-in (创建文件夹test-in)</div>
<div>bin/hadoop dfs -copyFromLocal /tmp/test*.txt test-in (复制两文件到test-in)</div>
<div>bin/hadoop dfs -ls test-in (查看是否复制成功)显示如下列表:</div>
<div>
<div class="codeText" id="codeText">
<ol class="dp-css" style="margin: 0px 1px 0px 0px; padding: 5px 0px;">
<li>Found 2 items</li>
<li>-rw-r--r-- 1 hadoop supergroup 24 2011-01-21 18:40 /user/hadoop/test-in/test_file1.txt</li>
<li>-rw-r--r-- 1 hadoop supergroup 34 2011-01-21 18:40 /user/hadoop/test-in/test_file2.txt</li></ol></div></div>
<div>注:这里的test-in其实是HDFS路径下的目录,绝对路径为“hdfs://localhost:9000/user/hadoop/test-in”</div>
<div>运行示例,执行如下命令</div>
<div>bin/hadoop jar hadoop-mapred-examples-0.21.0.jar wordcount test-in test-out (将生成结果输出到test-out)屏幕显示:</div>
<div>
<div class="codeText" id="codeText">
<ol class="dp-css" style="margin: 0px 1px 0px 0px; padding: 5px 0px;">
<li>11/01/21 18:50:16 INFO security.Groups: Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; cacheTimeout=300000</li>
<li>11/01/21 18:50:17 WARN conf.Configuration: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id</li>
<li>11/01/21 18:50:17 WARN mapreduce.JobSubmitter: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.</li>
<li>11/01/21 18:50:17 INFO input.FileInputFormat: Total input paths to process : 2</li>
<li>11/01/21 18:50:17 WARN conf.Configuration: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps</li>
<li>11/01/21 18:50:17 INFO mapreduce.JobSubmitter: number of splits:2</li>
<li>11/01/21 18:50:18 INFO mapreduce.JobSubmitter: adding the following namenodes' delegation tokens:null</li>
<li>11/01/21 18:50:18 INFO mapreduce.Job: Running job: job_201101211705_0001</li>
<li>11/01/21 18:50:19 INFO mapreduce.Job: map 0% reduce 0%</li>
<li>11/01/21 18:50:35 INFO mapreduce.Job: map 100% reduce 0%</li>
<li>11/01/21 18:50:44 INFO mapreduce.Job: map 100% reduce 100%</li>
<li>11/01/21 18:50:47 INFO mapreduce.Job: Job complete: job_201101211705_0001</li>
<li>11/01/21 18:50:47 INFO mapreduce.Job: Counters: 33</li>
<li>FileInputFormatCounters</li>
<li>BYTES_READ=58</li>
<li>FileSystemCounters</li>
<li>FILE_BYTES_READ=118</li>
<li>FILE_BYTES_WRITTEN=306</li>
<li>HDFS_BYTES_READ=300</li>
<li>HDFS_BYTES_WRITTEN=68</li>
<li>Shuffle Errors</li>
<li>BAD_ID=0</li>
<li>CONNECTION=0</li>
<li>IO_ERROR=0</li>
<li>WRONG_LENGTH=0</li>
<li>WRONG_MAP=0</li>
<li>WRONG_REDUCE=0</li>
<li>Job Counters </li>
<li>Data-local map tasks=2</li>
<li>Total time spent by all maps waiting after reserving slots (ms)=0</li>
<li>Total time spent by all reduces waiting after reserving slots (ms)=0</li>
<li>SLOTS_MILLIS_MAPS=22290</li>
<li>SLOTS_MILLIS_REDUCES=6539</li>
<li>Launched map tasks=2</li>
<li>Launched reduce tasks=1</li>
<li>Map-Reduce Framework</li>
<li>Combine input records=9</li>
<li>Combine output records=9</li>
<li>Failed Shuffles=0</li>
<li>GC time elapsed (ms)=642</li>
<li>Map input records=2</li>
<li>Map output bytes=94</li>
<li>Map output records=9</li>
<li>Merged Map outputs=2</li>
<li>Reduce input groups=8</li>
<li>Reduce input records=9</li>
<li>Reduce output records=8</li>
<li>Reduce shuffle bytes=124</li>
<li>Shuffled Maps =2</li>
<li>Spilled Records=18</li>
<li>SPLIT_RAW_BYTES=242</li></ol></div></div>
<div>查看执行结果:</div>
<div>bin/hadoop dfs -ls test-out 显示:</div>
<div>
<div class="codeText" id="codeText">
<ol class="dp-css" style="margin: 0px 1px 0px 0px; padding: 5px 0px;">
<li>Found 2 items</li>
<li>-rw-r--r-- 1 hadoop supergroup 0 2011-01-21 18:50 /user/hadoop/test-out/_SUCCESS</li>
<li>-rw-r--r-- 1 hadoop supergroup 68 2011-01-21 18:50 /user/hadoop/test-out/part-r-00000</li></ol></div></div>
<div> 查看最终统计结果:(执行命令)</div>
<div>bin/hadoop dfs -cat test-out/part-r-00000 显示统计结果,统计了每次词在文件中出现的次数</div>
<div>
<div class="codeText" id="codeText">
<ol class="dp-css" style="margin: 0px 1px 0px 0px; padding: 5px 0px;">
<li>a 1</li>
<li>bird 1</li>
<li>count 1</li>
<li>hadoop 1</li>
<li>hadoop,I'm 1</li>
<li>hello 2</li>
<li>vegetable 1</li>
<li>word 1</li></ol></div></div>
<div> </div>
<div><br> </div>
<div> </div>
<div> </div>
<div> </div>
<div> </div>
<div><br><br> </div>
<div> </div>
<div> </div>
<div>参考:</div>
<div> </div>
<div><a href="http://www.ibm.com/developerworks/cn/opensource/os-cn-hadoop1/" target="_blank" target="_blank">https://www.ibm.com/developerworks/cn/opensource/os-cn-hadoop1/</a></div>
<div><a href="https://www.ibm.com/developerworks/cn/opensource/os-cn-hadoop2/" target="_blank" target="_blank">https://www.ibm.com/developerworks/cn/opensource/os-cn-hadoop2/</a></div>
<div><a href="https://www.ibm.com/developerworks/cn/opensource/os-cn-hadoop3/" target="_blank" target="_blank">https://www.ibm.com/developerworks/cn/opensource/os-cn-hadoop3/</a></div>
<div><a href="http://itstarting.javaeye.com/blog/520985" target="_blank" target="_blank">http://itstarting.javaeye.com/blog/520985</a> </div>
页:
[1]