免费注册 查看新帖 |

Chinaunix

  平台 论坛 博客 文库
最近访问板块 发新帖
查看: 1179 | 回复: 0

Mapreduce代码设计与分析--沈岩(第二次报告总结) [复制链接]

论坛徽章:
0
发表于 2011-12-23 02:39 |显示全部楼层
<div><span class="Apple-style-span" style="line-height: 18px; font-size: 11.6667px; color: rgb(102, 102, 102); font-family: Arial; -webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; "><p style="line-height: normal; ">mapreduce程序设计</p><p style="line-height: normal; ">import java.io.IOException;<br style="line-height: normal; ">import org.apache.hadoop.conf.Configuration;<br style="line-height: normal; ">import org.apache.hadoop.conf.Configured;<br style="line-height: normal; ">import org.apache.hadoop.fs.Path;<br style="line-height: normal; ">import org.apache.hadoop.io.LongWritable;<br style="line-height: normal; ">import org.apache.hadoop.io.Text;<br style="line-height: normal; ">import org.apache.hadoop.mapreduce.Job;<br style="line-height: normal; ">import org.apache.hadoop.mapreduce.Mapper;<br style="line-height: normal; ">import org.apache.hadoop.mapreduce.Reducer;<br style="line-height: normal; ">import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;<br style="line-height: normal; ">import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;<br style="line-height: normal; ">import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;<br style="line-height: normal; ">import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;</p><p style="line-height: normal; ">import org.apache.hadoop.util.Tool;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</p><p style="line-height: normal; ">/*org.apache.hadoop.mapreduce.lib. 取代org.apache.hadoop.mapred.xxx,这里的改变让程序员修改代码时会更加的方便,比原来能够少写很多代码</p><p style="line-height: normal; ">the old API++++++++++++++++++++++++++++++++++++++++++++++++++</p><p style="line-height: normal; ">public static class MapClass extends MapReduceBase<br style="line-height: normal; ">implements Mapper&lt;K1, V1, K2, V2&gt; {<br style="line-height: normal; ">public void map(K1 key, V1 value,<br style="line-height: normal; ">OutputCollector&lt;K2, V2&gt; output,<br style="line-height: normal; ">Reporter reporter) throws IOException { }<br style="line-height: normal; ">}<br style="line-height: normal; ">public static class Reduce extends MapReduceBase<br style="line-height: normal; ">implements Reducer&lt;K2, V2, K3, V3&gt; {<br style="line-height: normal; ">public void reduce(K2 key, Iterator&lt;V2&gt; values,<br style="line-height: normal; ">OutputCollector&lt;K3, V3&gt; output,<br style="line-height: normal; ">Reporter reporter) throws IOException { }<br style="line-height: normal; ">}</p><p style="line-height: normal; ">The new API ++++++++++++++++++++++++++++++++++++++++++++++++<br style="line-height: normal; ">public static class MapClass extends Mapper&lt;K1, V1, K2, V2&gt; {<br style="line-height: normal; ">public void map(K1 key, V1 value, Context context)<br style="line-height: normal; ">throws IOException, InterruptedException { }<br style="line-height: normal; ">}</p><p style="line-height: normal; ">public static class Reduce extends Reducer&lt;K2, V2, K3, V3&gt; {<br style="line-height: normal; ">public void reduce(K2 key, Iterable&lt;V2&gt; values, Context context)<br style="line-height: normal; ">throws IOException, InterruptedException { }<br style="line-height: normal; ">}<br style="line-height: normal; "></p><p style="line-height: normal; ">*/<br style="line-height: normal; ">import org.apache.hadoop.util.ToolRunner;<br style="line-height: normal; "><br style="line-height: normal; ">public class tt extends Configured implements Tool {<br style="line-height: normal; ">public static class MapClass<br style="line-height: normal; ">extends Mapper&lt;LongWritable, Text, Text, Text&gt; {<br style="line-height: normal; ">public void map(LongWritable key, Text value, Context context)<br style="line-height: normal; ">throws IOException, InterruptedException {<br style="line-height: normal; ">String[] citation = value.toString().split(",");//split的作用是将该字符串里面的变量赋值给citation这个字符串数组当中。<br style="line-height: normal; ">context.write(new Text(citation[1]), new Text(citation[0]));&nbsp; //使用新的API取代了collect相关的API,将map中的key和value进行了互换。<br style="line-height: normal; ">}<br style="line-height: normal; ">}<br style="line-height: normal; ">public static class Reduce extends Reducer&lt;Text, Text, Text, Text&gt; {&nbsp; //前两个参数设置是输入参数,后两个参数是输出参数。<br style="line-height: normal; ">public void reduce(Text key, Iterable&lt;Text&gt; values,<br style="line-height: normal; ">Context context)<br style="line-height: normal; ">throws IOException, InterruptedException {<br style="line-height: normal; ">String csv ="";<br style="line-height: normal; "><br style="line-height: normal; ">for (Text val:values) {//Text类型是类似于String类型的文本格式,但是在处理编码上还是和String有差别,与内存序列化有关,是hadoop经过封装之后的新类。<br style="line-height: normal; ">if (csv.length() &gt; 0) csv += ",";<br style="line-height: normal; ">csv += val.toString();<br style="line-height: normal; ">}<br style="line-height: normal; "><br style="line-height: normal; ">context.write(key, new Text(csv));<br style="line-height: normal; ">}<br style="line-height: normal; ">}<br style="line-height: normal; ">public int run(String[] args) throws Exception {&nbsp; //由hadoop本身调用该程序<br style="line-height: normal; ">Configuration conf = getConf();<br style="line-height: normal; ">Job job = new Job(conf, "tt"); //利用job取代了jobclient<br style="line-height: normal; ">job.setJarByClass(tt.class);<br style="line-height: normal; ">Path in = new Path(args[0]);<br style="line-height: normal; ">Path out = new Path(args[1]);<br style="line-height: normal; ">FileInputFormat.setInputPaths(job, in);<br style="line-height: normal; ">FileOutputFormat.setOutputPath(job, out);<br style="line-height: normal; "><span style="line-height: normal; color: rgb(255, 0, 0); ">job.setMapperClass(MapClass.class);</span><br style="line-height: normal; "><span style="line-height: normal; color: rgb(255, 0, 0); ">job.setReducerClass(Reduce.class);</span><br style="line-height: normal; "><span style="line-height: normal; color: rgb(255, 0, 0); ">job.setInputFormatClass(TextInputFormat.class);</span><br style="line-height: normal; "><span style="line-height: normal; color: rgb(255, 0, 0); ">job.setOutputFormatClass(TextOutputFormat.class);</span><br style="line-height: normal; "><span style="line-height: normal; color: rgb(255, 0, 0); ">job.setOutputKeyClass(Text.class);</span><br style="line-height: normal; "><span style="line-height: normal; color: rgb(255, 0, 0); ">job.setOutputValueClass(Text.class);&nbsp;</span>&nbsp;//此处如果不进行设置,系统会抛出异常,还要记住新旧API不能混用<br style="line-height: normal; ">System.exit(job.waitForCompletion(true)?0:1);<br style="line-height: normal; ">return 0;<br style="line-height: normal; ">}<br style="line-height: normal; ">public static void main(String[] args) throws Exception {<br style="line-height: normal; ">int res = ToolRunner.run(new Configuration(), new tt(), args);&nbsp;&nbsp;&nbsp; //调用新的类的方法免除配置的相关琐碎的细节<br style="line-height: normal; ">System.exit(res);<br style="line-height: normal; ">}<br style="line-height: normal; ">}</p><p style="line-height: normal; ">上面的代码在eclipse中是可以运行的,但是输入文件是hadoop in action中的文件cite75_99.TXT,</p><p style="line-height: normal; ">格式如下:</p><p style="line-height: normal; ">[root@asus input]# head -n 5 cite75_99.txt&nbsp;<br style="line-height: normal; ">"CITING","CITED"<br style="line-height: normal; ">3858241,956203<br style="line-height: normal; ">3858241,1324234<br style="line-height: normal; ">3858241,3398406<br style="line-height: normal; ">3858241,3557384</p><p style="line-height: normal; ">我写的这个例子开始就是这样报错<span style="line-height: normal; "><span style="line-height: normal; ">org</span>.<span style="line-height: normal; ">apache</span>.<span style="line-height: normal; ">hadoop</span>.<span style="line-height: normal; ">io</span>.<span style="line-height: normal; ">LongWritable</span>&nbsp;<span style="line-height: normal; ">cannot</span>&nbsp;</span><br style="line-height: normal; "><span style="line-height: normal; "><span style="line-height: normal; ">be</span>&nbsp;<span style="line-height: normal; ">cast</span>&nbsp;<span style="line-height: normal; ">to</span>&nbsp;<span style="line-height: normal; ">org</span>.<span style="line-height: normal; ">apache</span>.<span style="line-height: normal; ">hadoop</span>.<span style="line-height: normal; ">io</span>.<span style="line-height: normal; ">Text</span>&nbsp;然后按照上面的程序修改调用了新的API 就能够有效的将key的类型设置成Text,我用红颜色标记的部分是必须要这样写的 因为设置Text必须要在map reduce 和conf中同时设置才管用。我的邮箱是shenyanxxxy@qq.com 如果有hadoop的兴趣爱好者可以联系我 我们共同来商讨。</span></p></span></div><div><br></div>
您需要登录后才可以回帖 登录 | 注册

本版积分规则 发表回复

  

北京盛拓优讯信息技术有限公司. 版权所有 京ICP备16024965号-6 北京市公安局海淀分局网监中心备案编号:11010802020122 niuxiaotong@pcpop.com 17352615567
未成年举报专区
中国互联网协会会员  联系我们:huangweiwei@itpub.net
感谢所有关心和支持过ChinaUnix的朋友们 转载本站内容请注明原作者名及出处

清除 Cookies - ChinaUnix - Archiver - WAP - TOP