- 论坛徽章:
- 0
|
我刚开始学这个,map和reduce在linux下用管道模拟没有问题。
map不需要输入的文件,实在里面直接打开的,这样可以么?
帮忙看看我的错误是map错了还是执行的时候把没有把需要的文件拷到hdfs的原因啊?
#!/usr/bin/env python
# usage: "train" or "submit"
import sys
id2indx = {}
tot_num = 0
indx_list = []
#print 'build index for items'
iprof = open('item.txt')
for line in iprof:
iid = int(line.split()[0])
if not iid in id2indx:
id2indx[iid] = tot_num
indx_list.append((iid, tot_num))
print '%s\t%d' % (id2indx[iid],iid)
tot_num += 1
#for (k, v) in self.indx_list:
#print '------- %d -> %d\n' % (k, v)
iprof.close()
#print 'build index for users'
uprof = open('user_profile.txt')
for line in uprof:
uid = int(line.split()[0])
if not uid in id2indx:
id2indx[uid] = tot_num
indx_list.append((uid, tot_num))
tot_num += 1
print '%s\t%d' % (id2indx[uid],uid)
uprof.close()
每次提交都失败因为不需要输入和输出的文件,我随便建了个空文件hello。
hadoop jar \$HADOOP_HOME/contrib/streaming/hadoop-0.20.2-streaming.jar -mapper ./python/map.py -reducer ./python/reduce.py |
|