- 论坛徽章:
- 0
|
这里省略hadoop的介绍,直接介绍安装步骤,按照这步骤就能克隆搭建一个实例。<br>角色列表:<br>namenode & jobtracker 192.168.237.13<br>datanode & tasktracker 192.168.237.74<br>datanode & tasktracker 192.168.239.128<br><br>#useradd hadoop<br>download hadoop-0.20.2.tar.gz http://mirror.bjtu.edu.cn/apache/hadoop/core/hadoop-0.20.2/<br>#mkdir /data/hadoop<br>#tar -zxvf hadoop-0.20.2.tar.gz<br>#chown -R hadoop:hadoop hadoop-0.20.2 hadoop<br>解决无密码登录问题<br>#./ssh_nopasswd.sh client && ./ssh_nopasswd.sh server 按需修改用户和路径<br><a href=".http://blog.chinaunix.net/attachment/attach/22/27/07/73222707737ff3e4021253d530696d46f60467d238.zip" target="_blank" target="_blank"><img src="/blog/image/attachicons/zip.gif" align="absmiddle" border="0"> ssh_nopasswd.zip </a> <br>----------------------------<br>以下四个文件的配置,在一台机上编辑好后,传到其它机器上,面前重复编辑。<br>相关文件配置:<br>core-site.xml 配置namenode jobtracker基本信息<br>主要配置<br>fs.default.name:URI of NameNode<br>mapred.job.tracker:jobtracker ip 和 端口<br>hadoop.tmp.dir:hadoop临时目录<br>dfs.name.dir:name table存储路径<br>dfs.data.dir:namenode数据块配置<br>dfs.replication:副本数<br><br>PS:<br>我的host中进行了如下设置:<br>192.168.237.13 hadoop-237-13.pconline.ctc hadoop-237-13<br>192.168.237.74 hadoop-237-74.pconline.ctc hadoop-237-74<br>192.168.239.128 hadoop-239-128.pconline.ctc hadoop-239-128<br><br>例子:<br><property><br><name>fs.default.name</name><br><value>hdfs://hadoop-237-13:9000</value><br><description>The name of the default file system. Either the literal string "local" or a host:port for DFS.</description><br></property><br><br><br><br><property><br><name>mapred.job.tracker</name><br><value>192.168.237.13:9001</value><br><description>The host and port that the MapReduce job tracker runs at. If "local", then jobs are run in-process as a single map and<br>reduce task.</description><br></property><br><br><property><br><name>hadoop.tmp.dir</name><br><value>/data/hadoop/tmp</value><br><description>A base for other temporary directories.</description><br></property><br><br><property><br><name>dfs.name.dir</name><br><value>/data/hadoop/filesystem/name</value><br><description>Determines where on the local filesystem the DFS name node should store the name table. If this is a comma-delimited list of directories then the name table is replicated in all of the directories, for redundancy. </description><br></property><br><br><property><br><name>dfs.data.dir</name><br><value>/data/hadoop/filesystem/data</value><br><description>Determines where on the local filesystem an DFS data node should store its blocks. If this is a comma-delimited list of directories, then data will be stored in all named directories, typically on different devices. Directories that do not exist are i<br>gnored.</description><br></property><br><br><property><br><name>dfs.replication</name><br><value>2</value><br><description>Default block replication. The actual number of replications can be specified when the file is created. The default isused if replication is not specified in create time.</description><br><br>mapred-site.xml <br>配置map reduce 的一些细节信息<br>看description进行配置就行<br><property><br><name>mapred.job.tracker</name><br><value>192.168.237.13:9001</value><br><description>The host and port that the MapReduce job tracker runs at. If "local", then jobs are run in-process as a single map and reduce task.</description><br></property><br><br><property><br><name>mapred.tasktracker.map.tasks.maximum</name><br><value>2</value><br><description>The maximum number of map tasks that will be run simultaneously by a task tracker.</description><br></property><br><br><br><property><br><name>mapred.tasktracker.reduce.tasks.maximum</name><br><value>2</value><br><description>The maximum number of reduce tasks that will be run simultaneously by a task tracker.</description><br></property><br><br><property><br><name>mapred.map.tasks</name><br><value>2</value><br><description>The default number of map tasks per job. Ignored when mapred.job.tracker is "local".</description><br></property><br><br><property><br><name>mapred.reduce.tasks</name><br><value>2</value><br><description>The default number of reduce tasks per job. Typically set to 99% of the cluster's reduce capacity, so that if a node fails the reduces can still be executed in a single wave. Ignored when mapred.job.tracker is "local".</description><br></property><br><br><property><br><name>mapred.userlog.retain.hours</name><br><value>2</value><br><description>The maximum time, in hours, for which the user-logs are to be retained.</description><br></property><br><br><property><br> <name>mapred.child.java.opts</name><br> <value>-Xmx700M -server</value><br></property><br><br><property><br> <name>mapred.map.max.attempts</name><br> <value>800</value><br> <description>Expert: The maximum number of attempts per map task. In other words, framework will try to execute a map task these many number of times before giving up on it.</description><br></property><br><br><property><br> <name>mapred.reduce.max.attempts</name><br> <value>800</value><br> <description>Expert: The maximum number of attempts per reduce task. In other words, framework will try to execute a reduce task these many number of times before giving up on it.</description><br></property><br><br><property><br> <name>mapred.max.tracker.failures</name><br> <value>800</value><br> <description>The number of task-failures on a tasktracker of a given job after which new tasks of that job aren't assigned to it. </description><br></property><br><br><property><br> <name>mapred.task.timeout</name><br> <value>60000000</value><br> <description>The number of milliseconds before a task will be terminated if it neither reads an input, writes an output, nor updates its status string. </description><br></property><br><br>masters secondarynamenode: 这里只为测试只做在namenode本机上了<br>里面信息为 192.168.237.13<br><br>slaves:<br>里面信息为:<br>192.168.237.74<br>192.168.239.128<br><br>因为我机器中没配置JAVA_HOME环境变量,所以在hadoop-env.sh文件中进行设置<br>export JAVA_HOME=/usr/java/jdk1.6.0_22<br><br>----------------------------<br>#cd /datat/hadoop && su hadoop<br>$bin/hadoop namenode -format<br>$bin/start-all.sh<br>$ bin/hadoop dfsadmin -report<br>显示如下信息,为成功。<br>Configured Capacity: 107981234176 (100.57 GB)<br>Present Capacity: 101694681088 (94.71 GB)<br>DFS Remaining: 101694607360 (94.71 GB)<br>DFS Used: 73728 (72 KB)<br>DFS Used%: 0%<br>Under replicated blocks: 1<br>Blocks with corrupt replicas: 0<br>Missing blocks: 0<br><br>-------------------------------------------------<br>Datanodes available: 2 (2 total, 0 dead)<br><br>Name: 192.168.239.128:50010<br>Decommission Status : Normal<br>Configured Capacity: 53558603776 (49.88 GB)<br>DFS Used: 36864 (36 KB)<br>Non DFS Used: 3143274496 (2.93 GB)<br>DFS Remaining: 50415292416(46.95 GB)<br>DFS Used%: 0%<br>DFS Remaining%: 94.13%<br>Last contact: Fri Aug 05 12:19:33 CST 2011<br><br><br>Name: 192.168.237.74:50010<br>Decommission Status : Normal<br>Configured Capacity: 54422630400 (50.69 GB)<br>DFS Used: 36864 (36 KB)<br>Non DFS Used: 3143278592 (2.93 GB)<br>DFS Remaining: 51279314944(47.76 GB)<br>DFS Used%: 0%<br>DFS Remaining%: 94.22%<br>Last contact: Fri Aug 05 12:19:33 CST 2011 <br><br><br><br>安装过程中的一些错误:<br>1./data/hadoop 没有做chown 提示没权限<br>2./data/hadoop中手工创建了tmp data相关目录,提示<br>2011-08-05 09:40:34,559 INFO org.apache.hadoop.mapred.JobTracker: problem cleaning system directory: hdfs://hadoop-237-13:9000/data/hadoop/tmp/mapred/system<br>org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot delete /data/hadoop/tmp/mapred/system. Name node is in safe mode.<br><br>如果遇到错误,多查看hadoop_home/logs下来相关信息<br><br>参考信息:<br>http://hadoop.apache.org/common/docs/r0.21.0/cluster_setup.html<br><br> |
|