- 论坛徽章:
- 6
|
本帖最后由 wang290 于 2013-08-19 22:19 编辑
软件下载Oracle Big Data Connectors:ODCH 下载地址:
http://www.oracle.com/technetwork/bdc/big-data-connectors/downloads/index.html
![]()
Jave SE 下载地址:
http://www.oracle.com/technetwork/java/javase/downloads/jdk6u38-downloads-1877406.html
![]()
Oracle11g下载地址:
![]()
Oracle Enterprise Linux下载地址:需要注册oracle帐号才能下载:
https://edelivery.oracle.com/linux
![]()
安装Oracle服务器在虚拟机或者物理机下安装都可以:
![]()
基本上就是一顿下一步。
然后安装oracle11g,这个真是折腾了好几天
最后按照这篇文章操作,安装成功。
http://wenku.baidu.com/view/fc804254be23482fb4da4c63.html
我觉得这块最好的文章就是oracle的官方给的教程:
http://docs.oracle.com/cd/E11882_01/install.112/e24326/toc.htm
安装hadoop
http://blog.csdn.net/lichangzai/article/details/8206834
遇到问题1:
ssh 免密码登录设置后还是提示需要输入密码
解决方法如下:
![]()
chmod 700 ~/.ssh/chmod 700 /home/userNamechmod 600 ~/.ssh/authorized_keys![]()
遇到的问题2:
Agent admitted failure to sign using the key
解決办法:
![]()
应用 ssh-add 指令將私钥加进来(名称可能不是 id_rsa)# ssh-add ~/.ssh/id_rsa![]()
Hadoop/Oracle架构机器部署情况:
主机名 | IP | 系统版本 | Oracle node | Hadoop node | hadoop进程名 | Master | 192.168.74.153 | OEL6.4 | Oracle11.2.0.2 | master | namenode,jobtracker | Node1 | 192.168.2.147 | Ubuntu12.04 | 无 | slave | datanode,tasktracker | Node2 | 192.168.2.148 | Ubuntu12.04 | 无 | slave | datanode,tasktracker |
主要环境变量: 变量名 | 变量值 | ORACLE_HOME | /oracle/11g | ORACLE_SID | orcl | HADOOP_HOME | /home/hadoop/hadoop-1.2 | HADOOP_CONF_DIR | $HADOOP_HOME/conf | OSCH_HOME | /opt/odch/orahdfs-2.2.0 | ORAHDFS_JAR | $OSCH_HOME/jlib/orahdfs.jar | HDFS_BIN_PATH | /opt/odch/orahdfs-2.2.0 | HADOOP_CLASSPATH | $OSCH_HOME/jlib/* | 按照以上配置。
安装ODCH并设置环境变量在opt下建立文件夹odch,将文件传到这个目录下解压缩。
/opt/odch
解压缩:
unzip orahdfs-2.2.0.zip
然后添加hadoop\oracle\ODCH主要要环境变量
进入hadoop用户主目录,vi .bash_profile
我的配置如下(可参考):
![]()
export ORACLE_BASE=/opt/oracleexport ORACLE_HOME=$ORACLE_BASE/11gexport ORACLE_SID=orclexport PATH=/usr/sbin:$PATHexport PATH=$ORACLE_HOME/bin:$PATHexport LD_LIBRARY_PATH=$ORACLE_HOME/lib:/lib64:/usr/lib64:/usr/local/lib64:/usr/X11R6/lib64/export CLASSPATH=$ORACLE_HOME/JRE:$ORACLE_HOME/jlib:$ORACLE_HOME/rdbms/jlibexport HADOOP_HOME=/home/hadoop/hadoop/hadoop-1.2.0PATH=$PATH:$HOME/bin:/home/hadoop/hadoop/jdk1.7.0_21/bin:$HADOOP_HOME/binexport HADOOP_CONF_DIR=$HADOOP_HOME/confexport OSCH_HOME=/opt/odch/orahdfs-2.2.0export ORAHDFS_JAR=$OSCH_HOME/jlib/orahdfs.jarexport HDFS_BIN_PATH=$OSCH_HOME/binexport JAVA_HOME=/home/hadoop/hadoop/jdk1.7.0_21export PATH![]()
修改用户组属性:
说明:因为之前的hadoop安装在hadoop系统用户下,而oracle安装在了oracle系统用户下,为了操作方便,使hadoop用户有操作oracle操作库的权限,现在修改hadoop系统用户属组,添加到和oracle同样的用户组下。
--查看之前的用户属组
![]()
[hadoop@master ~]$ id hadoopuid=501(hadoop) gid=503(hadoop) groups=503(hadoop)[hadoop@master ~]$ id oracleuid=500(oracle) gid=501(oinstall) groups=501(oinstall),502(dba)![]()
--修改用户属组
![]()
vi /etc/group--再查看用户组属性[hadoop@master ~]$ id hadoopuid=501(hadoop) gid=503(hadoop) groups=503(hadoop),501(oinstall),502(dba)![]()
然后重启一下计算机。
启动oracle并测试连接oracle启动数据库
![]()
[hadoop@master Desktop]$ sqlplus sys/oracle as sysdbaSQL*Plus: Release 11.2.0.1.0 Production on Sat Aug 17 11:06:08 2013Copyright (c) 1982, 2009, Oracle. All rights reserved.Connected to an idle instance.//启动数据库SQL> startup;ORACLE instance started.Total System Global Area 1653518336 bytesFixed Size 2213896 bytesVariable Size 973080568 bytesDatabase Buffers 671088640 bytesRedo Buffers 7135232 bytesDatabase mounted.Database opened.//显示数据库实例名SQL> show parameter db_name;NAME TYPE VALUE------------------------------------ ----------- ------------------------------db_name string orcl![]()
然后启动oracle的listener
![]()
[hadoop@master Desktop]$ lsnrctl startLSNRCTL for Linux: Version 11.2.0.1.0 - Production on 17-AUG-2013 11:21:33Copyright (c) 1991, 2009, Oracle. All rights reserved.Starting /opt/oracle/11g/bin/tnslsnr: please wait...TNSLSNR for Linux: Version 11.2.0.1.0 - ProductionSystem parameter file is /opt/oracle/11g/network/admin/listener.oraLog messages written to /opt/oracle/diag/tnslsnr/master/listener/alert/log.xmlListening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=EXTPROC1521)))Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=master)(PORT=1521)))Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=EXTPROC1521)))STATUS of the LISTENER------------------------Alias LISTENERVersion TNSLSNR for Linux: Version 11.2.0.1.0 - ProductionStart Date 17-AUG-2013 11:21:33Uptime 0 days 0 hr. 0 min. 0 secTrace Level offSecurity ON: Local OS AuthenticationSNMP OFFListener Parameter File /opt/oracle/11g/network/admin/listener.oraListener Log File /opt/oracle/diag/tnslsnr/master/listener/alert/log.xmlListening Endpoints Summary... (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=EXTPROC1521))) (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=master)(PORT=1521)))The listener supports no servicesThe command completed successfully//检查一下状态[hadoop@master Desktop]$ lsnrctl statLSNRCTL for Linux: Version 11.2.0.1.0 - Production on 17-AUG-2013 11:21:53Copyright (c) 1991, 2009, Oracle. All rights reserved.Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=EXTPROC1521)))STATUS of the LISTENER------------------------Alias LISTENERVersion TNSLSNR for Linux: Version 11.2.0.1.0 - ProductionStart Date 17-AUG-2013 11:21:33Uptime 0 days 0 hr. 0 min. 19 secTrace Level offSecurity ON: Local OS AuthenticationSNMP OFFListener Parameter File /opt/oracle/11g/network/admin/listener.oraListener Log File /opt/oracle/diag/tnslsnr/master/listener/alert/log.xmlListening Endpoints Summary... (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=EXTPROC1521))) (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=master)(PORT=1521)))The listener supports no servicesThe command completed successfully![]()
配置HDFS_STREAM和建表进入目录/opt/odch/orahdfs-2.2.0/bin,用vi打开hdfs_stream文件,加入如下配置:
![]()
export HADOOP_HOME=/home/hadoop/hadoop/hadoop-1.2.0export OSCH_HOME=/opt/odch/orahdfs-2.2.0export PATH=/usr/bin:/bin:$HADOOP_HOME/bin![]()
确保oracle用户对$ODCH_LOG_DIR/logs目录有读写权限
因为Oracle用户需要在 {$ODCH_LOG_DIR} 目录中创建log/bad文件,所以要确保oracle用户对$ODCH_LOG_DIR/log目录有读写权限
--修改目录权限,测试读写文件
![]()
[root@gc opt]# chmod -R 777 odch/[root@gc opt]# su - oracle[oracle@gc ~]$ cd /opt/odch/orahdfs-2.2.0/log/[oracle@gc log]$ touch ora_access_test[oracle@gc log]$ rm ora_access_test![]()
配置操作系统目录和数据库的Directory对象
--创建所用的系统目录 ![]()
[root@gc ~]# mkdir -p /opt/odch/orahdfs-2.2.0/logs[root@gc ~]# mkdir -p /opt/odch/orahdfs-2.2.0/extdir[root@gc ~]# chmod 777 /opt/odch/orahdfs-2.2.0/logs[root@gc ~]# chmod 777 /opt/odch/orahdfs-2.2.0/extdir![]()
--创建oracle Directory对象
![]()
sqlplus "/as sysdba"SQL> create or replace directory ODCH_LOG_DIR as '/opt/odch/orahdfs-2.2.0/logs';Directory created.SQL> grant read, write on directory ODCH_LOG_DIR to SCOTT;Grant succeeded.SQL> create or replace directory ODCH_DATA_DIR as '/opt/odch/orahdfs-2.2.0/extdir';Directory created.SQL> grant read, write on directory ODCH_DATA_DIR to SCOTT;Grant succeeded.SQL> create or replace directory HDFS_BIN_PATH as '/opt/odch/orahdfs-2.2.0/bin';Directory created.SQL> grant execute on directory HDFS_BIN_PATH to SCOTT;Grant succeeded.SQL> grant read, write on directory HDFS_BIN_PATH to SCOTT;Grant succeeded.![]()
--目录对象说明
HDFS_BIN_PATH::hdfs_stream脚本所在目录.
HDFS_DATA_DIR:用来存放“位置文件”(location files)的目录。“位置文件”(location files) 是一个配置文件,里面包含HDFS的文件路径/文件名以及文件编码格式。
ODCH_LOG_DIR:Oracle用来存放外部表的log/bad等文件的目录.
创建oracle外部表:
![]()
conn scott/tiger在登录的时候提示:ORA-28000: the account is lockedALTER USER scott ACCOUNT UNLOCK;![]()
连上之后然后创建表:
![]()
CREATE TABLE odch_ext_table ( ID NUMBER ,OWNER VARCHAR2(128) ,NAME VARCHAR2(128) ,MODIFIED DATE ,Val NUMBER ) ORGANIZATION EXTERNAL (TYPE oracle_loader DEFAULT DIRECTORY ODCH_DATA_DIR ACCESS PARAMETERS ( records delimited by newline preprocessor HDFS_BIN_PATH:hdfs_stream badfile ODCH_LOG_DIR:'odch_ext_table%a_%p.bad' logfile ODCH_LOG_DIR:'odch_ext_table%a_%p.log' fields terminated by ',' OPTIONALLY ENCLOSED BY '"' missing field values are null ( ID DECIMAL EXTERNAL, OWNER CHAR(200), NAME CHAR(200), MODIFIED CHAR DATE_FORMAT DATE MASK "YYYY-MM-DD HH24:MI:SS", Val DECIMAL EXTERNAL ) ) LOCATION ('Tmpdata.csv') ) PARALLEL REJECT LIMIT UNLIMITED;![]()
准备示例文件--示例文件内容
在附件中Tmpdata.csv
1,SYS,ORA$BASE,15-AUG-09,112,SYS,DUAL,15-AUG-09,1163,PUBLIC,DUAL,15-AUG-09,334,PUBLIC,MAP_OBJECT,15-AUG-09,55
文件的准备过程:
Tmpdata.csv文件是我们通过all_objects生成的,SQL脚本为:select rownum,owner,object_name,created,data_object_id from all_objects
-- 建立存储过程 CREATE OR REPLACE PROCEDURE SQL_TO_CSV ( P_QUERY IN VARCHAR2, -- PLSQL文 P_DIR IN VARCHAR2, -- 导出的文件放置目录 P_FILENAME IN VARCHAR2 -- CSV名 ) IS L_OUTPUT UTL_FILE.FILE_TYPE; L_THECURSOR INTEGER DEFAULT DBMS_SQL.OPEN_CURSOR; L_COLUMNVALUE VARCHAR2(4000); L_STATUS INTEGER; L_COLCNT NUMBER := 0; L_SEPARATOR VARCHAR2(1); L_DESCTBL DBMS_SQL.DESC_TAB; P_MAX_LINESIZE NUMBER := 32000; BEGIN --OPEN FILE L_OUTPUT := UTL_FILE.FOPEN(P_DIR, P_FILENAME, 'W', P_MAX_LINESIZE); --DEFINE DATE FORMAT EXECUTE IMMEDIATE 'ALTER SESSION SET NLS_DATE_FORMAT=''YYYY-MM-DD HH24:MI:SS'''; --OPEN CURSOR DBMS_SQL.PARSE( L_THECURSOR, P_QUERY, DBMS_SQL.NATIVE ); DBMS_SQL.DESCRIBE_COLUMNS( L_THECURSOR, L_COLCNT, L_DESCTBL ); --DUMP TABLE COLUMN NAME FOR I IN 1 .. L_COLCNT LOOP UTL_FILE.PUT( L_OUTPUT, L_SEPARATOR || '”' || L_DESCTBL(I).COL_NAME || '”' ); DBMS_SQL.DEFINE_COLUMN( L_THECURSOR, I, L_COLUMNVALUE, 4000 ); L_SEPARATOR := ','; END LOOP; UTL_FILE.NEW_LINE( L_OUTPUT ); --EXECUTE THE QUERY STATEMENT L_STATUS := DBMS_SQL.EXECUTE(L_THECURSOR); --DUMP TABLE COLUMN VALUE WHILE ( DBMS_SQL.FETCH_ROWS(L_THECURSOR) > 0 ) LOOP L_SEPARATOR := ''; FOR I IN 1 .. L_COLCNT LOOP DBMS_SQL.COLUMN_VALUE( L_THECURSOR, I, L_COLUMNVALUE ); UTL_FILE.PUT( L_OUTPUT, L_SEPARATOR || '”' || TRIM(BOTH ' ' FROM REPLACE(L_COLUMNVALUE,'”','””')) || '”'); L_SEPARATOR := ','; END LOOP; UTL_FILE.NEW_LINE( L_OUTPUT ); END LOOP; --CLOSE CURSOR DBMS_SQL.CLOSE_CURSOR(L_THECURSOR); --CLOSE FILE UTL_FILE.FCLOSE( L_OUTPUT ); EXCEPTION WHEN OTHERS THEN RAISE; END; / -- 创建存放文件的目录 CREATE OR REPLACE DIRECTORY MYDIR AS 'C:\'; -- 执行块 begin sql_to_csv('select * from au_jxs ','MYDIR','EXAMPLE.CSV'); end; /
其中路径文件名等可根据实际情况修改。
启动hadoop
[hadoop@master ~]$ start-all.sh Warning: $HADOOP_HOME is deprecated.starting namenode, logging to /home/hadoop/hadoop/hadoop-1.2.0/libexec/../logs/hadoop-hadoop-namenode-master.outnode1: starting datanode, logging to /home/hadoop/hadoop/hadoop-1.2.0/libexec/../logs/hadoop-hadoop-datanode-node1.outnode2: starting datanode, logging to /home/hadoop/hadoop/hadoop-1.2.0/libexec/../logs/hadoop-hadoop-datanode-node2.outmaster: starting secondarynamenode, logging to /home/hadoop/hadoop/hadoop-1.2.0/libexec/../logs/hadoop-hadoop-secondarynamenode-master.outstarting jobtracker, logging to /home/hadoop/hadoop/hadoop-1.2.0/libexec/../logs/hadoop-hadoop-jobtracker-master.outnode1: starting tasktracker, logging to /home/hadoop/hadoop/hadoop-1.2.0/libexec/../logs/hadoop-hadoop-tasktracker-node1.outnode2: starting tasktracker, logging to /home/hadoop/hadoop/hadoop-1.2.0/libexec/../logs/hadoop-hadoop-tasktracker-node2.out
--先在Hadoop中建立一个目录,然后把empxt*.dat放入该目录中
[hadoop@master ~]$ hadoop dfs -mkdir odchWarning: $HADOOP_HOME is deprecated.[hadoop@master ~]$ hadoop dfs -put Tmpdata.csv odchWarning: $HADOOP_HOME is deprecated.[hadoop@master ~]$ hadoop dfs -ls odchWarning: $HADOOP_HOME is deprecated.[hadoop@master ~]$ hadoop dfs -ls odchWarning: $HADOOP_HOME is deprecated.Found 1 items-rw-r--r-- 2 hadoop supergroup 113 2013-08-18 21:41 /user/hadoop/odch/Tmpdata.csv
生成位置文件--执行下面的命令
export HADOOP_CLASSPATH="$OSCH_HOME/jlib/*"
[hadoop@master ~]$ hadoop jar \> ${ORAHDFS_JAR} oracle.hadoop.hdfs.exttab.ExternalTable \> -D oracle.hadoop.hdfs.exttab.tableName=odch_ext_table \> -D oracle.hadoop.hdfs.exttab.datasetPaths=odch \> -D oracle.hadoop.hdfs.exttab.datasetRegex=Tmpdata.csv \> -D oracle.hadoop.hdfs.exttab.connection.url="jdbc:oracle:thin:@//192.168.74.153:1521/orcl" \> -D oracle.hadoop.hdfs.exttab.connection.user=SCOTT \> -publishWarning: $HADOOP_HOME is deprecated.DEPRECATED: The class oracle.hadoop.hdfs.exttab.ExternalTable is deprecated.It is replaced by oracle.hadoop.exttab.ExternalTable.Oracle SQL Connector for HDFS Release 2.2.0 - ProductionCopyright (c) 2011, 2013, Oracle and/or its affiliates. All rights reserved.[Enter Database Password:]The publish command succeeded.ALTER TABLE "SCOTT"."ODCH_EXT_TABLE" LOCATION( 'osch-20130818094259-1319-1');The following location files were created.osch-20130818094259-1319-1 contains 1 URI, 113 bytes 113 hdfs://master:9000/user/hadoop/odch/Tmpdata.csvThe following location files were deleted.
参数说明:
ExternalTable:使用hadoop ExternalTable命令工具
-D:指定相关参数
tableName:外部表名字
datasetPaths:源数据存放路径(HDFS)
datasetRegex:数据源格式
connection.url:oracle数据库连接串
connection.user:数据库用户名scott
命令执行后还要输入用户名密码:oracle
修改参数:
ALTER TABLE "SCOTT"."ODCH_EXT_TABLE"LOCATION('osch-20130818094259-1319-1');
查看位置文件内容和外链表信息进入extdir目录:
[hadoop@master ~]$ cd /opt/odch/orahdfs-2.2.0/extdir/[hadoop@master extdir]$ lsosch-20130818083220-7675-1 osch-20130818084052-4686-1 osch-20130818085941-2623-1 osch-20130818094259-1319-1[hadoop@master extdir]$ more osch-20130818094259-1319-1<?xml version="1.0" encoding="UTF-8" standalone="yes"?><locationFile> <header> <version>1.0</version> <fileName>osch-20130818094259-1319-1</fileName> <createDate>2013-08-18T21:15:40</createDate> <publishDate>2013-08-18T09:42:59</publishDate> <productName>Oracle SQL Connector for HDFS Release 2.2.0 - Production</productName> <productVersion>2.2.0</productVersion> </header> <uri_list> <uri_list_item size="113" compressionCodec="">hdfs://master:9000/user/hadoop/odch/Tmpdata.csv</uri_list_item> </uri_list></locationFile>
可以看出上面有了到hadoop的指向信息。
查看外链到hadoop表信息:
在scott用户查看:
SQL> set line 150 pages 1000; SQL> col owner for a10 SQL> col name for a20 SQL> select * from odch_ext_table; ID OWNER NAME MODIFIED VAL---------- ---------- -------------------- --------- ---------- 1 SYS ORA$BASE 09-AUG-15 11 2 SYS DUAL 09-AUG-15 116 3 PUBLIC DUAL 09-AUG-15 33 4 PUBLIC MAP_OBJECT 09-AUG-15 55
这些就是查询到的hadoop中的数据信息。
http://www.cnblogs.com/skyme/p/3267789.html
|
|