免费注册 查看新帖 |

Chinaunix

  平台 论坛 博客 文库
最近访问板块 发新帖
查看: 3651 | 回复: 0
打印 上一主题 下一主题

[Hive] How to store Hive's metadata in Oracle 10g [复制链接]

论坛徽章:
0
跳转到指定楼层
1 [收藏(0)] [报告]
发表于 2010-01-17 00:03 |只看该作者 |倒序浏览

                               
How to store Hive's metadata in Oracle
hive by default uses an embedded derby database. In real-world scenarios which multiple hive queries are executed using multiple session, a database server like MySQL or [color="#ff0000"]Oracle or derby server is required. For my example, i will use Oracle. Before doing that, make sure cluster is down.
1)create oracle user
               
               
               
                sql> create user hive identified by hive identified by hive
     default tablespace hive;
sql> grant dba to hive;    [color="#ff0000"]          [notice [color="#ff0000"]security]
2)edit hive/conf/hive-default.xml 
     1  
     2  
     3
     4  
     5
     6  
     7  
     8  
     9  
    10  
    11
    12  
    13  
    14    mapred.reduce.tasks
    15    -1
    16      The default number of reduce tasks per job.  Typically set
    17    to a prime close to the number of available hosts.  Ignored when
    18    mapred.job.tracker is "local". Hadoop set this to 1 by default, whereas hive uses -1 as its default value.
    19    By setting this property to -1, Hive will automatically figure out what should be the number of reducers.
    20   
    21  
    22
    23  
    24    hive.exec.reducers.bytes.per.reducer
    25    1000000000
    26    size per reducer.The default is 1G, i.e if the input size is 10G, it will use 10 reducers.
    27  
    28
    29  
    30    hive.exec.reducers.max
    31    999
    32    max number of reducers will be used. If the one
    33          specified in the configuration parameter mapred.reduce.tasks is
    34          negative, hive will use this one as the max number of reducers when
    35          automatically determine number of reducers.
    36  
    37
    38  
    39    hive.exec.scratchdir
    40    /tmp/hive-${user.name}
    41    Scratch space for Hive jobs
    42  
    43
    44  
    45    hive.test.mode
    46    false
    47    whether hive is running in test mode. If yes, it turns on sampling and prefixes the output tablename
    48  
    49
    50  
    51    hive.test.mode.prefix
    52    test_
    53    if hive is running in test mode, prefixes the output table by this string
    54  
    55
    56  
    57  
    58  
    59  
    60  
    61  
    62  
    63  
    64    hive.test.mode.samplefreq
    65    32
    66    if hive is running in test mode and table is not bucketed, sampling frequency
    67  
    68
    69  
    70    hive.test.mode.nosamplelist
    71   
    72    if hive is running in test mode, dont sample the above comma seperated list of tables
    73  
    74
    75  
    76    hive.metastore.local
    77    [color="#ff0000"]true
    78    controls whether to connect to remove metastore server or open a new metastore server in Hive Client JVM
    79  
    80
    81  
    82    javax.jdo.option.ConnectionURL
    83    [color="#ff0000"]jdbc:oracle:thin:@192.168.1.101:1521/ORCL
    84    JDBC connect string for a JDBC metastore
    85  
    86
    87  
    88    javax.jdo.option.ConnectionDriverName
    89    [color="#ff0000"]oracle.jdbc.driver.OracleDriver
    90    Driver class name for a JDBC metastore
    91  
    92
    93  
    94    javax.jdo.PersistenceManagerFactoryClass
    95    org.datanucleus.jdo.JDOPersistenceManagerFactory
    96    class implementing the jdo persistence
    97  
    98
    99  
   100    javax.jdo.option.DetachAllOnCommit
   101    true
   102    detaches all objects from session so that they can be used after transaction is committed
   103  
   104
   105  
   106    javax.jdo.option.NonTransactionalRead
   107    true
   108    reads outside of transactions
   109  
   110
   111  
   112    javax.jdo.option.ConnectionUserName
   113    hive
   114    username to use against metastore database
   115  
   116
   117  
   118    javax.jdo.option.ConnectionPassword
   119    hive
   120    password to use against metastore database
   121  
   122
   123  
   124    datanucleus.validateTables
   125    false
   126    validates existing schema against code. turn this on if you want to verify existing schema
   127  
   128
   129  
   130    datanucleus.validateColumns
   131    false
   132    validates existing schema against code. turn this on if you want to verify existing schema
   133  
   134
   135  
   136    datanucleus.validateConstraints
   137    false
   138    validates existing schema against code. turn this on if you want to verify existing schema
   139  
   140
   141  
   142    datanucleus.storeManagerType
   143    rdbms
   144    metadata store type
   145  
   146
   147  
   148    datanucleus.autoCreateSchema
   149    [color="#ff0000"]true
   150    creates necessary schema on a startup if one doesn't exist. set this to false, after creating it once
   151  
   152
   153  
   154    datanucleus.autoStartMechanismMode
   155    checked
   156    throw exception if metadata tables are incorrect
   157  
   158
   159  
   160    datancucleus.transactionIsolation
   161    read-committed
   162   
   163  
   164
   165  
   166    datanuclues.cache.level2
   167    true
   168    use a level 2 cache. turn this off if metadata is changed independently of hive metastore server
   169  
   170
   171  
   172    datanuclues.cache.level2.type
   173    SOFT
   174    SOFT=soft reference based cache, WEAK=weak reference based cache.
   175  
   176
   177  
   178    hive.metastore.warehouse.dir
   179    /user/hive/warehouse
   180    location of default database for the warehouse
   181  
   182
   183  
   184    hive.metastore.connect.retries
   185    5
   186    Number of retries while opening a connection to metastore
   187  
   188
   189  
   190    hive.metastore.rawstore.impl
   191    org.apache.hadoop.hive.metastore.ObjectStore
   192    Name of the class that implements org.apache.hadoop.hive.metastore.rawstore interface. This class is used to store and retrieval of raw metadata objects such as table, database
   193  
   194
   195  
   196    hive.default.fileformat
   197    TextFile
   198    Default file format for CREATE TABLE statement. Options are TextFile and SequenceFile. Users can explicitly say CREATE TABLE ... STORED AS  to override
   199  
   200
   201  
   202    hive.map.aggr
   203    true
   204    Whether to use map-side aggregation in Hive Group By queries
   205  
   206
   207  
   208    hive.groupby.skewindata
   209    false
   210    Whether there is skew in data to optimize group by queries
   211  
   212
   213  
   214    hive.groupby.mapaggr.checkinterval
   215    100000
   216    Number of rows after which size of the grouping keys/aggregation classes is performed
   217  
   218
   219  
220    hive.mapred.local.mem
   221    0
   222    For local mode, memory of the mappers/reducers
   223  
   224
   225  
   226    hive.map.aggr.hash.percentmemory
   227    0.5
   228    Portion of total memory to be used by map-side grup aggregation hash table
   229  
   230
   231  
   232    hive.map.aggr.hash.min.reduction
   233    0.5
   234    Hash aggregation will be turned off if the ratio between hash
   235    table size and input rows is bigger than this number. Set to 1 to make sure
   236    hash aggregation is never turned off.
   237  
   238
   239  
   240    hive.optimize.cp
   241    true
   242    Whether to enable column pruner
   243  
   244
   245  
   246    hive.optimize.ppd
   247    true
   248    Whether to enable predicate pushdown
   249  
   250
   251  
   252    hive.optimize.pruner
   253    true
   254    Whether to enable the new partition pruner which depends on predicate pushdown. If this is disabled,
   255    the old partition pruner which is based on AST will be enabled.
   256  
   257
   258  
   259    hive.join.emit.interval
   260    1000
   261    How many rows in the right-most join operand Hive should buffer before emitting the join result.
   262  
   263
   264  
   265    hive.mapred.mode
   266    nonstrict
   267    The mode in which the hive operations are being performed. In strict mode, some risky queries are not allowed to run
   268  
   269
   270  
   271    hive.exec.script.maxerrsize
   272    100000
   273    Maximum number of bytes a script is allowed to emit to standard error (per map-reduce task). This prevents runaway scripts from filling logs partitions to capacity
   274  
   275
   276  
   277    hive.exec.compress.output
   278    false
   279     This controls whether the final outputs of a query (to a local/hdfs file or a hive table) is compressed. The compression codec and other options are determined from hadoop config variables mapred.output.compress*
   280  
   281
   282  
   283    hive.exec.compress.intermediate
   284    false
   285     This controls whether intermediate files produced by hive between multiple map-reduce jobs are compressed. The compression codec and other options are determined from hadoop config variables mapred.output.compress*
   286  
   287
   288  
   289    hive.hwi.listen.host
   290    0.0.0.0
   291    This is the host address the Hive Web Interface will listen on
   292  
   293
   294  
   295    hive.hwi.listen.port
   296    9999
   297    This is the port the Hive Web Interface will listen on
   298  
   299
   300  
   301    hive.hwi.war.file
   302    ${HIVE_HOME}/lib/hive-hwi.war
   303    This is the WAR file with the jsp content for Hive Web Interface
   304  
   305
   306  
   307    hive.exec.pre.hooks
   308   
   309    Pre Execute Hook for Tests
   310  
   311
   312  
   313    hive.merge.mapfiles
   314    true
   315    Merge small files at the end of a map-only job
   316  
   317
   318  
   319    hive.merge.mapredfiles
   320    false
   321    Merge small files at the end of any job(map only or map-reduce)
   322  
   323
   324  
   325    hive.heartbeat.interval
   326    1000
   327    Send a heartbeat after this interval - used by mapjoin and filter operators
   328  
   329
   330  
   331    hive.merge.size.per.task
   332    256000000
   333    Size of merged files at the end of the job
   334  
   335
   336  
   337    hive.script.auto.progress
   338    false
   339    Whether Hive Tranform/Map/Reduce Clause should automatically send progress information to TaskTracker to avoid the task getting killed because of inactivity.  Hive sends progress information when the script is outputting to stderr.  This option removes the need of periodically producing stderr messages, but users should be cautious because this may prevent infinite loops in the scripts to be killed by TaskTracker.  
   340  
   341
   342  
3)download oracle jdbc:-------[color="#ff0000"] ojdbc-14.jar
http://www.oracle.com/technology/software/tech/java/sqlj_jdbc/htdocs/jdbc_10201.html
  
4) test it :
hive> [root@boss ~]# hive
Hive history file=/tmp/root/hive_job_log_root_201001162330_322775424.txt
hive> show tables;
OK
Time taken: 6.008 seconds
hive> create tabl[color="#ff0000"]e test(id bigint);
OK
Time taken: 0.495 seconds
hive>
5)login into oracle,check if autocreate some tables;
[oracle@boss ~]$ sqlplus hive/hive@192.168.1.101:1521/spgjmega
SQL*Plus: Release 10.2.0.1.0 - Production on Sat Jan 16 23:57:37 2010
Copyright (c) 1982, 2005, Oracle.  All rights reserved.
Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Production
With the Partitioning, OLAP and Data Mining options
SQL>  show user;
USER is "HIVE"
SQL> select * from tab;
TNAME                          TABTYPE  CLUSTERID
------------------------------ ------- ----------
NUCLEUS_TABLES                 TABLE
A                              TABLE
DBS                            TABLE
SEQUENCE_TABLE                 TABLE
SERDES                         TABLE
TBLS                           TABLE
SDS                            TABLE
PARTITION_KEYS                 TABLE
COLUMNS                        TABLE
BUCKETING_COLS                 TABLE
SD_PARAMS                      TABLE
TNAME                          TABTYPE  CLUSTERID
------------------------------ ------- ----------
SORT_COLS                      TABLE
SERDE_PARAMS                   TABLE
TABLE_PARAMS                   TABLE
14 rows selected.
SQL> set line 1000
SQL> r
  1* select * from TBLS
    TBL_ID CREATE_TIME      DB_ID LAST_ACCESS_TIME OWNER                                                                                RETENTION       SD_ID TBL_NAME
---------- ----------- ---------- ---------------- -------------------------------------------------------------------------------------------------------------------------------- ---------- ---------- --------------------------------------------------------------------------------------------------------------------------------
         1  1263655926          1                0 root                                                                                 0           1 test
SQL>
http://code.google.com/p/hacdb/wiki/HiveTry
http://www.mazsoft.com/blog/post/2009/11/19/setting-up-hadoophive-cluster-on-Centos-5.aspx
http://www.jpox.org/docs/1_0/rdbms.html
http://bbs.hadoopor.com/thread-219-1-1.html
http://wiki.apache.org/hadoop/HiveDerbyServerMode
               
               
               
               
               
               

本文来自ChinaUnix博客,如果查看原文请点:http://blog.chinaunix.net/u/32515/showart_2151068.html
您需要登录后才可以回帖 登录 | 注册

本版积分规则 发表回复

  

北京盛拓优讯信息技术有限公司. 版权所有 京ICP备16024965号-6 北京市公安局海淀分局网监中心备案编号:11010802020122 niuxiaotong@pcpop.com 17352615567
未成年举报专区
中国互联网协会会员  联系我们:huangweiwei@itpub.net
感谢所有关心和支持过ChinaUnix的朋友们 转载本站内容请注明原作者名及出处

清除 Cookies - ChinaUnix - Archiver - WAP - TOP