论坛徽章:: 0

电梯直达

1楼 [收藏(0)] [报告]

发表于 2010-01-17 00:03 |只看该作者 |倒序浏览

How to store Hive's metadata in Oracle
hive by default uses an embedded derby database. In real-world scenarios which multiple hive queries are executed using multiple session, a database server like MySQL or [color="#ff0000"]Oracle or derby server is required. For my example, i will use Oracle. Before doing that, make sure cluster is down.
1)create oracle user

sql> create user hive identified by hive identified by hive
default tablespace hive;
sql> grant dba to hive; [color="#ff0000"] [notice [color="#ff0000"]security]
2)edit hive/conf/hive-default.xml
   1
   2
   3
   4
   5
   6
   7
   8
   9
10
11
12
13
14 mapred.reduce.tasks
15 -1
16    The default number of reduce tasks per job.  Typically set
17 to a prime close to the number of available hosts.  Ignored when
18 mapred.job.tracker is "local". Hadoop set this to 1 by default, whereas hive uses -1 as its default value.
19 By setting this property to -1, Hive will automatically figure out what should be the number of reducers.
20
21
22
23
24 hive.exec.reducers.bytes.per.reducer
25 1000000000
26 size per reducer.The default is 1G, i.e if the input size is 10G, it will use 10 reducers.
27
28
29
30 hive.exec.reducers.max
31 999
32 max number of reducers will be used. If the one
33       specified in the configuration parameter mapred.reduce.tasks is
34       negative, hive will use this one as the max number of reducers when
35       automatically determine number of reducers.
36
37
38
39 hive.exec.scratchdir
40 /tmp/hive-${user.name}
41 Scratch space for Hive jobs
42
43
44
45 hive.test.mode
46 false
47 whether hive is running in test mode. If yes, it turns on sampling and prefixes the output tablename
48
49
50
51 hive.test.mode.prefix
52 test_
53 if hive is running in test mode, prefixes the output table by this string
54
55
56
57
58
59
60
61
62
63
64 hive.test.mode.samplefreq
65 32
66 if hive is running in test mode and table is not bucketed, sampling frequency
67
68
69
70 hive.test.mode.nosamplelist
71
72 if hive is running in test mode, dont sample the above comma seperated list of tables
73
74
75
76 hive.metastore.local
77 [color="#ff0000"]true
78 controls whether to connect to remove metastore server or open a new metastore server in Hive Client JVM
79
80
81
82 javax.jdo.option.ConnectionURL
83 [color="#ff0000"]jdbc:oracle:thin:@192.168.1.101:1521/ORCL
84 JDBC connect string for a JDBC metastore
85
86
87
88 javax.jdo.option.ConnectionDriverName
89 [color="#ff0000"]oracle.jdbc.driver.OracleDriver
90 Driver class name for a JDBC metastore
91
92
93
94 javax.jdo.PersistenceManagerFactoryClass
95 org.datanucleus.jdo.JDOPersistenceManagerFactory
96 class implementing the jdo persistence
97
98
99
100 javax.jdo.option.DetachAllOnCommit
101 true
102 detaches all objects from session so that they can be used after transaction is committed
103
104
105
106 javax.jdo.option.NonTransactionalRead
107 true
108 reads outside of transactions
109
110
111
112 javax.jdo.option.ConnectionUserName
113 hive
114 username to use against metastore database
115
116
117
118 javax.jdo.option.ConnectionPassword
119 hive
120 password to use against metastore database
121
122
123
124 datanucleus.validateTables
125 false
126 validates existing schema against code. turn this on if you want to verify existing schema
127
128
129
130 datanucleus.validateColumns
131 false
132 validates existing schema against code. turn this on if you want to verify existing schema
133
134
135
136 datanucleus.validateConstraints
137 false
138 validates existing schema against code. turn this on if you want to verify existing schema
139
140
141
142 datanucleus.storeManagerType
143 rdbms
144 metadata store type
145
146
147
148 datanucleus.autoCreateSchema
149 [color="#ff0000"]true
150 creates necessary schema on a startup if one doesn't exist. set this to false, after creating it once
151
152
153
154 datanucleus.autoStartMechanismMode
155 checked
156 throw exception if metadata tables are incorrect
157
158
159
160 datancucleus.transactionIsolation
161 read-committed
162
163
164
165
166 datanuclues.cache.level2
167 true
168 use a level 2 cache. turn this off if metadata is changed independently of hive metastore server
169
170
171
172 datanuclues.cache.level2.type
173 SOFT
174 SOFT=soft reference based cache, WEAK=weak reference based cache.
175
176
177
178 hive.metastore.warehouse.dir
179 /user/hive/warehouse
180 location of default database for the warehouse
181
182
183
184 hive.metastore.connect.retries
185 5
186 Number of retries while opening a connection to metastore
187
188
189
190 hive.metastore.rawstore.impl
191 org.apache.hadoop.hive.metastore.ObjectStore
192 Name of the class that implements org.apache.hadoop.hive.metastore.rawstore interface. This class is used to store and retrieval of raw metadata objects such as table, database
193
194
195
196 hive.default.fileformat
197 TextFile
198 Default file format for CREATE TABLE statement. Options are TextFile and SequenceFile. Users can explicitly say CREATE TABLE ... STORED AS  to override
199
200
201
202 hive.map.aggr
203 true
204 Whether to use map-side aggregation in Hive Group By queries
205
206
207
208 hive.groupby.skewindata
209 false
210 Whether there is skew in data to optimize group by queries
211
212
213
214 hive.groupby.mapaggr.checkinterval
215 100000
216 Number of rows after which size of the grouping keys/aggregation classes is performed
217
218
219
220 hive.mapred.local.mem
221 0
222 For local mode, memory of the mappers/reducers
223
224
225
226 hive.map.aggr.hash.percentmemory
227 0.5
228 Portion of total memory to be used by map-side grup aggregation hash table
229
230
231
232 hive.map.aggr.hash.min.reduction
233 0.5
234 Hash aggregation will be turned off if the ratio between hash
235 table size and input rows is bigger than this number. Set to 1 to make sure
236 hash aggregation is never turned off.
237
238
239
240 hive.optimize.cp
241 true
242 Whether to enable column pruner
243
244
245
246 hive.optimize.ppd
247 true
248 Whether to enable predicate pushdown
249
250
251
252 hive.optimize.pruner
253 true
254 Whether to enable the new partition pruner which depends on predicate pushdown. If this is disabled,
255 the old partition pruner which is based on AST will be enabled.
256
257
258
259 hive.join.emit.interval
260 1000
261 How many rows in the right-most join operand Hive should buffer before emitting the join result.
262
263
264
265 hive.mapred.mode
266 nonstrict
267 The mode in which the hive operations are being performed. In strict mode, some risky queries are not allowed to run
268
269
270
271 hive.exec.script.maxerrsize
272 100000
273 Maximum number of bytes a script is allowed to emit to standard error (per map-reduce task). This prevents runaway scripts from filling logs partitions to capacity
274
275
276
277 hive.exec.compress.output
278 false
279    This controls whether the final outputs of a query (to a local/hdfs file or a hive table) is compressed. The compression codec and other options are determined from hadoop config variables mapred.output.compress*
280
281
282
283 hive.exec.compress.intermediate
284 false
285    This controls whether intermediate files produced by hive between multiple map-reduce jobs are compressed. The compression codec and other options are determined from hadoop config variables mapred.output.compress*
286
287
288
289 hive.hwi.listen.host
290 0.0.0.0
291 This is the host address the Hive Web Interface will listen on
292
293
294
295 hive.hwi.listen.port
296 9999
297 This is the port the Hive Web Interface will listen on
298
299
300
301 hive.hwi.war.file
302 ${HIVE_HOME}/lib/hive-hwi.war
303 This is the WAR file with the jsp content for Hive Web Interface
304
305
306
307 hive.exec.pre.hooks
308
309 Pre Execute Hook for Tests
310
311
312
313 hive.merge.mapfiles
314 true
315 Merge small files at the end of a map-only job
316
317
318
319 hive.merge.mapredfiles
320 false
321 Merge small files at the end of any job(map only or map-reduce)
322
323
324
325 hive.heartbeat.interval
326 1000
327 Send a heartbeat after this interval - used by mapjoin and filter operators
328
329
330
331 hive.merge.size.per.task
332 256000000
333 Size of merged files at the end of the job
334
335
336
337 hive.script.auto.progress
338 false
339 Whether Hive Tranform/Map/Reduce Clause should automatically send progress information to TaskTracker to avoid the task getting killed because of inactivity.  Hive sends progress information when the script is outputting to stderr.  This option removes the need of periodically producing stderr messages, but users should be cautious because this may prevent infinite loops in the scripts to be killed by TaskTracker.
340
341
342
3)download oracle jdbc：-------[color="#ff0000"] ojdbc-14.jar
http://www.oracle.com/technology/software/tech/java/sqlj_jdbc/htdocs/jdbc_10201.html

4) test it ：
hive> [root@boss ~]# hive
Hive history file=/tmp/root/hive_job_log_root_201001162330_322775424.txt
hive> show tables;
OK
Time taken: 6.008 seconds
hive> create tabl[color="#ff0000"]e test(id bigint);
OK
Time taken: 0.495 seconds
hive>
5）login into oracle,check if autocreate some tables;
[oracle@boss ~]$ sqlplus hive/hive@192.168.1.101:1521/spgjmega
SQL*Plus: Release 10.2.0.1.0 - Production on Sat Jan 16 23:57:37 2010
Copyright (c) 1982, 2005, Oracle.  All rights reserved.
Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Production
With the Partitioning, OLAP and Data Mining options
SQL>  show user;
USER is "HIVE"
SQL> select * from tab;
TNAME                         TABTYPE  CLUSTERID
------------------------------ ------- ----------
NUCLEUS_TABLES                TABLE
A                            TABLE
DBS                         TABLE
SEQUENCE_TABLE                TABLE
SERDES                      TABLE
TBLS                         TABLE
SDS                         TABLE
PARTITION_KEYS                TABLE
COLUMNS                      TABLE
BUCKETING_COLS                TABLE
SD_PARAMS                   TABLE
TNAME                         TABTYPE  CLUSTERID
------------------------------ ------- ----------
SORT_COLS                   TABLE
SERDE_PARAMS                TABLE
TABLE_PARAMS                TABLE
14 rows selected.
SQL> set line 1000
SQL> r
  1* select * from TBLS
TBL_ID CREATE_TIME    DB_ID LAST_ACCESS_TIME OWNER                                                                               RETENTION    SD_ID TBL_NAME
---------- ----------- ---------- ---------------- -------------------------------------------------------------------------------------------------------------------------------- ---------- ---------- --------------------------------------------------------------------------------------------------------------------------------
      1  1263655926       1             0 root                                                                               0          1 test
SQL>
http://code.google.com/p/hacdb/wiki/HiveTry
http://www.mazsoft.com/blog/post/2009/11/19/setting-up-hadoophive-cluster-on-Centos-5.aspx
http://www.jpox.org/docs/1_0/rdbms.html
http://bbs.hadoopor.com/thread-219-1-1.html
http://wiki.apache.org/hadoop/HiveDerbyServerMode

本文来自ChinaUnix博客，如果查看原文请点：http://blog.chinaunix.net/u/32515/showart_2151068.html

返回列表

Chinaunix › 论坛 › IT运维 › Hadoop和大数据技术 › How to store Hive's metadata in Oracle 10g

[Hive] How to store Hive's metadata in Oracle 10g [复制链接]

浏览过的版块