sqoop避免输入密码自动增量job脚本介绍
【摘要】 将关系型数据库ORACLE的数据导入到HDFS中,可以通过Sqoop、OGG来实现,相比较ORACLE GOLDENGATE,Sqoop不仅不需要复杂的安装配置,而且传输效率很高,同时也能实现增量数据同步。
说明:本测试hadoop是单节点伪分布式环境,是基于之前写的两篇文章对应的环境操作.
本文档将在以上两个文章的基础上操作,是对第二篇文章环境的一个简单使用测试,使用过程中出现的错误亦可以验证暴
一、介绍
将关系型数据库ORACLE的数据导入到HDFS中,可以通过Sqoop、OGG来实现,相比较ORACLE GOLDENGATE,Sqoop不仅不需要复杂的安装配置,而且传输效率很高,同时也能实现增量数据同步。
说明:本测试hadoop是单节点伪分布式环境,是基于之前写的两篇文章对应的环境操作.
本文档将在以上两个文章的基础上操作,是对第二篇文章环境的一个简单使用测试,使用过程中出现的错误亦可以验证暴漏第二篇文章安装的问题出现的错误,至于sqoop增量同步到hive请看本人在这篇文章之后写的测试文档:
[Oracle通过sqoop增量同步数据到hive]
二、环境配置
三、实验过程
1、Oracle源端创建测试用表并初始化
--scott用户下创建此表
create table ora_hive(
empno number primary key,
ename varchar2(30),
hiredate date
);
--简单初始化出近1000天数据
insert into ora_hive
select level, dbms_random.string('u', 20), sysdate - level
from dual
connect by level <= 1000;
commit;
--现在表中存在2019,2018,2017,2016四年的数据
2、hive创建目标表
切换到hive目录:
[root@hadoop ~]# cd /hadoop/hive/
[root@hadoop hive]# cd bin
[root@hadoop bin]# ./hive
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/hadoop/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/imp
l/StaticLoggerBinder.class]SLF4J: Found binding in [jar:file:/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/
org/slf4j/impl/StaticLoggerBinder.class]SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Logging initialized using configuration in jar:file:/hadoop/hive/lib/hive-common-2.3.2.jar!/
hive-log4j2.properties Async: trueHive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider
using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.hive> show databases;
OK
default
sbux
Time taken: 1.508 seconds, Fetched: 2 row(s)
hive> create database oracle;
OK
Time taken: 1.729 seconds
hive> show databases;
OK
default
oracle
sbux
Time taken: 0.026 seconds, Fetched: 3 row(s)
hive> use oracle;
OK
Time taken: 0.094 seconds
hive> create table ora_hive(
> empno int,
> ename string,
> hiredate date
> );
OK
Time taken: 0.744 seconds
hive> show tables;
3、导入数据:
[root@hadoop bin]# sqoop import --connect jdbc:oracle:thin:@192.168.1.6:1521:orcl --username scott --password tiger --table ORA_HIVE -m 1 --hive-import --hive-database oracle
Warning: /hadoop/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /hadoop/sqoop/../zookeeper does not exist! Accumulo imports will fail.
java.security.AccessControlException: access denied ("javax.management.MBeanTrustPermission" "register")
at java.security.AccessControlContext.checkPermission(AccessControlContext.java:472)
at java.lang.SecurityManager.checkPermission(SecurityManager.java:585)
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.checkMBeanTrustPermission(DefaultMBeanServerInterceptor.java:1848)
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(DefaultMBeanServerInterceptor.java:322)
at com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(JmxMBeanServer.java:522)
at org.apache.logging.log4j.core.jmx.Server.register(Server.java:380)
at org.apache.logging.log4j.core.jmx.Server.reregisterMBeansAfterReconfigure(Server.java:165)
at org.apache.logging.log4j.core.jmx.Server.reregisterMBeansAfterReconfigure(Server.java:138)
at org.apache.logging.log4j.core.LoggerContext.setConfiguration(LoggerContext.java:507)
at org.apache.logging.log4j.core.LoggerContext.start(LoggerContext.java:249)
at org.apache.logging.log4j.core.async.AsyncLoggerContext.start(AsyncLoggerContext.java:86)
at org.apache.logging.log4j.core.impl.Log4jContextFactory.getContext(Log4jContextFactory.java:239)
at org.apache.logging.log4j.core.config.Configurator.initialize(Configurator.java:157)
at org.apache.logging.log4j.core.config.Configurator.initialize(Configurator.java:130)
at org.apache.logging.log4j.core.config.Configurator.initialize(Configurator.java:100)
at org.apache.logging.log4j.core.config.Configurator.initialize(Configurator.java:187)
at org.apache.hadoop.hive.common.LogUtils.initHiveLog4jDefault(LogUtils.java:154)
at org.apache.hadoop.hive.common.LogUtils.initHiveLog4jCommon(LogUtils.java:90)
at org.apache.hadoop.hive.common.LogUtils.initHiveLog4jCommon(LogUtils.java:82)
at org.apache.hadoop.hive.common.LogUtils.initHiveLog4j(LogUtils.java:65)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:702)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:686)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.sqoop.hive.HiveImport.executeScript(HiveImport.java:331)
at org.apache.sqoop.hive.HiveImport.importTable(HiveImport.java:241)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:537)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:628)
at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
at org.apache.sqoop.Sqoop.main(Sqoop.java:252)
Logging initialized using configuration in jar:file:/hadoop/hive/lib/hive-common-2.3.2.jar!/hive-log4j2.properties Async: true
19/03/12 14:38:47 INFO SessionState:
Logging initialized using configuration in jar:file:/hadoop/hive/lib/hive-common-2.3.2.jar!/hive-log4j2.properties Async: true
19/03/12 14:38:47 INFO session.SessionState: Created HDFS directory: /tmp/hive/root/0a864287-81f1-47f6-80fb-8ea3cf0b2faf
19/03/12 14:38:47 INFO session.SessionState: Created local directory: /hadoop/hive/tmp/root/0a864287-81f1-47f6-80fb-8ea3cf0b2faf
19/03/12 14:38:47 INFO session.SessionState: Created HDFS directory: /tmp/hive/root/0a864287-81f1-47f6-80fb-8ea3cf0b2faf/_tmp_space.db
19/03/12 14:38:47 INFO conf.HiveConf: Using the default value passed in for log id: 0a864287-81f1-47f6-80fb-8ea3cf0b2faf
19/03/12 14:38:47 INFO session.SessionState: Updating thread name to 0a864287-81f1-47f6-80fb-8ea3cf0b2faf main
19/03/12 14:38:47 INFO conf.HiveConf: Using the default value passed in for log id: 0a864287-81f1-47f6-80fb-8ea3cf0b2faf
19/03/12 14:38:47 INFO ql.Driver: Compiling command(queryId=root_20190312063847_496038e3-f0ee-4f95-b35b-bfd1746cfd09): CREATE TABLE IF NOT EXISTS `oracle`.`ORA_HIVE` ( `EMPNO` DOUBLE, `ENAM
E` STRING, `HIREDATE` STRING) COMMENT 'Imported by sqoop on 2019/03/12 06:38:44' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\001' LINES TERMINATED BY '\012' STORED AS TEXTFILE19/03/12 14:38:50 INFO hive.metastore: Trying to connect to metastore with URI thrift://192.168.1.66:9083
19/03/12 14:38:50 INFO hive.metastore: Opened a connection to metastore, current connections: 1
19/03/12 14:38:50 INFO hive.metastore: Connected to metastore.
19/03/12 14:38:50 INFO parse.CalcitePlanner: Starting Semantic Analysis
19/03/12 14:38:50 INFO parse.CalcitePlanner: Creating table oracle.ORA_HIVE position=27
19/03/12 14:38:51 INFO ql.Driver: Semantic Analysis Completed
19/03/12 14:38:51 INFO ql.Driver: Returning Hive schema: Schema(fieldSchemas:null, properties:null)
19/03/12 14:38:51 INFO ql.Driver: Completed compiling command(queryId=root_20190312063847_496038e3-f0ee-4f95-b35b-bfd1746cfd09); Time taken: 3.385 seconds
19/03/12 14:38:51 INFO ql.Driver: Concurrency mode is disabled, not creating a lock manager
19/03/12 14:38:51 INFO ql.Driver: Executing command(queryId=root_20190312063847_496038e3-f0ee-4f95-b35b-bfd1746cfd09): CREATE TABLE IF NOT EXISTS `oracle`.`ORA_HIVE` ( `EMPNO` DOUBLE, `ENAM
E` STRING, `HIREDATE` STRING) COMMENT 'Imported by sqoop on 2019/03/12 06:38:44' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\001' LINES TERMINATED BY '\012' STORED AS TEXTFILE19/03/12 14:38:51 INFO sqlstd.SQLStdHiveAccessController: Created SQLStdHiveAccessController for session context : HiveAuthzSessionContext [sessionString=0a864287-81f1-47f6-80fb-8ea3cf0b2fa
f, clientType=HIVECLI]19/03/12 14:38:51 WARN session.SessionState: METASTORE_FILTER_HOOK will be ignored, since hive.security.authorization.manager is set to instance of HiveAuthorizerFactory.
19/03/12 14:38:51 INFO hive.metastore: Mestastore configuration hive.metastore.filter.hook changed from org.apache.hadoop.hive.metastore.DefaultMetaStoreFilterHookImpl to org.apache.hadoop.
hive.ql.security.authorization.plugin.AuthorizationMetaStoreFilterHook19/03/12 14:38:51 INFO hive.metastore: Closed a connection to metastore, current connections: 0
19/03/12 14:38:51 INFO hive.metastore: Trying to connect to metastore with URI thrift://192.168.1.66:9083
19/03/12 14:38:51 INFO hive.metastore: Opened a connection to metastore, current connections: 1
19/03/12 14:38:51 INFO hive.metastore: Connected to metastore.
19/03/12 14:38:51 INFO ql.Driver: Completed executing command(queryId=root_20190312063847_496038e3-f0ee-4f95-b35b-bfd1746cfd09); Time taken: 0.134 seconds
OK
19/03/12 14:38:51 INFO ql.Driver: OK
Time taken: 3.534 seconds
19/03/12 14:38:51 INFO CliDriver: Time taken: 3.534 seconds
19/03/12 14:38:51 INFO conf.HiveConf: Using the default value passed in for log id: 0a864287-81f1-47f6-80fb-8ea3cf0b2faf
19/03/12 14:38:51 INFO session.SessionState: Resetting thread name to main
19/03/12 14:38:51 INFO conf.HiveConf: Using the default value passed in for log id: 0a864287-81f1-47f6-80fb-8ea3cf0b2faf
19/03/12 14:38:51 INFO session.SessionState: Updating thread name to 0a864287-81f1-47f6-80fb-8ea3cf0b2faf main
19/03/12 14:38:51 INFO ql.Driver: Compiling command(queryId=root_20190312063851_88f08fef-b111-439a-ac51-62b91c507927):
LOAD DATA INPATH 'hdfs://192.168.1.66:9000/user/root/ORA_HIVE' INTO TABLE `oracle`.`ORA_HIVE`
19/03/12 14:38:51 INFO ql.Driver: Semantic Analysis Completed
19/03/12 14:38:51 INFO ql.Driver: Returning Hive schema: Schema(fieldSchemas:null, properties:null)
19/03/12 14:38:51 INFO ql.Driver: Completed compiling command(queryId=root_20190312063851_88f08fef-b111-439a-ac51-62b91c507927); Time taken: 0.451 seconds
19/03/12 14:38:51 INFO ql.Driver: Concurrency mode is disabled, not creating a lock manager
19/03/12 14:38:51 INFO ql.Driver: Executing command(queryId=root_20190312063851_88f08fef-b111-439a-ac51-62b91c507927):
LOAD DATA INPATH 'hdfs://192.168.1.66:9000/user/root/ORA_HIVE' INTO TABLE `oracle`.`ORA_HIVE`
19/03/12 14:38:51 INFO ql.Driver: Starting task [Stage-0:MOVE] in serial mode
19/03/12 14:38:51 INFO hive.metastore: Closed a connection to metastore, current connections: 0
Loading data to table oracle.ora_hive
19/03/12 14:38:51 INFO exec.Task: Loading data to table oracle.ora_hive from hdfs://192.168.1.66:9000/user/root/ORA_HIVE
19/03/12 14:38:51 INFO hive.metastore: Trying to connect to metastore with URI thrift://192.168.1.66:9083
19/03/12 14:38:51 INFO hive.metastore: Opened a connection to metastore, current connections: 1
19/03/12 14:38:51 INFO hive.metastore: Connected to metastore.
19/03/12 14:38:51 ERROR hdfs.KeyProviderCache: Could not find uri with key [dfs.encryption.key.provider.uri] to create a keyProvider !!
19/03/12 14:38:52 ERROR exec.TaskRunner: Error in executeTask
java.lang.NoSuchMethodError: com.fasterxml.jackson.databind.ObjectMapper.readerFor(Ljava/lang/Class;)Lcom/fasterxml/jackson/databind/ObjectReader;
at org.apache.hadoop.hive.common.StatsSetupConst$ColumnStatsAccurate.<clinit>(StatsSetupConst.java:165)
at org.apache.hadoop.hive.common.StatsSetupConst.parseStatsAcc(StatsSetupConst.java:300)
at org.apache.hadoop.hive.common.StatsSetupConst.clearColumnStatsState(StatsSetupConst.java:261)
at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:2032)
at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:360)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:199)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2183)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1839)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1526)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1227)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:336)
at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:474)
at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:490)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:793)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:686)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.sqoop.hive.HiveImport.executeScript(HiveImport.java:331)
at org.apache.sqoop.hive.HiveImport.importTable(HiveImport.java:241)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:537)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:628)
at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
at org.apache.sqoop.Sqoop.main(Sqoop.java:252)
FAILED: Execution Error, return code -101 from org.apache.hadoop.hive.ql.exec.MoveTask. com.fasterxml.jackson.databind.ObjectMapper.readerFor(Ljava/lang/Class;)Lcom/fasterxml/jackson/databi
nd/ObjectReader;19/03/12 14:38:52 ERROR ql.Driver: FAILED: Execution Error, return code -101 from org.apache.hadoop.hive.ql.exec.MoveTask. com.fasterxml.jackson.databind.ObjectMapper.readerFor(Ljava/lang/C
lass;)Lcom/fasterxml/jackson/databind/ObjectReader;19/03/12 14:38:52 INFO ql.Driver: Completed executing command(queryId=root_20190312063851_88f08fef-b111-439a-ac51-62b91c507927); Time taken: 0.497 seconds
19/03/12 14:38:52 INFO conf.HiveConf: Using the default value passed in for log id: 0a864287-81f1-47f6-80fb-8ea3cf0b2faf
19/03/12 14:38:52 INFO session.SessionState: Resetting thread name to main
19/03/12 14:38:52 INFO conf.HiveConf: Using the default value passed in for log id: 0a864287-81f1-47f6-80fb-8ea3cf0b2faf
19/03/12 14:38:52 INFO session.SessionState: Deleted directory: /tmp/hive/root/0a864287-81f1-47f6-80fb-8ea3cf0b2faf on fs with scheme hdfs
19/03/12 14:38:52 INFO session.SessionState: Deleted directory: /hadoop/hive/tmp/root/0a864287-81f1-47f6-80fb-8ea3cf0b2faf on fs with scheme file
19/03/12 14:38:52 INFO hive.metastore: Closed a connection to metastore, current connections: 0
19/03/12 14:38:52 ERROR tool.ImportTool: Import failed: java.io.IOException: Hive CliDriver exited with status=-101
at org.apache.sqoop.hive.HiveImport.executeScript(HiveImport.java:355)
at org.apache.sqoop.hive.HiveImport.importTable(HiveImport.java:241)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:537)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:628)
at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
at org.apache.sqoop.Sqoop.main(Sqoop.java:252)
发现报错,先看第一个报错内容:
2019-03-12 06:38:47,077 main ERROR Could not register mbeans java.security.AccessControlException: access denied ("javax.management.MBeanTrustPermission" "register")
这个报错内容是Oracle jdbc驱动连接数据库时报的错,解决办法为:
修改$JAVA_HOME/jre/lib/security/java.policy文件,添加一行:
permission javax.management.MBeanTrustPermission "register";
来看第二个错误:
java.lang.NoSuchMethodError: com.fasterxml.jackson.databind.ObjectMapper.readerFor(Ljava/lang/Class;)Lcom/fasterxml/jackson/databind/ObjectReader;
这个报错原因是hive+sqoop的jackson版本不一致导致的问题,解决办法为:
将
HIVE_HOME/lib/jackson*.jar 拷贝替换至 $SQOOP_HOME/lib 目录中。
[root@hadoop bin]# cd /hadoop/sqoop/lib/
[root@hadoop lib]# mkdir /hadoop/bak
[root@hadoop lib]# mv jackson*.jar /hadoop/bak/
[root@hadoop lib]# cp $HIVE_HOME/lib/jackson*.jar .
这时候先删除一次已经创建的ORA_HIVE再执行一次导入:
[root@hadoop ~]# hadoop fs -rmr ORA_HIVE
rmr: DEPRECATED: Please use 'rm -r' instead.
19/03/12 14:54:21 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 0 minut
es, Emptier interval = 0 minutes.Deleted ORA_HIVE
[root@hadoop ~]# sqoop import --connect jdbc:oracle:thin:@192.168.1.6:1521:orcl --username scott --password tiger --table ORA_HIVE -m 1 --hive-import --hive-database oracle
Warning: /hadoop/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
这次导入成功了,去hive验证一下:
[root@hadoop bin]# ./hive
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/hadoop/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Logging initialized using configuration in jar:file:/hadoop/hive/lib/hive-common-2.3.2.jar!/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
hive> use oracle;
OK
Time taken: 1.111 seconds
hive> select * from ORA_HIVE;
。。。。。。。
997 XAOOEHXITLWEFBZFCNAB NULL
998 IZVWHVGTEJHCJWJZTDXK NULL
999 YMBFLJTTWPENEBXEWVIJ NULL
1000 QFKDIGEYFWQBZBGTJPPD NULL
Time taken: 2.092 seconds, Fetched: 2000 row(s)
已经导入进来了
【版权声明】本文为华为云社区用户原创内容,未经允许不得转载,如需转载请自行联系原作者进行授权。如果您发现本社区中有涉嫌抄袭的内容,欢迎发送邮件进行举报,并提供相关证据,一经查实,本社区将立刻删除涉嫌侵权内容,举报邮箱:
cloudbbs@huaweicloud.com
- 点赞
- 收藏
- 关注作者
评论(0)