Gobblin适配GaussDB开源开发心得
【摘要】 Gobblin适配GaussDB
1、背景介绍
Apache Gobblin 是一个由 LinkedIn 贡献给 Apache 软件基金会的开源分布式大数据集成框架。它旨在简化大数据集成中的常见任务,如数据流和批量数据生态系统中的提取、复制、组织和生命周期管理。Gobblin 能够处理来自各种数据源(例如数据库、REST API、FTP/SFTP 服务器、文件系统等)的大量数据,并将其抽取(Extract)、转换(Transform)和加载(Load,ETL)到 Hadoop 生态系统。它处理 ETL 流程中的共性任务,如作业调度、任务分区、错误处理、状态管理、数据质量检查与发布等,提供了一站式的解决方案。本次适配主要目的是让Apache Gobblin支持华为云 GuassDB数据库,让Apache Gobblin的用户使用华为云服务时能够顺利对接GuassDB数据库。
2、Gobblin编译
由于官方未提供安装包,需要手动编译。
2.1 在官网下载最新的源码包
2.2 解压
[openuser@ecs-hw00 data]$ tar -zxvf apache-gobblin-sources-0.17.0.tgz -C /opt/apps/
[openuser@ecs-hw00 gobblin-sources-0.17.0]$ cd /opt/apps/gobblin-sources-0.17.0/
[openuser@ecs-hw00 gobblin-sources-0.17.0]$ ll
total 243864
-rw-r--r-- 1 openuser openuser 157071 Jun 14 2023 CHANGELOG.md
-rw-r--r-- 1 openuser openuser 754 Feb 3 2022 HEADER
-rw-r--r-- 1 openuser openuser 15210 Feb 3 2022 LICENSE
-rw-r--r-- 1 openuser openuser 168 Feb 3 2022 NOTICE
-rw-r--r-- 1 openuser openuser 5260 Mar 16 2023 README.md
drwxr-xr-x 2 openuser openuser 4096 Jun 13 2023 bin
drwxrwxr-x 92 openuser openuser 4096 Nov 24 11:40 build
-rw-r--r-- 1 openuser openuser 8704 Nov 24 11:35 build.gradle
drwxr-xr-x 5 openuser openuser 4096 Nov 23 17:41 buildSrc
drwxr-xr-x 11 openuser openuser 4096 Feb 3 2022 conf
drwxr-xr-x 3 openuser openuser 4096 Feb 3 2022 config
-rw-r--r-- 1 openuser openuser 2038 Jun 14 2023 defaultEnvironment.gradle
drwxr-xr-x 2 openuser openuser 4096 Feb 3 2022 dev
drwxr-xr-x 3 openuser openuser 4096 Jun 13 2023 gobblin-admin
drwxr-xr-x 2 openuser openuser 4096 Feb 3 2022 gobblin-all
drwxr-xr-x 3 openuser openuser 4096 Feb 3 2022 gobblin-api
drwxr-xr-x 3 openuser openuser 4096 Feb 3 2022 gobblin-audit
drwxr-xr-x 3 openuser openuser 4096 Mar 16 2023 gobblin-aws
drwxr-xr-x 3 openuser openuser 4096 Feb 3 2022 gobblin-binary-management
drwxr-xr-x 3 openuser openuser 4096 Jun 13 2023 gobblin-cluster
drwxr-xr-x 4 openuser openuser 4096 Jun 13 2023 gobblin-compaction
drwxr-xr-x 3 openuser openuser 4096 Jun 14 2023 gobblin-completeness
drwxr-xr-x 4 openuser openuser 4096 Feb 3 2022 gobblin-config-management
drwxr-xr-x 3 openuser openuser 4096 Mar 16 2023 gobblin-core
drwxr-xr-x 3 openuser openuser 4096 Dec 17 2022 gobblin-core-base
drwxr-xr-x 4 openuser openuser 4096 Jun 13 2023 gobblin-data-management
drwxr-xr-x 2 openuser openuser 4096 Dec 17 2022 gobblin-distribution
drwxr-xr-x 4 openuser openuser 4096 Feb 3 2022 gobblin-docker
drwxr-xr-x 15 openuser openuser 4096 Feb 3 2022 gobblin-docs
drwxr-xr-x 3 openuser openuser 4096 Feb 3 2022 gobblin-example
-rw-r--r-- 1 openuser openuser 1187 Feb 3 2022 gobblin-flavored-build.gradle
drwxr-xr-x 4 openuser openuser 4096 Mar 10 2022 gobblin-hive-registration
drwxr-xr-x 3 openuser openuser 4096 Jun 14 2023 gobblin-iceberg
drwxr-xr-x 2 openuser openuser 4096 Feb 3 2022 gobblin-integration-test-log-dir
drwxr-xr-x 3 openuser openuser 4096 Feb 3 2022 gobblin-kubernetes
drwxr-xr-x 3 openuser openuser 4096 Dec 17 2022 gobblin-metastore
drwxr-xr-x 4 openuser openuser 4096 Feb 3 2022 gobblin-metrics-libs
drwxr-xr-x 34 openuser openuser 4096 Feb 3 2022 gobblin-modules
drwxr-xr-x 3 openuser openuser 4096 Feb 3 2022 gobblin-oozie
drwxr-xr-x 5 openuser openuser 4096 Feb 3 2022 gobblin-rest-service
drwxr-xr-x 5 openuser openuser 4096 Jun 14 2023 gobblin-restli
drwxr-xr-x 3 openuser openuser 4096 Jun 13 2023 gobblin-runtime
drwxr-xr-x 3 openuser openuser 4096 Feb 3 2022 gobblin-runtime-hadoop
drwxr-xr-x 3 openuser openuser 4096 Feb 3 2022 gobblin-salesforce
drwxr-xr-x 3 openuser openuser 4096 Dec 17 2022 gobblin-service
drwxr-xr-x 3 openuser openuser 4096 Feb 3 2022 gobblin-test
drwxr-xr-x 4 openuser openuser 4096 Feb 3 2022 gobblin-test-harness
drwxr-xr-x 4 openuser openuser 4096 Sep 15 2022 gobblin-test-utils
drwxr-xr-x 3 openuser openuser 4096 Feb 3 2022 gobblin-tunnel
drwxr-xr-x 3 openuser openuser 4096 Jun 13 2023 gobblin-utility
drwxr-xr-x 3 openuser openuser 4096 Jun 14 2023 gobblin-yarn
drwxr-xr-x 5 openuser openuser 4096 Jun 14 2023 gradle
-rw-r--r-- 1 openuser openuser 1432 Jun 14 2023 gradle.properties
-rw-r--r-- 1 openuser openuser 1432 Nov 24 11:38 gradle.properties.release
-rwxr-xr-x 1 openuser openuser 5829 Feb 3 2022 gradlew
-rwxr-xr-x 1 openuser openuser 3240 Feb 3 2022 gradlew.bat
drwxr-xr-x 3 openuser openuser 4096 Feb 3 2022 ligradle
drwxr-xr-x 2 openuser openuser 4096 Feb 3 2022 lumos_value_audit
drwxr-xr-x 2 openuser openuser 4096 Jun 14 2023 maven-nexus
-rw-r--r-- 1 openuser openuser 5561 Feb 3 2022 mkdocs.yml
-rwxr-xr-x 1 openuser openuser 2784 Feb 3 2022 query_github_issues.py
-rw-r--r-- 1 openuser openuser 18 Feb 3 2022 readthedocs.yml
-rw-r--r-- 1 openuser openuser 3404 Mar 10 2022 settings.gradle
2.3 修改依赖
[openuser@ecs-hw00 gobblin-sources-0.17.0]$ vim build.gradle
dependencies {
classpath 'org.apache.ant:ant:1.9.4'
classpath 'gradle.plugin.org.inferred:gradle-processors:1.1.2'
classpath 'io.spring.gradle:dependency-management-plugin:1.0.11.RELEASE'
classpath 'me.champeau.gradle:jmh-gradle-plugin:0.4.8'
classpath "gradle.plugin.nl.javadude.gradle.plugins:license-gradle-plugin:0.14.0"
classpath 'org.jfrog.buildinfo:build-info-extractor-gradle:4.23.4'
}
2.4 编译依赖环境
jkd要求1.8及以上、maven
[openuser@ecs-hw00 ~]$ ll /opt/apps/
drwxr-xr-x 9 openuser openuser 4096 Nov 24 19:08 gobblin-dist
drwxr-xr-x 50 openuser openuser 4096 Dec 16 14:17 gobblin-sources-0.17.0
drwxr-xr-x 7 openuser openuser 4096 Apr 2 2019 jdk1.8.0_212
drwxrwxr-x 6 openuser openuser 4096 Nov 23 17:21 maven-3.9.4
[openuser@ecs-hw00 ~]$ cat /etc/profile.d/my_env.sh
#JAVA_HOME
export JAVA_HOME=/opt/apps/jdk1.8.0_212
export PATH=$PATH:$JAVA_HOME/bin
# MAVEN_HOME
export M2_HOME=/opt/apps/maven-3.9.4
export PATH=$M2_HOME/bin:$PATH
2.5 下载gradle包装器
注意:下载后要把jar包复制到gradle/wrapper目录下
wget --no-check-certificate -P gradle/wrapper https://github.com/apache/gobblin/raw/0.17.0/gradle/wrapper/gradle-wrapper.jar
2.6 下载依赖包
# 下载
wget https://repo.gradle.org/ui/native/libs-releases/org/gradle/api/plugins/gradle-nexus-plugin/0.7.1/gradle-nexus-plugin-0.7.1.jar
# 安装到maven仓库
mvn install:install-file -Dfile=./gradle-nexus-plugin-0.7.1.jar \
-DgroupId=org.gradle.api.plugins \
-DartifactId=gradle-nexus-plugin \
-Dversion=0.7.1 \
-Dpackaging=jar
2.7 构建发行版
[openuser@ecs-hw00 gobblin-sources-0.17.0]$ ./gradlew build -x findbugsMain -x test -x rat -x checkstyleMain
2.8 得到发行版 apache-gobblin-incubating-bin-0.17.0.tar.gz
[openuser@ecs-hw00 ~]$ ll /opt/apps/gobblin-sources-0.17.0/
total 243864
-rw-r--r-- 1 openuser openuser 157071 Jun 14 2023 CHANGELOG.md
-rw-r--r-- 1 openuser openuser 754 Feb 3 2022 HEADER
-rw-r--r-- 1 openuser openuser 15210 Feb 3 2022 LICENSE
-rw-r--r-- 1 openuser openuser 168 Feb 3 2022 NOTICE
-rw-r--r-- 1 openuser openuser 5260 Mar 16 2023 README.md
-rw-rw-r-- 1 openuser openuser 249263991 Nov 24 11:39 apache-gobblin-incubating-bin-0.17.0.tar.gz
drwxr-xr-x 2 openuser openuser 4096 Jun 13 2023 bin
drwxrwxr-x 92 openuser openuser 4096 Nov 24 11:40 build
-rw-r--r-- 1 openuser openuser 8704 Nov 24 11:35 build.gradle
drwxr-xr-x 5 openuser openuser 4096 Nov 23 17:41 buildSrc
drwxr-xr-x 11 openuser openuser 4096 Feb 3 2022 conf
drwxr-xr-x 3 openuser openuser 4096 Feb 3 2022 config
-rw-r--r-- 1 openuser openuser 2038 Jun 14 2023 defaultEnvironment.gradle
drwxr-xr-x 2 openuser openuser 4096 Feb 3 2022 dev
drwxr-xr-x 3 openuser openuser 4096 Jun 13 2023 gobblin-admin
drwxr-xr-x 2 openuser openuser 4096 Feb 3 2022 gobblin-all
drwxr-xr-x 3 openuser openuser 4096 Feb 3 2022 gobblin-api
drwxr-xr-x 3 openuser openuser 4096 Feb 3 2022 gobblin-audit
drwxr-xr-x 3 openuser openuser 4096 Mar 16 2023 gobblin-aws
drwxr-xr-x 3 openuser openuser 4096 Feb 3 2022 gobblin-binary-management
drwxr-xr-x 3 openuser openuser 4096 Jun 13 2023 gobblin-cluster
drwxr-xr-x 4 openuser openuser 4096 Jun 13 2023 gobblin-compaction
drwxr-xr-x 3 openuser openuser 4096 Jun 14 2023 gobblin-completeness
drwxr-xr-x 4 openuser openuser 4096 Feb 3 2022 gobblin-config-management
drwxr-xr-x 3 openuser openuser 4096 Mar 16 2023 gobblin-core
drwxr-xr-x 3 openuser openuser 4096 Dec 17 2022 gobblin-core-base
drwxr-xr-x 4 openuser openuser 4096 Jun 13 2023 gobblin-data-management
drwxr-xr-x 2 openuser openuser 4096 Dec 17 2022 gobblin-distribution
drwxr-xr-x 4 openuser openuser 4096 Feb 3 2022 gobblin-docker
drwxr-xr-x 15 openuser openuser 4096 Feb 3 2022 gobblin-docs
drwxr-xr-x 3 openuser openuser 4096 Feb 3 2022 gobblin-example
-rw-r--r-- 1 openuser openuser 1187 Feb 3 2022 gobblin-flavored-build.gradle
drwxr-xr-x 4 openuser openuser 4096 Mar 10 2022 gobblin-hive-registration
drwxr-xr-x 3 openuser openuser 4096 Jun 14 2023 gobblin-iceberg
drwxr-xr-x 2 openuser openuser 4096 Feb 3 2022 gobblin-integration-test-log-dir
drwxr-xr-x 3 openuser openuser 4096 Feb 3 2022 gobblin-kubernetes
drwxr-xr-x 3 openuser openuser 4096 Dec 17 2022 gobblin-metastore
drwxr-xr-x 4 openuser openuser 4096 Feb 3 2022 gobblin-metrics-libs
drwxr-xr-x 34 openuser openuser 4096 Feb 3 2022 gobblin-modules
drwxr-xr-x 3 openuser openuser 4096 Feb 3 2022 gobblin-oozie
drwxr-xr-x 5 openuser openuser 4096 Feb 3 2022 gobblin-rest-service
drwxr-xr-x 5 openuser openuser 4096 Jun 14 2023 gobblin-restli
drwxr-xr-x 3 openuser openuser 4096 Jun 13 2023 gobblin-runtime
drwxr-xr-x 3 openuser openuser 4096 Feb 3 2022 gobblin-runtime-hadoop
drwxr-xr-x 3 openuser openuser 4096 Feb 3 2022 gobblin-salesforce
drwxr-xr-x 3 openuser openuser 4096 Dec 17 2022 gobblin-service
drwxr-xr-x 3 openuser openuser 4096 Feb 3 2022 gobblin-test
drwxr-xr-x 4 openuser openuser 4096 Feb 3 2022 gobblin-test-harness
drwxr-xr-x 4 openuser openuser 4096 Sep 15 2022 gobblin-test-utils
drwxr-xr-x 3 openuser openuser 4096 Feb 3 2022 gobblin-tunnel
drwxr-xr-x 3 openuser openuser 4096 Jun 13 2023 gobblin-utility
drwxr-xr-x 3 openuser openuser 4096 Jun 14 2023 gobblin-yarn
drwxr-xr-x 5 openuser openuser 4096 Jun 14 2023 gradle
-rw-r--r-- 1 openuser openuser 1432 Jun 14 2023 gradle.properties
-rw-r--r-- 1 openuser openuser 1432 Nov 24 11:38 gradle.properties.release
-rwxr-xr-x 1 openuser openuser 5829 Feb 3 2022 gradlew
-rwxr-xr-x 1 openuser openuser 3240 Feb 3 2022 gradlew.bat
drwxr-xr-x 3 openuser openuser 4096 Feb 3 2022 ligradle
drwxr-xr-x 2 openuser openuser 4096 Feb 3 2022 lumos_value_audit
drwxr-xr-x 2 openuser openuser 4096 Jun 14 2023 maven-nexus
-rw-r--r-- 1 openuser openuser 5561 Feb 3 2022 mkdocs.yml
-rwxr-xr-x 1 openuser openuser 2784 Feb 3 2022 query_github_issues.py
-rw-r--r-- 1 openuser openuser 18 Feb 3 2022 readthedocs.yml
-rw-r--r-- 1 openuser openuser 3404 Mar 10 2022 settings.gradle
[openuser@ecs-hw00 ~]$
3 部署
3.1 解压goblin
[openuser@ecs-hw00 gobblin-sources-0.17.0]$ tar -zxvf apache-gobblin-incubating-bin-0.17.0.tar.gz -C /opt/apps/
3.2 创建作业目录
[openuser@ecs-hw00 gobblin-dist]$ mkdir -p /opt/apps/gobblin-dist/workdir
3.3 gobblin所需环境变量
[openuser@ecs-hw00 gobblin-dist]$ export GOBBLIN_WORK_DIR=/opt/apps/gobblin-dist/workdir
[openuser@ecs-hw00 gobblin-dist]$ export GOBBLIN_FWDIR=/opt/apps/gobblin-dist
[openuser@ecs-hw00 gobblin-dist]$ export GOBBLIN_LOG_DIR=/opt/apps/gobblin-dist/logs
[openuser@ecs-hw00 gobblin-dist]$ export GOBBLIN_JOB_CONFIG_DIR=/opt/apps/gobblin-dist/conf
4 作业示例
4.1 创建GaussBD源表及MySQL目标表
4.2 编写作业配置文件
[openuser@ecs-hw00 gobblin-dist]$ vim workdir/mysql-to-gaussdb.pull
job.name=MySQLToGaussDBJob
job.group=MySQLToGaussDBGroup
job.description=Extract data from MySQL and load into GaussDB.
job.lock.enabled=false
gobblin.scheduler.mode=standalone
# Source
source.class=gobblin.source.extractor.extract.jdbc.JdbcExtractor
extract.namespace=gobblin.extract.jdbc
extract.table.type=SNAPSHOT_ONLY
# MySQL
source.connection.driver=com.mysql.cj.jdbc.Driver
source.connection.url=jdbc:mysql://host:3306/db
source.connection.user=
source.connection.password=
source.table=
writer.builder.class=gobblin.writer.SimpleDataWriterBuilder
writer.destination.type=HDFS
writer.output.format=jdbc
writer.connection.driver=org.postgresql.Driver
writer.connection.url=jdbc:postgresql://host:8000/db_all_userbase?currentSchema=public&useUnicode=true&characterEncoding=UTF-8
writer.connection.user=
writer.connection.password=
writer.output.database=
writer.output.table=
writer.batch.size=5000
source.max.number.of.partitions=10
4.3 提交作业
[openuser@ecs-hw00 gobblin-dist]$ bin/gobblin.sh cli run --job-conf-file /opt/apps/gobblin-dist/workdir/mysql-to-gaussdb.pull --conf-dir /opt/apps/gobblin-dist/conf/standalone -jobName MySQLToGaussDBJob
4.4 观察mysql目标表,数据已写入
5 心得
5.1 如果过程中有缺少jar包,可以手动下载下来,安装到maven中。
5.2 Gradle的仓库配置
repositories {
maven {
url "https://plugins.gradle.org/m2/"
}
maven { url 'https://repo.jfrog.org/artifactory/libs-release' }
maven { url 'https://repo.jfrog.org/artifactory/libs-snapshot' }
maven { url 'https://repository.apache.org/content/groups/snapshots/' }
mavenLocal()
}
【版权声明】本文为华为云社区用户原创内容,转载时必须标注文章的来源(华为云社区)、文章链接、文章作者等基本信息, 否则作者和本社区有权追究责任。如果您发现本社区中有涉嫌抄袭的内容,欢迎发送邮件进行举报,并提供相关证据,一经查实,本社区将立刻删除涉嫌侵权内容,举报邮箱:
cloudbbs@huaweicloud.com
- 点赞
- 收藏
- 关注作者
评论(0)