Gobblin适配GaussDB开源开发心得

举报
guochengyi 发表于 2024/12/16 15:19:29 2024/12/16
【摘要】 Gobblin适配GaussDB

1、背景介绍

      Apache Gobblin 是一个由 LinkedIn 贡献给 Apache 软件基金会的开源分布式大数据集成框架。它旨在简化大数据集成中的常见任务,如数据流和批量数据生态系统中的提取、复制、组织和生命周期管理。Gobblin 能够处理来自各种数据源(例如数据库、REST APIFTP/SFTP 服务器、文件系统等)的大量数据,并将其抽取(Extract)、转换(Transform)和加载(LoadETL)到 Hadoop 生态系统。它处理 ETL 流程中的共性任务,如作业调度、任务分区、错误处理、状态管理、数据质量检查与发布等,提供了一站式的解决方案。本次适配主要目的是让Apache Gobblin支持华为云 GuassDB数据库,让Apache Gobblin的用户使用华为云服务时能够顺利对接GuassDB数据库

2、Gobblin编译

      由于官方未提供安装包,需要手动编译。

2.1 在官网下载最新的源码包

2.2 解压

[openuser@ecs-hw00 data]$ tar -zxvf apache-gobblin-sources-0.17.0.tgz -C /opt/apps/

[openuser@ecs-hw00 gobblin-sources-0.17.0]$ cd /opt/apps/gobblin-sources-0.17.0/
[openuser@ecs-hw00 gobblin-sources-0.17.0]$ ll
total 243864
-rw-r--r--  1 openuser openuser    157071 Jun 14  2023 CHANGELOG.md
-rw-r--r--  1 openuser openuser       754 Feb  3  2022 HEADER
-rw-r--r--  1 openuser openuser     15210 Feb  3  2022 LICENSE
-rw-r--r--  1 openuser openuser       168 Feb  3  2022 NOTICE
-rw-r--r--  1 openuser openuser      5260 Mar 16  2023 README.md
drwxr-xr-x  2 openuser openuser      4096 Jun 13  2023 bin
drwxrwxr-x 92 openuser openuser      4096 Nov 24 11:40 build
-rw-r--r--  1 openuser openuser      8704 Nov 24 11:35 build.gradle
drwxr-xr-x  5 openuser openuser      4096 Nov 23 17:41 buildSrc
drwxr-xr-x 11 openuser openuser      4096 Feb  3  2022 conf
drwxr-xr-x  3 openuser openuser      4096 Feb  3  2022 config
-rw-r--r--  1 openuser openuser      2038 Jun 14  2023 defaultEnvironment.gradle
drwxr-xr-x  2 openuser openuser      4096 Feb  3  2022 dev
drwxr-xr-x  3 openuser openuser      4096 Jun 13  2023 gobblin-admin
drwxr-xr-x  2 openuser openuser      4096 Feb  3  2022 gobblin-all
drwxr-xr-x  3 openuser openuser      4096 Feb  3  2022 gobblin-api
drwxr-xr-x  3 openuser openuser      4096 Feb  3  2022 gobblin-audit
drwxr-xr-x  3 openuser openuser      4096 Mar 16  2023 gobblin-aws
drwxr-xr-x  3 openuser openuser      4096 Feb  3  2022 gobblin-binary-management
drwxr-xr-x  3 openuser openuser      4096 Jun 13  2023 gobblin-cluster
drwxr-xr-x  4 openuser openuser      4096 Jun 13  2023 gobblin-compaction
drwxr-xr-x  3 openuser openuser      4096 Jun 14  2023 gobblin-completeness
drwxr-xr-x  4 openuser openuser      4096 Feb  3  2022 gobblin-config-management
drwxr-xr-x  3 openuser openuser      4096 Mar 16  2023 gobblin-core
drwxr-xr-x  3 openuser openuser      4096 Dec 17  2022 gobblin-core-base
drwxr-xr-x  4 openuser openuser      4096 Jun 13  2023 gobblin-data-management
drwxr-xr-x  2 openuser openuser      4096 Dec 17  2022 gobblin-distribution
drwxr-xr-x  4 openuser openuser      4096 Feb  3  2022 gobblin-docker
drwxr-xr-x 15 openuser openuser      4096 Feb  3  2022 gobblin-docs
drwxr-xr-x  3 openuser openuser      4096 Feb  3  2022 gobblin-example
-rw-r--r--  1 openuser openuser      1187 Feb  3  2022 gobblin-flavored-build.gradle
drwxr-xr-x  4 openuser openuser      4096 Mar 10  2022 gobblin-hive-registration
drwxr-xr-x  3 openuser openuser      4096 Jun 14  2023 gobblin-iceberg
drwxr-xr-x  2 openuser openuser      4096 Feb  3  2022 gobblin-integration-test-log-dir
drwxr-xr-x  3 openuser openuser      4096 Feb  3  2022 gobblin-kubernetes
drwxr-xr-x  3 openuser openuser      4096 Dec 17  2022 gobblin-metastore
drwxr-xr-x  4 openuser openuser      4096 Feb  3  2022 gobblin-metrics-libs
drwxr-xr-x 34 openuser openuser      4096 Feb  3  2022 gobblin-modules
drwxr-xr-x  3 openuser openuser      4096 Feb  3  2022 gobblin-oozie
drwxr-xr-x  5 openuser openuser      4096 Feb  3  2022 gobblin-rest-service
drwxr-xr-x  5 openuser openuser      4096 Jun 14  2023 gobblin-restli
drwxr-xr-x  3 openuser openuser      4096 Jun 13  2023 gobblin-runtime
drwxr-xr-x  3 openuser openuser      4096 Feb  3  2022 gobblin-runtime-hadoop
drwxr-xr-x  3 openuser openuser      4096 Feb  3  2022 gobblin-salesforce
drwxr-xr-x  3 openuser openuser      4096 Dec 17  2022 gobblin-service
drwxr-xr-x  3 openuser openuser      4096 Feb  3  2022 gobblin-test
drwxr-xr-x  4 openuser openuser      4096 Feb  3  2022 gobblin-test-harness
drwxr-xr-x  4 openuser openuser      4096 Sep 15  2022 gobblin-test-utils
drwxr-xr-x  3 openuser openuser      4096 Feb  3  2022 gobblin-tunnel
drwxr-xr-x  3 openuser openuser      4096 Jun 13  2023 gobblin-utility
drwxr-xr-x  3 openuser openuser      4096 Jun 14  2023 gobblin-yarn
drwxr-xr-x  5 openuser openuser      4096 Jun 14  2023 gradle
-rw-r--r--  1 openuser openuser      1432 Jun 14  2023 gradle.properties
-rw-r--r--  1 openuser openuser      1432 Nov 24 11:38 gradle.properties.release
-rwxr-xr-x  1 openuser openuser      5829 Feb  3  2022 gradlew
-rwxr-xr-x  1 openuser openuser      3240 Feb  3  2022 gradlew.bat
drwxr-xr-x  3 openuser openuser      4096 Feb  3  2022 ligradle
drwxr-xr-x  2 openuser openuser      4096 Feb  3  2022 lumos_value_audit
drwxr-xr-x  2 openuser openuser      4096 Jun 14  2023 maven-nexus
-rw-r--r--  1 openuser openuser      5561 Feb  3  2022 mkdocs.yml
-rwxr-xr-x  1 openuser openuser      2784 Feb  3  2022 query_github_issues.py
-rw-r--r--  1 openuser openuser        18 Feb  3  2022 readthedocs.yml
-rw-r--r--  1 openuser openuser      3404 Mar 10  2022 settings.gradle

2.3 修改依赖

[openuser@ecs-hw00 gobblin-sources-0.17.0]$ vim build.gradle

  dependencies {
    classpath 'org.apache.ant:ant:1.9.4'
    classpath 'gradle.plugin.org.inferred:gradle-processors:1.1.2'
    classpath 'io.spring.gradle:dependency-management-plugin:1.0.11.RELEASE'
    classpath 'me.champeau.gradle:jmh-gradle-plugin:0.4.8'
    classpath "gradle.plugin.nl.javadude.gradle.plugins:license-gradle-plugin:0.14.0"
    classpath 'org.jfrog.buildinfo:build-info-extractor-gradle:4.23.4'
  }

2.4 编译依赖环境

      jkd要求1.8及以上、maven

[openuser@ecs-hw00 ~]$ ll /opt/apps/
drwxr-xr-x  9 openuser openuser 4096 Nov 24 19:08 gobblin-dist
drwxr-xr-x 50 openuser openuser 4096 Dec 16 14:17 gobblin-sources-0.17.0
drwxr-xr-x  7 openuser openuser 4096 Apr  2  2019 jdk1.8.0_212
drwxrwxr-x  6 openuser openuser 4096 Nov 23 17:21 maven-3.9.4

[openuser@ecs-hw00 ~]$ cat /etc/profile.d/my_env.sh 
#JAVA_HOME
export JAVA_HOME=/opt/apps/jdk1.8.0_212
export PATH=$PATH:$JAVA_HOME/bin
# MAVEN_HOME
export M2_HOME=/opt/apps/maven-3.9.4
export PATH=$M2_HOME/bin:$PATH

2.5 下载gradle包装器

       注意:下载后要把jar包复制到gradle/wrapper目录下

wget --no-check-certificate -P gradle/wrapper https://github.com/apache/gobblin/raw/0.17.0/gradle/wrapper/gradle-wrapper.jar

2.6 下载依赖包

# 下载
wget https://repo.gradle.org/ui/native/libs-releases/org/gradle/api/plugins/gradle-nexus-plugin/0.7.1/gradle-nexus-plugin-0.7.1.jar

# 安装到maven仓库
mvn install:install-file -Dfile=./gradle-nexus-plugin-0.7.1.jar \
    -DgroupId=org.gradle.api.plugins \
    -DartifactId=gradle-nexus-plugin \
    -Dversion=0.7.1 \
    -Dpackaging=jar

2.7 构建发行版

[openuser@ecs-hw00 gobblin-sources-0.17.0]$ ./gradlew build -x findbugsMain -x test -x rat -x checkstyleMain

2.8 得到发行版  apache-gobblin-incubating-bin-0.17.0.tar.gz

[openuser@ecs-hw00 ~]$ ll /opt/apps/gobblin-sources-0.17.0/
total 243864
-rw-r--r--  1 openuser openuser    157071 Jun 14  2023 CHANGELOG.md
-rw-r--r--  1 openuser openuser       754 Feb  3  2022 HEADER
-rw-r--r--  1 openuser openuser     15210 Feb  3  2022 LICENSE
-rw-r--r--  1 openuser openuser       168 Feb  3  2022 NOTICE
-rw-r--r--  1 openuser openuser      5260 Mar 16  2023 README.md
-rw-rw-r--  1 openuser openuser 249263991 Nov 24 11:39 apache-gobblin-incubating-bin-0.17.0.tar.gz
drwxr-xr-x  2 openuser openuser      4096 Jun 13  2023 bin
drwxrwxr-x 92 openuser openuser      4096 Nov 24 11:40 build
-rw-r--r--  1 openuser openuser      8704 Nov 24 11:35 build.gradle
drwxr-xr-x  5 openuser openuser      4096 Nov 23 17:41 buildSrc
drwxr-xr-x 11 openuser openuser      4096 Feb  3  2022 conf
drwxr-xr-x  3 openuser openuser      4096 Feb  3  2022 config
-rw-r--r--  1 openuser openuser      2038 Jun 14  2023 defaultEnvironment.gradle
drwxr-xr-x  2 openuser openuser      4096 Feb  3  2022 dev
drwxr-xr-x  3 openuser openuser      4096 Jun 13  2023 gobblin-admin
drwxr-xr-x  2 openuser openuser      4096 Feb  3  2022 gobblin-all
drwxr-xr-x  3 openuser openuser      4096 Feb  3  2022 gobblin-api
drwxr-xr-x  3 openuser openuser      4096 Feb  3  2022 gobblin-audit
drwxr-xr-x  3 openuser openuser      4096 Mar 16  2023 gobblin-aws
drwxr-xr-x  3 openuser openuser      4096 Feb  3  2022 gobblin-binary-management
drwxr-xr-x  3 openuser openuser      4096 Jun 13  2023 gobblin-cluster
drwxr-xr-x  4 openuser openuser      4096 Jun 13  2023 gobblin-compaction
drwxr-xr-x  3 openuser openuser      4096 Jun 14  2023 gobblin-completeness
drwxr-xr-x  4 openuser openuser      4096 Feb  3  2022 gobblin-config-management
drwxr-xr-x  3 openuser openuser      4096 Mar 16  2023 gobblin-core
drwxr-xr-x  3 openuser openuser      4096 Dec 17  2022 gobblin-core-base
drwxr-xr-x  4 openuser openuser      4096 Jun 13  2023 gobblin-data-management
drwxr-xr-x  2 openuser openuser      4096 Dec 17  2022 gobblin-distribution
drwxr-xr-x  4 openuser openuser      4096 Feb  3  2022 gobblin-docker
drwxr-xr-x 15 openuser openuser      4096 Feb  3  2022 gobblin-docs
drwxr-xr-x  3 openuser openuser      4096 Feb  3  2022 gobblin-example
-rw-r--r--  1 openuser openuser      1187 Feb  3  2022 gobblin-flavored-build.gradle
drwxr-xr-x  4 openuser openuser      4096 Mar 10  2022 gobblin-hive-registration
drwxr-xr-x  3 openuser openuser      4096 Jun 14  2023 gobblin-iceberg
drwxr-xr-x  2 openuser openuser      4096 Feb  3  2022 gobblin-integration-test-log-dir
drwxr-xr-x  3 openuser openuser      4096 Feb  3  2022 gobblin-kubernetes
drwxr-xr-x  3 openuser openuser      4096 Dec 17  2022 gobblin-metastore
drwxr-xr-x  4 openuser openuser      4096 Feb  3  2022 gobblin-metrics-libs
drwxr-xr-x 34 openuser openuser      4096 Feb  3  2022 gobblin-modules
drwxr-xr-x  3 openuser openuser      4096 Feb  3  2022 gobblin-oozie
drwxr-xr-x  5 openuser openuser      4096 Feb  3  2022 gobblin-rest-service
drwxr-xr-x  5 openuser openuser      4096 Jun 14  2023 gobblin-restli
drwxr-xr-x  3 openuser openuser      4096 Jun 13  2023 gobblin-runtime
drwxr-xr-x  3 openuser openuser      4096 Feb  3  2022 gobblin-runtime-hadoop
drwxr-xr-x  3 openuser openuser      4096 Feb  3  2022 gobblin-salesforce
drwxr-xr-x  3 openuser openuser      4096 Dec 17  2022 gobblin-service
drwxr-xr-x  3 openuser openuser      4096 Feb  3  2022 gobblin-test
drwxr-xr-x  4 openuser openuser      4096 Feb  3  2022 gobblin-test-harness
drwxr-xr-x  4 openuser openuser      4096 Sep 15  2022 gobblin-test-utils
drwxr-xr-x  3 openuser openuser      4096 Feb  3  2022 gobblin-tunnel
drwxr-xr-x  3 openuser openuser      4096 Jun 13  2023 gobblin-utility
drwxr-xr-x  3 openuser openuser      4096 Jun 14  2023 gobblin-yarn
drwxr-xr-x  5 openuser openuser      4096 Jun 14  2023 gradle
-rw-r--r--  1 openuser openuser      1432 Jun 14  2023 gradle.properties
-rw-r--r--  1 openuser openuser      1432 Nov 24 11:38 gradle.properties.release
-rwxr-xr-x  1 openuser openuser      5829 Feb  3  2022 gradlew
-rwxr-xr-x  1 openuser openuser      3240 Feb  3  2022 gradlew.bat
drwxr-xr-x  3 openuser openuser      4096 Feb  3  2022 ligradle
drwxr-xr-x  2 openuser openuser      4096 Feb  3  2022 lumos_value_audit
drwxr-xr-x  2 openuser openuser      4096 Jun 14  2023 maven-nexus
-rw-r--r--  1 openuser openuser      5561 Feb  3  2022 mkdocs.yml
-rwxr-xr-x  1 openuser openuser      2784 Feb  3  2022 query_github_issues.py
-rw-r--r--  1 openuser openuser        18 Feb  3  2022 readthedocs.yml
-rw-r--r--  1 openuser openuser      3404 Mar 10  2022 settings.gradle
[openuser@ecs-hw00 ~]$

3 部署

3.1 解压goblin

[openuser@ecs-hw00 gobblin-sources-0.17.0]$ tar -zxvf apache-gobblin-incubating-bin-0.17.0.tar.gz -C /opt/apps/

3.2 创建作业目录

[openuser@ecs-hw00 gobblin-dist]$ mkdir -p /opt/apps/gobblin-dist/workdir

3.3 gobblin所需环境变量

[openuser@ecs-hw00 gobblin-dist]$ export GOBBLIN_WORK_DIR=/opt/apps/gobblin-dist/workdir
[openuser@ecs-hw00 gobblin-dist]$ export GOBBLIN_FWDIR=/opt/apps/gobblin-dist
[openuser@ecs-hw00 gobblin-dist]$ export GOBBLIN_LOG_DIR=/opt/apps/gobblin-dist/logs
[openuser@ecs-hw00 gobblin-dist]$ export GOBBLIN_JOB_CONFIG_DIR=/opt/apps/gobblin-dist/conf

4 作业示例

4.1 创建GaussBD源表及MySQL目标表

4.2 编写作业配置文件

[openuser@ecs-hw00 gobblin-dist]$ vim workdir/mysql-to-gaussdb.pull

job.name=MySQLToGaussDBJob
job.group=MySQLToGaussDBGroup
job.description=Extract data from MySQL and load into GaussDB.

job.lock.enabled=false
gobblin.scheduler.mode=standalone

# Source 
source.class=gobblin.source.extractor.extract.jdbc.JdbcExtractor
extract.namespace=gobblin.extract.jdbc
extract.table.type=SNAPSHOT_ONLY

# MySQL 
source.connection.driver=com.mysql.cj.jdbc.Driver
source.connection.url=jdbc:mysql://host:3306/db
source.connection.user=
source.connection.password=
source.table=

writer.builder.class=gobblin.writer.SimpleDataWriterBuilder
writer.destination.type=HDFS
writer.output.format=jdbc
writer.connection.driver=org.postgresql.Driver
writer.connection.url=jdbc:postgresql://host:8000/db_all_userbase?currentSchema=public&useUnicode=true&characterEncoding=UTF-8
writer.connection.user=
writer.connection.password=
writer.output.database=
writer.output.table=

writer.batch.size=5000
source.max.number.of.partitions=10

4.3 提交作业

[openuser@ecs-hw00 gobblin-dist]$ bin/gobblin.sh cli run --job-conf-file /opt/apps/gobblin-dist/workdir/mysql-to-gaussdb.pull  --conf-dir /opt/apps/gobblin-dist/conf/standalone -jobName MySQLToGaussDBJob

4.4 观察mysql目标表,数据已写入

5 心得

5.1 如果过程中有缺少jar包,可以手动下载下来,安装到maven中。

5.2 Gradle的仓库配置

 repositories {
    maven {
      url "https://plugins.gradle.org/m2/"
    }
    maven { url 'https://repo.jfrog.org/artifactory/libs-release' }
    maven { url 'https://repo.jfrog.org/artifactory/libs-snapshot' }
    maven { url 'https://repository.apache.org/content/groups/snapshots/' }
    mavenLocal() 
  }

【版权声明】本文为华为云社区用户原创内容,转载时必须标注文章的来源(华为云社区)、文章链接、文章作者等基本信息, 否则作者和本社区有权追究责任。如果您发现本社区中有涉嫌抄袭的内容,欢迎发送邮件进行举报,并提供相关证据,一经查实,本社区将立刻删除涉嫌侵权内容,举报邮箱: cloudbbs@huaweicloud.com
  • 点赞
  • 收藏
  • 关注作者

评论(0

0/1000
抱歉,系统识别当前为高风险访问,暂不支持该操作

全部回复

上滑加载中

设置昵称

在此一键设置昵称,即可参与社区互动!

*长度不超过10个汉字或20个英文字符,设置后3个月内不可修改。

*长度不超过10个汉字或20个英文字符,设置后3个月内不可修改。