手把手教你在华为云编译和使用Apache Impala
【摘要】 这是一个精简版的Apache Impala编译和使用指南,和前一篇的Apache Kudu相同,都是经过作者在华为云平台上实践过的,希望对大数据生态中这两个组件感兴趣的同学起到一定的帮助:)
1 前言
昨天分享了Apache Kudu在华为云上的编译和使用,今天继续选择Apache Impala这个项目,来手把手指导大家从源码开始构建一个本地的Impala集群,同时会预加载1GB规模的tpc-ds和tpc-h的测试集数据,然后进行熟悉的SQL交互查询操作。
因为Impala依赖的组件较多,集群启动的时候会同时启动Hdfs、Kms、Yarn、Hive、HBase、Kudu、Ranger、Impala等组件,所以可能这也是Impala让人望而却步的一个重要原因。
注意,以下操作仍旧只需ctrl+c & ctrl+v 即可:)
2 准备工作
在开始本文之前,建议在华为云购买一台云服务器,同时考虑到后续的顺利操作,云服务器需要有一些要求:
CPU架构:x86计算
规格:c6.2xlarge.4(提高编译速度和内存资源)
镜像:公共镜像,CentOS CentOS 8.0 64bit
系统盘:高IO,100GB
弹性公网:按流量计费(提高下载速度)
3 操作系统
安装软件包
[root@ecs-impala ~]# yum install -y git ant maven.noarch python2.x86_64 python2-devel.x86_64 redhat-rpm-config postgresql postgresql-server lzo-devel cyrus-sasl* krb5-devel.x86_64 krb5-server.x86_64 autoconf automake libtool flex rsync gcc-c++.x86_64 openssl-devel.x86_64
使用python2
[root@ecs-impala ~]# cd /usr/bin
[root@ecs-impala bin]# ln -s python2.7 python
[root@ecs-kudu bin]# ls -lrt python*
lrwxrwxrwx 1 root root 16 Nov 17 2019 python2-config -> python2.7-config
-rwxr-xr-x 1 root root 1846 Nov 17 2019 python2.7-config
lrwxrwxrwx 1 root root 9 Nov 17 2019 python2 -> python2.7
-rwxr-xr-x 1 root root 10760 Nov 17 2019 python2.7
lrwxrwxrwx 1 root root 32 Nov 21 2019 python3.6m -> /usr/libexec/platform-python3.6m
lrwxrwxrwx 1 root root 31 Nov 21 2019 python3.6 -> /usr/libexec/platform-python3.6
lrwxrwxrwx 1 root root 25 Feb 12 10:34 python3 -> /etc/alternatives/python3
lrwxrwxrwx 1 root root 9 Jun 9 19:03 python -> python2.7
免密处理
[root@ecs-impala ~]# ssh-keygen -t rsa
[root@ecs-impala ~]# cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
创建hdfs目录:
[root@ecs-impala ~]# mkdir -p /var/lib/hadoop-hdfs
初始化hive metastore数据库
这里选择postgresql为例,修改配置文件将以下三处`peer`和`ident`改成`trust`,并创建用户和授予权限:
[root@ecs-impala ~]# service postgresql initdb
[root@ecs-impala ~]# vim /var/lib/pgsql/data/pg_hba.conf
# "local" is for Unix domain socket connections only
# "local" is for Unix domain socket connections only
local all all trust
# IPv4 local connections:
host all all 127.0.0.1/32 trust
# IPv6 local connections:
host all all ::1/128 trust
[root@ecs-impala ~]# service postgresql restart
[root@ecs-impala ~]# sudo -iu postgres
[postgres@ecs-impala ~]$ psql
psql (10.6)
Type "help" for help.
postgres=# CREATE ROLE hiveuser LOGIN PASSWORD 'password';
CREATE ROLE
postgres=# ALTER ROLE hiveuser WITH CREATEDB;
ALTER ROLE
postgres=# \q
[postgres@ecs-impala ~]$ exit
[root@ecs-impala ~]# useradd hiveuser
[root@ecs-impala ~]# sudo -iu hiveuser
[hiveuser@ecs-impala ~]$ psql -dpostgres
psql (10.6)
Type "help" for help.
postgres=> create database "HMS_root_impala_cdp" owner hiveuser;
CREATE DATABASE
postgres=> grant all privileges on database "HMS_root_impala_cdp" to hiveuser;
GRANT
postgres=> \q
[hiveuser@ecs-impala ~]$ exit
logout
[root@ecs-impala ~]#
4 编译hadoop-lzo库
[root@ecs-impala ~]# git clone https://github.com/cloudera/hadoop-lzo.git
[root@ecs-impala ~]# cd ~/hadoop-lzo
[root@ecs-impala ~]# export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.252.b09-2.el8_1.x86_64
[root@ecs-impala ~]# ant package
5 编译Impala源码
编译impala源码和加载测试数据部分会非常耗时,基本上是小时级别,所以一定要有耐心,而且中间有可能会失败,需要多试几次-_-||
[root@ecs-impala ~]# git clone https://github.com/apache/impala.git
[root@ecs-impala ~]# cd impala
[root@ecs-impala impala]# ./buildall.sh -noclean -testdata -format_metastore
6 测试验证
等以上编译和测试数据加载完,接下来就可以开心的跑sql了
[root@ecs-impala impala]# source bin/impala-config.sh
...
[root@ecs-impala impala]# impala-shell.sh
Starting Impala Shell with no authentication using Python 2.7.16
Opened TCP connection to localhost.localdomain:21000
Connected to localhost.localdomain:21000
Server version: impalad version 4.0.0-SNAPSHOT DEBUG (build f4f7fb53a48f114f520737af7be2433a5afd03d4)
***********************************************************************************
Welcome to the Impala shell.
(Impala Shell v4.0.0-SNAPSHOT (f4f7fb5) built on Wed Jun 10 14:32:22 CST 2020)
You can run a single query from the command line using the '-q' option.
***********************************************************************************
[localhost.localdomain:21000] default>
[localhost.localdomain:21000] default> show databases;
...
[localhost.localdomain:21000] default>
【声明】本内容来自华为云开发者社区博主,不代表华为云及华为云开发者社区的观点和立场。转载时必须标注文章的来源(华为云社区)、文章链接、文章作者等基本信息,否则作者和本社区有权追究责任。如果您发现本社区中有涉嫌抄袭的内容,欢迎发送邮件进行举报,并提供相关证据,一经查实,本社区将立刻删除涉嫌侵权内容,举报邮箱:
cloudbbs@huaweicloud.com
- 点赞
- 收藏
- 关注作者
评论(0)