大数据物流项目:主题及指标开发之即席查询引擎Impala(分布式内存计算)(十一.五)
09-[掌握]-即席查询之Impala 服务启动及CLI使用
物流项目中,Impala分析引擎使用CM进行安装(impala就是Cloudera开源框架),注意事项:
- 1)、安装目录:
/opt/cloudera/parcels/CDH/lib/impala
- 2)、配置文件:
/etc/impala/conf/
- 3)、日志文件:
/val/log
- step1、启动HDFS分布式系统,实际项目中HDFS肯定HA高可用集群
- step2、启动Hive MetaStore服务
- step3、启动Impala框架服务
- Impalad服务、State Stored服务和Catalogd服务
当上述所有服务启动完成以后,可以打开Impala框架中各个服务提供WEB UI界面
- Impalad 服务界面:http://node2.itcast.cn:25000/
- 2)、Impala StateStored 服务:http://node2.itcast.cn:25010/
-
3)、Impala Catalogd元数据服务:http://node2.itcast.cn:25020/
Impala分析引擎提供客户端给用户,方便编写SQL语句,进行查询分析数据
impala提供命令行:
impala-shell
[root@node2 ~]# impala-shell --help
Usage: impala_shell.py [options]
Options:
-h, --help show this help message and exit
-i IMPALAD, --impalad=IMPALAD
<host:port> of impalad to connect to
[default: node2.itcast.cn:21000]
-b KERBEROS_HOST_FQDN, --kerberos_host_fqdn=KERBEROS_HOST_FQDN
If set, overrides the expected hostname of the
Impalad's kerberos service principal. impala-shell
will check that the server's principal matches this
hostname. This may be used when impalad is configured
to be accessed via a load-balancer, but it is desired
for impala-shell to talk to a specific impalad
directly. [default: none]
-q QUERY, --query=QUERY
Execute a query without the shell [default: none]
-f QUERY_FILE, --query_file=QUERY_FILE
Execute the queries in the query file, delimited by ;.
If the argument to -f is "-", then queries are read
from stdin and terminated with ctrl-d. [default: none]
-k, --kerberos Connect to a kerberized impalad [default: False]
-o OUTPUT_FILE, --output_file=OUTPUT_FILE
If set, query results are written to the given file.
Results from multiple semicolon-terminated queries
will be appended to the same file [default: none]
-B, --delimited Output rows in delimited mode [default: False]
--print_header Print column names in delimited mode when pretty-
printed. [default: False]
--output_delimiter=OUTPUT_DELIMITER
Field delimiter to use for output in delimited mode
[default: \t]
-s KERBEROS_SERVICE_NAME, --kerberos_service_name=KERBEROS_SERVICE_NAME
Service name of a kerberized impalad [default: impala]
-V, --verbose Verbose output [default: True]
-p, --show_profiles Always display query profiles after execution
[default: False]
--quiet Disable verbose output [default: False]
-v, --version Print version information [default: False]
-c, --ignore_query_failure
Continue on query failure [default: False]
-d DEFAULT_DB, --database=DEFAULT_DB
Issues a use database command on startup
[default: none]
-l, --ldap Use LDAP to authenticate with Impala. Impala must be
configured to allow LDAP authentication.
[default: False]
-u USER, --user=USER User to authenticate with. [default: root]
--ssl Connect to Impala via SSL-secured connection
[default: False]
--ca_cert=CA_CERT Full path to certificate file used to authenticate
Impala's SSL certificate. May either be a copy of
Impala's certificate (for self-signed certs) or the
certificate of a trusted third-party CA. If not set,
but SSL is enabled, the shell will NOT verify Impala's
server certificate [default: none]
--config_file=CONFIG_FILE
Specify the configuration file to load options. The
following sections are used: [impala],
[impala.query_options]. Section names are case
sensitive. Specifying this option within a config file
will have no effect. Only specify this as an option in
the commandline. [default: /root/.impalarc]
--history_file=HISTORY_FILE
The file in which to store shell history. This may
also be configured using the IMPALA_HISTFILE
environment variable. [default: ~/.impalahistory]
--live_summary Print a query summary every 1s while the query is
running. [default: False]
--live_progress Print a query progress every 1s while the query is
running. [default: False]
--auth_creds_ok_in_clear
If set, LDAP authentication may be used with an
insecure connection to Impala. WARNING: Authentication
credentials will therefore be sent unencrypted, and
may be vulnerable to attack. [default: none]
--ldap_password_cmd=LDAP_PASSWORD_CMD
Shell command to run to retrieve the LDAP password
[default: none]
--var=KEYVAL Defines a variable to be used within the Impala
session. Can be used multiple times to set different
variables. It must follow the pattern "KEY=VALUE", KEY
starts with an alphabetic character and contains
alphanumeric characters or underscores. [default:
none]
-Q QUERY_OPTIONS, --query_option=QUERY_OPTIONS
Sets the default for a query option. Can be used
multiple times to set different query options. It must
follow the pattern "KEY=VALUE", KEY must be a valid
query option. Valid query options can be listed by
command 'set'. [default: none]
-t CLIENT_CONNECT_TIMEOUT_MS, --client_connect_timeout_ms=CLIENT_CONNECT_TIMEOUT_MS
Timeout in milliseconds after which impala-shell will
time out if it fails to connect to Impala server. Set
to 0 to disable any timeout. [default: 60000]
发现impala-shell命令行执行python脚本:
[root@node2 ~]# impala-shell -i node2.itcast.cn:21000
Starting Impala Shell without Kerberos authentication
Opened TCP connection to node2.itcast.cn:21000
Connected to node2.itcast.cn:21000
Server version: impalad version 3.2.0-cdh6.2.1 RELEASE (build 525e372410dd2ce206e2ad0f21f57cae7380c0cb)
***********************************************************************************
Welcome to the Impala shell.
(Impala Shell v3.2.0-cdh6.2.1 (525e372) built on Wed Sep 11 01:30:44 PDT 2019)
You can change the Impala daemon that you're connected to by using the CONNECT
command.To see how Impala will plan to run your query without actually executing
it, use the EXPLAIN command. You can change the level of detail in the EXPLAIN
output by setting the EXPLAIN_LEVEL query option.
***********************************************************************************
[node2.itcast.cn:21000] default> show databases ;
Query: show databases
+------------------+----------------------------------------------+
| name | comment |
+------------------+----------------------------------------------+
| _impala_builtins | System database for Impala builtin functions |
| default | Default Hive database |
| logistics | ??????? |
+------------------+----------------------------------------------+
Fetched 3 row(s) in 0.52s
[node2.itcast.cn:21000] default> use logistics ;
Query: use logistics
[node2.itcast.cn:21000] logistics>
[node2.itcast.cn:21000] logistics> show tables ;
Query: show tables
+--------------------------------+
| name |
+--------------------------------+
| tbl_address |
| tbl_areas |
| tbl_charge_standard |
| tbl_codes |
| tbl_collect_package |
| tbl_company |
| tbl_company_dot_map |
| tbl_company_transport_route_ma |
| tbl_company_warehouse_map |
| tbl_consumer_address_map |
| tbl_consumer_sender_info |
| tbl_courier |
| tbl_customer |
| tbl_customer_detail |
| tbl_deliver_package |
| tbl_deliver_region |
| tbl_delivery_record |
| tbl_department |
| tbl_dot |
| tbl_dot_transport_tool |
| tbl_dot_transport_tool_detail |
| tbl_driver |
| tbl_emp |
| tbl_emp_info_map |
| tbl_example1 |
| tbl_express_bill |
| tbl_express_bill_detail |
| tbl_express_package |
| tbl_fixed_area |
| tbl_goods_rack |
| tbl_job |
| tbl_out_warehouse |
| tbl_out_warehouse_detail |
| tbl_pkg |
| tbl_postal_standard |
| tbl_push_warehouse |
| tbl_push_warehouse_detail |
| tbl_route |
| tbl_service_evaluation |
| tbl_store_grid |
| tbl_transport_tool |
| tbl_vehicle_monitor |
| tbl_warehouse |
| tbl_warehouse_emp |
| tbl_warehouse_rack_map |
| tbl_warehouse_receipt |
| tbl_warehouse_receipt_detail |
| tbl_warehouse_send_vehicle |
| tbl_warehouse_transport_detail |
| tbl_warehouse_transport_tool |
| tbl_warehouse_vehicle_map |
| tbl_waybill |
| tbl_waybill_detail |
| tbl_waybill_line |
| tbl_waybill_state_record |
| tbl_work_time |
| test |
+--------------------------------+
Fetched 57 row(s) in 0.03s
[node2.itcast.cn:21000] logistics> select * from tbl_express_bill limit 10 ;
Query: select * from tbl_express_bill limit 10
Query submitted at: 2021-03-21 17:21:28 (Coordinator: http://node2.itcast.cn:25000)
Query progress can be monitored at: http://node2.itcast.cn:25000/query_plan?query_id=ed4ee096548edaa5:6380ed8300000000
10-[理解]-即席查询之使用Impala操作Kudu
Impala与Kudu集成,集成时,需要注意:启动相关服务组件和在Impala中创建表映射到Kudu表中。
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-c4iljpwb-1641192152644)(/img/1616318681670.png)]
首先使用impala-shell连接impalad服务,创建数据库:
[root@node2 ~]# impala-shell -i node2.itcast.cn:21000
Starting Impala Shell without Kerberos authentication
Opened TCP connection to node2.itcast.cn:21000
Connected to node2.itcast.cn:21000
Server version: impalad version 3.2.0-cdh6.2.1 RELEASE (build 525e372410dd2ce206e2ad0f21f57cae7380c0cb)
***********************************************************************************
Welcome to the Impala shell.
(Impala Shell v3.2.0-cdh6.2.1 (525e372) built on Wed Sep 11 01:30:44 PDT 2019)
The '-B' command line flag turns off pretty-printing for query results. Use this
flag to remove formatting from results you want to save for later, or to benchmark
Impala.
***********************************************************************************
[node2.itcast.cn:21000] default> create database db_kudu ;
Query: create database db_kudu
+----------------------------+
| summary |
+----------------------------+
| Database has been created. |
+----------------------------+
Fetched 1 row(s) in 2.60s
[node2.itcast.cn:21000] default> use db_kudu ;
Query: use db_kudu
[node2.itcast.cn:21000] db_kudu>
接下来,Impala与Kudu表集成:
- 1)、方式一、管理表集成
- 相当Hive中管理表,删除表时,元数据和数据都被删除
- impala中表删除,kudu中对应表也被删除
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-mvRHbLTa-1641192152645)(img/1613900209137.png)]
[node2.itcast.cn:21000] db_kudu> CREATE TABLE `my_first_table`
> (
> id BIGINT,
> name STRING,
> PRIMARY KEY(id)
> )
> PARTITION BY HASH PARTITIONS 16
> STORED AS KUDU
> TBLPROPERTIES (
> 'kudu.num_tablet_replicas' = '1'
> );
Query: CREATE TABLE `my_first_table`
(
id BIGINT,
name STRING,
PRIMARY KEY(id)
)
PARTITION BY HASH PARTITIONS 16
STORED AS KUDU
TBLPROPERTIES (
'kudu.num_tablet_replicas' = '1'
)
+-------------------------+
| summary |
+-------------------------+
| Table has been created. |
+-------------------------+
Fetched 1 row(s) in 5.73s
[node2.itcast.cn:21000] db_kudu> show tables ;
Query: show tables
+----------------+
| name |
+----------------+
| my_first_table |
+----------------+
Fetched 1 row(s) in 0.04s
[node2.itcast.cn:21000] db_kudu> drop table if exists my_first_table;
Query: drop table if exists my_first_table
+-------------------------+
| summary |
+-------------------------+
| Table has been dropped. |
+-------------------------+
Fetched 1 row(s) in 7.10s
[node2.itcast.cn:21000] db_kudu> show tables ;
Query: show tables
Fetched 0 row(s) in 0.07s
- 2)、方式二、外部表集成
- 相当Hive中外部表,删除表示,仅仅删除元数据
- impala中表删除,kudu中表没有被删除
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-oyZxEt1E-1641192152645)(img/1613900499712.png)]
[node2.itcast.cn:21000] db_kudu> CREATE EXTERNAL TABLE `tbl_emp` STORED AS KUDU
> TBLPROPERTIES(
> 'kudu.table_name' = 'tbl_emp',
> 'kudu.master_addresses' = 'node2.itcast.cn:7051') ;
Query: CREATE EXTERNAL TABLE `tbl_emp` STORED AS KUDU
TBLPROPERTIES(
'kudu.table_name' = 'tbl_emp',
'kudu.master_addresses' = 'node2.itcast.cn:7051')
+-------------------------+
| summary |
+-------------------------+
| Table has been created. |
+-------------------------+
Fetched 1 row(s) in 0.55s
[node2.itcast.cn:21000] db_kudu> show tables ;
Query: show tables
+---------+
| name |
+---------+
| tbl_emp |
+---------+
Fetched 1 row(s) in 0.01s
[node2.itcast.cn:21000] db_kudu>
[node2.itcast.cn:21000] db_kudu> select * from tbl_emp limit 10 ;
Query: select * from tbl_emp limit 10
Query submitted at: 2021-03-21 17:31:49 (Coordinator: http://node2.itcast.cn:25000)
Query progress can be monitored at: http://node2.itcast.cn:25000/query_plan?query_id=bc42a1cbb7be5cb6:aeee39e400000000
+----+---------------------+---------------------+-----------+---------+-----------+--------+-------+---------------------+
| id | cdt | empbirathday | empgender | empname | empnumber | remark | state | udt |
+----+---------------------+---------------------+-----------+---------+-----------+--------+-------+---------------------+
| 1 | 2016-03-11 03:07:36 | 1975-03-30 04:48:53 | 1 | 肥项禹 | 98537 | NULL | 1 | 2016-03-11 03:07:37 |
| 8 | 2016-03-11 03:07:36 | 2006-03-12 06:38:08 | 1 | 昔又香 | 87686 | NULL | 1 | 2016-03-11 03:07:37 |
| 10 | 2016-03-11 03:07:36 | 2004-11-11 15:00:22 | 1 | 哀以松 | 18487 | NULL | 1 | 2016-03-11 03:07:37 |
| 11 | 2016-03-11 03:07:36 | 2009-06-22 08:07:23 | 1 | 聊浦和 | 24470 | NULL | 1 | 2016-03-11 03:07:37 |
| 14 | 2016-03-11 03:07:36 | 2020-01-15 06:42:09 | 1 | 洋冬萱 | 25579 | NULL | 1 | 2016-03-11 03:07:37 |
| 16 | 2016-03-11 03:07:36 | 2001-04-08 00:53:17 | 1 | 席慧云 | 64030 | NULL | 1 | 2016-03-11 03:07:37 |
| 17 | 2016-03-11 03:07:36 | 2017-09-19 19:59:52 | 1 | 干月灵 | 46621 | NULL | 1 | 2016-03-11 03:07:37 |
| 19 | 2016-03-11 03:07:36 | 2010-12-21 16:35:40 | 1 | 钮幻桃 | 35884 | NULL | 1 | 2016-03-11 03:07:37 |
| 27 | 2016-03-11 03:07:36 | 2008-12-06 02:37:57 | 1 | 才雪容 | 76262 | NULL | 1 | 2016-03-11 03:07:37 |
| 29 | 2016-03-11 03:07:36 | 2009-12-27 04:27:34 | 1 | 禹涵桃 | 63525 | NULL | 1 | 2016-03-11 03:07:37 |
+----+---------------------+---------------------+-----------+---------+-----------+--------+-------+---------------------+
Fetched 10 row(s) in 26.04s
[node2.itcast.cn:21000] db_kudu> drop table tbl_emp ;
Query: drop table tbl_emp
+-------------------------+
| summary |
+-------------------------+
| Table has been dropped. |
+-------------------------+
Fetched 1 row(s) in 7.22s
[node2.itcast.cn:21000] db_kudu> show tables ;
Query: show tables
Fetched 0 row(s) in 0.02s
Impala DML操作,包含数据插入、数据更新和数据删除以及数据查询分析。
CREATE TABLE `my_first_table`
(
id BIGINT,
name STRING,
PRIMARY KEY(id)
)
PARTITION BY HASH PARTITIONS 16
STORED AS KUDU
TBLPROPERTIES (
'kudu.num_tablet_replicas' = '1'
);
INSERT INTO my_first_table VALUES (50, "zhangsan");
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-5HlXBUyT-1641192152646)(/img/1616319412797.png)]
[node2.itcast.cn:21000] db_kudu> INSERT INTO my_first_table VALUES (1, "john"), (2, "jane"), (3, "jim");
Query: INSERT INTO my_first_table VALUES (1, "john"), (2, "jane"), (3, "jim")
Query submitted at: 2021-03-21 17:37:07 (Coordinator: http://node2.itcast.cn:25000)
Query progress can be monitored at: http://node2.itcast.cn:25000/query_plan?query_id=974337f26ef055a2:3509d1b900000000
Modified 3 row(s), 0 row error(s) in 0.26s
[node2.itcast.cn:21000] db_kudu>
[node2.itcast.cn:21000] db_kudu> select * from my_first_table ;
Query: select * from my_first_table
Query submitted at: 2021-03-21 17:37:09 (Coordinator: http://node2.itcast.cn:25000)
Query progress can be monitored at: http://node2.itcast.cn:25000/query_plan?query_id=6a4608a12ee92237:9110732f00000000
+----+----------+
| id | name |
+----+----------+
| 50 | zhangsan |
| 2 | jane |
| 3 | jim |
| 1 | john |
+----+----------+
Fetched 4 row(s) in 0.27s
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-etuNkpLa-1641192152646)(/img/1616319469934.png)]
Impala与Kudu集成时,插入数据方式:
1、insert单条记录
2、insert多条记录
3、子查询插入
更新数据:update语句
[node2.itcast.cn:21000] db_kudu>
[node2.itcast.cn:21000] db_kudu> UPDATE my_first_table SET name="xiaowang" where id =1 ;
Query: UPDATE my_first_table SET name="xiaowang" where id =1
Query submitted at: 2021-03-21 17:38:11 (Coordinator: http://node2.itcast.cn:25000)
Query progress can be monitored at: http://node2.itcast.cn:25000/query_plan?query_id=314c908c835d71e3:bb0e700c00000000
Modified 1 row(s), 0 row error(s) in 0.53s
[node2.itcast.cn:21000] db_kudu>
[node2.itcast.cn:21000] db_kudu> select * from my_first_table ;
Query: select * from my_first_table
Query submitted at: 2021-03-21 17:38:14 (Coordinator: http://node2.itcast.cn:25000)
Query progress can be monitored at: http://node2.itcast.cn:25000/query_plan?query_id=c949987ff8a003f6:db504d1a00000000
+----+----------+
| id | name |
+----+----------+
| 50 | zhangsan |
| 2 | jane |
| 3 | jim |
| 1 | xiaowang |
+----+----------+
Fetched 4 row(s) in 0.15s
删除数据:delete语句
[node2.itcast.cn:21000] db_kudu> delete from my_first_table where id =2;
Query: delete from my_first_table where id =2
Query submitted at: 2021-03-21 17:39:35 (Coordinator: http://node2.itcast.cn:25000)
Query progress can be monitored at: http://node2.itcast.cn:25000/query_plan?query_id=5b48f7834e5ca05c:72db5f8f00000000
Modified 1 row(s), 0 row error(s) in 0.42s
[node2.itcast.cn:21000] db_kudu>
[node2.itcast.cn:21000] db_kudu> select * from my_first_table ;
Query: select * from my_first_table
Query submitted at: 2021-03-21 17:39:37 (Coordinator: http://node2.itcast.cn:25000)
Query progress can be monitored at: http://node2.itcast.cn:25000/query_plan?query_id=d04152da32a15d13:3397d4fa00000000
+----+----------+
| id | name |
+----+----------+
| 50 | zhangsan |
| 3 | jim |
| 1 | xiaowang |
+----+----------+
Fetched 3 row(s) in 0.13s
11-[理解]-即席查询之Hue集成Impala
Hue与Impala集成,为客户提供可视化WEB UI界面,编写SQL语句分析数据,企业中常用方式。
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-V7sBIkhK-1641192152646)(/img/1616319782217.png)]
Hue安装采用CM进行安装,Hue框架也是Cloudera公司开源,页面启动Hue服务。
==Hue安装就可以配置与impala集成,以及其他框架集成,启动时比较慢。==
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-NxPnXZTx-1641192152646)(/img/1616319892724.png)]
当Hue服务启动以后,可以访问地址:http://node2.itcast.cn:8888/、http://node2.itcast.cn:8889/,
admin/admin
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-5ZCrkoZr-1641192152647)(/img/1616320101950.png)]
登录进入Hue界面以后,可以选择底层分析引擎:要么是impala(内存)、要么是hive
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-EmzyDVod-1641192152647)(/img/1616320741553.png)]
当选择impala引擎以后:
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-oNds4HCY-1641192152647)(/img/1616320830250.png)]
点击查询按钮【向右箭头】即可。
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-RBtHDc9z-1641192152648)(/img/1616320928054.png)]
可以对编写SQL查给一个名称和描述,下次可以直接 运行SQL即可。
Hue配置与大数据框架集成时,页面地址:http://node2.itcast.cn:7180/cmf/services/29/config
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-rqQkhYS8-1641192152648)(/img/1616321021806.png)]
12-[理解]-即席查询之JDBC Client 连接
使用Impala分析数据时,可以使用如下三种方式:Hue界面、impala-shell命令行和JDBC/ODBC连接(BI工具)
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-z3KzOmOt-1641192152648)(/img/1614041952644.png)]
可以通过
JDBC
方式连接Impalad服务,端口号:21050
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-BG45m2Ay-1641192152648)(/img/1616376602336.png)]
使用数据库Client工具:DBeave连接Impala,需要JDBC驱动:
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-2ceFtPEG-1641192152649)(/img/1616376670241.png)]
在DBeave中创建Impala连接:
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-tYZNfSz8-1641192152649)(/img/1616376773788.png)]
使用DBeave连接Impalad服务以后,进行查询分析数据
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-6KcPlDMb-1641192152649)(/img/1616376882974.png)]
- 点赞
- 收藏
- 关注作者
评论(0)