hive查询常用命令

举报
没头脑 发表于 2022/06/17 21:40:27 2022/06/17
【摘要】 行mapreduce任务hadoop jar /opt/data/hadoop-mapreduce-examples-2.8.3.jar wordcount /user/ict01/item.log /user/ict01/output 目录下的jar包 jar包的主类首先初始化:source /opt/...

行mapreduce任务

hadoop jar /opt/data/hadoop-mapreduce-examples-2.8.3.jar wordcount /user/ict01/item.log /user/ict01/output
                           目录下的jar包                   jar包的主类
首先初始化:source /opt/client/bigdata_env

从本地上传文件到hdfs中/ict01下
hadoop dfs -put /data/employee.txt /ict01


在hdfs中查看前5行的文件内容
hadoop dfs -cat /ict01/employee.txt | head 5

hive中创建表
create table employee(
user_id int,
usename string,
dept_id int)
row formate delimited fields teminated by ','
stored as textfile;


查询hive中5条数据
select * from department limit 5

select e.username,d.dept_name,s.salarys 
from cx_table_employee e 
join cx_table_department d 
on e.dept_id = d.dept_id 
join cx_table_salary s 
on e.user_id = s.userid


从本地源中加载数据到hive表中
load data inpath '/ict01/department.txt' into table department.txt;

创建外部表employee
create external table employee(id int,name string ,subject string,score  float)  row format delimited fields terminated by ',' stored as textfile ;


create table employee(
user_id int,
username string,
dept_id int)
row format delimited fields terminated by ','  
stored as textfile ;

create table department(
dept_id int,
dept_name string) 
row format delimited fields terminated by ','  
stored as textfile ;

select * from department  limit 5;
select * from  employee  limit 5;

多张表进行内连接操作时,只有所有表中与on条件中相匹配的数据才会显示。例如下面的SQL实现了每个员工所在的部门,employee表和dept表连接,on条件是dept_id,只有dept_id一样的数据才会匹配并显示出来。
执行语句:
select e.username,e.dept_id,d.dept_name,d.dept_id
from employee e join department d 
on e.dept_id = d.dept_id
where e.dept_id = 1 ;

where e.dept_id' = 1 and 'd.dept_id' = 1;

select * from cx_stu02 where gender ='male' limit 2;

select e.username,e.dept_id,d.dept_name,d.dept_id
from employee e join department d 
on e.dept_id = d.dept_id
limit 5 ;



可以对两张以上的表进行连接操作,下面的SQL语句查询员工的名字、部门名字及其的薪水:
执行语句:
select e.username,d.dept_name,s.salarys 
from cx_table_employee e 
join cx_table_department d 
on e.dept_id = d.dept_id 
join cx_table_salary s 
on e.user_id = s.userid;


操作命令

数据预处理:剔除数据中任意字段为空值

INSERT OVERWRITE TABLE result01
	select * from salary
	where userid is not null and dept_id is not null and salarys is not null

剔除identity字段数值在0-100以外的值

INSERT OVERWRITE TABLE result
	SELECT * FROM hittable
	WHERE identity between 0 and 100 ;

查看总数

select count(*) from item;

查询提供啤酒或葡萄酒的第一项产品(Item01)的总行数

select count(*) as num from item where item='Beer' or item01 = 'Wine';

查询第一次购买产品中最畅销的产品排名(item01)

select item01,count(item01) as num from item 
group by item01 
order by num desc;

查询牛奶出现在每一行购买供品的概率

select b.num/a.num as rate from(
select count(*) num from item) a,
(select count(*) num from item
where item01=='Milk' 
or item02=='Milk' 
or item02=='Milk'
or item03=='Milk'
or item04=='Milk') b ;

Query the probability that Milk appears in every line purchased offerings, take a screenshot, and save it as 4.4.1

收集有关出售产品数量的统计数据,并对产品进行分类。分类规则如下

item total sort
select b.item,b.total,case
when total<800 then 'cold'
when total>=800 and total <1000 then 'warm'
else 'hot'
end as sort from(
select item,sum(num) total from(
select item01 as item,count(item01)as num from item group by item01
union all
select item02 as item,count(item02) as num from item group by item02
union all
select item03 as item,count(item03) as num from item group by item03
union all
select item04 as item,count(item04) as num from item group by item04) a group by item) b;

在Hive查询期间,将前面的查询结果保存为结果表,在结果表中查询数据

create table result as select b.item,b.total,case
when total<800 then 'cold'
when total>=800 and total <1000 then 'warm'
else 'hot'
end as sort from(
select item,sum(num) total from (
select item01 as item,count(item01) as num from item group by item01
union all
select item02 as item,count(item02) as num from item group by item02
union all
select item03 as item,count(item03) as num from item group by item03
union all
select item04 as item,count(item04) as num from item group by item04) a group by item) b;

任务1:相似度检测
查询result表中,碱基对(aliLen)长度介于29800-29900之间的个数,截图并保存为3-1-alilen。查询result表中,按照匹配程度(identity)进行降序排序,显示sav,identity,bitScore这三个字段的前十行数据,截图并保存为3-2-bitscore。
任务2:分组统计
分组统计result表中 gapOpens 的次数并进行降序排序,显示字段包含gapOpens及对应的次数( gapNum),显示前5行数据,截图并保存为3-3-top5。

select * from result where aliLen between 29800 and 29900
select count(*) from result as gapNum group by gapOpens order by gapNum desc limit 5

查询result表中,统计qstart=sstart 且 qend=send的数据个数以及qstart!=sstart或qend!=send的数据个数,截图并保存结果为3-4-start。
查询结果输出形式如下:

equal num
yes ***
no ***
select yes as equal ,count(1) as num from result where qstart=sstart and qend=send union all
select no as equal ,count(1) as num  from result where qstart!=sstart or qend!=send

查询result 表中,分组统计匹配度在0%-60%,60%-70%,70%-80%,80%-90%,90%-100%,并按照等级个数进行升序排序,其中0%-60%为E等级,60%-70%为D等级,70%-80%为C等级,80%-90%为B等级,90%-100%为A等级,截图并保存为3-5-group。
查询结果输出形式如下(以下根据实际情况显示):

identity total
A **
B **
C **
D **
E **
select identity,count(1) as total  from (

select  case when total>=90 and total<100 then ‘A’
when total>=80 and total<90 then ‘B’
when total>=70 and total<80 then ‘C’
when total>=60 and total<90 then ‘D’
else ‘E’
end as identity from result )
group by identity ,order by identity
【版权声明】本文为华为云社区用户转载文章,如果您发现本社区中有涉嫌抄袭的内容,欢迎发送邮件进行举报,并提供相关证据,一经查实,本社区将立刻删除涉嫌侵权内容,举报邮箱: cloudbbs@huaweicloud.com
  • 点赞
  • 收藏
  • 关注作者

评论(0

0/1000
抱歉,系统识别当前为高风险访问,暂不支持该操作

全部回复

上滑加载中

设置昵称

在此一键设置昵称,即可参与社区互动!

*长度不超过10个汉字或20个英文字符,设置后3个月内不可修改。

*长度不超过10个汉字或20个英文字符,设置后3个月内不可修改。