hive查询常用命令
行mapreduce任务
hadoop jar /opt/data/hadoop-mapreduce-examples-2.8.3.jar wordcount /user/ict01/item.log /user/ict01/output
目录下的jar包 jar包的主类
首先初始化:source /opt/client/bigdata_env
从本地上传文件到hdfs中/ict01下
hadoop dfs -put /data/employee.txt /ict01
在hdfs中查看前5行的文件内容
hadoop dfs -cat /ict01/employee.txt | head 5
hive中创建表
create table employee(
user_id int,
usename string,
dept_id int)
row formate delimited fields teminated by ','
stored as textfile;
查询hive中5条数据
select * from department limit 5
select e.username,d.dept_name,s.salarys
from cx_table_employee e
join cx_table_department d
on e.dept_id = d.dept_id
join cx_table_salary s
on e.user_id = s.userid
从本地源中加载数据到hive表中
load data inpath '/ict01/department.txt' into table department.txt;
创建外部表employee
create external table employee(id int,name string ,subject string,score float) row format delimited fields terminated by ',' stored as textfile ;
create table employee(
user_id int,
username string,
dept_id int)
row format delimited fields terminated by ','
stored as textfile ;
create table department(
dept_id int,
dept_name string)
row format delimited fields terminated by ','
stored as textfile ;
select * from department limit 5;
select * from employee limit 5;
多张表进行内连接操作时,只有所有表中与on条件中相匹配的数据才会显示。例如下面的SQL实现了每个员工所在的部门,employee表和dept表连接,on条件是dept_id,只有dept_id一样的数据才会匹配并显示出来。
执行语句:
select e.username,e.dept_id,d.dept_name,d.dept_id
from employee e join department d
on e.dept_id = d.dept_id
where e.dept_id = 1 ;
where e.dept_id' = 1 and 'd.dept_id' = 1;
select * from cx_stu02 where gender ='male' limit 2;
select e.username,e.dept_id,d.dept_name,d.dept_id
from employee e join department d
on e.dept_id = d.dept_id
limit 5 ;
可以对两张以上的表进行连接操作,下面的SQL语句查询员工的名字、部门名字及其的薪水:
执行语句:
select e.username,d.dept_name,s.salarys
from cx_table_employee e
join cx_table_department d
on e.dept_id = d.dept_id
join cx_table_salary s
on e.user_id = s.userid;
操作命令
数据预处理:剔除数据中任意字段为空值
INSERT OVERWRITE TABLE result01
select * from salary
where userid is not null and dept_id is not null and salarys is not null
剔除identity字段数值在0-100以外的值
INSERT OVERWRITE TABLE result
SELECT * FROM hittable
WHERE identity between 0 and 100 ;
查看总数
select count(*) from item;
查询提供啤酒或葡萄酒的第一项产品(Item01)的总行数
select count(*) as num from item where item='Beer' or item01 = 'Wine';
查询第一次购买产品中最畅销的产品排名(item01)
select item01,count(item01) as num from item
group by item01
order by num desc;
查询牛奶出现在每一行购买供品的概率
select b.num/a.num as rate from(
select count(*) num from item) a,
(select count(*) num from item
where item01=='Milk'
or item02=='Milk'
or item02=='Milk'
or item03=='Milk'
or item04=='Milk') b ;
Query the probability that Milk appears in every line purchased offerings, take a screenshot, and save it as 4.4.1
收集有关出售产品数量的统计数据,并对产品进行分类。分类规则如下
item | total | sort |
---|---|---|
select b.item,b.total,case
when total<800 then 'cold'
when total>=800 and total <1000 then 'warm'
else 'hot'
end as sort from(
select item,sum(num) total from(
select item01 as item,count(item01)as num from item group by item01
union all
select item02 as item,count(item02) as num from item group by item02
union all
select item03 as item,count(item03) as num from item group by item03
union all
select item04 as item,count(item04) as num from item group by item04) a group by item) b;
在Hive查询期间,将前面的查询结果保存为结果表,在结果表中查询数据
create table result as select b.item,b.total,case
when total<800 then 'cold'
when total>=800 and total <1000 then 'warm'
else 'hot'
end as sort from(
select item,sum(num) total from (
select item01 as item,count(item01) as num from item group by item01
union all
select item02 as item,count(item02) as num from item group by item02
union all
select item03 as item,count(item03) as num from item group by item03
union all
select item04 as item,count(item04) as num from item group by item04) a group by item) b;
任务1:相似度检测
查询result表中,碱基对(aliLen)长度介于29800-29900之间的个数,截图并保存为3-1-alilen。查询result表中,按照匹配程度(identity)进行降序排序,显示sav,identity,bitScore这三个字段的前十行数据,截图并保存为3-2-bitscore。
任务2:分组统计
分组统计result表中 gapOpens 的次数并进行降序排序,显示字段包含gapOpens及对应的次数( gapNum),显示前5行数据,截图并保存为3-3-top5。
select * from result where aliLen between 29800 and 29900
select count(*) from result as gapNum group by gapOpens order by gapNum desc limit 5
查询result表中,统计qstart=sstart 且 qend=send的数据个数以及qstart!=sstart或qend!=send的数据个数,截图并保存结果为3-4-start。
查询结果输出形式如下:
equal | num |
---|---|
yes | *** |
no | *** |
select yes as equal ,count(1) as num from result where qstart=sstart and qend=send union all
select no as equal ,count(1) as num from result where qstart!=sstart or qend!=send
查询result 表中,分组统计匹配度在0%-60%,60%-70%,70%-80%,80%-90%,90%-100%,并按照等级个数进行升序排序,其中0%-60%为E等级,60%-70%为D等级,70%-80%为C等级,80%-90%为B等级,90%-100%为A等级,截图并保存为3-5-group。
查询结果输出形式如下(以下根据实际情况显示):
identity | total |
---|---|
A | ** |
B | ** |
C | ** |
D | ** |
E | ** |
select identity,count(1) as total from (
select case when total>=90 and total<100 then ‘A’
when total>=80 and total<90 then ‘B’
when total>=70 and total<80 then ‘C’
when total>=60 and total<90 then ‘D’
else ‘E’
end as identity from result )
group by identity ,order by identity
- 点赞
- 收藏
- 关注作者
评论(0)