Hive分析窗口函数(一) SUM,AVG,MIN,MAX

举报
bigdata张凯翔 发表于 2021/03/26 00:13:23 2021/03/26
【摘要】 Hive分析窗口函数(一) SUM,AVG,MIN,MAX 数据准备 建表语句: create table itzkx_t1( cookieid string, createtime string, --day pv int ) row format delimited fields terminated by ','; 加载数据: load data local i...

Hive分析窗口函数(一) SUM,AVG,MIN,MAX
数据准备
建表语句:

create table itzkx_t1(
cookieid string,
createtime string,   --day 
pv int
) row format delimited 
fields terminated by ',';

加载数据:
load data local inpath '/export/hivedata/itzkx_t1.dat' into table itzkx_t1;

cookie1,2018-04-10,1
cookie1,2018-04-11,5
cookie1,2018-04-12,7
cookie1,2018-04-13,3
cookie1,2018-04-14,2
cookie1,2018-04-15,4
cookie1,2018-04-16,4
+---------------------+-----------------------+---------------+--+
| itzkx_t1.cookieid  | itzkx_t1.createtime  | itzkx_t1.pv  |
+---------------------+-----------------------+---------------+--+
| cookie1 | 2018-04-10 | 1 |
| cookie1 | 2018-04-11 | 5 |
| cookie1 | 2018-04-12 | 7 |
| cookie1 | 2018-04-13 | 3 |
| cookie1 | 2018-04-14 | 2 |
| cookie1 | 2018-04-15 | 4 |
| cookie1 | 2018-04-16 | 4 |
+---------------------+-----------------------+---------------+--+

开启智能本地模式
SET hive.exec.mode.local.auto=true; SUM(结果和ORDER BY相关,默认为升序)

pv1: 分组内从起点到当前行的pv累积
select cookieid,createtime,pv,
sum(pv) over(partition by cookieid order by createtime) as pv1 
from itzkx_t1;
+-----------+-------------+-----+------+--+
| cookieid  | createtime  | pv  | pv1  |
+-----------+-------------+-----+------+--+
| cookie1   | 2018-04-10  | 1   | 1 |
| cookie1   | 2018-04-11  | 5   | 6 |
| cookie1   | 2018-04-12  | 7   | 13   |
| cookie1   | 2018-04-13  | 3   | 16   |
| cookie1   | 2018-04-14  | 2   | 18   |
| cookie1   | 2018-04-15  | 4   | 22   |
| cookie1   | 2018-04-16  | 4   | 26   |
+-----------+-------------+-----+------+--+
7 rows selected (2.949 seconds)

///当前行和前面的行累加
//pv2: 同pv1

select cookieid,createtime,pv,
sum(pv) over(partition by cookieid order by createtime rows between unbounded preceding and current row) as pv2
from itzkx_t1;
+-----------+-------------+-----+------+--+
| cookieid  | createtime  | pv  | pv2  |
+-----------+-------------+-----+------+--+
| cookie1   | 2018-04-10  | 1   | 1 |
| cookie1   | 2018-04-11  | 5   | 6 |
| cookie1   | 2018-04-12  | 7   | 13   |
| cookie1   | 2018-04-13  | 3   | 16   |
| cookie1   | 2018-04-14  | 2   | 18   |
| cookie1   | 2018-04-15  | 4   | 22   |
| cookie1   | 2018-04-16  | 4   | 26   |
+-----------+-------------+-----+------+--+

//pv3: 分组内(cookie1)所有的pv累加

select cookieid,createtime,pv,
sum(pv) over(partition by cookieid) as pv3
from itzkx_t1;
+-----------+-------------+-----+------+--+
| cookieid  | createtime  | pv  | pv3  |
+-----------+-------------+-----+------+--+
| cookie1   | 2018-04-16  | 4   | 26   |
| cookie1   | 2018-04-15  | 4   | 26   |
| cookie1   | 2018-04-14  | 2   | 26   |
| cookie1   | 2018-04-13  | 3   | 26   |
| cookie1   | 2018-04-12  | 7   | 26   |
| cookie1   | 2018-04-11  | 5   | 26   |
| cookie1   | 2018-04-10  | 1   | 26   |
+-----------+-------------+-----+------+--+

//pv4: 分组内当前行+往前3行

select cookieid,createtime,pv,
sum(pv) over(partition by cookieid order by createtime rows between 3 preceding and current row) as pv4
from itzkx_t1;
+-----------+-------------+-----+------+--+
| cookieid  | createtime  | pv  | pv4  |
+-----------+-------------+-----+------+--+
| cookie1   | 2018-04-10  | 1   | 1 |
| cookie1   | 2018-04-11  | 5   | 6 |
| cookie1   | 2018-04-12  | 7   | 13   |
| cookie1   | 2018-04-13  | 3   | 16   |
| cookie1   | 2018-04-14  | 2   | 17   |
| cookie1   | 2018-04-15  | 4   | 16   |
| cookie1   | 2018-04-16  | 4   | 13   |
+-----------+-------------+-----+------+--+

//计算往前三行,往后一行

select cookieid,createtime,pv,
sum(pv) over(partition by cookieid order by createtime rows between 3 preceding and 1 following) as pv5
from itzkx_t1;
+-----------+-------------+-----+------+--+
| cookieid  | createtime  | pv  | pv5  |
+-----------+-------------+-----+------+--+
| cookie1   | 2018-04-10  | 1   | 6 |
| cookie1   | 2018-04-11  | 5   | 13   |
| cookie1   | 2018-04-12  | 7   | 16   |
| cookie1   | 2018-04-13  | 3   | 18   |
| cookie1   | 2018-04-14  | 2   | 21   |
| cookie1   | 2018-04-15  | 4   | 20   |
| cookie1   | 2018-04-16  | 4   | 13   |
+-----------+-------------+-----+------+--+

//当前行到后面的终点所有pv求和

select cookieid,createtime,pv, sum(pv) over(partition by cookieid order by createtime rows between current row and unbounded following) as pv6
from itzkx_t1;
+-----------+-------------+-----+------+--+
| cookieid  | createtime  | pv  | pv6  |
+-----------+-------------+-----+------+--+
| cookie1   | 2018-04-10  | 1   | 26   |
| cookie1   | 2018-04-11  | 5   | 25   |
| cookie1   | 2018-04-12  | 7   | 20   |
| cookie1   | 2018-04-13  | 3   | 13   |
| cookie1   | 2018-04-14  | 2   | 10   |
| cookie1   | 2018-04-15  | 4   | 8 |
| cookie1   | 2018-04-16  | 4   | 4 |
+-----------+-------------+-----+------+--+ 
+---------------------+-----------------------+---------------+--+
| itzkx_t1.cookieid  | itzkx_t1.createtime  | itzkx_t1.pv  |
+---------------------+-----------------------+---------------+--+
| cookie1 | 2018-04-10 | 1 |
| cookie1 | 2018-04-11 | 5 |
| cookie1 | 2018-04-12 | 7 |
| cookie1 | 2018-04-13 | 3 |
| cookie1 | 2018-04-14 | 2 |
| cookie1 | 2018-04-15 | 4 |
| cookie1 | 2018-04-16 | 4 |
+---------------------+-----------------------+---------------+--+

,如,11号=10号+11号, 12号=10号+11号+12号,
13号=10号+11号+12号+13号, 14号=11号+12号+13号+14号
pv5: 分组内当前行+往前3行+往后1行,如,14号=11号+12号+13号+14号+15号=5+7+3+2+4=21
pv6: 分组内当前行+往后所有行,如,13号=13号+14号+15号+16号=3+2+4+4=13,
14号=14号+15号+16号=2+4+4=10
如果不指定rows between,默认为从起点到当前行;
如果不指定order by,则将分组内所有值累加;
关键是理解rows between含义,也叫做window子句:
preceding:往前
following:往后
current row:当前行
unbounded:起点(边界)
unbounded preceding 表示从前面的起点
unbounded following:表示到后面的终点
AVG,MIN,MAX,和SUM用法一样
求平均值

select cookieid,createtime,pv,
avg(pv) over(partition by cookieid order by createtime rows between unbounded preceding and current row) as pv2
from itzkx_t1;
+---------------------+-----------------------+---------------+--+
| itzkx_t1.cookieid  | itzkx_t1.createtime  | itzkxt1.pv  |
+---------------------+-----------------------+---------------+--+
| cookie1 | 2018-04-10 | 1 |
| cookie1 | 2018-04-11 | 5 |
| cookie1 | 2018-04-12 | 7 |
| cookie1 | 2018-04-13 | 3 |
| cookie1 | 2018-04-14 | 2 |
| cookie1 | 2018-04-15 | 4 |
| cookie1 | 2018-04-16 | 4 |
+---------------------+-----------------------+---------------+--+
7 rows selected (0.159 seconds)
从当前行的数和前面起始行的数这个范围累加求和再求平均数
+-----------+-------------+-----+---------------------+--+
| cookieid  | createtime  | pv  | pv2 |
+-----------+-------------+-----+---------------------+--+
| cookie1   | 2018-04-10  | 1   | 1.0 |
| cookie1   | 2018-04-11  | 5   | 3.0 |
| cookie1   | 2018-04-12  | 7   | 4.333333333333333   |
| cookie1   | 2018-04-13  | 3   | 4.0 |
| cookie1   | 2018-04-14  | 2   | 3.6 |
| cookie1   | 2018-04-15  | 4   | 3.6666666666666665  |
| cookie1   | 2018-04-16  | 4   | 3.7142857142857144  |
+-----------+-------------+-----+---------------------+--+
从当前行和前面的所有相比选最大值
select cookieid,createtime,pv,
max(pv) over(partition by cookieid order by createtime rows between unbounded preceding and current row) as pv2
from itzkx_t1;
+-----------+-------------+-----+------+--+
| cookieid  | createtime  | pv  | pv2  |
+-----------+-------------+-----+------+--+
| cookie1   | 2018-04-10  | 1   | 1 |
| cookie1   | 2018-04-11  | 5   | 5 |
| cookie1   | 2018-04-12  | 7   | 7 |
| cookie1   | 2018-04-13  | 3   | 7 |
| cookie1   | 2018-04-14  | 2   | 7 |
| cookie1   | 2018-04-15  | 4   | 7 |
| cookie1   | 2018-04-16  | 4   | 7 |
+-----------+-------------+-----+------+--+

从当前行和前面的所有相比求最小值

select cookieid,createtime,pv,
min(pv) over(partition by cookieid order by createtime rows between unbounded preceding and current row) as pv2
from itzkx_t1;
+-----------+-------------+-----+------+--+
| cookieid  | createtime  | pv  | pv2  |
+-----------+-------------+-----+------+--+
| cookie1   | 2018-04-10  | 1   | 1 |
| cookie1   | 2018-04-11  | 5   | 1 |
| cookie1   | 2018-04-12  | 7   | 1 |
| cookie1   | 2018-04-13  | 3   | 1 |
| cookie1   | 2018-04-14  | 2   | 1 |
| cookie1   | 2018-04-15  | 4   | 1 |
| cookie1   | 2018-04-16  | 4   | 1 |
+-----------+-------------+-----+------+--+

文章来源: www.jianshu.com,作者:百忍成金的虚竹,版权归原作者所有,如需转载,请联系作者。

原文链接:www.jianshu.com/p/e6ef2512bf62

【版权声明】本文为华为云社区用户转载文章,如果您发现本社区中有涉嫌抄袭的内容,欢迎发送邮件进行举报,并提供相关证据,一经查实,本社区将立刻删除涉嫌侵权内容,举报邮箱: cloudbbs@huaweicloud.com
  • 点赞
  • 收藏
  • 关注作者

评论(0

0/1000
抱歉,系统识别当前为高风险访问,暂不支持该操作

全部回复

上滑加载中

设置昵称

在此一键设置昵称,即可参与社区互动!

*长度不超过10个汉字或20个英文字符,设置后3个月内不可修改。

*长度不超过10个汉字或20个英文字符,设置后3个月内不可修改。