Hive分析窗口函数(一) SUM,AVG,MIN,MAX
Hive分析窗口函数(一) SUM,AVG,MIN,MAX
数据准备
建表语句:
create table itzkx_t1(
cookieid string,
createtime string, --day
pv int
) row format delimited
fields terminated by ',';
加载数据:
load data local inpath '/export/hivedata/itzkx_t1.dat' into table itzkx_t1;
cookie1,2018-04-10,1
cookie1,2018-04-11,5
cookie1,2018-04-12,7
cookie1,2018-04-13,3
cookie1,2018-04-14,2
cookie1,2018-04-15,4
cookie1,2018-04-16,4
+---------------------+-----------------------+---------------+--+
| itzkx_t1.cookieid | itzkx_t1.createtime | itzkx_t1.pv |
+---------------------+-----------------------+---------------+--+
| cookie1 | 2018-04-10 | 1 |
| cookie1 | 2018-04-11 | 5 |
| cookie1 | 2018-04-12 | 7 |
| cookie1 | 2018-04-13 | 3 |
| cookie1 | 2018-04-14 | 2 |
| cookie1 | 2018-04-15 | 4 |
| cookie1 | 2018-04-16 | 4 |
+---------------------+-----------------------+---------------+--+
开启智能本地模式
SET hive.exec.mode.local.auto=true; SUM(结果和ORDER BY相关,默认为升序)
pv1: 分组内从起点到当前行的pv累积
select cookieid,createtime,pv,
sum(pv) over(partition by cookieid order by createtime) as pv1
from itzkx_t1;
+-----------+-------------+-----+------+--+
| cookieid | createtime | pv | pv1 |
+-----------+-------------+-----+------+--+
| cookie1 | 2018-04-10 | 1 | 1 |
| cookie1 | 2018-04-11 | 5 | 6 |
| cookie1 | 2018-04-12 | 7 | 13 |
| cookie1 | 2018-04-13 | 3 | 16 |
| cookie1 | 2018-04-14 | 2 | 18 |
| cookie1 | 2018-04-15 | 4 | 22 |
| cookie1 | 2018-04-16 | 4 | 26 |
+-----------+-------------+-----+------+--+
7 rows selected (2.949 seconds)
///当前行和前面的行累加
//pv2: 同pv1
select cookieid,createtime,pv,
sum(pv) over(partition by cookieid order by createtime rows between unbounded preceding and current row) as pv2
from itzkx_t1;
+-----------+-------------+-----+------+--+
| cookieid | createtime | pv | pv2 |
+-----------+-------------+-----+------+--+
| cookie1 | 2018-04-10 | 1 | 1 |
| cookie1 | 2018-04-11 | 5 | 6 |
| cookie1 | 2018-04-12 | 7 | 13 |
| cookie1 | 2018-04-13 | 3 | 16 |
| cookie1 | 2018-04-14 | 2 | 18 |
| cookie1 | 2018-04-15 | 4 | 22 |
| cookie1 | 2018-04-16 | 4 | 26 |
+-----------+-------------+-----+------+--+
//pv3: 分组内(cookie1)所有的pv累加
select cookieid,createtime,pv,
sum(pv) over(partition by cookieid) as pv3
from itzkx_t1;
+-----------+-------------+-----+------+--+
| cookieid | createtime | pv | pv3 |
+-----------+-------------+-----+------+--+
| cookie1 | 2018-04-16 | 4 | 26 |
| cookie1 | 2018-04-15 | 4 | 26 |
| cookie1 | 2018-04-14 | 2 | 26 |
| cookie1 | 2018-04-13 | 3 | 26 |
| cookie1 | 2018-04-12 | 7 | 26 |
| cookie1 | 2018-04-11 | 5 | 26 |
| cookie1 | 2018-04-10 | 1 | 26 |
+-----------+-------------+-----+------+--+
//pv4: 分组内当前行+往前3行
select cookieid,createtime,pv,
sum(pv) over(partition by cookieid order by createtime rows between 3 preceding and current row) as pv4
from itzkx_t1;
+-----------+-------------+-----+------+--+
| cookieid | createtime | pv | pv4 |
+-----------+-------------+-----+------+--+
| cookie1 | 2018-04-10 | 1 | 1 |
| cookie1 | 2018-04-11 | 5 | 6 |
| cookie1 | 2018-04-12 | 7 | 13 |
| cookie1 | 2018-04-13 | 3 | 16 |
| cookie1 | 2018-04-14 | 2 | 17 |
| cookie1 | 2018-04-15 | 4 | 16 |
| cookie1 | 2018-04-16 | 4 | 13 |
+-----------+-------------+-----+------+--+
//计算往前三行,往后一行
select cookieid,createtime,pv,
sum(pv) over(partition by cookieid order by createtime rows between 3 preceding and 1 following) as pv5
from itzkx_t1;
+-----------+-------------+-----+------+--+
| cookieid | createtime | pv | pv5 |
+-----------+-------------+-----+------+--+
| cookie1 | 2018-04-10 | 1 | 6 |
| cookie1 | 2018-04-11 | 5 | 13 |
| cookie1 | 2018-04-12 | 7 | 16 |
| cookie1 | 2018-04-13 | 3 | 18 |
| cookie1 | 2018-04-14 | 2 | 21 |
| cookie1 | 2018-04-15 | 4 | 20 |
| cookie1 | 2018-04-16 | 4 | 13 |
+-----------+-------------+-----+------+--+
//当前行到后面的终点所有pv求和
select cookieid,createtime,pv, sum(pv) over(partition by cookieid order by createtime rows between current row and unbounded following) as pv6
from itzkx_t1;
+-----------+-------------+-----+------+--+
| cookieid | createtime | pv | pv6 |
+-----------+-------------+-----+------+--+
| cookie1 | 2018-04-10 | 1 | 26 |
| cookie1 | 2018-04-11 | 5 | 25 |
| cookie1 | 2018-04-12 | 7 | 20 |
| cookie1 | 2018-04-13 | 3 | 13 |
| cookie1 | 2018-04-14 | 2 | 10 |
| cookie1 | 2018-04-15 | 4 | 8 |
| cookie1 | 2018-04-16 | 4 | 4 |
+-----------+-------------+-----+------+--+
+---------------------+-----------------------+---------------+--+
| itzkx_t1.cookieid | itzkx_t1.createtime | itzkx_t1.pv |
+---------------------+-----------------------+---------------+--+
| cookie1 | 2018-04-10 | 1 |
| cookie1 | 2018-04-11 | 5 |
| cookie1 | 2018-04-12 | 7 |
| cookie1 | 2018-04-13 | 3 |
| cookie1 | 2018-04-14 | 2 |
| cookie1 | 2018-04-15 | 4 |
| cookie1 | 2018-04-16 | 4 |
+---------------------+-----------------------+---------------+--+
,如,11号=10号+11号, 12号=10号+11号+12号,
13号=10号+11号+12号+13号, 14号=11号+12号+13号+14号
pv5: 分组内当前行+往前3行+往后1行,如,14号=11号+12号+13号+14号+15号=5+7+3+2+4=21
pv6: 分组内当前行+往后所有行,如,13号=13号+14号+15号+16号=3+2+4+4=13,
14号=14号+15号+16号=2+4+4=10
如果不指定rows between,默认为从起点到当前行;
如果不指定order by,则将分组内所有值累加;
关键是理解rows between含义,也叫做window子句:
preceding:往前
following:往后
current row:当前行
unbounded:起点(边界)
unbounded preceding 表示从前面的起点
unbounded following:表示到后面的终点
AVG,MIN,MAX,和SUM用法一样
求平均值
select cookieid,createtime,pv,
avg(pv) over(partition by cookieid order by createtime rows between unbounded preceding and current row) as pv2
from itzkx_t1;
+---------------------+-----------------------+---------------+--+
| itzkx_t1.cookieid | itzkx_t1.createtime | itzkxt1.pv |
+---------------------+-----------------------+---------------+--+
| cookie1 | 2018-04-10 | 1 |
| cookie1 | 2018-04-11 | 5 |
| cookie1 | 2018-04-12 | 7 |
| cookie1 | 2018-04-13 | 3 |
| cookie1 | 2018-04-14 | 2 |
| cookie1 | 2018-04-15 | 4 |
| cookie1 | 2018-04-16 | 4 |
+---------------------+-----------------------+---------------+--+
7 rows selected (0.159 seconds)
从当前行的数和前面起始行的数这个范围累加求和再求平均数
+-----------+-------------+-----+---------------------+--+
| cookieid | createtime | pv | pv2 |
+-----------+-------------+-----+---------------------+--+
| cookie1 | 2018-04-10 | 1 | 1.0 |
| cookie1 | 2018-04-11 | 5 | 3.0 |
| cookie1 | 2018-04-12 | 7 | 4.333333333333333 |
| cookie1 | 2018-04-13 | 3 | 4.0 |
| cookie1 | 2018-04-14 | 2 | 3.6 |
| cookie1 | 2018-04-15 | 4 | 3.6666666666666665 |
| cookie1 | 2018-04-16 | 4 | 3.7142857142857144 |
+-----------+-------------+-----+---------------------+--+
从当前行和前面的所有相比选最大值
select cookieid,createtime,pv,
max(pv) over(partition by cookieid order by createtime rows between unbounded preceding and current row) as pv2
from itzkx_t1;
+-----------+-------------+-----+------+--+
| cookieid | createtime | pv | pv2 |
+-----------+-------------+-----+------+--+
| cookie1 | 2018-04-10 | 1 | 1 |
| cookie1 | 2018-04-11 | 5 | 5 |
| cookie1 | 2018-04-12 | 7 | 7 |
| cookie1 | 2018-04-13 | 3 | 7 |
| cookie1 | 2018-04-14 | 2 | 7 |
| cookie1 | 2018-04-15 | 4 | 7 |
| cookie1 | 2018-04-16 | 4 | 7 |
+-----------+-------------+-----+------+--+
从当前行和前面的所有相比求最小值
select cookieid,createtime,pv,
min(pv) over(partition by cookieid order by createtime rows between unbounded preceding and current row) as pv2
from itzkx_t1;
+-----------+-------------+-----+------+--+
| cookieid | createtime | pv | pv2 |
+-----------+-------------+-----+------+--+
| cookie1 | 2018-04-10 | 1 | 1 |
| cookie1 | 2018-04-11 | 5 | 1 |
| cookie1 | 2018-04-12 | 7 | 1 |
| cookie1 | 2018-04-13 | 3 | 1 |
| cookie1 | 2018-04-14 | 2 | 1 |
| cookie1 | 2018-04-15 | 4 | 1 |
| cookie1 | 2018-04-16 | 4 | 1 |
+-----------+-------------+-----+------+--+
文章来源: www.jianshu.com,作者:百忍成金的虚竹,版权归原作者所有,如需转载,请联系作者。
原文链接:www.jianshu.com/p/e6ef2512bf62
- 点赞
- 收藏
- 关注作者
评论(0)