- 微信
- 微博
  
  分享文章到微博
- 复制链接
  
  复制链接到剪贴板

Spark如何求解中位数

数据社发表于 2022/09/25 02:21:02 2022/09/25

【摘要】关于求解中位数，我们知道在Python中直接有中位数处理函数（mean)，比如在Python中求解一个中位数，代码很简单。 Python计算中位数 import numpy as np nums =...

关于求解中位数，我们知道在Python中直接有中位数处理函数（mean)，比如在Python中求解一个中位数，代码很简单。

Python计算中位数

import numpy as np

nums = [1.1,2.2,3.3,4.4,5.5,6.6]

  
 
  1
  2
  3

均值

np.mean(nums)

  
 
  1

中位数

np.median(nums)

  
 
  1

在hive中没有直接提供相关的mean函数，但官方提供了两个UDAF，percentile和percentile_approx。

我们看下官方是怎么说的

DOUBLEpercentile(BIGINT col, p)Returns the exact pthpercentile of a
column in the group (does not work with floating point types). p must
be between 0 and 1. NOTE: A true percentile can only be computed for
integer values. Use PERCENTILE_APPROX if your input is non-integral.

arraypercentile(BIGINT col, array(p1[, p2]…))Returns the exact
percentiles p1, p2, … of a column in the group (does not work with
floating point types). pimust be between 0 and 1. NOTE: A true
percentile can only be computed for integer values. Use
PERCENTILE_APPROX if your input is non-integral.

DOUBLEpercentile_approx(DOUBLE col, p [, B])Returns an approximate
pthpercentile of a numeric column (including floating point types) in
the group. The B parameter controls approximation accuracy at the
cost of memory. Higher values yield better approximations, and the
default is 10,000. When the number of distinct values in col is
smaller than B, this gives an exact percentile value.

arraypercentile_approx(DOUBLE col, array(p1[, p2]…) [, B])Same as
above, but accepts and returns an array of percentile values instead
of a single one.

请注意，官方文档上说了一句话：NOTE: A true percentile can only be computed for integer values. UsePERCENTILE_APPROX if your input is non-integral.

也就是说，真正的中位数只能用percentile来计算，输入需要为整数类型，使用percentile_approx（输入为浮点型）计算得到的并不是真正的中位数，也就是所说的近似中位数，经过大量数据验证，有时候这个近似中位数和真正的中位数差别还是很大的。

如何对有小数的数据求取中位数呢？

可以把小数转换为整数，然后再求取中位数（如先✖️乘10000）

sparksql中也是如此求取中位数的，赶快去试一试吧！

文章来源: dataclub.blog.csdn.net，作者：数据社，版权归原作者所有，如需转载，请联系作者。

原文链接：dataclub.blog.csdn.net/article/details/106425075

点赞
收藏
关注作者

0/1000

抱歉，系统识别当前为高风险访问，暂不支持该操作

全部回复

上滑加载中

设置昵称

在此一键设置昵称，即可参与社区互动！

*长度不超过10个汉字或20个英文字符，设置后3个月内不可修改。

确认取消

加入云驻计划，成为创作者

华为云周边好礼
免费体验产品
特殊身份标识
线下官方门票
内部专家零距离
与10000+优质创作者共同成长

立即加入

Spark如何求解中位数

全部回复

设置昵称

关于作者

目录

加入云驻计划，成为创作者

Spark如何求解中位数

全部回复

设置昵称

关于作者

目录

热门推荐查看更多

相关文章

加入云驻计划，成为创作者

相关产品