- 微信
- 微博
  
  分享文章到微博
- 复制链接
  
  复制链接到剪贴板

11-pandas常用操作--数据分析

brucexiaogui 发表于 2021/12/30 01:30:40 2021/12/30

【摘要】 pandas常用操作 In [3]: ...

pandas常用操作

     In [3]: 
   

 
            import pandas as pd 
            data = pd.DataFrame({'group':['a','a','a','b','b','b','c','c','c'], 
                               'data':[4,3,2,1,12,3,4,5,7]}) 
            data 
           

       Out[3]: 
     

	group	data
0	a	4
1	a	3
2	a	2
3	b	1
4	b	12
5	b	3
6	c	4
7	c	5
8	c	7

数据排序

sort_values(by=['group','data'])表示按照什么字段排序 ascending 属性表示按照升序排序或者按照降序排序

     In [5]: 
   

 
            data.sort_values(by=['group','data'],ascending=[False,True],inplace=True) 
            data

       Out[5]: 
     

	group	data
6	c	4
7	c	5
8	c	7
3	b	1
5	b	3
4	b	12
2	a	2
1	a	3
0	a	4

     In [7]: 
   

 
            data = pd.DataFrame({'k1':['one']*3+['two']*4, 
                               'k2':[3,2,1,3,3,4,4]}) 
            data

       Out[7]: 
     

	k1	k2
0	one	3
1	one	2
2	one	1
3	two	3
4	two	3
5	two	4
6	two	4

     In [8]: 
   

 
            data.sort_values(by='k2')

       Out[8]: 
     

	k1	k2
2	one	1
1	one	2
0	one	3
3	two	3
4	two	3
5	two	4
6	two	4

drop_duplicates() 去掉重复的数据

下面的结果显示将k2最后4的两条two重复的数据去除了

     In [9]: 
   

data.drop_duplicates()

       Out[9]: 
     

	k1	k2
0	one	3
1	one	2
2	one	1
3	two	3
5	two	4

     In [10]: 
   

 
            data.drop_duplicates('k1')

       Out[10]: 
     

	k1	k2
0	one	3
3	two	3

     In [11]: 
   

 
            data.drop_duplicates('k2')

       Out[11]: 
     

	k1	k2
0	one	3
1	one	2
2	one	1
5	two	4

     In [21]: 
   

 
            data = pd.DataFrame({'food':['A1','A2','B1','B2','B3','C1','C2'],'data':[1,2,3,4,5,6,7]}) 
            data

       Out[21]: 
     

	food	data
0	A1	1
1	A2	2
2	B1	3
3	B2	4
4	B3	5
5	C1	6
6	C2	7

apply() 数据聚合运算，可以很方便的对分组进行现有的运算和自定义的运算

     In [22]: 
   

 
            def food_map(series): 
                if series['food'] == 'A1': 
                    return 'A' 
                elif series['food'] == 'A2': 
                    return 'A' 
                elif series['food'] == 'B1': 
                    return 'B' 
                elif series['food'] == 'B2': 
                    return 'B' 
                elif series['food'] == 'B3': 
                    return 'B' 
                elif series['food'] == 'C1': 
                    return 'C' 
                elif series['food'] == 'C2': 
                    return 'C' 
            data['food_map'] = data.apply(food_map,axis = 'columns') 
            data 
           

       Out[22]: 
     

	food	data	food_map
0	A1	1	A
1	A2	2	A
2	B1	3	B
3	B2	4	B
4	B3	5	B
5	C1	6	C
6	C2	7	C

     In [23]: 
   

 
            food2Upper = { 
                'A1':'A', 
                'A2':'A', 
                'B1':'B', 
                'B2':'B', 
                'B3':'B', 
                'C1':'C', 
                'C2':'C' 
            } 
            data['upper'] = data['food'].map(food2Upper) 
            data 
           

       Out[23]: 
     

	food	data	food_map	upper
0	A1	1	A	A
1	A2	2	A	A
2	B1	3	B	B
3	B2	4	B	B
4	B3	5	B	B
5	C1	6	C	C
6	C2	7	C	C

     In [31]: 
   

 
            import numpy as np 
            df = pd.DataFrame({'data1':np.random.randn(5), 
                              'data2':np.random.randn(5)}) 
            df2 = df.assign(ration = df['data1']/df['data2']) 
            df2 
           

       Out[31]: 
     

	data1	data2	ration
0	-0.174477	-1.529914	0.114044
1	-1.010451	0.548382	-1.842603
2	-0.455647	0.825538	-0.551939
3	-0.183391	-0.474148	0.386779
4	1.002723	-0.270130	-3.711996

     In [29]: 
   

 
            df2.drop('ration',axis='columns',inplace=True) 
           

     In [30]: 
   

df2

       Out[30]: 
     

	data1	data2
0	0.653051	0.276822
1	-2.282018	-0.070788
2	-0.256115	-0.163071
3	-0.316628	-1.660694
4	0.438870	1.090236

替换数据 replace()

     In [33]: 
   

 
            data = pd.Series([1,2,3,4,5,6,7,8,9]) 
            data

       Out[33]: 
     

0    1
1    2
2    3
3    4
4    5
5    6
6    7
7    8
8    9
dtype: int64

     In [34]: 
   

 
            data.replace(9,np.nan,inplace=True)

     In [35]: 
   

data

       Out[35]: 
     

0    1.0
1    2.0
2    3.0
3    4.0
4    5.0
5    6.0
6    7.0
7    8.0
8    NaN
dtype: float64

bins 数据分组规则 cut()数据分组

     In [41]: 
   

 
            ages = [13,23,34,45,56,67,79] 
            bins = [10,40,80] 
            bins_res = pd.cut(ages,bins) 
            bins_res 
           

       Out[41]: 
     

[(10, 40], (10, 40], (10, 40], (40, 80], (40, 80], (40, 80], (40, 80]]
Categories (2, interval[int64]): [(10, 40] < (40, 80]]

value_counts() 统计数据分组后每组的个数

     In [43]: 
   

 
            pd.value_counts(bins_res)

       Out[43]: 
     

(40, 80]    4
(10, 40]    3
dtype: int64

     In [44]: 
   

 
            pd.cut(ages,[10,30,50,80])

       Out[44]: 
     

[(10, 30], (10, 30], (30, 50], (30, 50], (50, 80], (50, 80], (50, 80]]
Categories (3, interval[int64]): [(10, 30] < (30, 50] < (50, 80]]

     In [47]: 
   

 
            group_names =['Yonth','Mille','Old'] 
            pd.cut(ages,[10,20,50,80],labels=group_names)

       Out[47]: 
     

[Yonth, Mille, Mille, Mille, Old, Old, Old]
Categories (3, object): [Yonth < Mille < Old]

     In [48]: 
   

 
            group_names =['Yonth','Mille','Old'] 
            pd.value_counts(pd.cut(ages,[10,20,50,80],labels=group_names))

       Out[48]: 
     

Old      3
Mille    3
Yonth    1
dtype: int64

     In [49]: 
   

 
            df = pd.DataFrame([range(3),[0,np.nan,0],[0,0,np.nan],range(3)]) 
            df

       Out[49]: 
     

	1	2
0	1.0	2.0
1	NaN	0.0
2	0.0	NaN
3	1.0	2.0

isnull() 查看数据中是否有空值。any是按行查看空值，axis=1 是按列查看

     In [50]: 
   

df.isnull()

       Out[50]: 
     

	0	1	2
0	False	False	False
1	False	True	False
2	False	False	True
3	False	False	False

     In [51]: 
   

 
            df.isnull().any()

       Out[51]: 
     

0    False
1     True
2     True
dtype: bool

     In [53]: 
   

 
            df.isnull().any(axis=1) 
           

       Out[53]: 
     

0    False
1     True
2     True
3    False
dtype: bool

fillna() 检查是否有缺失值，如果有可以将填充值写在fillna()的参数中。

fillna（5）发现缺失值用5填充

     In [54]: 
   

 
            df.fillna(5) 
           

       Out[54]: 
     

	1	2
0	1.0	2.0
1	5.0	0.0
2	0.0	5.0
3	1.0	2.0

     In [55]: 
   

 
            df[df.isnull().any(axis = 1)]

       Out[55]: 
     

	0	1	2
1	0	NaN	0.0
2	0	0.0	NaN

     In [ ]: 
   

文章来源: brucelong.blog.csdn.net，作者：Bruce小鬼，版权归原作者所有，如需转载，请联系作者。

原文链接：brucelong.blog.csdn.net/article/details/80763617

点赞
收藏
关注作者

0/1000

抱歉，系统识别当前为高风险访问，暂不支持该操作

全部回复

上滑加载中

设置昵称

在此一键设置昵称，即可参与社区互动！

*长度不超过10个汉字或20个英文字符，设置后3个月内不可修改。

确认取消

加入云驻计划，成为创作者

华为云周边好礼
免费体验产品
特殊身份标识
线下官方门票
内部专家零距离
与10000+优质创作者共同成长

立即加入

11-pandas常用操作--数据分析

pandas常用操作

数据排序

drop_duplicates() 去掉重复的数据

apply() 数据聚合运算，可以很方便的对分组进行现有的运算和自定义的运算

替换数据 replace()

bins 数据分组规则 cut()数据分组

value_counts() 统计数据分组后每组的个数

isnull() 查看数据中是否有空值。any是按行查看空值，axis=1 是按列查看

fillna() 检查是否有缺失值，如果有可以将填充值写在fillna()的参数中。

全部回复

设置昵称

关于作者

目录

加入云驻计划，成为创作者

11-pandas常用操作--数据分析

pandas常用操作

数据排序

drop_duplicates() 去掉重复的数据

apply() 数据聚合运算，可以很方便的对分组进行现有的运算和自定义的运算

替换数据 replace()

bins 数据分组规则 cut()数据分组

value_counts() 统计数据分组后每组的个数

isnull() 查看数据中是否有空值。any是按行查看空值，axis=1 是按列查看

fillna() 检查是否有缺失值，如果有可以将填充值写在fillna()的参数中。

全部回复

设置昵称

关于作者

目录

热门推荐查看更多

相关文章

加入云驻计划，成为创作者

相关产品