5-数值运算--数据分析

举报
brucexiaogui 发表于 2021/12/30 00:12:17 2021/12/30
【摘要】 创建DataFrame格式数据,指定他的行索引名称和列索引名称 In [3]: ...

创建DataFrame格式数据,指定他的行索引名称和列索引名称

In [3]:
import pandas as pd
df = pd.DataFrame([[1,2,3],[4,5,6]],index=['a','b'],columns=['A','B','C'])
df
Out[3]:
  A B C
a 1 2 3
b 4 5 6

默认按列求和计算

In [4]:
df.sum()

Out[4]:
A    5
B    7
C    9
dtype: int64

按行求和计算

In [6]:
 
           
df.sum(axis=1)
Out[6]:
a     6
b    15
dtype: int64

根据指定的轴进行计算

In [7]:
 
           
df.sum(axis='columns')
Out[7]:
a     6
b    15
dtype: int64
In [8]:
 
           
df.mean()
Out[8]:
A    2.5
B    3.5
C    4.5
dtype: float64
In [9]:
 
           
df.mean(axis=1)
Out[9]:
a    2.0
b    5.0
dtype: float64
In [10]:
 
           
df.median()
Out[10]:
A    2.5
B    3.5
C    4.5
dtype: float64

二元统计

  • .cov():斜方差
In [11]:
df = pd.read_csv('C:/JupyterWork/data/titanic.csv')
df.head()
Out[11]:
  PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
0 1 0 3 Braund, Mr. Owen Harris male 22.0 1 0 A/5 21171 7.2500 NaN S
1 2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 0 PC 17599 71.2833 C85 C
2 3 1 3 Heikkinen, Miss. Laina female 26.0 0 0 STON/O2. 3101282 7.9250 NaN S
3 4 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 0 113803 53.1000 C123 S
4 5 0 3 Allen, Mr. William Henry male 35.0 0 0 373450 8.0500 NaN S
In [12]:
 
           
df.cov()
Out[12]:
  PassengerId Survived Pclass Age SibSp Parch Fare
PassengerId 66231.000000 -0.626966 -7.561798 138.696504 -16.325843 -0.342697 161.883369
Survived -0.626966 0.236772 -0.137703 -0.551296 -0.018954 0.032017 6.221787
Pclass -7.561798 -0.137703 0.699015 -4.496004 0.076599 0.012429 -22.830196
Age 138.696504 -0.551296 -4.496004 211.019125 -4.163334 -2.344191 73.849030
SibSp -16.325843 -0.018954 0.076599 -4.163334 1.216043 0.368739 8.748734
Parch -0.342697 0.032017 0.012429 -2.344191 0.368739 0.649728 8.661052
Fare 161.883369 6.221787 -22.830196 73.849030 8.748734 8.661052 2469.436846

corr():相关系数

In [13]:
 
           
df.corr()
Out[13]:
  PassengerId Survived Pclass Age SibSp Parch Fare
PassengerId 1.000000 -0.005007 -0.035144 0.036847 -0.057527 -0.001652 0.012658
Survived -0.005007 1.000000 -0.338481 -0.077221 -0.035322 0.081629 0.257307
Pclass -0.035144 -0.338481 1.000000 -0.369226 0.083081 0.018443 -0.549500
Age 0.036847 -0.077221 -0.369226 1.000000 -0.308247 -0.189119 0.096067
SibSp -0.057527 -0.035322 0.083081 -0.308247 1.000000 0.414838 0.159651
Parch -0.001652 0.081629 0.018443 -0.189119 0.414838 1.000000 0.216225
Fare 0.012658 0.257307 -0.549500 0.096067 0.159651 0.216225 1.000000

value_counts(): 统计指定列下各个数值出现的次数,默认降序排序

In [14]:
 
           
df['Age'].value_counts()
Out[14]:
24.00    30
22.00    27
18.00    26
19.00    25
30.00    25
28.00    25
21.00    24
25.00    23
36.00    22
29.00    20
32.00    18
27.00    18
35.00    18
26.00    18
16.00    17
31.00    17
20.00    15
33.00    15
23.00    15
34.00    15
39.00    14
17.00    13
42.00    13
40.00    13
45.00    12
38.00    11
50.00    10
2.00     10
4.00     10
47.00     9
         ..
71.00     2
59.00     2
63.00     2
0.83      2
30.50     2
70.00     2
57.00     2
0.75      2
13.00     2
10.00     2
64.00     2
40.50     2
32.50     2
45.50     2
20.50     1
24.50     1
0.67      1
14.50     1
0.92      1
74.00     1
34.50     1
80.00     1
12.00     1
36.50     1
53.00     1
55.50     1
70.50     1
66.00     1
23.50     1
0.42      1
Name: Age, Length: 88, dtype: int64

### value_counts(): 统计指定列下各个数值出现的次数,设置升序排序

In [15]:
 
           
df['Age'].value_counts(ascending = True)
Out[15]:
0.42      1
23.50     1
66.00     1
70.50     1
55.50     1
53.00     1
36.50     1
12.00     1
80.00     1
34.50     1
74.00     1
0.92      1
14.50     1
0.67      1
24.50     1
20.50     1
45.50     2
32.50     2
40.50     2
64.00     2
10.00     2
13.00     2
0.75      2
57.00     2
70.00     2
30.50     2
0.83      2
63.00     2
59.00     2
71.00     2
         ..
47.00     9
4.00     10
2.00     10
50.00    10
38.00    11
45.00    12
40.00    13
42.00    13
17.00    13
39.00    14
34.00    15
23.00    15
33.00    15
20.00    15
31.00    17
16.00    17
26.00    18
35.00    18
27.00    18
32.00    18
29.00    20
36.00    22
25.00    23
21.00    24
28.00    25
30.00    25
19.00    25
18.00    26
22.00    27
24.00    30
Name: Age, Length: 88, dtype: int64

计算一等舱,二等舱,三等舱分别有多少人

In [16]:
df['Pclass'].value_counts(ascending = True)
Out[16]:
2    184
1    216
3    491
Name: Pclass, dtype: int64

bins: 将数据按照指定的数值进行分组划分

In [19]:
df['Age'].value_counts(ascending = True,bins = 5)
Out[19]:
(64.084, 80.0]       11
(48.168, 64.084]     69
(0.339, 16.336]     100
(32.252, 48.168]    188
(16.336, 32.252]    346
Name: Age, dtype: int64
In [20]:
 
           
df['Age'].count()
Out[20]:
714

help() 显示某个命令使用方法

In [21]:
 
           
print(help(pd.value_counts))
Help on function value_counts in module pandas.core.algorithms:

value_counts(values, sort=True, ascending=False, normalize=False, bins=None, dropna=True)
    Compute a histogram of the counts of non-null values.
    
    Parameters
    ----------
    values : ndarray (1-d)
    sort : boolean, default True
        Sort by values
    ascending : boolean, default False
        Sort in ascending order
    normalize: boolean, default False
        If True then compute a relative histogram
    bins : integer, optional
        Rather than count values, group them into half-open bins,
        convenience for pd.cut, only works with numeric data
    dropna : boolean, default True
        Don't include counts of NaN
    
    Returns
    -------
    value_counts : Series

None
In [ ]:

文章来源: brucelong.blog.csdn.net,作者:Bruce小鬼,版权归原作者所有,如需转载,请联系作者。

原文链接:brucelong.blog.csdn.net/article/details/80720552

【版权声明】本文为华为云社区用户转载文章,如果您发现本社区中有涉嫌抄袭的内容,欢迎发送邮件进行举报,并提供相关证据,一经查实,本社区将立刻删除涉嫌侵权内容,举报邮箱: cloudbbs@huaweicloud.com
  • 点赞
  • 收藏
  • 关注作者

评论(0

0/1000
抱歉,系统识别当前为高风险访问,暂不支持该操作

全部回复

上滑加载中

设置昵称

在此一键设置昵称,即可参与社区互动!

*长度不超过10个汉字或20个英文字符,设置后3个月内不可修改。

*长度不超过10个汉字或20个英文字符,设置后3个月内不可修改。