Python基础(十一) | 超详细的Pandas库三万字总结(中)

举报
timerring 发表于 2022/10/07 09:33:57 2022/10/07
【摘要】 11.3 数值运算及统计分析1、数据的查看import pandas as pdimport numpy as npdates = pd.date_range(start='2019-01-01', periods=6)df = pd.DataFrame(np.random.randn(6,4), index=dates, columns=["A", "B", "C", "D"])df<s...

11.3 数值运算及统计分析

image-20221002211052367

1、数据的查看

import pandas as pd
import numpy as np

dates = pd.date_range(start='2019-01-01', periods=6)
df = pd.DataFrame(np.random.randn(6,4), index=dates, columns=["A", "B", "C", "D"])
df
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
A B C D
2019-01-01 -0.854043 0.412345 -2.296051 -0.048964
2019-01-02 1.371364 -0.121454 -0.299653 1.095375
2019-01-03 -0.714591 -1.103224 0.979250 0.319455
2019-01-04 -1.397557 0.426008 0.233861 -1.651887
2019-01-05 0.434026 0.459830 -0.095444 1.220302
2019-01-06 -0.133876 0.074500 -1.028147 0.605402

(1)查看前面的行

df.head()    # 默认5行,也可以进行设置
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
A B C D
2019-01-01 -0.854043 0.412345 -2.296051 -0.048964
2019-01-02 1.371364 -0.121454 -0.299653 1.095375
2019-01-03 -0.714591 -1.103224 0.979250 0.319455
2019-01-04 -1.397557 0.426008 0.233861 -1.651887
2019-01-05 0.434026 0.459830 -0.095444 1.220302
df.head(2)
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
A B C D
2019-01-01 -0.854043 0.412345 -2.296051 -0.048964
2019-01-02 1.371364 -0.121454 -0.299653 1.095375

(2)查看后面的行

df.tail()    # 默认5行
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
A B C D
2019-01-02 1.371364 -0.121454 -0.299653 1.095375
2019-01-03 -0.714591 -1.103224 0.979250 0.319455
2019-01-04 -1.397557 0.426008 0.233861 -1.651887
2019-01-05 0.434026 0.459830 -0.095444 1.220302
2019-01-06 -0.133876 0.074500 -1.028147 0.605402
df.tail(3) 
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
A B C D
2019-01-04 -1.397557 0.426008 0.233861 -1.651887
2019-01-05 0.434026 0.459830 -0.095444 1.220302
2019-01-06 -0.133876 0.074500 -1.028147 0.605402

(3)查看总体信息

df.iloc[0, 3] = np.nan
df
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
A B C D
2019-01-01 -0.854043 0.412345 -2.296051 NaN
2019-01-02 1.371364 -0.121454 -0.299653 1.095375
2019-01-03 -0.714591 -1.103224 0.979250 0.319455
2019-01-04 -1.397557 0.426008 0.233861 -1.651887
2019-01-05 0.434026 0.459830 -0.095444 1.220302
2019-01-06 -0.133876 0.074500 -1.028147 0.605402
df.info()
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 6 entries, 2019-01-01 to 2019-01-06
Freq: D
Data columns (total 4 columns):
A    6 non-null float64
B    6 non-null float64
C    6 non-null float64
D    5 non-null float64
dtypes: float64(4)
memory usage: 240.0 bytes

2、Numpy通用函数同样适用于Pandas

(1)向量化运算

x = pd.DataFrame(np.arange(4).reshape(1, 4))
x
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
0 1 2 3
0 0 1 2 3
x+5
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
0 1 2 3
0 5 6 7 8
np.exp(x)
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
0 1 2 3
0 1.0 2.718282 7.389056 20.085537
y = pd.DataFrame(np.arange(4,8).reshape(1, 4))
y
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
0 1 2 3
0 4 5 6 7
x*y
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
0 1 2 3
0 0 5 12 21

(2)矩阵化运算

np.random.seed(42)
x = pd.DataFrame(np.random.randint(10, size=(30, 30)))
x
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
0 1 2 3 4 5 6 7 8 9 ... 20 21 22 23 24 25 26 27 28 29
0 6 3 7 4 6 9 2 6 7 4 ... 4 0 9 5 8 0 9 2 6 3
1 8 2 4 2 6 4 8 6 1 3 ... 2 0 3 1 7 3 1 5 5 9
2 3 5 1 9 1 9 3 7 6 8 ... 6 8 7 0 7 7 2 0 7 2
3 2 0 4 9 6 9 8 6 8 7 ... 0 2 4 2 0 4 9 6 6 8
4 9 9 2 6 0 3 3 4 6 6 ... 9 6 8 6 0 0 8 8 3 8
5 2 6 5 7 8 4 0 2 9 7 ... 2 0 4 0 7 0 0 1 1 5
6 6 4 0 0 2 1 4 9 5 6 ... 5 0 8 5 2 3 3 2 9 2
7 2 3 6 3 8 0 7 6 1 7 ... 3 0 1 0 4 4 6 8 8 2
8 2 2 3 7 5 7 0 7 3 0 ... 1 1 5 2 8 3 0 3 0 4
9 3 7 7 6 2 0 0 2 5 6 ... 4 2 3 2 0 0 4 5 2 8
10 4 7 0 4 2 0 3 4 6 0 ... 5 6 1 9 1 9 0 7 0 8
11 5 6 9 6 9 2 1 8 7 9 ... 6 5 2 8 9 5 9 9 5 0
12 3 9 5 5 4 0 7 4 4 6 ... 0 7 2 9 6 9 4 9 4 6
13 8 4 0 9 9 0 1 5 8 7 ... 5 8 4 0 3 4 9 9 4 6
14 3 0 4 6 9 9 5 4 3 1 ... 6 1 0 3 7 1 2 0 0 2
15 4 2 0 0 7 9 1 2 1 2 ... 6 3 9 4 1 7 3 8 4 8
16 3 9 4 8 7 2 0 2 3 1 ... 8 0 0 3 8 5 2 0 3 8
17 2 8 6 3 2 9 4 4 2 8 ... 6 9 4 2 6 1 8 9 9 0
18 5 6 7 9 8 1 9 1 4 4 ... 3 5 2 5 6 9 9 2 6 2
19 1 9 3 7 8 6 0 2 8 0 ... 4 3 2 2 3 8 1 8 0 0
20 4 5 5 2 6 8 9 7 5 7 ... 3 5 0 8 0 4 3 2 5 1
21 2 4 8 1 9 7 1 4 6 7 ... 0 1 8 2 0 4 6 5 0 4
22 4 5 2 4 6 4 4 4 9 9 ... 1 7 6 9 9 1 5 5 2 1
23 0 5 4 8 0 6 4 4 1 2 ... 8 5 0 7 6 9 2 0 4 3
24 9 7 0 9 0 3 7 4 1 5 ... 3 7 8 2 2 1 9 2 2 4
25 4 1 9 5 4 5 0 4 8 9 ... 9 3 0 7 0 2 3 7 5 9
26 6 7 1 9 7 2 6 2 6 1 ... 0 6 5 9 8 0 3 8 3 9
27 2 8 1 3 5 1 7 7 0 2 ... 8 0 4 5 4 5 5 6 3 7
28 6 8 6 2 2 7 4 3 7 5 ... 1 7 9 2 4 5 9 5 3 2
29 3 0 3 0 0 9 5 4 3 2 ... 1 3 0 4 8 0 8 7 5 6

30 rows × 30 columns

【版权声明】本文为华为云社区用户原创内容,未经允许不得转载,如需转载请自行联系原作者进行授权。如果您发现本社区中有涉嫌抄袭的内容,欢迎发送邮件进行举报,并提供相关证据,一经查实,本社区将立刻删除涉嫌侵权内容,举报邮箱: cloudbbs@huaweicloud.com
  • 点赞
  • 收藏
  • 关注作者

评论(0

0/1000
抱歉,系统识别当前为高风险访问,暂不支持该操作

全部回复

上滑加载中

设置昵称

在此一键设置昵称,即可参与社区互动!

*长度不超过10个汉字或20个英文字符,设置后3个月内不可修改。

*长度不超过10个汉字或20个英文字符,设置后3个月内不可修改。