- 微信
- 微博
  
  分享文章到微博
- 复制链接
  
  复制链接到剪贴板

如何选择数据

梦笔生花发表于 2022/11/09 21:40:11 2022/11/09

【摘要】 1. Series1.1 索引Series 对象索引的工作原理和 ndarray 对象索引非常类似，不同的一点是，在对 Series 对象进行索引时，我们不但可以使用整数还可以使用 Series 对象本身的索引，举几个例子。单个元素：import pandas as pdmy_series = pd.Series([4, -7, 6, -5, 3, 2], index=["a", "b", ...

1. Series

1.1 索引

Series 对象索引的工作原理和 ndarray 对象索引非常类似，不同的一点是，在对 Series 对象进行索引时，我们不但可以使用整数还可以使用 Series 对象本身的索引，举几个例子。单个元素：

import pandas as pd

my_series = pd.Series([4, -7, 6, -5, 3, 2], index=["a", "b", "c", "d", "e", "f"])

print(my_series[3])
print(my_series['d'])

在上面的代码中，my_series[3] 和 my_series['d'] 访问的是同一个元素，一个使用的是整数 3，一个使用的是 Series 对象本身的索引 d。

多个元素：

import pandas as pd

my_series = pd.Series([4, -7, 6, -5, 3, 2], index=["a", "b", "c", "d", "e", "f"])

print(my_series[[1, 3, 4]])
print(my_series[['b', 'd', 'e']])

切片：

import pandas as pd

my_series = pd.Series([4, -7, 6, -5, 3, 2], index=["a", "b", "c", "d", "e", "f"])

print(my_series[0:5])
print(my_series['a':'e'])

使用整数进行切片和使用 Series 对象的索引进行切片不同的地方是，使用整数切片不包含最末的那个元素，而使用对象的索引进行切片时是包含最末的那个元素的。

布尔索引：

import pandas as pd

my_series = pd.Series([4, -7, 6, -5, 3, 2], index=["a", "b", "c", "d", "e", "f"])

print(my_series[my_series > my_series.median()])

上面代码的输出结果是大于平均值的元素。

1.2 像字典一样选择数据

import pandas as pd
import numpy as np

my_series = pd.Series([4, -7, 6, -5, 3, 2], index=["a", "b", "c", "d", "e", "f"])

print(my_series['b'])
print(my_series.get('b', np.NaN))

判断 key 是否在 Series 里：

import pandas as pd
import numpy as np

my_series = pd.Series([4, -7, 6, -5, 3, 2], index=["a", "b", "c", "d", "e", "f"])

print('e' in my_series)
print('g' in my_series)

1.3 loc 或 iloc

loc 和 iloc 的区别是，loc 使用的是轴标签，iloc 使用的是整数。

单个元素：

import pandas as pd
import numpy as np

my_series = pd.Series([4, -7, 6, -5, 3, 2], index=["a", "b", "c", "d", "e", "f"])

print(my_series.iloc[1])
print(my_series.loc['b'])

多个元素：

import pandas as pd
import numpy as np

my_series = pd.Series([4, -7, 6, -5, 3, 2], index=["a", "b", "c", "d", "e", "f"])

print(my_series.iloc[[1, 3]])
print(my_series.loc[['b', 'd']])

切片：

import pandas as pd
import numpy as np

my_series = pd.Series([4, -7, 6, -5, 3, 2], index=["a", "b", "c", "d", "e", "f"])

print(my_series.iloc[1:5])
print(my_series.loc['b': 'e'])

2. DataFrame

2.1 索引

使用索引的方法来访问 DataFrame 时，得到的是一列或多列。单列的访问：

import pandas as pd

d = {
    "Open": pd.Series([136, 137, 140, 143, 141, 142], index=['2021-07-01', '2021-07-02', '2021-07-06', '2021-07-07', '2021-07-08', '2021-07-09']),
    "High": pd.Series([137, 140, 143, 144, 144, 145], index=['2021-07-01', '2021-07-02', '2021-07-06', '2021-07-07', '2021-07-08', '2021-07-09']),
    "Low": pd.Series([135, 137, 140, 142, 140, 142], index=['2021-07-01', '2021-07-02', '2021-07-06', '2021-07-07', '2021-07-08', '2021-07-09']),
    "Close": pd.Series([137, 139, 142, 144, 143, 145], index = ['2021-07-01', '2021-07-02', '2021-07-06', '2021-07-07', '2021-07-08', '2021-07-09'])
}

df = pd.DataFrame(d)
print(df['Open'])

上面的单列访问我们还可以使用另外一种方法，例如：

import pandas as pd

d = {
    "Open": pd.Series([136, 137, 140, 143, 141, 142], index=['2021-07-01', '2021-07-02', '2021-07-06', '2021-07-07', '2021-07-08', '2021-07-09']),
    "High": pd.Series([137, 140, 143, 144, 144, 145], index=['2021-07-01', '2021-07-02', '2021-07-06', '2021-07-07', '2021-07-08', '2021-07-09']),
    "Low": pd.Series([135, 137, 140, 142, 140, 142], index=['2021-07-01', '2021-07-02', '2021-07-06', '2021-07-07', '2021-07-08', '2021-07-09']),
    "Close": pd.Series([137, 139, 142, 144, 143, 145], index = ['2021-07-01', '2021-07-02', '2021-07-06', '2021-07-07', '2021-07-08', '2021-07-09'])
}

df = pd.DataFrame(d)

print(df.Open)

上面代码采用的方法是类似于属性的访问方法。多列的访问：

import pandas as pd

d = {
    "Open": pd.Series([136, 137, 140, 143, 141, 142], index=['2021-07-01', '2021-07-02', '2021-07-06', '2021-07-07', '2021-07-08', '2021-07-09']),
    "High": pd.Series([137, 140, 143, 144, 144, 145], index=['2021-07-01', '2021-07-02', '2021-07-06', '2021-07-07', '2021-07-08', '2021-07-09']),
    "Low": pd.Series([135, 137, 140, 142, 140, 142], index=['2021-07-01', '2021-07-02', '2021-07-06', '2021-07-07', '2021-07-08', '2021-07-09']),
    "Close": pd.Series([137, 139, 142, 144, 143, 145], index = ['2021-07-01', '2021-07-02', '2021-07-06', '2021-07-07', '2021-07-08', '2021-07-09'])
}

df = pd.DataFrame(d)
print(df[['Open', 'Close']])

布尔索引：

import pandas as pd

d = {
    "Open": pd.Series([136, 137, 140, 143, 141, 142], index=['2021-07-01', '2021-07-02', '2021-07-06', '2021-07-07', '2021-07-08', '2021-07-09']),
    "High": pd.Series([137, 140, 143, 144, 144, 145], index=['2021-07-01', '2021-07-02', '2021-07-06', '2021-07-07', '2021-07-08', '2021-07-09']),
    "Low": pd.Series([135, 137, 140, 142, 140, 142], index=['2021-07-01', '2021-07-02', '2021-07-06', '2021-07-07', '2021-07-08', '2021-07-09']),
    "Close": pd.Series([137, 139, 142, 144, 143, 145], index = ['2021-07-01', '2021-07-02', '2021-07-06', '2021-07-07', '2021-07-08', '2021-07-09'])
}

df = pd.DataFrame(d)
print(df[df['Open'] > 140])

上面代码中，Open 列的值大于 140 的为后面三行，所以最后得到整个 DataFrame 的后面三行。使用布尔索引，我们还可以得到整个 DataFrame 中大于 140 的元素，例如：

import pandas as pd

d = {
    "Open": pd.Series([136, 137, 140, 143, 141, 142], index=['2021-07-01', '2021-07-02', '2021-07-06', '2021-07-07', '2021-07-08', '2021-07-09']),
    "High": pd.Series([137, 140, 143, 144, 144, 145], index=['2021-07-01', '2021-07-02', '2021-07-06', '2021-07-07', '2021-07-08', '2021-07-09']),
    "Low": pd.Series([135, 137, 140, 142, 140, 142], index=['2021-07-01', '2021-07-02', '2021-07-06', '2021-07-07', '2021-07-08', '2021-07-09']),
    "Close": pd.Series([137, 139, 142, 144, 143, 145], index = ['2021-07-01', '2021-07-02', '2021-07-06', '2021-07-07', '2021-07-08', '2021-07-09'])
}

df = pd.DataFrame(d)
print(df[df > 140])

上面的代码得到了整个 DataFrame 中值大于 140 的元素，不大于 140 的元素用 NaN 填充。

2.2 切片

import pandas as pd

d = {
    "Open": pd.Series([136, 137, 140, 143, 141, 142], index=['2021-07-01', '2021-07-02', '2021-07-06', '2021-07-07', '2021-07-08', '2021-07-09']),
    "High": pd.Series([137, 140, 143, 144, 144, 145], index=['2021-07-01', '2021-07-02', '2021-07-06', '2021-07-07', '2021-07-08', '2021-07-09']),
    "Low": pd.Series([135, 137, 140, 142, 140, 142], index=['2021-07-01', '2021-07-02', '2021-07-06', '2021-07-07', '2021-07-08', '2021-07-09']),
    "Close": pd.Series([137, 139, 142, 144, 143, 145], index = ['2021-07-01', '2021-07-02', '2021-07-06', '2021-07-07', '2021-07-08', '2021-07-09'])
}

df = pd.DataFrame(d)

print(df[:3])
print(df[::2])
print(df[::-1])
print(df[1:2])

上面的切片都是针对行来的。

2.3 loc 或 iloc

2.3.1 单行或单列的访问

import pandas as pd

d = {
    "Open": pd.Series([136, 137, 140, 143, 141, 142], index=['2021-07-01', '2021-07-02', '2021-07-06', '2021-07-07', '2021-07-08', '2021-07-09']),
    "High": pd.Series([137, 140, 143, 144, 144, 145], index=['2021-07-01', '2021-07-02', '2021-07-06', '2021-07-07', '2021-07-08', '2021-07-09']),
    "Low": pd.Series([135, 137, 140, 142, 140, 142], index=['2021-07-01', '2021-07-02', '2021-07-06', '2021-07-07', '2021-07-08', '2021-07-09']),
    "Close": pd.Series([137, 139, 142, 144, 143, 145], index = ['2021-07-01', '2021-07-02', '2021-07-06', '2021-07-07', '2021-07-08', '2021-07-09'])
}

df = pd.DataFrame(d)
print(df.iloc[0])
print(df.loc['2021-07-01'])
print(df.iloc[:, 0])
print(df.loc[:, 'Open'])

2.3.2 多行或多列的访问

import pandas as pd

d = {
    "Open": pd.Series([136, 137, 140, 143, 141, 142], index=['2021-07-01', '2021-07-02', '2021-07-06', '2021-07-07', '2021-07-08', '2021-07-09']),
    "High": pd.Series([137, 140, 143, 144, 144, 145], index=['2021-07-01', '2021-07-02', '2021-07-06', '2021-07-07', '2021-07-08', '2021-07-09']),
    "Low": pd.Series([135, 137, 140, 142, 140, 142], index=['2021-07-01', '2021-07-02', '2021-07-06', '2021-07-07', '2021-07-08', '2021-07-09']),
    "Close": pd.Series([137, 139, 142, 144, 143, 145], index = ['2021-07-01', '2021-07-02', '2021-07-06', '2021-07-07', '2021-07-08', '2021-07-09'])
}

df = pd.DataFrame(d)
print(df.iloc[[0, 1, 2]])
print(df.iloc[:, [0, 1, 2]])
print(df.loc[:, ['Open', 'High', 'Low']])

2.3.3 位于行列交叉的元素

import pandas as pd

d = {
    "Open": pd.Series([136, 137, 140, 143, 141, 142], index=['2021-07-01', '2021-07-02', '2021-07-06', '2021-07-07', '2021-07-08', '2021-07-09']),
    "High": pd.Series([137, 140, 143, 144, 144, 145], index=['2021-07-01', '2021-07-02', '2021-07-06', '2021-07-07', '2021-07-08', '2021-07-09']),
    "Low": pd.Series([135, 137, 140, 142, 140, 142], index=['2021-07-01', '2021-07-02', '2021-07-06', '2021-07-07', '2021-07-08', '2021-07-09']),
    "Close": pd.Series([137, 139, 142, 144, 143, 145], index = ['2021-07-01', '2021-07-02', '2021-07-06', '2021-07-07', '2021-07-08', '2021-07-09'])
}

df = pd.DataFrame(d)

print(df.loc[['2021-07-01', '2021-07-02'], ['Open', 'High']])
print(df.iloc[[0, 1], [0, 1]])

切片

import pandas as pd

d = {
    "Open": pd.Series([136, 137, 140, 143, 141, 142], index=['2021-07-01', '2021-07-02', '2021-07-06', '2021-07-07', '2021-07-08', '2021-07-09']),
    "High": pd.Series([137, 140, 143, 144, 144, 145], index=['2021-07-01', '2021-07-02', '2021-07-06', '2021-07-07', '2021-07-08', '2021-07-09']),
    "Low": pd.Series([135, 137, 140, 142, 140, 142], index=['2021-07-01', '2021-07-02', '2021-07-06', '2021-07-07', '2021-07-08', '2021-07-09']),
    "Close": pd.Series([137, 139, 142, 144, 143, 145], index = ['2021-07-01', '2021-07-02', '2021-07-06', '2021-07-07', '2021-07-08', '2021-07-09'])
}

df = pd.DataFrame(d)

print(df.loc['2021-07-01':'2021-07-07', 'Open': 'Low'])
print(df.iloc[0:4, 0:3])

【声明】本内容来自华为云开发者社区博主，不代表华为云及华为云开发者社区的观点和立场。转载时必须标注文章的来源（华为云社区）、文章链接、文章作者等基本信息，否则作者和本社区有权追究责任。如果您发现本社区中有涉嫌抄袭的内容，欢迎发送邮件进行举报，并提供相关证据，一经查实，本社区将立刻删除涉嫌侵权内容，举报邮箱： cloudbbs@huaweicloud.com

点赞
收藏
关注作者

0/1000

抱歉，系统识别当前为高风险访问，暂不支持该操作

全部回复

上滑加载中

设置昵称

在此一键设置昵称，即可参与社区互动！

*长度不超过10个汉字或20个英文字符，设置后3个月内不可修改。

确认取消

加入云驻计划，成为创作者

华为云周边好礼
免费体验产品
特殊身份标识
线下官方门票
内部专家零距离
与10000+优质创作者共同成长

立即加入

如何选择数据

1. Series

1.1 索引

1.2 像字典一样选择数据

1.3 loc 或 iloc

2. DataFrame

2.1 索引

2.2 切片

2.3 loc 或 iloc

2.3.1 单行或单列的访问

2.3.2 多行或多列的访问

2.3.3 位于行列交叉的元素

切片

全部回复

设置昵称

关于作者

目录

加入云驻计划，成为创作者

如何选择数据

1. Series

1.1 索引

1.2 像字典一样选择数据

1.3 loc 或 iloc

2. DataFrame

2.1 索引

2.2 切片

2.3 loc 或 iloc

2.3.1 单行或单列的访问

2.3.2 多行或多列的访问

2.3.3 位于行列交叉的元素

切片

全部回复

设置昵称

关于作者

目录

热门推荐查看更多

相关文章

加入云驻计划，成为创作者

相关产品