hands-on-data-analysis 第二单元 2,3节

举报
陈沧夜 发表于 2022/06/19 09:59:02 2022/06/19
【摘要】 hands-on-data-analysis 第二单元 2,3节第二节 数据重构万事开头记得导入基本的库: # 导入基本库 import numpy as np import pandas as pd2.1.数据合并——concat横向合并官方文档:pandas.concat — pandas 1.4.2 documentation (pydata.org)对text_left_up,tex...

hands-on-data-analysis 第二单元 2,3节

第二节 数据重构

万事开头记得导入基本的库:

 # 导入基本库
 import numpy as np
 import pandas as pd

2.1.数据合并——concat横向合并

官方文档:

pandas.concat — pandas 1.4.2 documentation (pydata.org)

text_left_up,text_right_up两张表,如果横向合并为一张表(就是列与列拼接在一起)

text_left_up

PassengerId Survived Pclass Name
0 1 0 3 Braund, Mr. Owen Harris
1 2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Th...
2 3 1 3 Heikkinen, Miss. Laina
3 4 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel)
4 5 0 3 Allen, Mr. William Henry

text_right_up:

Sex Age SibSp Parch Ticket Fare Cabin Embarked
0 male 22.0 1.0 0.0 A/5 21171 7.2500 NaN S
1 female 38.0 1.0 0.0 PC 17599 71.2833 C85 C
2 female 26.0 0.0 0.0 STON/O2. 3101282 7.9250 NaN S
3 female 35.0 1.0 0.0 113803 53.1000 C123 S
4 male 35.0 0.0 0.0 373450 8.0500 NaN S


 list_up = [text_left_up,text_right_up]
 result_up = pd.concat(list_up,axis=1)
 result_up.head()

得到:

PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
0 1.0 0.0 3.0 Braund, Mr. Owen Harris male 22.0 1.0 0.0 A/5 21171 7.2500 NaN S
1 2.0 1.0 1.0 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1.0 0.0 PC 17599 71.2833 C85 C
2 3.0 1.0 3.0 Heikkinen, Miss. Laina female 26.0 0.0 0.0 STON/O2. 3101282 7.9250 NaN S
3 4.0 1.0 1.0 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1.0 0.0 113803 53.1000 C123 S
4 5.0 0.0 3.0 Allen, Mr. William Henry male 35.0 0.0 0.0 373450 8.0500 NaN S

就好比,把带有小明的学号的表和带有小明成绩的表合在一起。

2.2.数据合并——concat纵向合并

官方文档:

pandas.concat — pandas 1.4.2 documentation (pydata.org)



将train-left-down和train-right-down横向合并为一张表,并保存这张表为result_down。然后将上边的result_up和result_down纵向合并为result。

text_left_down的数据为:

PassengerId Survived Pclass Name
0 440 0 2 Kvillner, Mr. Johan Henrik Johannesson
1 441 1 2 Hart, Mrs. Benjamin (Esther Ada Bloomfield)
2 442 0 3 Hampe, Mr. Leon
3 443 0 3 Petterson, Mr. Johan Emil
4 444 1 2 Reynaldo, Ms. Encarnacion

text_right_down数据为:

Sex Age SibSp Parch Ticket Fare Cabin Embarked
0 male 31.0 0 0 C.A. 18723 10.500 NaN S
1 female 45.0 1 1 F.C.C. 13529 26.250 NaN S
2 male 20.0 0 0 345769 9.500 NaN S
3 male 25.0 1 0 347076 7.775 NaN S
4 female 28.0 0 0 230434 13.000 NaN S
 list_down=[text_left_down,text_right_down]
 result_down = pd.concat(list_down,axis=1)
 result = pd.concat([result_up,result_down])
 result.head()

合并后的表为:

PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
0 1.0 0.0 3.0 Braund, Mr. Owen Harris male 22.0 1.0 0.0 A/5 21171 7.2500 NaN S
1 2.0 1.0 1.0 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1.0 0.0 PC 17599 71.2833 C85 C
2 3.0 1.0 3.0 Heikkinen, Miss. Laina female 26.0 0.0 0.0 STON/O2. 3101282 7.9250 NaN S
3 4.0 1.0 1.0 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1.0 0.0 113803 53.1000 C123 S
4 5.0 0.0 3.0 Allen, Mr. William Henry male 35.0 0.0 0.0 373450 8.0500 NaN S

2.3.数据合并——join

官方文档: pandas.DataFrame.join — pandas 1.4.2 documentation (pydata.org)

Join columns of another DataFrame.

Join columns with other DataFrame either on index or on a key column. Efficiently join multiple DataFrame objects by index at once by passing a list.

从官方文档上可以知道,join的方式比较灵活。

可以在 索引 上将 列 与其他 DataFrame 连接。 也可以通过传递一个列表,一次有效地按索引连接多个 DataFrame 对象。

参数有:

 DataFrame.join(other, on=None, how='left', lsuffix='', rsuffix='', sort=False)

2.4. concat 与 join 比较

concat、join等的比较

Merge, join, concatenate and compare — pandas 1.4.2 documentation (pydata.org)

第三节 GroupBy 接口

官方文档:

pandas.DataFrame.groupby — pandas 1.4.2 documentation (pydata.org)

【版权声明】本文为华为云社区用户原创内容,转载时必须标注文章的来源(华为云社区)、文章链接、文章作者等基本信息, 否则作者和本社区有权追究责任。如果您发现本社区中有涉嫌抄袭的内容,欢迎发送邮件进行举报,并提供相关证据,一经查实,本社区将立刻删除涉嫌侵权内容,举报邮箱: cloudbbs@huaweicloud.com
  • 点赞
  • 收藏
  • 关注作者

评论(0

0/1000
抱歉,系统识别当前为高风险访问,暂不支持该操作

全部回复

上滑加载中

设置昵称

在此一键设置昵称,即可参与社区互动!

*长度不超过10个汉字或20个英文字符,设置后3个月内不可修改。

*长度不超过10个汉字或20个英文字符,设置后3个月内不可修改。