对比集合Set | 详解Pandas的DataFrame如何做交集、并集、差集与对称差集
【摘要】
一、简介
"""
@Author :叶庭云
@公众号 :AI庭云君
@CSDN :https://yetingyun.blog.csdn.net/
"""
12345
Python的...
一、简介
"""
@Author :叶庭云
@公众号 :AI庭云君
@CSDN :https://yetingyun.blog.csdn.net/
"""
- 1
- 2
- 3
- 4
- 5
Python的数据类型集合:由不同元素组成的集合,集合中是一组无序排列的可 Hash 的值(不可变类型),可以作为字典的Key
Pandas中的DataFrame:DataFrame是一个表格型的数据结构,可以理解为带有标签的二维数组。
常用的集合操作如下图所示:
二、交集
- pandas的 merge 功能默认为 inner 连接,可以实现取交集
- 集合 set 可以直接用 & 取交集
import pandas as pd
print("CSDN叶庭云:https://yetingyun.blog.csdn.net/")
set1 = {"Python", "Go", "C++", "Java"}
set2 = {"Go", "C++", "JavaScript", "C"}
set1 & set2
df1 = pd.DataFrame([
['1', 'Python'],
['2', 'Go'],
['3', 'C++'],
['4', 'Java'],
], columns=['id','name'])
df2 = pd.DataFrame([
['2','Go'],
['3','C++'],
['5','JavaScript'],
['6','C'],
], columns=['id','name'])
pd.merge(df1, df2, on=['id','name'])
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
操作如下所示:
三、并集
- Pandas的 merge 方法里参数 how 的取值有 “left”, “right”, “inner”, “outer”,默认是inner。outer外连接可以实现取并集。另一种方法也可以df1.append(df2)后去重,保留第一次出现的也可以实现取并集。
- 集合 set 可以直接用 | 取并集
set1 = {"Python", "Go", "C++", "Java"}
set2 = {"Go", "C++", "JavaScript", "C"}
set1 | set2
print("CSDN叶庭云:https://yetingyun.blog.csdn.net/")
df1 = pd.DataFrame([
['1', 'Python'],
['2', 'Go'],
['3', 'C++'],
['4', 'Java'],
], columns=['id','name'])
df2 = pd.DataFrame([
['2','Go'],
['3','C++'],
['5','JavaScript'],
['6','C'],
], columns=['id','name'])
pd.merge(df1, df2,
on=['id','name'],
how='outer')
df3 = df1.append(df2)
df3.drop_duplicates(subset=['id'], keep="first")
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
四、差集
set1 = {"Python", "Go", "C++", "Java"}
set2 = {"Go", "C++", "JavaScript", "C"}
set1 - set2
print("CSDN叶庭云:https://yetingyun.blog.csdn.net/")
set1 = {"Python", "Go", "C++", "Java"}
set2 = {"Go", "C++", "JavaScript", "C"}
set2 - set1
# df1-df2
df1 = pd.DataFrame([
['1', 'Python'],
['2', 'Go'],
['3', 'C++'],
['4', 'Java'],
], columns=['id','name'])
df2 = pd.DataFrame([
['2','Go'],
['3','C++'],
['5','JavaScript'],
['6','C'],
], columns=['id','name'])
df1 = df1.append(df2)
df1 = df1.append(df2)
set_diff_df = df1.drop_duplicates(subset=df1.columns,
keep=False)
set_diff_df
# df2-df1
df1 = pd.DataFrame([
['1', 'Python'],
['2', 'Go'],
['3', 'C++'],
['4', 'Java'],
], columns=['id','name'])
df2 = pd.DataFrame([
['2','Go'],
['3','C++'],
['5','JavaScript'],
['6','C'],
], columns=['id','name'])
print("CSDN叶庭云:https://yetingyun.blog.csdn.net/")
df2 = df2.append(df1)
df2 = df2.append(df1)
set_diff_df = df2.drop_duplicates(subset=df2.columns,
keep=False)
set_diff_df
# df1-df2
df1 = pd.DataFrame([
['1', 'Python'],
['2', 'Go'],
['3', 'C++'],
['4', 'Java'],
], columns=['id','name'])
df2 = pd.DataFrame([
['2','Go'],
['3','C++'],
['5','JavaScript'],
['6','C'],
], columns=['id','name'])
pd.concat([df1, df2, df2]).drop_duplicates(keep=False)
# df2-df1
df1 = pd.DataFrame([
['1', 'Python'],
['2', 'Go'],
['3', 'C++'],
['4', 'Java'],
], columns=['id','name'])
df2 = pd.DataFrame([
['2','Go'],
['3','C++'],
['5','JavaScript'],
['6','C'],
], columns=['id','name'])
pd.concat([df2, df1, df1]).drop_duplicates(keep=False)
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
- 45
- 46
- 47
- 48
- 49
- 50
- 51
- 52
- 53
- 54
- 55
- 56
- 57
- 58
- 59
- 60
- 61
- 62
- 63
- 64
- 65
- 66
- 67
- 68
- 69
- 70
- 71
- 72
- 73
- 74
- 75
- 76
- 77
- 78
- 79
- 80
- 81
- 82
- 83
- 84
- 85
- 86
- 87
- 88
五、对称差集
print("CSDN叶庭云:https://yetingyun.blog.csdn.net/")
set1 = {"Python", "Go", "C++", "Java"}
set2 = {"Go", "C++", "JavaScript", "C"}
set1 ^ set2 # 对称差集
# 去重 不保留重复的:即可实现取对称差集
df3 = df1.append(df2)
df3.drop_duplicates(subset=['id'], keep=False)
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
推荐学习:
文章来源: yetingyun.blog.csdn.net,作者:叶庭云,版权归原作者所有,如需转载,请联系作者。
原文链接:yetingyun.blog.csdn.net/article/details/122588761
【版权声明】本文为华为云社区用户转载文章,如果您发现本社区中有涉嫌抄袭的内容,欢迎发送邮件进行举报,并提供相关证据,一经查实,本社区将立刻删除涉嫌侵权内容,举报邮箱:
cloudbbs@huaweicloud.com
- 点赞
- 收藏
- 关注作者
评论(0)