1-Pandas数据介绍--数据分析

举报
brucexiaogui 发表于 2021/12/30 00:14:34 2021/12/30
【摘要】 Pandas数据分析 import pandas as pd df = pd.read_csv('C:/JupyterWork/data/titanic.csv') dfPassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked0 1 ...

      Pandas数据分析
      import pandas as pd
      df = pd.read_csv('C:/JupyterWork/data/titanic.csv')
      df
      PassengerId	Survived	Pclass	Name	Sex	Age	SibSp	Parch	Ticket	Fare	Cabin	Embarked
      0	1	0	3	Braund, Mr. Owen Harris	male	22.0	1	0	A/5 21171	7.2500	NaN	S
      1	2	1	1	Cumings, Mrs. John Bradley (Florence Briggs Th...	female	38.0	1	0	PC 17599	71.2833	C85	C
      2	3	1	3	Heikkinen, Miss. Laina	female	26.0	0	0	STON/O2. 3101282	7.9250	NaN	S
      3	4	1	1	Futrelle, Mrs. Jacques Heath (Lily May Peel)	female	35.0	1	0	113803	53.1000	C123	S
      4	5	0	3	Allen, Mr. William Henry	male	35.0	0	0	373450	8.0500	NaN	S
      5	6	0	3	Moran, Mr. James	male	NaN	0	0	330877	8.4583	NaN	Q
      6	7	0	1	McCarthy, Mr. Timothy J	male	54.0	0	0	17463	51.8625	E46	S
      7	8	0	3	Palsson, Master. Gosta Leonard	male	2.0	3	1	349909	21.0750	NaN	S
      8	9	1	3	Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)	female	27.0	0	2	347742	11.1333	NaN	S
      9	10	1	2	Nasser, Mrs. Nicholas (Adele Achem)	female	14.0	1	0	237736	30.0708	NaN	C
      10	11	1	3	Sandstrom, Miss. Marguerite Rut	female	4.0	1	1	PP 9549	16.7000	G6	S
      11	12	1	1	Bonnell, Miss. Elizabeth	female	58.0	0	0	113783	26.5500	C103	S
      12	13	0	3	Saundercock, Mr. William Henry	male	20.0	0	0	A/5. 2151	8.0500	NaN	S
      13	14	0	3	Andersson, Mr. Anders Johan	male	39.0	1	5	347082	31.2750	NaN	S
      14	15	0	3	Vestrom, Miss. Hulda Amanda Adolfina	female	14.0	0	0	350406	7.8542	NaN	S
      15	16	1	2	Hewlett, Mrs. (Mary D Kingcome)	female	55.0	0	0	248706	16.0000	NaN	S
      16	17	0	3	Rice, Master. Eugene	male	2.0	4	1	382652	29.1250	NaN	Q
      17	18	1	2	Williams, Mr. Charles Eugene	male	NaN	0	0	244373	13.0000	NaN	S
      18	19	0	3	Vander Planke, Mrs. Julius (Emelia Maria Vande...	female	31.0	1	0	345763	18.0000	NaN	S
      19	20	1	3	Masselmani, Mrs. Fatima	female	NaN	0	0	2649	7.2250	NaN	C
      20	21	0	2	Fynney, Mr. Joseph J	male	35.0	0	0	239865	26.0000	NaN	S
      21	22	1	2	Beesley, Mr. Lawrence	male	34.0	0	0	248698	13.0000	D56	S
      22	23	1	3	McGowan, Miss. Anna "Annie"	female	15.0	0	0	330923	8.0292	NaN	Q
      23	24	1	1	Sloper, Mr. William Thompson	male	28.0	0	0	113788	35.5000	A6	S
      24	25	0	3	Palsson, Miss. Torborg Danira	female	8.0	3	1	349909	21.0750	NaN	S
      25	26	1	3	Asplund, Mrs. Carl Oscar (Selma Augusta Emilia...	female	38.0	1	5	347077	31.3875	NaN	S
      26	27	0	3	Emir, Mr. Farred Chehab	male	NaN	0	0	2631	7.2250	NaN	C
      27	28	0	1	Fortune, Mr. Charles Alexander	male	19.0	3	2	19950	263.0000	C23 C25 C27	S
      28	29	1	3	O'Dwyer, Miss. Ellen "Nellie" female NaN 0 0 330959 7.8792 NaN Q
      29 30 0 3 Todoroff, Mr. Lalio male NaN 0 0 349216 7.8958 NaN S
      ... ... ... ... ... ... ... ... ... ... ... ... ...
      861 862 0 2 Giles, Mr. Frederick Edward male 21.0 1 0 28134 11.5000 NaN S
      862 863 1 1 Swift, Mrs. Frederick Joel (Margaret Welles Ba... female 48.0 0 0 17466 25.9292 D17 S
      863 864 0 3 Sage, Miss. Dorothy Edith "Dolly" female NaN 8 2 CA. 2343 69.5500 NaN S
      864 865 0 2 Gill, Mr. John William male 24.0 0 0 233866 13.0000 NaN S
      865 866 1 2 Bystrom, Mrs. (Karolina) female 42.0 0 0 236852 13.0000 NaN S
      866 867 1 2 Duran y More, Miss. Asuncion female 27.0 1 0 SC/PARIS 2149 13.8583 NaN C
      867 868 0 1 Roebling, Mr. Washington Augustus II male 31.0 0 0 PC 17590 50.4958 A24 S
      868 869 0 3 van Melkebeke, Mr. Philemon male NaN 0 0 345777 9.5000 NaN S
      869 870 1 3 Johnson, Master. Harold Theodor male 4.0 1 1 347742 11.1333 NaN S
      870 871 0 3 Balkic, Mr. Cerin male 26.0 0 0 349248 7.8958 NaN S
      871 872 1 1 Beckwith, Mrs. Richard Leonard (Sallie Monypeny) female 47.0 1 1 11751 52.5542 D35 S
      872 873 0 1 Carlsson, Mr. Frans Olof male 33.0 0 0 695 5.0000 B51 B53 B55 S
      873 874 0 3 Vander Cruyssen, Mr. Victor male 47.0 0 0 345765 9.0000 NaN S
      874 875 1 2 Abelson, Mrs. Samuel (Hannah Wizosky) female 28.0 1 0 P/PP 3381 24.0000 NaN C
      875 876 1 3 Najib, Miss. Adele Kiamie "Jane" female 15.0 0 0 2667 7.2250 NaN C
      876 877 0 3 Gustafsson, Mr. Alfred Ossian male 20.0 0 0 7534 9.8458 NaN S
      877 878 0 3 Petroff, Mr. Nedelio male 19.0 0 0 349212 7.8958 NaN S
      878 879 0 3 Laleff, Mr. Kristo male NaN 0 0 349217 7.8958 NaN S
      879 880 1 1 Potter, Mrs. Thomas Jr (Lily Alexenia Wilson) female 56.0 0 1 11767 83.1583 C50 C
      880 881 1 2 Shelley, Mrs. William (Imanita Parrish Hall) female 25.0 0 1 230433 26.0000 NaN S
      881 882 0 3 Markun, Mr. Johann male 33.0 0 0 349257 7.8958 NaN S
      882 883 0 3 Dahlberg, Miss. Gerda Ulrika female 22.0 0 0 7552 10.5167 NaN S
      883 884 0 2 Banfield, Mr. Frederick James male 28.0 0 0 C.A./SOTON 34068 10.5000 NaN S
      884 885 0 3 Sutehall, Mr. Henry Jr male 25.0 0 0 SOTON/OQ 392076 7.0500 NaN S
      885 886 0 3 Rice, Mrs. William (Margaret Norton) female 39.0 0 5 382652 29.1250 NaN Q
      886 887 0 2 Montvila, Rev. Juozas male 27.0 0 0 211536 13.0000 NaN S
      887 888 1 1 Graham, Miss. Margaret Edith female 19.0 0 0 112053 30.0000 B42 S
      888 889 0 3 Johnston, Miss. Catherine Helen "Carrie" female NaN 1 2 W./C. 6607 23.4500 NaN S
      889 890 1 1 Behr, Mr. Karl Howell male 26.0 0 0 111369 30.0000 C148 C
      890 891 0 3 Dooley, Mr. Patrick male 32.0 0 0 370376 7.7500 NaN Q
      891 rows × 12 columns
      head()方法默认显示5条数据,可以通过输入数字来控制显示条数
      df.head()
      PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
      0 1 0 3 Braund, Mr. Owen Harris male 22.0 1 0 A/5 21171 7.2500 NaN S
      1 2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 0 PC 17599 71.2833 C85 C
      2 3 1 3 Heikkinen, Miss. Laina female 26.0 0 0 STON/O2. 3101282 7.9250 NaN S
      3 4 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 0 113803 53.1000 C123 S
      4 5 0 3 Allen, Mr. William Henry male 35.0 0 0 373450 8.0500 NaN S
      df.head(10)
      PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
      0 1 0 3 Braund, Mr. Owen Harris male 22.0 1 0 A/5 21171 7.2500 NaN S
      1 2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 0 PC 17599 71.2833 C85 C
      2 3 1 3 Heikkinen, Miss. Laina female 26.0 0 0 STON/O2. 3101282 7.9250 NaN S
      3 4 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 0 113803 53.1000 C123 S
      4 5 0 3 Allen, Mr. William Henry male 35.0 0 0 373450 8.0500 NaN S
      5 6 0 3 Moran, Mr. James male NaN 0 0 330877 8.4583 NaN Q
      6 7 0 1 McCarthy, Mr. Timothy J male 54.0 0 0 17463 51.8625 E46 S
      7 8 0 3 Palsson, Master. Gosta Leonard male 2.0 3 1 349909 21.0750 NaN S
      8 9 1 3 Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg) female 27.0 0 2 347742 11.1333 NaN S
      9 10 1 2 Nasser, Mrs. Nicholas (Adele Achem) female 14.0 1 0 237736 30.0708 NaN C
      .info()返回数据当前的信息
      df.info()
      <class 'pandas.core.frame.DataFrame'>
      RangeIndex: 891 entries, 0 to 890
      Data columns (total 12 columns):
      PassengerId 891 non-null int64
      Survived 891 non-null int64
      Pclass 891 non-null int64
      Name 891 non-null object
      Sex 891 non-null object
      Age 714 non-null float64
      SibSp 891 non-null int64
      Parch 891 non-null int64
      Ticket 891 non-null object
      Fare 891 non-null float64
      Cabin 204 non-null object
      Embarked 889 non-null object
      dtypes: float64(2), int64(5), object(5)
      memory usage: 83.6+ KB
      df.index
      RangeIndex(start=0, stop=891, step=1)
      df.columns
      Index(['PassengerId', 'Survived', 'Pclass', 'Name', 'Sex', 'Age', 'SibSp',
       'Parch', 'Ticket', 'Fare', 'Cabin', 'Embarked'],
       dtype='object')
      dtypes 查看数据类型
      df.dtypes
      PassengerId int64
      Survived int64
      Pclass int64
      Name object
      Sex object
      Age float64
      SibSp int64
      Parch int64
      Ticket object
      Fare float64
      Cabin object
      Embarked object
      dtype: object
      values 查看数值
      df.values
      array([[1, 0, 3, ..., 7.25, nan, 'S'],
       [2, 1, 1, ..., 71.2833, 'C85', 'C'],
       [3, 1, 3, ..., 7.925, nan, 'S'],
       ...,
       [889, 0, 3, ..., 23.45, nan, 'S'],
       [890, 1, 1, ..., 30.0, 'C148', 'C'],
       [891, 0, 3, ..., 7.75, nan, 'Q']], dtype=object)
      创建Dataframe类型数据
      data = {'country':['a1','a2','a3'],
       'population':[11,12,13]}
      df_data = pd.DataFrame(data)
      df_data
      country population
      0 a1 11
      1 a2 12
      2 a3 13
      df_data.info()
      <class 'pandas.core.frame.DataFrame'>
      RangeIndex: 3 entries, 0 to 2
      Data columns (total 2 columns):
      country 3 non-null object
      population 3 non-null int64
      dtypes: int64(1), object(1)
      memory usage: 128.0+ bytes
      指定查看数据中某一列数据
      df['Age'].head(10)
      0 22.0
      1 38.0
      2 26.0
      3 35.0
      4 35.0
      5 NaN
      6 54.0
      7 2.0
      8 27.0
      9 14.0
      Name: Age, dtype: float64
      Series数据结构:dataFrame中的一行\一列数据拿出来就是一个Series数据
      df['Age'][:6]
      0 22.0
      1 38.0
      2 26.0
      3 35.0
      4 35.0
      5 NaN
      Name: Age, dtype: float64
      更改索引名称:将数据中Name列名称设置为行索引名称
      df = df.set_index('Name')
      df['Age'][:6]
      Name
      Braund, Mr. Owen Harris 22.0
      Cumings, Mrs. John Bradley (Florence Briggs Thayer) 38.0
      Heikkinen, Miss. Laina 26.0
      Futrelle, Mrs. Jacques Heath (Lily May Peel) 35.0
      Allen, Mr. William Henry 35.0
      Moran, Mr. James NaN
      Name: Age, dtype: float64
      根据索引名称获取对应的值
      age = df['Age']
      age['Braund, Mr. Owen Harris']
      22.0
      数据进行运算
      age = age+10
      age[:6]
      Name
      Braund, Mr. Owen Harris 42.0
      Cumings, Mrs. John Bradley (Florence Briggs Thayer) 58.0
      Heikkinen, Miss. Laina 46.0
      Futrelle, Mrs. Jacques Heath (Lily May Peel) 55.0
      Allen, Mr. William Henry 55.0
      Moran, Mr. James NaN
      Name: Age, dtype: float64
      age.mean()
      49.69911764705882
      age.min()
      20.42
      .describe()显示数据统计指标
      df.describe()
      PassengerId Survived Pclass Age SibSp Parch Fare
      count 891.000000 891.000000 891.000000 714.000000 891.000000 891.000000 891.000000
      mean 446.000000 0.383838 2.308642 29.699118 0.523008 0.381594 32.204208
      std 257.353842 0.486592 0.836071 14.526497 1.102743 0.806057 49.693429
      min 1.000000 0.000000 1.000000 0.420000 0.000000 0.000000 0.000000
      25% 223.500000 0.000000 2.000000 20.125000 0.000000 0.000000 7.910400
      50% 446.000000 0.000000 3.000000 28.000000 0.000000 0.000000 14.454200
      75% 668.500000 1.000000 3.000000 38.000000 1.000000 0.000000 31.000000
      max 891.000000 1.000000 3.000000 80.000000 8.000000 6.000000 512.329200
 

文章来源: brucelong.blog.csdn.net,作者:Bruce小鬼,版权归原作者所有,如需转载,请联系作者。

原文链接:brucelong.blog.csdn.net/article/details/80718995

【版权声明】本文为华为云社区用户转载文章,如果您发现本社区中有涉嫌抄袭的内容,欢迎发送邮件进行举报,并提供相关证据,一经查实,本社区将立刻删除涉嫌侵权内容,举报邮箱: cloudbbs@huaweicloud.com
  • 点赞
  • 收藏
  • 关注作者

评论(0

0/1000
抱歉,系统识别当前为高风险访问,暂不支持该操作

全部回复

上滑加载中

设置昵称

在此一键设置昵称,即可参与社区互动!

*长度不超过10个汉字或20个英文字符,设置后3个月内不可修改。

*长度不超过10个汉字或20个英文字符,设置后3个月内不可修改。