Pandas DataFrame reindex 重置行索引

举报
千江有水千江月 发表于 2020/12/29 13:23:54 2020/12/29
【摘要】 所属的课程名称及链接[AI基础课程--常用框架工具]环境信息* ModelArts  * Notebook - Multi-Engine 2.0 (python3)    * JupyterLab - Notebook - Conda-python3      * pandas 0.22.0Pandas DataFrame reindex 重置行索引import pandas as pdim...

所属的课程名称及链接


环境信息

  • * ModelArts
    •   * Notebook - Multi-Engine 2.0 (python3)
      •     * JupyterLab - Notebook - Conda-python3
        •       * pandas 0.22.0


Pandas DataFrame reindex 重置行索引

import pandas as pd
import numpy as np

my_df = pd.DataFrame(data=np.arange(20).reshape(4,5),  # 4*5的矩阵
             index=list("acef"),  # 行索引 缺少bd,一会用reindex补上
             columns=list("ABCDE"))  # 列索引

print("my_df\n",my_df)
'''
reindex(
    labels=None, 
    index=None, 
    columns=None,
    axis=None, 
    method=None, 
    copy=True, 
    level=None, 
    fill_value=nan, 
    limit=None, 
    tolerance=None) 
'''
# 重置行索引
# 对于新行,填充的是NaN
# 注意阅读帮助文档
print(my_df.reindex(list("abcdefg")))
my_df
     A   B   C   D   E
a   0   1   2   3   4
c   5   6   7   8   9
e  10  11  12  13  14
f  15  16  17  18  19
      A     B     C     D     E
a   0.0   1.0   2.0   3.0   4.0
b   NaN   NaN   NaN   NaN   NaN
c   5.0   6.0   7.0   8.0   9.0
d   NaN   NaN   NaN   NaN   NaN
e  10.0  11.0  12.0  13.0  14.0
f  15.0  16.0  17.0  18.0  19.0
g   NaN   NaN   NaN   NaN   NaN


help

help(my_df.reindex)

Help on method reindex in module pandas.core.frame:

reindex(labels=None, index=None, columns=None, axis=None, method=None, copy=True, level=None, fill_value=nan, limit=None, tolerance=None) method of pandas.core.frame.DataFrame instance
    Conform DataFrame to new index with optional filling logic, placing
    NA/NaN in locations having no value in the previous index. A new object
    is produced unless the new index is equivalent to the current one and
    copy=False
    
    Parameters
    ----------
    labels : array-like, optional
        New labels / index to conform the axis specified by 'axis' to.
    index, columns : array-like, optional (should be specified using keywords)
        New labels / index to conform to. Preferably an Index object to
        avoid duplicating data
    axis : int or str, optional
        Axis to target. Can be either the axis name ('index', 'columns')
        or number (0, 1).
    method : {None, 'backfill'/'bfill', 'pad'/'ffill', 'nearest'}, optional
        method to use for filling holes in reindexed DataFrame.
        Please note: this is only  applicable to DataFrames/Series with a
        monotonically increasing/decreasing index.
    
        * default: don't fill gaps
        * pad / ffill: propagate last valid observation forward to next
          valid
        * backfill / bfill: use next valid observation to fill gap
        * nearest: use nearest valid observations to fill gap
    
    copy : boolean, default True
        Return a new object, even if the passed indexes are the same
    level : int or name
        Broadcast across a level, matching Index values on the
        passed MultiIndex level
    fill_value : scalar, default np.NaN
        Value to use for missing values. Defaults to NaN, but can be any
        "compatible" value
    limit : int, default None
        Maximum number of consecutive elements to forward or backward fill
    tolerance : optional
        Maximum distance between original and new labels for inexact
        matches. The values of the index at the matching locations most
        satisfy the equation ``abs(index[indexer] - target) <= tolerance``.
    
        Tolerance may be a scalar value, which applies the same tolerance
        to all values, or list-like, which applies variable tolerance per
        element. List-like includes list, tuple, array, Series, and must be
        the same size as the index and its dtype must exactly match the
        index's type.
    
        .. versionadded:: 0.17.0
        .. versionadded:: 0.21.0 (list-like tolerance)
    
    Examples
    --------
    
    ``DataFrame.reindex`` supports two calling conventions
    
    * ``(index=index_labels, columns=column_labels, ...)``
    * ``(labels, axis={'index', 'columns'}, ...)``
    
    We *highly* recommend using keyword arguments to clarify your
    intent.
    
    Create a dataframe with some fictional data.
    
    >>> index = ['Firefox', 'Chrome', 'Safari', 'IE10', 'Konqueror']
    >>> df = pd.DataFrame({
    ...      'http_status': [200,200,404,404,301],
    ...      'response_time': [0.04, 0.02, 0.07, 0.08, 1.0]},
    ...       index=index)
    >>> df
               http_status  response_time
    Firefox            200           0.04
    Chrome             200           0.02
    Safari             404           0.07
    IE10               404           0.08
    Konqueror          301           1.00
    
    Create a new index and reindex the dataframe. By default
    values in the new index that do not have corresponding
    records in the dataframe are assigned ``NaN``.
    
    >>> new_index= ['Safari', 'Iceweasel', 'Comodo Dragon', 'IE10',
    ...             'Chrome']
    >>> df.reindex(new_index)
                   http_status  response_time
    Safari               404.0           0.07
    Iceweasel              NaN            NaN
    Comodo Dragon          NaN            NaN
    IE10                 404.0           0.08
    Chrome               200.0           0.02
    
    We can fill in the missing values by passing a value to
    the keyword ``fill_value``. Because the index is not monotonically
    increasing or decreasing, we cannot use arguments to the keyword
    ``method`` to fill the ``NaN`` values.
    
    >>> df.reindex(new_index, fill_value=0)
                   http_status  response_time
    Safari                 404           0.07
    Iceweasel                0           0.00
    Comodo Dragon            0           0.00
    IE10                   404           0.08
    Chrome                 200           0.02
    
    >>> df.reindex(new_index, fill_value='missing')
                  http_status response_time
    Safari                404          0.07
    Iceweasel         missing       missing
    Comodo Dragon     missing       missing
    IE10                  404          0.08
    Chrome                200          0.02
    
    We can also reindex the columns.
    
    >>> df.reindex(columns=['http_status', 'user_agent'])
               http_status  user_agent
    Firefox            200         NaN
    Chrome             200         NaN
    Safari             404         NaN
    IE10               404         NaN
    Konqueror          301         NaN
    
    Or we can use "axis-style" keyword arguments
    
    >>> df.reindex(['http_status', 'user_agent'], axis="columns")
               http_status  user_agent
    Firefox            200         NaN
    Chrome             200         NaN
    Safari             404         NaN
    IE10               404         NaN
    Konqueror          301         NaN
    
    To further illustrate the filling functionality in
    ``reindex``, we will create a dataframe with a
    monotonically increasing index (for example, a sequence
    of dates).
    
    >>> date_index = pd.date_range('1/1/2010', periods=6, freq='D')
    >>> df2 = pd.DataFrame({"prices": [100, 101, np.nan, 100, 89, 88]},
    ...                    index=date_index)
    >>> df2
                prices
    2010-01-01     100
    2010-01-02     101
    2010-01-03     NaN
    2010-01-04     100
    2010-01-05      89
    2010-01-06      88
    
    Suppose we decide to expand the dataframe to cover a wider
    date range.
    
    >>> date_index2 = pd.date_range('12/29/2009', periods=10, freq='D')
    >>> df2.reindex(date_index2)
                prices
    2009-12-29     NaN
    2009-12-30     NaN
    2009-12-31     NaN
    2010-01-01     100
    2010-01-02     101
    2010-01-03     NaN
    2010-01-04     100
    2010-01-05      89
    2010-01-06      88
    2010-01-07     NaN
    
    The index entries that did not have a value in the original data frame
    (for example, '2009-12-29') are by default filled with ``NaN``.
    If desired, we can fill in the missing values using one of several
    options.
    
    For example, to backpropagate the last valid value to fill the ``NaN``
    values, pass ``bfill`` as an argument to the ``method`` keyword.
    
    >>> df2.reindex(date_index2, method='bfill')
                prices
    2009-12-29     100
    2009-12-30     100
    2009-12-31     100
    2010-01-01     100
    2010-01-02     101
    2010-01-03     NaN
    2010-01-04     100
    2010-01-05      89
    2010-01-06      88
    2010-01-07     NaN
    
    Please note that the ``NaN`` value present in the original dataframe
    (at index value 2010-01-03) will not be filled by any of the
    value propagation schemes. This is because filling while reindexing
    does not look at dataframe values, but only compares the original and
    desired indexes. If you do want to fill in the ``NaN`` values present
    in the original dataframe, use the ``fillna()`` method.
    
    See the :ref:`user guide <basics.reindexing>` for more.
    
    Returns
    -------
    reindexed : DataFrame


备注

1. 感谢老师的教学与课件  
2. 欢迎各位同学一起来交流学习心得^_^  
3. 沙箱实验、认证、论坛和直播,其中包含了许多优质的内容,推荐了解与学习。  


【版权声明】本文为华为云社区用户原创内容,转载时必须标注文章的来源(华为云社区)、文章链接、文章作者等基本信息, 否则作者和本社区有权追究责任。如果您发现本社区中有涉嫌抄袭的内容,欢迎发送邮件进行举报,并提供相关证据,一经查实,本社区将立刻删除涉嫌侵权内容,举报邮箱: cloudbbs@huaweicloud.com
  • 点赞
  • 收藏
  • 关注作者

评论(0

0/1000
抱歉,系统识别当前为高风险访问,暂不支持该操作

全部回复

上滑加载中

设置昵称

在此一键设置昵称,即可参与社区互动!

*长度不超过10个汉字或20个英文字符,设置后3个月内不可修改。

*长度不超过10个汉字或20个英文字符,设置后3个月内不可修改。