本文介绍了在Pandas DF行中查找最短日期并创建新列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个带有许多日期的表(某些日期将是NaN),我需要找到最早的日期因此一行可能具有DATE_MODIFIED,WITHDRAWN_DATE,SOLD_DATE,STATUS_DATE等.

I have a table with a number of dates (some dates will be NaN) and I need to find the oldest dateso a row may have DATE_MODIFIED, WITHDRAWN_DATE, SOLD_DATE, STATUS_DATE etc..

因此,对于每一行,我要查找其中最早的字段并在数据框中添加新列的一个或多个字段中都会有一个日期.

So for each row there will be a date in one or more of the fields I want to find the oldest of those and make a new column in the dataframe.

像这样的事情,如果我只做一个,例如DATE MODIFIED,我会得到一个结果,但是当我如下添加第二个时

Something like this, if I just do one , eg DATE MODIFIED I get a result but when I add the second as below

table['END_DATE']=min([table['DATE_MODIFIED']],[table['SOLD_DATE']])

我得到:

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

为此,假设我最初创建了正确的日期列,此构造方法是否可以找到最小日期?

For that matter will this construct work to find the min date, assuming I create correct date columns initially?

推荐答案

只需沿轴= 1应用min函数.

Just apply the min function along the axis=1.

In [1]: import pandas as pd
In [2]: df = pd.read_csv('test.cvs', parse_dates=['d1', 'd2', 'd3'])
In [3]: df.ix[2, 'd1'] = None
In [4]: df.ix[1, 'd2'] = None
In [5]: df.ix[4, 'd3'] = None
In [6]: df
Out[6]:
                   d1                  d2                  d3
0 2013-02-07 00:00:00 2013-03-08 00:00:00 2013-05-21 00:00:00
1 2013-02-07 00:00:00                 NaT 2013-05-21 00:00:00
2                 NaT 2013-03-02 00:00:00 2013-05-21 00:00:00
3 2013-02-04 00:00:00 2013-03-08 00:00:00 2013-01-04 00:00:00
4 2013-02-01 00:00:00 2013-03-06 00:00:00                 NaT
In [7]: df.min(axis=1)
Out[7]:
0   2013-02-07 00:00:00
1   2013-02-07 00:00:00
2   2013-03-02 00:00:00
3   2013-01-04 00:00:00
4   2013-02-01 00:00:00
dtype: datetime64[ns]

这篇关于在Pandas DF行中查找最短日期并创建新列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-13 08:14