本文介绍了将pandas列从字符串Quarters和Years数组转换为datetime列,在该列中混合格式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是我先前提出的问题的延伸.

This is an extension of an earlier question i had.

将pandas列从字符串Quarters和Years数组转换为datetime列

我有一个这样的数据框,其中日期混杂了.

I have a dataframe like this where the dates are jumbled up.

我想将它们转换为日期时间对象.

I want to convert them to datetime objects.

因此 3Q '11 将成为 2011-09-30 Q1 '20 将成为 2020-03-31

Date    Data
3Q '11  11.12
4Q '11  15.43
1Q '12  11.8
2Q '12  17
1Q '13  19.5
2Q '13  14.62
3Q '13  14.1
4Q '13  26
1Q '14  16.4
2Q '14  13.3
3Q '14  12.3
4Q '14  21.4
1Q '15  12.6
2Q '15  11
3Q '15  9.9
4Q '15  16.1
1Q '16  10.3
Q2 '16  10
Q3 '16  9.3
Q4 '16  13.1
Q1 '17  8.9
Q2 '17  11.4
Q3 '17  10.3
Q4 '17  13.2
Q1 '18  9.1
Q2 '18  11.6
Q3 '18  9.7
Q4 '18  12.9
Q1 '19  9.9
Q2 '19  12.3
Q3 '19  11.8
Q4 '19  15.9
Q1 '20  6.9
Q2 '20  12.4
Q3 '20  13.9

如果行都匹配,则我具有以下公式来处理不同的数据帧,其中每行包含Q后跟一个数字或一个数字后跟一个Q,

I have the following formula to handle the different dataframes if the rows all match where either every row contains Q followed by a number or a number followed by a Q,

if df['Date'][0].startswith('Q') == True:
    df['Date'] = df['Date'].str.replace(" ","").str.split("'")
    df['Date'] = (pd.to_datetime("20"+df['Date'].str[::-1].str.join('')) + pd.offsets.QuarterEnd(0))
else:
    df['Date'] = df['Date'].str.replace("'","20").str.split(" ")
    df['Date'] = pd.to_datetime(df['Date'].str.join('')) + pd.offsets.QuarterEnd(0)

但是,在这种情况下,数据框同时具有将日期写为Q3或3Q的两种数据,如何在应用其中一种数据之前将其标准化?

However, in this case, the dataframe has both kinds of data where the dates are written written as both Q3 or 3Q within the same frame, how do i normalise the data before applying one of these?

推荐答案

您可以使用 Series.replace 以获取正确的时间顺序,然后应用转换为日期时间的解决方案:

You can use Series.replace for correct order of periods and then apply solution for convert to datetimes:

df = pd.DataFrame({'Date': ["3Q '11", "4Q '11", "1Q '12", "2Q '12", "1Q '13",
                            "Q2 '19", "Q3 '19", "Q4 '19", "Q1 '20"], 
                   'Data': [11.12, 15.43, 11.8, 17.0, 19.5, 12.3, 11.8, 15.9, 6.9]})
print (df)
     Date   Data
0  3Q '11  11.12
1  4Q '11  15.43
2  1Q '12  11.80
3  2Q '12  17.00
4  1Q '13  19.50
5  Q2 '19  12.30
6  Q3 '19  11.80
7  Q4 '19  15.90
8  Q1 '20   6.90


df['Date'] = df['Date'].replace(r"^(\d+)([Q])\D*(\d+)$", r'20\3\2\1', regex=True)
df['Date'] = df['Date'].replace(r"^([Q]\d+)\D*(\d+)$", r'20\2\1', regex=True)


print (df)
     Date   Data
0  2011Q3  11.12
1  2011Q4  15.43
2  2012Q1  11.80
3  2012Q2  17.00
4  2013Q1  19.50
5  2019Q2  12.30
6  2019Q3  11.80
7  2019Q4  15.90
8  2020Q1   6.90

另一个想法是使用索引:

Another idea is use indexing:

m =  df['Date'].str.startswith('Q')
df['Date'] = ('20' + df['Date'].str[-2:] + df['Date'].str[:2]
                  .where(m, df['Date'].str[1] + df['Date'].str[0]))
print (df)
     Date   Data
0  2011Q3  11.12
1  2011Q4  15.43
2  2012Q1  11.80
3  2012Q2  17.00
4  2013Q1  19.50
5  2019Q2  12.30
6  2019Q3  11.80
7  2019Q4  15.90
8  2020Q1   6.90

    

这篇关于将pandas列从字符串Quarters和Years数组转换为datetime列,在该列中混合格式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-29 09:30