我正在通过Spyder IDE运行Windows 10,Python 2.7。
我有一个名为DataFrame
的熊猫df
:
import random
import pandas as pd
tee = pd.date_range('2000-01-01','2005-01-01',freq = 'M')
owe = pd.date_range('2000-01-01','2005-02-29')
eye = random.sample(range(100), 60)
you = random.sample(range(100), 60)
df = pd.DataFrame({'tee': tee , 'owe':owe, 'eye': eye,'you': you})
#df.dtypes
#dtypes: eye and owe = 'datetime64[ns]' , you and tee = 'int64'
you
以天为单位。对于2004年中具有
tee
的每一行,我想将eye
的记录替换为该行的int64
加owe
的月份(格式为you
)。请让我知道我是否可以提供更多信息。
最佳答案
评论说一些错误。因此,我更改了创建dataframe
的方式,然后通过函数loc
与Series.dt.year
和Series.dt.month
应用条件:
注意:测试后,将功能eye_new
中的列名eye
更改为loc
!
import random
import numpy as np
import pandas as pd
tee = pd.date_range(pd.to_datetime('2000-01-01'),pd.to_datetime('2005-01-01'),freq = 'M')
owe = pd.date_range(pd.to_datetime('2000-01-01'),pd.to_datetime('2005-02-28'))
eye = random.sample(range(100), 60)
#http://docs.scipy.org/doc/numpy-1.10.0/reference/generated/numpy.random.randint.html
you = np.random.randint(0, 100 , size=1886)
print len(tee) #60
print len(owe) #1886
print len(eye) #60
print len(you) #1886
df1 = pd.DataFrame({'tee': tee , 'eye': eye})
print df1.head()
eye tee
0 17 2000-01-31
1 71 2000-02-29
2 56 2000-03-31
3 2 2000-04-30
4 92 2000-05-31
df2 = pd.DataFrame({'owe':owe, 'you': you})
print df2.head()
owe you
0 2000-01-01 78
1 2000-01-02 76
2 2000-01-03 78
3 2000-01-04 51
4 2000-01-05 66
#merging df1 and df2 by colum,ns tee and owe
result = pd.merge(df1, df2, left_on='tee', right_on='owe')
#after testing change eye_new to eye
result.loc[ result['tee'].dt.year == 2004 ,'eye_new'] = result['owe'].dt.month +
result['you']
print result
eye tee owe you eye_new
0 17 2000-01-31 2000-01-31 4 NaN
1 71 2000-02-29 2000-02-29 27 NaN
2 56 2000-03-31 2000-03-31 14 NaN
3 2 2000-04-30 2000-04-30 13 NaN
4 92 2000-05-31 2000-05-31 10 NaN
5 11 2000-06-30 2000-06-30 54 NaN
6 93 2000-07-31 2000-07-31 54 NaN
...
45 49 2003-10-31 2003-10-31 50 NaN
46 79 2003-11-30 2003-11-30 93 NaN
47 68 2003-12-31 2003-12-31 20 NaN
48 96 2004-01-31 2004-01-31 49 50
49 53 2004-02-29 2004-02-29 24 26
50 83 2004-03-31 2004-03-31 56 59
51 39 2004-04-30 2004-04-30 18 22
52 40 2004-05-31 2004-05-31 4 9
53 86 2004-06-30 2004-06-30 42 48
54 48 2004-07-31 2004-07-31 76 83
55 9 2004-08-31 2004-08-31 21 29
56 89 2004-09-30 2004-09-30 24 33
57 18 2004-10-31 2004-10-31 32 42
58 13 2004-11-30 2004-11-30 58 69
59 12 2004-12-31 2004-12-31 6 18
通过评论编辑:
#after testing change eye_new to eye
result.loc[ result['tee'].dt.year == 2004 ,'eye_new'] = result.apply(lambda x: x['owe'] +
pd.DateOffset(days=x['you']), axis=1).dt.month
print result
eye tee owe you eye_new
0 5 2000-01-31 2000-01-31 149 NaN
1 81 2000-02-29 2000-02-29 5 NaN
2 42 2000-03-31 2000-03-31 39 NaN
.
.
.
47 56 2003-12-31 2003-12-31 74 NaN
48 64 2004-01-31 2004-01-31 39 3
49 0 2004-02-29 2004-02-29 395 3
50 13 2004-03-31 2004-03-31 257 12
51 31 2004-04-30 2004-04-30 164 10
52 14 2004-05-31 2004-05-31 116 9
53 37 2004-06-30 2004-06-30 335 5
54 49 2004-07-31 2004-07-31 158 1
55 95 2004-08-31 2004-08-31 244 5
56 82 2004-09-30 2004-09-30 279 7
57 38 2004-10-31 2004-10-31 20 11
58 74 2004-11-30 2004-11-30 33 1
59 59 2004-12-31 2004-12-31 326 11