python - Pandas read_csv并删除夏时制

我有一个312.5MB的csv文件，其中包含从2003年7月27日至今的EURUSD 1min OHLC数据，但是这些日期都针对夏时制进行了调整，这意味着我会重复和留空。

看到它是如此大的文件，默认的日期解析器太慢了，所以我这样做：

tizo = dateutil.tz.tzfile('/usr/share/zoneinfo/GB')
def date_parse_1min(s):
    return datetime(int(s[6:10]),
                    int(s[3:5]),
                    int(s[0:2]),
                    int(s[11:13]),
                    int(s[14:16]),tzinfo=tizo)

df = read_csv("EURUSD_1m_clean_w_header.csv",index_col=0,parse_dates=True, date_parser=date_parse_1min)

#verify that it's got the tz right:
df.index
Exception AttributeError: "'NoneType' object has no attribute 'toordinal'" in 'pandas.tslib._localize_tso' ignored
Exception AttributeError: "'NoneType' object has no attribute 'toordinal'" in 'pandas.tslib._localize_tso' ignored
<class 'pandas.tseries.index.DatetimeIndex'>
[2003-07-26 23:00:00, ..., 2012-12-15 23:59:00]
Length: 4938660, Freq: None, Timezone: tzfile('/usr/share/zoneinfo/GB')

不知道为什么那里存在属性错误。

df.index.get_duplicates()
<class 'pandas.tseries.index.DatetimeIndex'>
[2003-10-26 01:00:00, ..., 2012-10-28 01:59:00]
Length: 600, Freq: None, Timezone: None
df1 = df.tz_convert('GMT')
df1.index.get_duplicates()
<class 'pandas.tseries.index.DatetimeIndex'>
[2003-10-26 01:00:00, ..., 2012-10-28 01:59:00]
Length: 600, Freq: None, Timezone: None

如何让熊猫去掉夏令时偏移量？显然，我可以算出需要更改的正确整数索引，并像那样做，但是必须有更好的方法。

最佳答案

如果您采用每年的第一个和最后一个重复值，并将数据之间的间隔移动一个小时，则这应该是解决问题的最简单方法。显然，您必须考虑到第一个数据点从夏时制开始。

关于python - Pandas read_csv并删除夏时制，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/14012235/