usecols 与 parse_dates 和名称

本文介绍了usecols 与 parse_dates 和名称的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试使用以下格式的 OHLC 数据加载 csv 文件.

I am trying to load a csv file with OHLC data in the following format.

In [49]: !head '500008.csv'
03 Jan 2000,12.85,13.11,12.74,13.11,976500,,,,
04 Jan 2000,13.54,13.60,12.56,13.33,2493000,,,,
05 Jan 2000,12.68,13.34,12.37,12.68,1680000,,,,
06 Jan 2000,12.60,13.30,12.27,12.34,2800500,,,,
07 Jan 2000,12.53,12.70,11.82,12.57,2763000,,,,
10 Jan 2000,13.58,13.58,13.58,13.58,13500,,,,
11 Jan 2000,14.66,14.66,13.40,13.47,1694220,,,,
12 Jan 2000,13.66,13.99,13.20,13.54,519164,,,,
13 Jan 2000,13.67,13.87,13.54,13.80,278400,,,,
14 Jan 2000,13.84,13.99,13.30,13.50,718814,,,,

我尝试了以下加载数据的方法.

I tried the following which loads the data.

df = read_csv('500008.csv', parse_dates=[0,1,2], usecols=range(6), 
                            header=None, index_col=0)

但现在我想命名要命名的列.所以，我试过了，

But now I want to name the columns to be named. So, I tried,

df = read_csv('500008.csv', parse_dates=[0,1,2], usecols=range(6),
                            header=None, index_col=0, names='d o h l c v'.split())

但这没有说，

IndexError: list index out of range

有人能指出我做错了什么吗?

Can someone point out what I am doing wrong?

推荐答案

我不知道它是错误还是功能，但您必须为所有列指定名称，即使您指定usecols

I don't know if its a bug or a feature but you have to specify names for all columns present even if you specify just a subset of columns to usecols

df = pd.read_csv(StringIO(raw),
                 parse_dates=True,
                 header=None,
                 index_col=0,
                 usecols=[0,1,2,3,4,5],
                 names='0 1 2 3 4 5 6 7 8 9'.split())

给出

                1      2      3      4        5
0                                              
2000-01-03  12.85  13.11  12.74  13.11   976500
2000-01-04  13.54  13.60  12.56  13.33  2493000
2000-01-05  12.68  13.34  12.37  12.68  1680000

我通过尝试为 names 和 usecols 指定一个完整列表的边缘情况，然后尝试逐渐减少，看看会发生什么.

I figured this by trying the edge case where you specify a full list to both names and usecols and tried then to gradually reduce and see what happens.

奇怪的是您在尝试例如 usecols=[1,2,3] 和 names=['1','2','3 时得到的错误消息']:

What is weired is the error message you get when you try for instance usecols=[1,2,3] and names=['1','2','3']:

ValueError: Passed header names mismatches usecols

这没有意义...

这篇关于usecols 与 parse_dates 和名称的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！