问题描述
我在一个文件夹中有250个csv文件,我使用以下代码将它们导入到单个数据框中:
I have 250 csv files in a folder, i used the following code to import them to a single dataframe:
files = "~/*.csv"
df = pd.concat([pd.read_csv(f, dtype='str') for f in glob.glob(files)], ignore_index=True)
我的问题是我没有任何文件中的日期信息,该日期在文件名中提到,例如"LSH_190207
",即7-Feb-2019
.有没有一种方法可以在导入文件时(最好作为索引)在数据框中包含此信息.或者至少创建一个包含文件名的新列,以便以后可以拆分&将其格式化为日期列.
My problem is i dont have date info inside any of the files, the date is mentioned in the filename like "LSH_190207
" which is 7-Feb-2019
.Is there a way i can include this info in the dataframe while importing the files, preferably as index.Or at least create a new column that would contain the file names, so i can later split & format it into date column.
推荐答案
是的,
假设文件列表为
files = glob.glob('*.csv')
#['file1_LSH_190207_something.csv', 'file2_LSH_190208_something.csv']
#[f.split("_")[2] for f in files] gives ['190207', '190208']
这将创建一个日期列,其日期值为字符串:
This will create a date column with the value of date as string:
df= pd.concat([pd.read_csv(f, dtype='str').assign(date= f.split("_")[2]) for f in files],\
ignore_index=True)
示例输出:
A B C date
0 1 2 3 190207
1 4 5 6 190207
2 5 6 8 190208
3 9 1 3 190208
发布此信息后,您可以执行以下操作以将日期转换为自己的格式:
Post this you could do the below to convert the date in your own format:
pd.to_datetime(df['date']).dt.strftime('%d-%b-%Y')
0 07-Feb-2019
1 08-Feb-2019
2 09-Feb-2019
这篇关于从文件名导入多个csv时创建日期索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!