问题描述
我正在尝试读取特定目录的所有 .xlsx
文件,想法是加载每个目录文件的每个 Excel 电子表格,将它们存储为 .pandas
数据框,然后将所有报告的每个电子表格存储为字典.
I am trying to read all .xlsx
files for an specific directory, the idea is to load every excel spreadsheet for every directory files, store them as a .pandas
dataframe and then store each spreadsheet for all reports as a dictionary.
在我的尝试中,错误 BadZipFile: File is not a zip file
不断出现,
In my attempt, the error BadZipFile: File is not a zip file
keeps arising,
我错过了什么?
阅读报告并连接每个 Excel 表:
import openpyxl
from openpyxl import Workbook
import pandas as pd
from openpyxl import load_workbook
############### path settlement and file names ##########
path_reportes = 'Reports/xlsx_folder'
file_names = os.listdir(path_reportes)
overall_df = dict()
############## concatenate all reports ##################
for file_name in file_names:
data_file_path = os.path.join(path_reportes, file_name)
# Start by opening the spreadsheet and selecting the main sheet
workbook = load_workbook(filename=data_file_path)
#sheet = workbook.active
# Save the spreadsheet
workbook.save(filename=data_file_path)
df_report_dict = pd.read_excel(data_file_path, sheet_name=None, engine='openpyxl')
for key in df_report_dict:
df_report_dict[key]['report_name'] = file_name
try:
overall_df[key] = overall_df[key].append(df_report_dict[key], ignore_index=True)
except:
overall_df[key] = df_report_dict[key]
输出下一个错误:
BadZipFile:文件不是 zip 文件
完全回溯错误:
---------------------------------------------------------------------------
BadZipFile Traceback (most recent call last)
<ipython-input-6-5e32988240ae> in <module>
10
11 # Start by opening the spreadsheet and selecting the main sheet
---> 12 workbook = load_workbook(filename=data_file_path)
13 #sheet = workbook.active
14
/usr/local/lib/python3.6/site-packages/openpyxl/reader/excel.py in load_workbook(filename, read_only, keep_vba, data_only, keep_links)
314 """
315 reader = ExcelReader(filename, read_only, keep_vba,
--> 316 data_only, keep_links)
317 reader.read()
318 return reader.wb
/usr/local/lib/python3.6/site-packages/openpyxl/reader/excel.py in __init__(self, fn, read_only, keep_vba, data_only, keep_links)
122 def __init__(self, fn, read_only=False, keep_vba=KEEP_VBA,
123 data_only=False, keep_links=True):
--> 124 self.archive = _validate_archive(fn)
125 self.valid_files = self.archive.namelist()
126 self.read_only = read_only
/usr/local/lib/python3.6/site-packages/openpyxl/reader/excel.py in _validate_archive(filename)
94 raise InvalidFileException(msg)
95
---> 96 archive = ZipFile(filename, 'r')
97 return archive
98
/usr/local/lib/python3.6/zipfile.py in __init__(self, file, mode, compression, allowZip64)
1129 try:
1130 if mode == 'r':
-> 1131 self._RealGetContents()
1132 elif mode in ('w', 'x'):
1133 # set the modified flag so central directory gets written
/usr/local/lib/python3.6/zipfile.py in _RealGetContents(self)
1196 raise BadZipFile("File is not a zip file")
1197 if not endrec:
-> 1198 raise BadZipFile("File is not a zip file")
1199 if self.debug > 1:
1200 print(endrec)
BadZipFile: File is not a zip file
推荐答案
我尝试复制您的实验并创建了虚拟 excel 文件,在对您的代码稍作修改后,我没有收到您的错误消息.试试下面的代码,看看它是否能解决你的问题:
I tried to replicate your experiment and created dummy excel files and after little modification to your code, I didn't get your error. try the code bellow to see if it solve your problem:
import openpyxl
from openpyxl import Workbook
import pandas as pd
from openpyxl import load_workbook
import os
############### path settlement and file names ##########
path_reportes = os.path.join(os.getcwd(), 'Reports', 'xlsx_folder')
file_names = os.listdir(path_reportes)
overall_df = dict()
############## concatenate all reports ##################
for file_name in file_names:
data_file_path = os.path.join(path_reportes, file_name)
# Start by opening the spreadsheet and selecting the main sheet
workbook = load_workbook(filename=data_file_path)
#sheet = workbook.active
# Save the spreadsheet
workbook.save(filename=data_file_path)
df_report_dict = pd.read_excel(data_file_path, sheet_name=None, engine='openpyxl')
for key in df_report_dict:
df_report_dict[key]['report_name'] = file_name
try:
overall_df[key] = overall_df[key].append(df_report_dict[key], ignore_index=True)
except:
overall_df[key] = df_report_dict[key]
这篇关于如何使用openpyxl正确读取excel文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!