问题描述
我有一个Windows路径对象列表,我正在运行if语句.背景:我有几个csv文件.我的代码检查这些csv文件.如果csv文件是好的,则脚本将文件移动到称为存档"的目录.如果存在错误,则移至错误",如果为空,则移至空".
HI I have a list of windows path objects which I am running an if statement on. Background: I have several csv files. My code checks these csv files. If csv file is good, the script moves the file to a dir called "archive". If there is an error its moved to "error", if its empty it goes to "empty".
所以我有一个文件已移至存档.我将此文件复制回基本目录,以供脚本处理. 但是,应该捕获该重复项的if语句不会执行,而是脚本尝试将文件移至存档目录.发生这种情况时,由于我使用Path.rename()方法移动文件,因此出现以下错误:FileExistsError:[WinError 183]该文件已存在时无法创建文件:'C:\ Users \ sys_nsgprobeingestio \ Documents \ dozie \ odfs \ odfshistory \ 06_17_2020_FMGN520.csv'-> 'C:\ Users \ sys_nsgprobeingestio \ Documents \ dozie \ odfs \ odfshistory \ archive \ 06_17_2020_FMGN520.csv'
So I have a file that's been moved to archive. I copied this file back over to base dir for the script to process it. However the if statement that is supposed to catch this duplicate doesn't execute and instead the script tries to move the file to the archive dir. When this happens, becaue I am using the Path.rename() method to move my files, I get the following error:FileExistsError: [WinError 183] Cannot create a file when that file already exists: 'C:\Users\sys_nsgprobeingestio\Documents\dozie\odfs\odfshistory\06_17_2020_FMGN520.csv' -> 'C:\Users\sys_nsgprobeingestio\Documents\dozie\odfs\odfshistory\archive\06_17_2020_FMGN520.csv'
这些是涉及的功能.有人知道为什么会这样吗?:
These are the functions involved. Anyone know why this is happening?:
def make_dict_of_csvprocessing_dirs():
dir_dict = process_dirconfig_file(dirconfig_file)
# print(dir_dict)
dictofpdir_flist = {} #dictionary of lists of files in different processing dirs
csvbase_file_dir = dir_dict["base_dir"]
csvhistory_Phandler = Path(csvbase_file_dir)
csvbase_path_list = [file for file in csvhistory_Phandler.glob("*.*")]
dictofpdir_flist["csvbase_path_list"] = csvbase_path_list
archive_dir = dir_dict["archive_dir"]
archive_Phandler = Path(archive_dir)
archivefiles_path_set = {file for file in archive_Phandler.rglob("*.*")}
dictofpdir_flist["archivefiles_path_set"] = archivefiles_path_set
发生错误的函数:
def odf_history_from_csv_to_dbtable(db_instance):
odfsdict = db_instance['odfs_tester_history']
#table_row = {}
totalresult_list = []
dir_dict, dictofpdir_flist = make_dict_of_csvprocessing_dirs()
print(dir_dict)
csvbase_path_list = dictofpdir_flist["csvbase_path_list"]
archivefiles_path_set = dictofpdir_flist["archivefiles_path_set"]
for csv in csvbase_path_list: # is there a faster way to compare the list of files in archive and history?
if csv in archivefiles_path_set:
print(csv.name + " is in archive folder already")
else:
csvhistoryfilelist_to_dbtable(csv, db_instance)
df_tuple = process_csv_formatting(csv)
df_cnum, odfscsv_df = df_tuple
if df_cnum == 1:
trg_path = Path(dir_dict['empty_dir'])
csv.rename(trg_path.joinpath(csv.name))
return totalresult_list
当我调试Pycharm时,会得到以下值:请注意目录列表的勾号是如何反转的.我想知道这是否是问题吗?:
archivefiles_path_set={WindowsPath('C:/Users/sys_nsgprobeingestio/Documents/dozie/odfs/odfshistory/archive/06_17_2020_FMGN520.csv')}
csv = {WindowsPath}C:\Users\sys_nsgprobeingestio\Documents\dozie\odfs\odfshistory\06_17_2020_FMGN520.csv
csvbase_path_list =
[WindowsPath('C:/Users/sys_nsgprobeingestio/Documents/dozie/odfs/odfshistory/06_17_2020_FMGN520.csv')]
推荐答案
可能是获取要复制的文件的最快方法(如果您是同时访问这两个目录的唯一进程):
Propbably fastest way to get which files to copy (if you are the only process accessing both dirs):
from os import listdir
basedir = r"c:/temp/csvs"
archdir = os.path.join(basedir,"temp")
def what_to_copy(frm_dir, to_dir):
return set(os.listdir(frm_dir)).difference(os.listdir(to_dir))
copy_names = what_to_copy(basedir, archdir)
print(copy_names) # you need to prepend the dirs when copying, use os.path.join
看来,您的代码非常复杂(将大量内容存储在字典中以进行传输以再次输出).这就是它的工作方式:
It seems your code is quite complex (lots of storing stuff in dicts to transfer to get it out again) for that little of a task. This is how it could work:
import os
# boiler plate code to create files and make some of them already "archived"
names = [ f"file_{i}.csv" for i in range(10,60)]
basedir = r"c:/temp/csvs"
archdir = os.path.join(basedir,"temp")
os.makedirs(basedir, exist_ok = True)
os.makedirs(archdir, exist_ok = True)
def create_files():
for idx, fn in enumerate(names):
# create all files in basedir
with open(os.path.join(basedir,fn),"w") as f:
f.write(" ")
# every 3rd file goes into archdir as well
if idx%3 == 0:
with open(os.path.join(archdir,fn),"w") as f:
f.write(" ")
create_files()
复制"功能一个文件(如果尚不存在):
Function to "copy" a file if not yet exists:
def copy_from_to_if_not_exists(frm,to):
"""'frm' full path to file, 'to' directory to copy to"""
# norm paths so they compare equally regardless of C:/temp or C:\\temp
frm = os.path.normpath(frm)
to = os.path.normpath(to)
fn = os.path.basename(frm)
dir = os.path.dirname(frm)
if dir != to:
if fn in os.listdir(to):
print(fn, " -> already exists!")
else:
# you would copy the file instead ...
print(fn, " -> could be copied")
# print whats in the basedir as well as the archivedir (os.walk descends subdirs)
for root,dirs,files in os.walk(basedir):
print(root + ":", files, sep="\n")
for file in os.listdir(basedir):
copy_from_to_if_not_exists(os.path.join(basedir,file),archdir)
如果硬盘驱动器的读取缓存优化不足以满足您的需求,则可以缓存os.listdir(to)
的结果,但它可能照样保存.
If the read cache optimization of your harddrive is not good enough for you, you can cache the result of os.listdir(to)
but its probably fine as is.
输出:
c:/temp/csvs:
['file_10.csv','file_11.csv','file_12.csv','file_13.csv','file_14.csv','file_15.csv',
'file_16.csv','file_17.csv','file_18.csv','file_19.csv','file_20.csv','file_21.csv',
'file_22.csv','file_23.csv','file_24.csv','file_25.csv','file_26.csv','file_27.csv',
'file_28.csv','file_29.csv','file_30.csv','file_31.csv','file_32.csv','file_33.csv',
'file_34.csv','file_35.csv','file_36.csv','file_37.csv','file_38.csv','file_39.csv',
'file_40.csv','file_41.csv','file_42.csv','file_43.csv','file_44.csv','file_45.csv',
'file_46.csv','file_47.csv','file_48.csv','file_49.csv','file_50.csv','file_51.csv',
'file_52.csv','file_53.csv','file_54.csv','file_55.csv','file_56.csv','file_57.csv',
'file_58.csv','file_59.csv']
c:/temp/csvs\temp:
['file_10.csv','file_13.csv','file_16.csv','file_19.csv','file_22.csv','file_25.csv',
'file_28.csv','file_31.csv','file_34.csv','file_37.csv','file_40.csv','file_43.csv',
'file_46.csv','file_49.csv','file_52.csv','file_55.csv','file_58.csv']
file_10.csv -> already exists!
file_11.csv -> could be copied
file_12.csv -> could be copied
file_13.csv -> already exists!
file_14.csv -> could be copied
file_15.csv -> could be copied
file_16.csv -> already exists!
file_17.csv -> could be copied
file_18.csv -> could be copied
[...snipp...]
file_55.csv -> already exists!
file_56.csv -> could be copied
file_57.csv -> could be copied
file_58.csv -> already exists!
file_59.csv -> could be copied
有关缓存功能结果的方法,请参见 lru_cache -并考虑将os.listdir(archdir)
放到一个函数中,如果IO读取成为瓶颈(首先测量,然后进行优化),该函数将缓存结果
See lru_cache for ways to cache results of functions - and consider putting the os.listdir(archdir)
into a function that caches the result if IO reading gets to be a bottleneck (measure first, then optimize)
这篇关于如果满足语句条件但不执行(Python)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!