问题描述
我正在使用glob
将文件名提供给循环,如下所示:
I'm using glob
to feed file names to a loop like so:
inputcsvfiles = glob.iglob('NCCCSM*.csv')
for x in inputcsvfiles:
csvfilename = x
do stuff here
我用来制作此脚本原型的玩具示例可以很好地与2个,10个甚至100个输入的csv文件配合使用,但实际上我需要它来循环10959个文件.当使用那么多文件时,该脚本在第一次迭代后便停止工作,并且无法找到第二个输入文件.
The toy example that I used to prototype this script works fine with 2, 10, or even 100 input csv files, but I actually need it to loop through 10,959 files. When using that many files, the script stops working after the first iteration and fails to find the second input file.
鉴于该脚本在合理"的条目数(2-100)下绝对可以正常工作,但在我需要的条目数(10,959)上却没有,这是一种更好的方法来处理这种情况,或者我可以使用某种参数可以设置为允许大量迭代吗?
Given that the script works absolutely fine with a "reasonable" number of entries (2-100), but not with what I need (10,959) is there a better way to handle this situation, or some sort of parameter that I can set to allow for a high number of iterations?
PS-最初我使用的是glob.glob
,但是glob.iglob表现不佳.
PS- initially I was using glob.glob
, but glob.iglob fairs no better.
上面的扩展以获取更多上下文...
An expansion of above for more context...
# typical input file looks like this: "NCCCSM20110101.csv", "NCCCSM20110102.csv", etc.
inputcsvfiles = glob.iglob('NCCCSM*.csv')
# loop over individial input files
for x in inputcsvfiles:
csvfile = x
modelname = x[0:5]
# ArcPy
arcpy.AddJoin_management(inputshape, "CLIMATEID", csvfile, "CLIMATEID", "KEEP_COMMON")
do more stuff after
脚本在ArcPy行失败,该行将"csvfile"变量传递到命令中.报告的错误是它找不到指定的csv文件(例如"NCCSM20110101.csv"),而实际上csv肯定在目录中.难道您不能像我上面那样多次重复使用声明的变量(x)?同样,如果要遍历的目录只有100个左右的文件,则此方法会很好地工作,但是如果有很多文件(例如10,959),则在列表下方的某个地方似乎会任意失败.
The script fails at the ArcPy line, where the "csvfile" variable gets passed into the command. The error reported is that it can't find a specified csv file (e.g., "NCCSM20110101.csv"), when in fact, the csv is definitely in the directory. Could it be that you can't reuse a declared variable (x) multiple times as I have above? Again, this will work fine if the directory being glob'd only has 100 or so files, but if there's a whole lot (e.g., 10,959), it fails seemingly arbitrarily somewhere down the list.
推荐答案
出现的一个问题不是Python本身,而是ArcPy和/或 MS处理CSV文件(我认为是后者).循环迭代时,它将创建一个schema.ini
文件,从而添加和存储有关在循环中处理的每个CSV文件的信息.随着时间的流逝,schema.ini
变得相当大,我相信这是性能问题出现的时候.
One issue that arose was not with Python per se, but rather with ArcPy and/or MS handling of CSV files (more the latter, I think). As the loop iterates, it creates a schema.ini
file whereby information on each CSV file processed in the loop gets added and stored. Over time, the schema.ini
gets rather large and I believe that's when the performance issues arise.
我的解决方案虽然可能不太优雅,但在每次循环期间都请删除schema.ini
文件,以避免出现此问题.这样可以使我处理10k + CSV文件,尽管速度很慢.说实话,最后我们最终使用GRASS和BASH脚本.
My solution, although perhaps inelegant, was do delete the schema.ini
file during each loop to avoid the issue. Doing so allowed me to process the 10k+ CSV files, although rather slowly. Truth be told, we wound up using GRASS and BASH scripting in the end.
这篇关于局限在Python的全球范围内吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!