问题描述
我正在研究一个脚本,以递归方式遍历主文件夹中的子文件夹,并根据某种文件类型构建一个列表.我的脚本有问题.当前设置如下
I am working on a script to recursively go through subfolders in a mainfolder and build a list off a certain file type. I am having an issue with the script. Its currently set as follows
for root, subFolder, files in os.walk(PATH):
for item in files:
if item.endswith(".txt") :
fileNamePath = str(os.path.join(root,subFolder,item))
问题在于subFolder变量提取的是子文件夹列表,而不是ITEM文件所在的文件夹.我曾考虑过为子文件夹运行一个for循环,然后加入路径的第一部分,但我想出了ID双重检查以了解在此之前是否有人提出任何建议.感谢您的帮助!
the problem is that the subFolder variable is pulling in a list of subfolders rather than the folder that the ITEM file is located. I was thinking of running a for loop for the subfolder before and join the first part of the path but I figured Id double check to see if anyone has any suggestions before that. Thanks for your help!
推荐答案
您应该使用称为root
的dirpath
.提供了dirnames
,以便在不希望将os.walk
递归到其中的文件夹时进行修剪.
You should be using the dirpath
which you call root
. The dirnames
are supplied so you can prune it if there are folders that you don't wish os.walk
to recurse into.
import os
result = [os.path.join(dp, f) for dp, dn, filenames in os.walk(PATH) for f in filenames if os.path.splitext(f)[1] == '.txt']
在最近一次投票之后,我想到glob
是按扩展名选择的更好工具.
After the latest downvote, it occurred to me that glob
is a better tool for selecting by extension.
import os
from glob import glob
result = [y for x in os.walk(PATH) for y in glob(os.path.join(x[0], '*.txt'))]
也是生成器版本
from itertools import chain
result = (chain.from_iterable(glob(os.path.join(x[0], '*.txt')) for x in os.walk('.')))
Edit2 for Python 3.4 +
from pathlib import Path
result = list(Path(".").rglob("*.[tT][xX][tT]"))
这篇关于递归子文件夹搜索并返回列表python中的文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!