如何使os.walk
遍历FTP数据库(位于远程服务器上)的目录树?现在,代码的结构方式为(提供注释):
import fnmatch, os, ftplib
def find(pattern, startdir=os.curdir): #find function taking variables for both desired file and the starting directory
for (thisDir, subsHere, filesHere) in os.walk(startdir): #each of the variables change as the directory tree is walked
for name in subsHere + filesHere: #going through all of the files and subdirectories
if fnmatch.fnmatch(name, pattern): #if the name of one of the files or subs is the same as the inputted name
fullpath = os.path.join(thisDir, name) #fullpath equals the concatenation of the directory and the name
yield fullpath #return fullpath but anew each time
def findlist(pattern, startdir = os.curdir, dosort=False):
matches = list(find(pattern, startdir)) #find with arguments pattern and startdir put into a list data structure
if dosort: matches.sort() #isn't dosort automatically False? Is this statement any different from the same thing but with a line in between
return matches
#def ftp(
#specifying where to search.
if __name__ == '__main__':
import sys
namepattern, startdir = sys.argv[1], sys.argv[2]
for name in find(namepattern, startdir): print (name)
我在想我需要定义一个新函数(即
def ftp()
)将此功能添加到上面的代码中。但是,恐怕os.walk
函数默认情况下只会在运行代码的计算机的目录树中移动。有没有一种方法可以扩展
os.walk
的功能,使其能够遍历远程目录树(通过FTP)? 最佳答案
您所需要的只是利用python的ftplib
模块。由于os.walk()
基于广度优先搜索算法,因此您需要在每次迭代时查找目录和文件名,然后从第一个目录递归继续遍历。大约2年前,我实现了this algorithm的实现,以用作FTPwalker的心脏,它是通过FTP遍历超大型目录树的最佳软件包。
from os import path as ospath
class FTPWalk:
"""
This class is contain corresponding functions for traversing the FTP
servers using BFS algorithm.
"""
def __init__(self, connection):
self.connection = connection
def listdir(self, _path):
"""
return files and directory names within a path (directory)
"""
file_list, dirs, nondirs = [], [], []
try:
self.connection.cwd(_path)
except Exception as exp:
print ("the current path is : ", self.connection.pwd(), exp.__str__(),_path)
return [], []
else:
self.connection.retrlines('LIST', lambda x: file_list.append(x.split()))
for info in file_list:
ls_type, name = info[0], info[-1]
if ls_type.startswith('d'):
dirs.append(name)
else:
nondirs.append(name)
return dirs, nondirs
def walk(self, path='/'):
"""
Walk through FTP server's directory tree, based on a BFS algorithm.
"""
dirs, nondirs = self.listdir(path)
yield path, dirs, nondirs
for name in dirs:
path = ospath.join(path, name)
yield from self.walk(path)
# In python2 use:
# for path, dirs, nondirs in self.walk(path):
# yield path, dirs, nondirs
self.connection.cwd('..')
path = ospath.dirname(path)
现在,使用此类,您可以使用
ftplib
模块简单地创建一个连接对象,并将该对象传递给FTPWalk
对象,然后循环遍历walk()
函数:In [2]: from test import FTPWalk
In [3]: import ftplib
In [4]: connection = ftplib.FTP("ftp.uniprot.org")
In [5]: connection.login()
Out[5]: '230 Login successful.'
In [6]: ftpwalk = FTPWalk(connection)
In [7]: for i in ftpwalk.walk():
print(i)
...:
('/', ['pub'], [])
('/pub', ['databases'], ['robots.txt'])
('/pub/databases', ['uniprot'], [])
('/pub/databases/uniprot', ['current_release', 'previous_releases'], ['LICENSE', 'current_release/README', 'current_release/knowledgebase/complete', 'previous_releases/', 'current_release/relnotes.txt', 'current_release/uniref'])
('/pub/databases/uniprot/current_release', ['decoy', 'knowledgebase', 'rdf', 'uniparc', 'uniref'], ['README', 'RELEASE.metalink', 'changes.html', 'news.html', 'relnotes.txt'])
...
...
...
关于python - 在FTP服务器上扩展Python的os.walk函数,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/31465199/