我正在尝试在我的代码中使用multiprocessing
以获得更好的性能。
但是,我得到如下错误:
Traceback (most recent call last):
File "D:\EpubBuilder\TinyEpub.py", line 49, in <module>
e.epub2txt()
File "D:\EpubBuilder\TinyEpub.py", line 43, in epub2txt
tempread = self.get_text()
File "D:\EpubBuilder\TinyEpub.py", line 29, in get_text
txtlist = pool.map(self.char2text,charlist)
File "C:\Python34\lib\multiprocessing\pool.py", line 260, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "C:\Python34\lib\multiprocessing\pool.py", line 599, in get
raise self._value
File "C:\Python34\lib\multiprocessing\pool.py", line 383, in _handle_tasks
put(task)
File "C:\Python34\lib\multiprocessing\connection.py", line 206, in send
self._send_bytes(ForkingPickler.dumps(obj))
File "C:\Python34\lib\multiprocessing\reduction.py", line 50, in dumps
cls(buf, protocol).dump(obj)
TypeError: cannot serialize '_io.BufferedReader' object
我尝试了另一种方式,并收到此错误:
TypeError: cannot serialize '_io.TextIOWrapper' object
我的代码如下所示:
from multiprocessing import Pool
class Book(object):
def __init__(self, arg):
self.namelist = arg
def format_char(self,char):
char = char + "a"
return char
def format_book(self):
self.tempread = ""
charlist = [f.read() for f in self.namelist] #list of char
with Pool() as pool:
txtlist = pool.map(self.format_char,charlist)
self.tempread = "".join(txtlist)
return self.tempread
if __name__ == '__main__':
import os
b = Book([open(f) for f in os.listdir()])
t = b.format_book()
print(t)
我认为由于未在主函数中使用
Pool
而引发了错误。我的猜想是对的吗?以及如何修改我的代码以修复错误?
最佳答案
问题在于,您在namelist
实例中有一个无法拾取的实例变量(Book
)。因为您要在实例方法上调用pool.map
,并且您正在Windows上运行,所以整个实例都必须是可腌制的,才能将其传递给子进程。 Book.namelist
是一个打开的文件对象(_io.BufferedReader
),不能被腌制。您可以通过两种方法解决此问题。根据示例代码,您似乎可以将format_char
设为顶级函数:
def format_char(char):
char = char + "a"
return char
class Book(object):
def __init__(self, arg):
self.namelist = arg
def format_book(self):
self.tempread = ""
charlist = [f.read() for f in self.namelist] #list of char
with Pool() as pool:
txtlist = pool.map(format_char,charlist)
self.tempread = "".join(txtlist)
return self.tempread
但是,实际上,如果您需要
format_char
作为实例方法,则可以在腌制之前从实例中删除__getstate__
参数,从而使用 __setstate__
/ Book
使namelist
可腌制:class Book(object):
def __init__(self, arg):
self.namelist = arg
def __getstate__(self):
""" This is called before pickling. """
state = self.__dict__.copy()
del state['namelist']
return state
def __setstate__(self, state):
""" This is called while unpickling. """
self.__dict__.update(state)
def format_char(self,char):
char = char + "a"
def format_book(self):
self.tempread = ""
charlist = [f.read() for f in self.namelist] #list of char
with Pool() as pool:
txtlist = pool.map(self.format_char,charlist)
self.tempread = "".join(txtlist)
return self.tempread
只要您不需要在子进程中访问
namelist
,就可以。