我正在使用Python pdftables从pdf获取表数据,我按照git中给出的说明进行操作

https://github.com/drj11/pdftables

但是当我运行代码时

filepath = 'tests.pdf'
fileobj = open(filepath,'rb')
from pdftables.pdf_document import PDFDocument
doc = PDFDocument.from_fileobj(fileobj)


我得到这样的错误

       File "<stdin>", line 1, in <module>
       File "pdftables/pdf_document.py", line 53, in from_fileobj
       raise NotImplementedError


任何人都可以帮助我解决这个问题

最佳答案

如果查看实现from_fileobj功能的file,您会看到以下注释:

# TODO(pwaller): For now, put fh into a temporary file and call
# .from_path. Future: when we have a working stream input function for
# poppler, use that.


如果我理解正确,您应该改用from_path函数,因为尚未实现from_fileobj。使用您当前的代码,这很容易:

filepath = 'tests.pdf'
from pdftables.pdf_document import PDFDocument
doc = PDFDocument.from_path(filepath)

07-24 16:17