问题描述
我正在使用Camelot读取完整的PDF,并从每个PDF中提取大约112个属性.
I am using Camelot to read complete PDFs and extract about 112 attributes from each one.
我使用表格区域提取属性
I use table areas to extract the attributes
test_variable = camelot.read_pdf(filename, flavor='stream',
table_areas=['38, 340 ,50, 328'])
问题是在所有文档中,同一属性的表区域不是恒定的.有时,我会在另一个文档中的x或y坐标上找到同一属性,仅向下几个像素.
The issue is the table area is not constant for the same attribute across all documents. Sometimes I would find the same attribute a few pixels down in x or y-coordinates i another document.
test_variable = camelot.read_pdf(filename, flavor='stream',
table_areas=['38,350,50,338'])
是否有一种方法可以从同一区域获取确切的属性,而与提取任何文档无关?
Is there a way to get the exact attribute from the same area regardless of extraction of any document?
推荐答案
也许table_regions选项(在0.7中引入)可以为您提供帮助.
Maybe the option table_regions (introduced in 0.7) can help you.
https://camelot-py .readthedocs.io/en/master/user/advanced.html#specify-table-regions
指定table_regions时,Camelot将仅分析指定的区域以查找表."
"When table_regions is specified, Camelot will only analyze the specified regions to look for tables."
您可以定义一个较大的table_regions区域,而Camelot将在该区域中搜索表.
You can define a larger table_regions area and Camelot will search for tables in this area.
这篇关于使用Camelot查找PDF尺寸的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!