问题描述
Python版本:3
输入:包含采购订单的PDF文件输入示例:http://gem.compaq.com/gemstore/sites/downloads/SLED_PO_Template .pdf
注意:这是空的采购订单样本格式,实际格式可能会有所不同。实时pdf可能不是空的。
所需输出是从pdf获取密钥名称及其值。
样品输出:
采购订单编号:其pdf值(其他按键相同)
问题:如何从给定的pdf文件中提取密钥名称及其相关值数据?
我尝试过:
尝试tabula-py,pdfminer2,pdftotext,OCR,pdf2json。
但我面临的主要挑战是:将关键字与其真实值相关联。
Python Version: 3
Input: PDF file containing Purchase order Input Example: http://gem.compaq.com/gemstore/sites/downloads/SLED_PO_Template.pdf
Note: This is empty purchase order sample format, actual Format may vary. In real time pdf may not be empty.
Desired Output is to get key name and its value from pdf.
Sample Output:
PO number: its value in pdf (Same for other keys)
Question: How to extract name of keys and its relevant value data from given pdf file?
What I have tried:
Tried tabula-py, pdfminer2, pdftotext, OCR, pdf2json.
But main challenge I am facing is: Relating key with its true value.
推荐答案
这篇关于使用Python从采购订单(PDF文件)中提取密钥及其相关值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!