问题描述
我正在使用 vision.enums.Feature.Type.DOCUMENT_TEXT_DETECTION
提取pdf文档中的一些密集文本.这是我的代码:
I am using vision.enums.Feature.Type.DOCUMENT_TEXT_DETECTION
to extract some dense text in a pdf document. Here is my code:
from google.cloud import vision
def extract_text(bucket, filename, mimetype):
print('Looking for text in PDF {}'.format(filename))
# BATCH_SIZE; How many pages should be grouped into each json output file.
# """OCR with PDF/TIFF as source files on GCS"""
# Detect text
feature = vision.types.Feature(
type=vision.enums.Feature.Type.DOCUMENT_TEXT_DETECTION)
# Extract text from source bucket
gcs_source_uri = 'gs://{}/{}'.format(bucket, filename)
gcs_source = vision.types.GcsSource(uri=gcs_source_uri)
input_config = vision.types.InputConfig(
gcs_source=gcs_source, mime_type=mimetype)
request = vision.types.AnnotateFileRequest(features=[feature], input_config=input_config)
print('Waiting for the ORC operation to finish.')
ocr_response = vision_client.batch_annotate_files(requests=[request])
print('OCR completed.')
在回复中,我希望找到 ocr_response.responses [1 ... n] .pages [1 ... n] .blocks [1 ... n] .bounding_box
填写的顶点
列表,但此列表为空.而是有一个 normalized_vertices
列表,它们是介于0和1之间的归一化顶点.为什么会这样呢?为什么 vertices
结构为空?我正在关注此文章,那里的作者使用顶点
,但是我不明白为什么我没有得到它们.要将它们转换为非规范化的形式,我将规范化的顶点乘以高度和宽度,但是结果很糟糕,盒子的位置不好.
In the response, I am expecting to find into ocr_response.responses[1...n].pages[1...n].blocks[1...n].bounding_box
a list of vertices
filled in, but this list is empty. Instead, there is a normalized_vertices
list which are the normalised vertices between 0 and 1. Why is that so? why the vertices
structure is empty?I am following this article, and the author there uses vertices
, but I don't understand why I don't get them.To convert them to the non normalised form, I am multiplying the normalised vertex by height and width, but the result is awful, the boxes are not well positioned.
推荐答案
要将标准化顶点转换为顶点,您应该将NormalizedVertex的x字段乘以width值以获得顶点的x字段并乘以的y字段高度值来获取NormalizedVertex即可获得顶点的y.
To convert Normalized Vertex to Vertex you should multiply the x field of your NormalizedVertex with the width value to get the x field of the Vertex and multiply the y field of your NormalizedVertex with the height value to get the y of the Vertex.
原因为什么要获得归一化顶点和作者中篇文章获得顶点"的原因是因为2020年5月15日以来TEXT_DETECTION和DOCUMENT_TEXT_DETECTION模型已升级到较新版本,中篇文章撰写于2018年12月25日.
The reason why you get Normalized Vertex, and the author of Medium article get Vertex is because the TEXT_DETECTION and DOCUMENT_TEXT_DETECTION models have been upgraded to newer versions since May 15, 2020, and medium article was written on Dec 25, 2018.
要使用旧模型生成结果,必须在Feature对象的model字段中指定"builtin/legacy_20190601"以获取旧模型结果.
To use legacy models for results, you must specify "builtin/legacy_20190601" in the model field of a Feature object to get the old model results.
但是Google的文档提到,到2020年11月15日之后,将不再提供旧型号.
But the Google's doc mention that after November 15, 2020 the old models will not longer be offered.
这篇关于google vision API返回空的边界框顶点,而是返回normalized_vertexes的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!