问题描述
当前正在尝试创建一个python脚本,该脚本将检查google文档中的各种SEO页面指标.
Currently trying to create a python script that will check a google document for various SEO onpage metrics.
google docs API的好示例显示了如何从Google文档中提取所有文本.但是,这只会返回不带格式的纯文本.
The google docs API has a good sample showing how to extract ALL the text from a google document. However, this simply returns plain text with no formatting.
要执行检查,我需要将H1,H2-H4,粗体文本等拆分出来,但是经过两个小时的玩耍/在API文档/网络中进行搜索后,我不知道该如何做.编辑以下循环以获取(例如)所有HEADING_2元素.
To perform my checks I need to be able to split out the H1, H2-H4, text in bold etc but after two hours of playing around/searching around the API docs/web, I can't figure out how to edit the following loop to be able to get (for example) all the HEADING_2 elements.
text = ''
for value in elements:
if 'paragraph' in value:
elements = value.get('paragraph').get('elements')
for elem in elements:
text += read_paragraph_element(elem)
elif 'table' in value:
# The text in table cells are in nested Structural Elements and tables may be
# nested.
table = value.get('table')
for row in table.get('tableRows'):
cells = row.get('tableCells')
for cell in cells:
text += read_strucutural_elements(cell.get('content'))
elif 'tableOfContents' in value:
# The text in the TOC is also in a Structural Element.
toc = value.get('tableOfContents')
text += read_strucutural_elements(toc.get('content'))
return text
任何帮助表示赞赏.谢谢.
Any help appreciated. Thanks.
推荐答案
我相信您的目标和当前情况如下.
I believe your goal and your current situation as follows.
- 您要检索段落样式的
HEADING_2
的文本. - 您要使用适用于python的googleapis实现此目标.
- 您想使用问题中的脚本实现目标.
- 您已经使用Docs API从Google文档中获取了值.
- 在这种情况下,我认为当
namedStyleType
的值为HEADING_2
时,需要检索文本.
- In this case, I thought that when the value of
namedStyleType
isHEADING_2
, the text is required to be retrieved.
当这一点反映到您的脚本中时,它如下所示.
When this point is reflected to your script, it becomes as follows.
for value in elements:
if 'paragraph' in value:
elements = value.get('paragraph').get('elements')
至:
for value in elements:
if 'paragraph' in value and value['paragraph']['paragraphStyle']['namedStyleType'] == 'HEADING_2': # Modified
elements = value.get('paragraph').get('elements')
参考:
- NamedStyleType
这篇关于如何使用API从Google文档中提取标题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!