问题描述
lines = []
total_check = 0
with pdfplumber.open(file) as pdf:
pages = pdf.pages
for page in pdf.pages:
text = page.extract_text()
for line in text.split('\n'):
print(line)
输出数据:
Totaalbedrag excl. btw € 25,00
当我尝试从数据中获取增值税时:
KVK_re = re.compile(r'(excl. btw .+)')
KVK_re.search(data).group(0)
输出:AttributeError:'NoneType'对象没有属性'group'
KVK_re = re.compile(r'(excl. btw .+)')
KVK_re.search(r'excl. btw € 25,00').group(0)
输出:'excl.大约€25,00'
当我将文字输出粘贴到搜索中时,怎么可能找到数字€25,00,而当输入数据变量时却找不到数字?
请帮助我!
在大多数情况下,当模式中使用文字空间并且不匹配时,原因是不可见的字符或不间断的空格. /p>
当您使用不间断的空格\xA0
时,可以简单地将文字空间替换为\s
来匹配任何空白,或者使用[ \xA0]
来替换任何一个空格.
在这种情况下,似乎可能是空格和一些不可见字符的组合,因此,您可以使用\W
匹配任何非单词字符而不是文字空间:
r'excl\.\W+btw\W.+'
lines = []
total_check = 0
with pdfplumber.open(file) as pdf:
pages = pdf.pages
for page in pdf.pages:
text = page.extract_text()
for line in text.split('\n'):
print(line)
output data:
Totaalbedrag excl. btw € 25,00
When I try to retrieve VAT from data:
KVK_re = re.compile(r'(excl. btw .+)')
KVK_re.search(data).group(0)
output: AttributeError: 'NoneType' object has no attribute 'group'
KVK_re = re.compile(r'(excl. btw .+)')
KVK_re.search(r'excl. btw € 25,00').group(0)
output: 'excl. btw € 25,00'
How is it possible that when I paste the literal output in a search it does find the number € 25,00 and when I enter the data variable it does not?
Please help me!
In most cases, when a literal space is used in the pattern and there is no match, the reason is the invisible characters, or non-breaking spaces.
When you have non-breaking spaces, \xA0
, you can simply replace the literal spaces with \s
to match any whitespace, or [ \xA0]
to match either of the spaces.
It appears there may be a combination of both spaces and some invisible chars in this case, thus, you may use \W
to match any non-word chars instead of a literal space:
r'excl\.\W+btw\W.+'
这篇关于为什么在RegEx中找不到此字符串?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!