问题描述
my_string = " Name Last_Name Place"
my_string_another = "Aman Raparia India"
我在上面提供了两个字符串,这不是CSV的输出.目前,我正在做的是读取第一个字符串并将其转换为这样的列表
I have two string which I have provided above and this is not an output of CSV. At present what I am doing is that I read the first string and convert to a list like this
my_string = my_string.strip("\r\n")
my_string = my_string.split(" ")
my_string[:] = [elem for elem in my_string if elem != ""]
以
my_string = ['Name', 'Last_Name', 'Place']
Similary我这样做是为了让my_string_another生成另一个列表,
Similary I do this for my_string_another to produce another list as
my_another_string = ["Aman", "Raparia", "India"]
因此,我可以轻松地创建一个dict对象.
Hence I can easily create a dict object.
当my_string_another缺少以下字段之一时会发生问题:-
The problem occurs when my_string_another is missing one of the fields like:-
my_string_another = "Aman India"
当我使用相同的逻辑将my_string_another转换为它生成的列表时
When I use my same logic to convert the my_string_another to a list it produces
my_string_another = ["Aman", "India"]
这样,当我将它们映射在一起时,它将映射到姓氏"而不是"Place".
So that when I map them together it will be mapped to the Last Name, not to Place.
有没有一种方法可以获取以下格式的输出:-
Is there a way I can get the output in the format of:-
my_another_string = ["Aman", "NA", "India"]
因此,当我同时映射两个String时,它们将正确匹配.
So that when I map both the String they are matched properly.
推荐答案
您可以使用 re
模块:
You could use the re
module:
>>> import re
>>> my_string = " Name Last_Name Place"
>>> my_string_another = "Aman Raparia India"
>>> re.search('(\S+)\s+(\S*)\s+(\S+)',my_string).groups()
('Name', 'Last_Name', 'Place')
>>> re.search('(\S+)\s+(\S*)\s+(\S+)',my_string_another).groups()
('Aman', 'Raparia', 'India')
>>> my_string_another = "Aman India"
>>> re.search('(\S+)\s+(\S*)\s+(\S+)',my_string_another).groups()
('Aman', '', 'India')
这大概意味着:捕获三组非空格字符.中间是可选的.
This roughly means: capture three groups of non-white-spaces characters. The middle one is optionnal.
然后您可以使用列表推导通过NA更改空字符串:
You can then use list comprehension to change the empty string by NA:
>>> m = re.search('(\S+)\s+(\S*)\s+(\S+)',my_string_another).groups()
>>> m = [i if i else 'NA' for i in m]
>>> m
['Aman', 'NA', 'India']
这篇关于(Python)识别缺少的字符并替换为NA的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!