(Python)识别缺少的字符并替换为NA

本文介绍了(Python)识别缺少的字符并替换为NA的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

my_string = "        Name         Last_Name              Place"
my_string_another = "Aman         Raparia                India"

我在上面提供了两个字符串，这不是CSV的输出.目前，我正在做的是读取第一个字符串并将其转换为这样的列表

I have two string which I have provided above and this is not an output of CSV. At present what I am doing is that I read the first string and convert to a list like this

my_string = my_string.strip("\r\n")
my_string = my_string.split(" ")
my_string[:] = [elem for elem in my_string if elem != ""]

以

my_string = ['Name', 'Last_Name', 'Place']

Similary我这样做是为了让my_string_another生成另一个列表，

Similary I do this for my_string_another to produce another list as

my_another_string = ["Aman", "Raparia", "India"]

因此，我可以轻松地创建一个dict对象.

Hence I can easily create a dict object.

当my_string_another缺少以下字段之一时会发生问题:-

The problem occurs when my_string_another is missing one of the fields like:-

my_string_another = "Aman                             India"

当我使用相同的逻辑将my_string_another转换为它生成的列表时

When I use my same logic to convert the my_string_another to a list it produces

my_string_another = ["Aman", "India"]

这样，当我将它们映射在一起时，它将映射到姓氏"而不是"Place".

So that when I map them together it will be mapped to the Last Name, not to Place.

有没有一种方法可以获取以下格式的输出:-

Is there a way I can get the output in the format of:-

 my_another_string = ["Aman", "NA", "India"]

因此，当我同时映射两个String时，它们将正确匹配.

So that when I map both the String they are matched properly.

推荐答案

您可以使用 re模块:

You could use the re module:

>>> import re
>>> my_string = "        Name         Last_Name              Place"
>>> my_string_another = "Aman         Raparia                India"
>>> re.search('(\S+)\s+(\S*)\s+(\S+)',my_string).groups()
('Name', 'Last_Name', 'Place')
>>> re.search('(\S+)\s+(\S*)\s+(\S+)',my_string_another).groups()
('Aman', 'Raparia', 'India')
>>> my_string_another = "Aman                             India"
>>> re.search('(\S+)\s+(\S*)\s+(\S+)',my_string_another).groups()
('Aman', '', 'India')

这大概意味着:捕获三组非空格字符.中间是可选的.

This roughly means: capture three groups of non-white-spaces characters. The middle one is optionnal.

然后您可以使用列表推导通过NA更改空字符串:

You can then use list comprehension to change the empty string by NA:

>>> m = re.search('(\S+)\s+(\S*)\s+(\S+)',my_string_another).groups()
>>> m = [i if i else 'NA' for i in m]
>>> m
['Aman', 'NA', 'India']

这篇关于(Python)识别缺少的字符并替换为NA的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！

python