问题描述
我有一个名为 file01.csv
的.csv文件,其中包含许多记录。有些记录是必需的,有些则不是。我发现所需的记录有一个字符串变量Mi,但它不存在于不必要的记录中。所以,我想根据字符串值Mi在每个记录的字段中选择所需的记录。
I have a .csv file named file01.csv
that contains many records. Some records are required and some are not. I find that the required records has a string variable "Mi", but it is not exist into the unnecessary records. So, I want to select the required records based on string value "Mi" in the field for every records.
最后,我要从包含值Mi的字段中删除每个记录的后续字段。
Finally I want to delete the subsequent fields of each record from the field that contains value "Mi". Any suggestion and advice is appreciated.
可选:
-
另外,我要删除第一列。
In addition, I want to delete the first column.
将列BB拆分为两列,名为a_id和c_id。用_(下划线)分隔值,左侧将转到a_id,右侧转到c_id。
Split column BB into two column named as a_id, and c_id. Separate the value by _ (underscore) and left side will go to a_id, and right side will go to c_id.
我的 fileO.csv
如下:
AA BB CC DD EE FF GG
1 1_1.csv (=0 =10" 27" =57 "Mi"
0.97 0.9 0.8 NaN 0.9 od 0.2
2 1_3.csv (=0 =10" 27" "Mi" 0.5
0.97 0.5 0.8 NaN 0.9 od 0.4
3 1_6.csv (=0 =10" "Mi" =53 cnt
0.97 0.9 0.8 NaN 0.9 od 0.6
4 2_6.csv No Bi 000 000 000 000
5 2_8.csv No Bi 000 000 000 000
6 6_9.csv less 000 000 000 000
7 7_9.csv s(=0 =26" =46" "Mi" 121
我的预期结果文件(outFile.csv) / p>
My Expected results files (outFile.csv):
a_id b_id CC DD EE FF GG
1 1 0 10 27 57
1 3 0 10 27
1 6 0 10
7 9 0 26 46
推荐答案
以下方法应该可以使用Python csv
模块正常工作:
import csv
import re
import string
output_header = ['a_id', 'b_id', 'CC', 'DD', 'EE', 'FF', 'GG']
sanitise_table = string.maketrans("","")
nodigits_table = sanitise_table.translate(sanitise_table, string.digits)
def find_mi(row):
for index, col in enumerate(row):
if col.find('Mi') != -1:
return index
return -1
def sanitise_cell(cell):
return cell.translate(sanitise_table, nodigits_table) # Keep digits
f_input = open('fileO.csv', 'rb')
f_output = open('outFile.csv', 'wb')
csv_input = csv.reader(f_input)
csv_output = csv.writer(f_output)
input_header = next(f_input)
csv_output.writerow(output_header)
for row in csv_input:
#print '%2d %s' % (len(row), row)
if len(row) >= 2:
bb = re.match(r'(\d+)__(\d+).0\.csv', row[1])
mi = find_mi(row)
if bb and mi != -1:
row[:] = row[:mi] + [''] * (len(row) - mi)
row[:] = [sanitise_cell(col) for col in row]
row[0] = bb.group(1)
row[1] = bb.group(2)
csv_output.writerow(row)
f_input.close()
f_output.close()
outFile.csv
将包含以下内容:
a_id,b_id,CC,DD,EE,FF,GG
1,1,0,10,27,57,
1,3,0,10,27,,
1,6,0,10,,,
7,9,0,26,46,,
使用Python 2.6.6测试
Tested using Python 2.6.6
这篇关于基于特定的索引字符串值选择记录,然后通过python删除后续字段的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!