基本上,我试图将csv文件中具有相同名称的项列的计数列值加在一起。然后,我需要按项目列值以字母升序对结果进行排序。例如:
Leading Cause, Deaths
Diabetes Mellitus, 123
Influenza and Pneumonia, 325
Diabetes Mellitus, 100
我需要将值123和100相加以获得针对糖尿病的新行。
它看起来应该像这样:
Diabetes Mellitus, 223
。这是我到目前为止的代码:
import csv
import sys
with open(sys.argv[1], 'r') as File:
reader = csv.reader(File)
itemindex = sys.argv[2]
countindex = sys.argv[3]
item column = 0
count column = 0
first row = True
dictionary = {}
for row in reader:
if firstrow == True:
firstrow = False
itemcolumn = row.index(itemindex)
countcolumn = row.index(countindex)
else:
if item column in dictionary:
# Add the item at the row's count column (converted to an int) to the
# prexisting entry with that item column.
else:
#create a new entry in the dictionary
print(itemindex + "," + countindex)
for key, value in sorted(dictionary)
print(key + "," + value)
评论的部分是我坚持的部分。
最佳答案
使用collections.defaultdict
,这是专门的字典类,可以很容易地进行以下操作:
import collections
import csv
import os
import sys
try:
filename = sys.argv[1]
itemindex = int(sys.argv[2])
countindex = int(sys.argv[3])
except IndexError:
print('Error:\n Usage: {} <file name> <item index> <count index>'.format(
os.path.basename(sys.argv[0])))
sys.exit(-1)
with open(filename, 'r', newline='') as file:
reader = csv.reader(file, skipinitialspace=True)
next(reader) # Skip first row.
counter = collections.defaultdict(int)
for row in reader:
disease, deaths = row[itemindex], int(row[countindex])
counter[disease] += deaths
for key, value in sorted(counter.items()):
print('{}, {}'.format(key, value))
用法示例:
python3 script_name.py diseases.csv 0 1
样本输出:
Diabetes Mellitus, 223
Influenza and Pneumonia, 325
关于python - 如何从CSV文件的每一行中具有可变数量值的列中提取数据?,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/49265474/