python - 如何从CSV文件的每一行中具有可变数量值的列中提取数据？

基本上，我试图将csv文件中具有相同名称的项列的计数列值加在一起。然后，我需要按项目列值以字母升序对结果进行排序。例如：

Leading Cause, Deaths
Diabetes Mellitus, 123
Influenza and Pneumonia, 325
Diabetes Mellitus, 100

我需要将值123和100相加以获得针对糖尿病的新行。

它看起来应该像这样：
Diabetes Mellitus, 223。

这是我到目前为止的代码：

import csv
import sys

with open(sys.argv[1], 'r') as File:
    reader = csv.reader(File)
    itemindex = sys.argv[2]
    countindex = sys.argv[3]
    item column = 0
    count column = 0
    first row = True
    dictionary = {}

    for row in reader:
       if firstrow == True:
          firstrow = False
          itemcolumn = row.index(itemindex)
          countcolumn = row.index(countindex)
       else:
           if item column in dictionary:
               # Add the item at the row's count column (converted to an int) to the
               # prexisting entry with that item column.
           else:
               #create a new entry in the dictionary

       print(itemindex + "," + countindex)

for key, value in sorted(dictionary)
    print(key + "," + value)

评论的部分是我坚持的部分。

最佳答案

使用collections.defaultdict，这是专门的字典类，可以很容易地进行以下操作：

import collections
import csv
import os
import sys

try:
    filename = sys.argv[1]
    itemindex = int(sys.argv[2])
    countindex = int(sys.argv[3])
except IndexError:
    print('Error:\n  Usage: {} <file name> <item index> <count index>'.format(
            os.path.basename(sys.argv[0])))
    sys.exit(-1)

with open(filename, 'r', newline='') as file:
    reader = csv.reader(file, skipinitialspace=True)
    next(reader)  # Skip first row.

    counter = collections.defaultdict(int)
    for row in reader:
        disease, deaths = row[itemindex], int(row[countindex])
        counter[disease] += deaths

for key, value in sorted(counter.items()):
    print('{}, {}'.format(key, value))

用法示例：

python3 script_name.py diseases.csv 0 1

样本输出：

Diabetes Mellitus, 223
Influenza and Pneumonia, 325

关于python - 如何从CSV文件的每一行中具有可变数量值的列中提取数据？，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/49265474/