Python计算字典中的总值和百分比

Python计算字典中的总值和百分比

我现在肯定做错了,我的大脑正在融化。
我有这个数据

queryset = [
{'source_id': '1', 'gender_id': 'female', 'total': 12928604, 'percentage': {'neutral': [8284384, 64.08], 'positive': [3146438, 24.34], 'negative': [1497782, 11.59]}},
{'source_id': '1', 'gender_id': 'male', 'total': 15238856, 'percentage': {'neutral': [10042152, 65.9], 'positive': [2476421, 16.25], 'negative': [2720283, 17.85]}},
{'source_id': '1', 'gender_id': 'null', 'total': 6, 'percentage': {'neutral': [5, 83.33], 'positive': [1, 16.67], 'negative': [0, 0.0]}},
{'source_id': '2', 'gender_id': 'female', 'total': 23546499, 'percentage': {'neutral': [15140308, 64.3], 'positive': [5372964, 22.82], 'negative': [3033227, 12.88]}},
{'source_id': '2', 'gender_id': 'male', 'total': 15349754, 'percentage': {'neutral': [10137025, 66.04], 'positive': [2413350, 15.72], 'negative': [2799379, 18.24]}},
{'source_id': '2', 'gender_id': 'null', 'total': 3422, 'percentage': {'neutral': [2464, 72.0], 'positive': [437, 12.77], 'negative': [521, 15.23]}}
{'source_id': '3', 'gender_id': 'female', 'total': 29417761, 'percentage': {'neutral': [18944384, 64.4], 'positive': [7181996, 24.41], 'negative': [3291381, 11.19]}},
{'source_id': '3', 'gender_id': 'male', 'total': 27200788, 'percentage': {'neutral': [17827887, 65.54], 'positive': [4179990, 15.37], 'negative': [5192911, 19.09]}},
{'source_id': '3', 'gender_id': 'null', 'total': 32909, 'percentage': {'neutral': [22682, 68.92], 'positive': [4005, 12.17], 'negative': [6222, 18.91]}}
]


我想要的输出是

    [ {'source_id:1', 'total': 28167466(sum of 'male, female, null' total
   values for source id=1) , percentage: {'neutral':[18326541,
   65.06(getting   the % out of neutral value from total)], 'positive':
   [5622859, 19.96], 'negative':[4218065,14.97], {and do the same for all sources}]


我做什么但不起作用,我有3if语句适用于所有3个ID

for i in queryset:
if i['source_id'] == '1':
    output['percentage'] = {
        'neutral': [sum(i['percentage']['neutral'][0] for i in queryset if i['source_id'] == '1'),
                    round(output['negative'] / output['2_total'] * 100, 2)],

        'positive': [sum(i['percentage']['positive'][0] for i in queryset if i['source_id'] == '2'),
                     round(output['positive'] / output['2_total'] * 100, 2)],

        'negative': [sum(i['percentage']['negative'][0] for i in queryset if i['source_id'] == '2'),
                     round(output['negative'] / output['2_total'] * 100, 2)]}

最佳答案

好吧,如果我理解正确,这就是您想要的:

unique_ids = set([item.get('source_id') for item in queryset]) # unique source ids

output = []

for id_ in unique_ids:
    # only grab items that match the current source id
    to_agg = list(filter(lambda x: x.get('source_id') == id_, queryset))

    # sum the total field for this source id
    total = sum((item.get('total') for item in to_agg))

    # aggregate the data for neutral/positive/negative
    percents = [item.get('percentage') for item in to_agg]
    negatives = sum((item.get('negative')[0] for item in percents))
    positives = sum((item.get('positive')[0] for item in percents))
    neutrals = sum((item.get('neutral')[0] for item in percents))

    # construct the final dictionary
    d = {'source_id': id_,
         'total': total,
         'percentage': {'neutral': [neutrals, round(neutrals / total * 100, 2)],
                        'positives': [positives, round(positives / total * 100, 2)],
                        'negative': [negatives, round(negatives / total * 100, 2)]}}

    output.append(d)

sorted(output, key=lambda x: x.get('source_id'))

[{'percentage': {'negative': [4218065, 14.97],
   'neutral': [18326541, 65.06],
   'positives': [5622860, 19.96]},
  'source_id': '1',
  'total': 28167466},
 {'percentage': {'negative': [5833127, 15.0],
   'neutral': [25279797, 64.99],
   'positives': [7786751, 20.02]},
  'source_id': '2',
  'total': 38899675},
 {'percentage': {'negative': [8490514, 14.99],
   'neutral': [36794953, 64.95],
   'positives': [11365991, 20.06]},
  'source_id': '3',
  'total': 56651458}]


编辑:请记住,我尚未优化此答案,因此如果您的查询集很大,它可能不会像您需要的那样快。

关于python - Python计算字典中的总值和百分比,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/55201023/

10-12 04:02