问题描述
我有60条记录,其中包含"skillsList",("skillsList"是技能列表")和"IdNo"列.我想找出多少个"IdNo"具有共同的技能.
I have 60 records with a column "skillsList" "("skillsList" is a list of skills) and "IdNo".I want to find out how many "IdNo's" have a skill in common.
如何在python中做到这一点.我不知道如何计算特定列表项.不胜感激.
How can I do it in python. I am not knowing how to take the count of a particular list item. Would appreciate any help.
>>> a = open("C:\Users\abc\Desktop\Book2.csv")
>>> type(a1)
<type 'str'>
我打印a1时出现了一些文字
Some of the text when I print a1
>>> a1
'IdNo, skillsList\n1,"u\'Training\', u\'E-Learning\', u\'PowerPoint\', u\'Teaching\', u\'Accounting\', u\'Team Management\', u\'Team Building\', u\'Microsoft Excel\', u\'Microsoft Office\', u\'Financial Accounting\', u\'Microsoft Word\', u\'Customer Service\'"\n2,"u\'Telecommunications\', u\'Data Center\', u\'ISO 27001\', u\'Management\', u\'BS25999\', u\'Technology\', u\'Information Technology...\', u\'Certified PMP\\xae\', u\'Certified BS25999 Lead...\'"\n3,"u\'Market Research\', u\'Segmentation\', u\'Marketing Strategy\', u\'Consumer Behavior\', u\'Experience Working with...\'"
谢谢
推荐答案
您可以建立反向的技能索引.因此,您使用每个键作为技能名称来构建字典,并且键的值是一组IdNo
.这样,您还可以找出哪些IdNo
具有一些技能
You can build a inverted index of skills. So you build a dictionary with each key as a skill name and the value of the key is a set of IdNo
. That way you can also find out which IdNo
s have some set of skills
代码看起来像
skills = {}
with open('filename.txt') as f:
for line in f.readlines():
items = [item.strip() for item in line.split(',')]
idNo = items[0]
skill_list = items[1:]
for skill in skill_list:
if skill in skills:
skills[skill].add(idNo)
else:
skills[skill] = set([idNo, ])
现在您有了skills
字典,该字典看起来像
Now you have skills
dictionary which would look like
skills = {
'Training': set(1,2,3),
'Powerpoint': set(1,3,4),
'E-learning': set(9,10,11),
.....,
.....,
}
现在您看到1,3,4具有Powerpoint
作为一项技能,如果您想知道同时具有'Training'和'Powerpoint'技能的idNo
,您可以做到
Now you see that 1,3,4 have Powerpoint
as a skill and if you want to know idNo
who have both 'Training' and 'Powerpoint' skills you can do
skills['Powerpoint'].intersection(skills['Training'])
,如果您想了解idNo
谁具有培训"或"Powerpoint"技能,可以这样做
and if you want to know idNo
who have either 'Training' or 'Powerpoint' skills you can do
skills['Powerpoint'].union(skills['Training'])
这篇关于在python列表中的tems上分组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!