试图用美丽的汤刮擦瑞典议员。运行刮板时,出现“ ValueError:太多值无法解包(预期3)”。
该脚本输出一个csv,但只有五个名称。列表中的第六个人名为Janm(Aline Ericson,MP)。我想问题是她有两个姓氏-Alm Ericson,并且代码只需要三个值,名字,姓氏和聚会。
我应该如何编码字段分隔符以使其也适用于双重姓氏?
页面上的名称写为
Last_name, first_name (party)
码:
import urllib.request
import bs4 as bs
import csv
source = urllib.request.urlopen("https://www.riksdagen.se/sv/ledamoter-partier/").read()
soup = bs.BeautifulSoup(source, "lxml")
data = []
for span in soup.find_all("span", {"class": "fellow-name"}):
cleanednames = span.text.strip()
data.append(cleanednames) #fields are appended to list rather printing
with open("riksdagsledamoter.csv", "w") as stream:
fieldnames = ["Last_Name","First_Name","Party"]
var = csv.DictWriter(stream, fieldnames=fieldnames)
var.writeheader()
for item in data:
last_name, First_name, party = item.split() #splitting data in 3 fields
last_name = last_name.replace(",","") #removing ',' from last name
party = party.replace("(","").replace(")","") #removing "()" from party
var.writerow({"Last_Name": last_name,"First_Name": First_name, "Party": party}) #writing to csv row
最佳答案
这是一个应该解决的简单正则表达式
import re
print(re.match("(.*), (.*) \((.*)\)", 'Alm Ericson, Janine (MP)').groups())
灵感来自Corentin的答案
关于python - 搜寻姓氏数量不同的名字列表,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/53206352/