本文介绍了由于Python中的ascii错误,将数据写入CSV时出错的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
import requests
from bs4 import BeautifulSoup
import csv
from urlparse import urljoin
import urllib2
base_url = 'http://www.baseball-reference.com'
data = requests.get("http://www.baseball-reference.com/teams/BAL/2014-schedule-scores.shtml")
soup = BeautifulSoup(data.content)
outfile = open("./Balpbp.csv", "wb")
writer = csv.writer(outfile)
url = []
for link in soup.find_all('a'):
if not link.has_attr('href'):
continue
if link.get_text() != 'boxscore':
continue
url.append(base_url + link['href'])
for list in url:
response = requests.get(list)
html = response.content
soup = BeautifulSoup(html)
table = soup.find('table', attrs={'id': 'play_by_play'})
list_of_rows = []
for row in table.findAll('tr'):
list_of_cells = []
for cell in row.findAll('td'):
text = cell.text.replace(' ', '')
list_of_cells.append(text)
list_of_rows.append(list_of_cells)
writer.writerows(list_of_rows)
以下是错误信息:
Traceback (most recent call last):
File "try.py", line 40, in <module>
writer.writerows(list_of_rows)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in position 57: ordinal not in range(128)
当我将数据写入csv时,最终数据中包含\x ... stuff的数据会阻止数据被写入csv。如何更改数据以删除此部分数据或执行某些操作来避开此问题?
When I write the data to a csv I end up with data that contains \x... stuff in the data pieces which prevents the data from being written to a csv. How could I change the data to delete this part of the data or do something to circumvent this issue?
推荐答案
与csv模块与python2,你需要 encode
字符串:
You cannot use unicode with the csv module with python2, you need to encode
the strings:
注意
text = cell.text.replace(' ', '').encode("utf-8")
编码后输出: / p>
Output after encoding:
Top of the 1st, Red Sox Batting, Tied 0-0, Orioles' Chris Tillman facing 1-2-3
"
t1,0-0,0,---,"7,(2-2) CBBFFFX",O,BOS,D. Nava,C. Tillman,2%,52%,Groundout: P-1B (P's Right)
t1,0-0,1,---,"4,(1-2) BCFX",,BOS,D. Pedroia,C. Tillman,-2%,50%,Single to RF (Line Drive to Short RF)
t1,0-0,1,1--,"5,(1-2) CFBFT",O,BOS,D. Ortiz,C. Tillman,3%,52%,Strikeout Swinging
t1,0-0,2,1--,"4,(0-2) C1CFS",O,BOS,M. Napoli,C. Tillman,2%,55%,Strikeout Swinging
,,,,,,,,,"0 runs, 1 hit, 0 errors, 1 LOB. Red Sox 0, Orioles 0."
"Bottom of the 1st, Orioles Batting, Tied 0-0, Red Sox' Jon Lester facing 1-2-3
"
b1,0-0,0,---,"4,(1-2) CBFX",O,BAL,N. Markakis,J. Lester,-2%,52%,Groundout: 3B-1B (Weak 3B)
b1,0-0,1,---,"6,(3-2) BBFFBX",,BAL,J. Hardy,J. Lester,2%,55%,Single to LF (Line Drive)
b1,0-0,1,1--,"4,(1-2) FBSX",O,BAL,A. Jones,J. Lester,-3%,52%,Popfly: SS (Deep SS)
b1,0-0,2,1--,"5,(1-2) FFBFS",O,BAL,C. Davis,J. Lester,-2%,50%,Strikeout Swinging
....................................
这篇关于由于Python中的ascii错误,将数据写入CSV时出错的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!