到目前为止,代码下载并打印到屏幕上,但是我如何把打印出来的材料放到一个sql数据库中。如果我想把数据放到CSV文件中,Python(在一个好日子)似乎会自动创建这个文件。很明显,在转换到mySql时,我假设我必须先创建一个数据库才能接收数据。我的问题是如何从将csv步骤完全省略,直接插入数据库。
我已经下载了pyMySql库。有什么建议吗..looknow
from urllib import urlopen
from bs4 import BeautifulSoup
html = urlopen("http://www.officialcharts.com/charts/singles- chart/19800203/7501/" )
bsObj = BeautifulSoup(html)
nameList = bsObj. findAll("div" , {"class" : "artist",})
for name in nameList:
print(name. get_text())
html = urlopen("http://www.officialcharts.com/charts/singles- chart/19800203/7501/" )
bsObj = BeautifulSoup(html)
nameList = bsObj. findAll("div" , {"class" : "title"})
for name in nameList:
print(name. get_text())
最佳答案
所以这里有几件事要解决。
docs on PyMySQL非常擅长让你起来跑步。
不过,在将这些内容放入数据库之前,您需要以艺术家和歌曲名称相互关联的方式获取它们。现在你得到一个艺术家和歌曲的单独列表,没有办法将他们联系起来。您将需要迭代title artist类来执行此操作。
我会这样做的-
from urllib import urlopen
from bs4 import BeautifulSoup
import pymysql.cursors
# Webpage connection
html = urlopen("http://www.officialcharts.com/charts/singles-chart/19800203/7501/")
# Grab title-artist classes and iterate
bsObj = BeautifulSoup(html)
recordList = bsObj.findAll("div", {"class" : "title-artist",})
# Now iterate over recordList to grab title and artist
for record in recordList:
title = record.find("div", {"class": "title",}).get_text().strip()
artist = record.find("div", {"class": "artist"}).get_text().strip()
print artist + ': ' + title
这将为记录列表循环的每个迭代打印标题和艺术家。
为了将这些值插入MySQL数据库,我创建了一个名为
artist_song
的表,其中包含以下内容:CREATE TABLE `artist_song` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`artist` varchar(255) COLLATE utf8_bin NOT NULL,
`song` varchar(255) COLLATE utf8_bin NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin
AUTO_INCREMENT=1;
这不是解决这个问题的最干净的方法,但这个想法是合理的。我们要打开到MySQL数据库的连接(我将我的数据库称为top_40),并为记录列表循环的每个迭代插入一个艺术家/标题对:
from urllib import urlopen
from bs4 import BeautifulSoup
import pymysql.cursors
# Webpage connection
html = urlopen("http://www.officialcharts.com/charts/singles-chart/19800203/7501/")
# Grab title-artist classes and store in recordList
bsObj = BeautifulSoup(html)
recordList = bsObj.findAll("div", {"class" : "title-artist",})
# Create a pymysql cursor and iterate over each title-artist record.
# This will create an INSERT statement for each artist/pair, then commit
# the transaction after reaching the end of the list. pymysql does not
# have autocommit enabled by default. After committing it will close
# the database connection.
# Create database connection
connection = pymysql.connect(host='localhost',
user='root',
password='password',
db='top_40',
charset='utf8mb4',
cursorclass=pymysql.cursors.DictCursor)
try:
with connection.cursor() as cursor:
for record in recordList:
title = record.find("div", {"class": "title",}).get_text().strip()
artist = record.find("div", {"class": "artist"}).get_text().strip()
sql = "INSERT INTO `artist_song` (`artist`, `song`) VALUES (%s, %s)"
cursor.execute(sql, (artist, title))
connection.commit()
finally:
connection.close()
编辑:
根据我的评论,我认为更清楚的做法是遍历表行:
from urllib import urlopen
from bs4 import BeautifulSoup
import pymysql.cursors
# Webpage connection
html = urlopen("http://www.officialcharts.com/charts/singles-chart/19800203/7501/")
bsObj = BeautifulSoup(html)
rows = bsObj.findAll('tr')
for row in rows:
if row.find('span', {'class' : 'position'}):
position = row.find('span', {'class' : 'position'}).get_text().strip()
artist = row.find('div', {'class' : 'artist'}).get_text().strip()
track = row.find('div', {'class' : 'title'}).get_text().strip()