问题描述
到目前为止的代码下载并打印到屏幕上,但是我如何将该打印材料放入 sql 数据库中.如果我想将数据放入 CSV 文件中,似乎是 Python(在美好的一天)创建了该文件自动.显然,在转移到 mySql 时,我假设我必须事先创建一个数据库才能接收数据.我的问题是如何将数据从刮取到数据库中,而完全省略 csv 步骤.预计我已经下载了 pyMySql 库.任何建议都非常感谢..looknow
The code so far downloads and prints onto the screen,but how do I get that printed material into a sql database.If I wanted to get the data into CSV files it seems that Python(on a good day) creates the file automatically.Obviously with transferring into mySql I assume that I would have to create a database beforehand in order to receive the data.My question is how would I get the data from the scrape into the database omitting the csv step altogether.In anticipation I have already downloaded pyMySql library.Any suggestions much aprreciated..looknow
from urllib import urlopen
from bs4 import BeautifulSoup
html = urlopen("http://www.officialcharts.com/charts/singles- chart/19800203/7501/" )
bsObj = BeautifulSoup(html)
nameList = bsObj. findAll("div" , {"class" : "artist",})
for name in nameList:
print(name. get_text())
html = urlopen("http://www.officialcharts.com/charts/singles- chart/19800203/7501/" )
bsObj = BeautifulSoup(html)
nameList = bsObj. findAll("div" , {"class" : "title"})
for name in nameList:
print(name. get_text())
推荐答案
所以这里有几件事情需要解决.
So there are a couple things to address here.
有关 PyMySQL 的文档 非常适合让您启动和运行.
The docs on PyMySQL are pretty good at getting you up and running.
在将这些东西放入数据库之前,您需要以艺术家和歌曲名称相互关联的方式获取它们.现在,您将获得一份单独的艺术家和歌曲列表,但无法将它们关联起来.您将需要遍历 title-artist 类来执行此操作.
Before you can put these things into a database though, you need to grab them in a way that the artist and song name are associated with each other. Right now you are getting a separate list of artists and songs, with no way to associate them. You will want to iterate over the title-artist class to do this.
我会这样做 -
from urllib import urlopen
from bs4 import BeautifulSoup
import pymysql.cursors
# Webpage connection
html = urlopen("http://www.officialcharts.com/charts/singles-chart/19800203/7501/")
# Grab title-artist classes and iterate
bsObj = BeautifulSoup(html)
recordList = bsObj.findAll("div", {"class" : "title-artist",})
# Now iterate over recordList to grab title and artist
for record in recordList:
title = record.find("div", {"class": "title",}).get_text().strip()
artist = record.find("div", {"class": "artist"}).get_text().strip()
print artist + ': ' + title
这将为 recordList 循环的每次迭代打印标题和艺术家.
This will print the title and artist for each iteration of the recordList loop.
为了将这些值插入到 MySQL 数据库中,我创建了一个名为 artist_song
的表,其内容如下:
To insert these values into a MySQL DB, I created a table called artist_song
with the following:
CREATE TABLE `artist_song` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`artist` varchar(255) COLLATE utf8_bin NOT NULL,
`song` varchar(255) COLLATE utf8_bin NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin
AUTO_INCREMENT=1;
这不是最干净的方法,但这个想法是合理的.我们想打开一个到 MySQL DB 的连接(我已经调用了我的 DB top_40),并为 recordList 循环的每次迭代插入一个艺术家/标题对:
This isn't the cleanest way to go about this, but the idea is sound. We want to open a connection to the MySQL DB (I have called my DB top_40), and insert an artist/title pair for each iteration of the recordList loop:
from urllib import urlopen
from bs4 import BeautifulSoup
import pymysql.cursors
# Webpage connection
html = urlopen("http://www.officialcharts.com/charts/singles-chart/19800203/7501/")
# Grab title-artist classes and store in recordList
bsObj = BeautifulSoup(html)
recordList = bsObj.findAll("div", {"class" : "title-artist",})
# Create a pymysql cursor and iterate over each title-artist record.
# This will create an INSERT statement for each artist/pair, then commit
# the transaction after reaching the end of the list. pymysql does not
# have autocommit enabled by default. After committing it will close
# the database connection.
# Create database connection
connection = pymysql.connect(host='localhost',
user='root',
password='password',
db='top_40',
charset='utf8mb4',
cursorclass=pymysql.cursors.DictCursor)
try:
with connection.cursor() as cursor:
for record in recordList:
title = record.find("div", {"class": "title",}).get_text().strip()
artist = record.find("div", {"class": "artist"}).get_text().strip()
sql = "INSERT INTO `artist_song` (`artist`, `song`) VALUES (%s, %s)"
cursor.execute(sql, (artist, title))
connection.commit()
finally:
connection.close()
根据我的评论,我认为迭代表行更清楚:
Per my comment, I think it is clearer to iterate over the table rows instead:
from urllib import urlopen
from bs4 import BeautifulSoup
import pymysql.cursors
# Webpage connection
html = urlopen("http://www.officialcharts.com/charts/singles-chart/19800203/7501/")
bsObj = BeautifulSoup(html)
rows = bsObj.findAll('tr')
for row in rows:
if row.find('span', {'class' : 'position'}):
position = row.find('span', {'class' : 'position'}).get_text().strip()
artist = row.find('div', {'class' : 'artist'}).get_text().strip()
track = row.find('div', {'class' : 'title'}).get_text().strip()
这篇关于美丽的汤网页抓取到 mysql的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!