前言
嗨喽~大家好呀,这里是魔王呐 ❤ ~!
知识点:
-
爬虫基本流程
-
requests的使用
-
动态数据抓包
开发环境:
-
解释器: python 3.8
-
编辑器: pycharm 2022.3
-
requests >>> pip install requests
第三方模块安装:
win + R 输入cmd 输入安装命令 pip install 模块名 (如果你觉得安装速度比较慢, 你可以切换国内镜像源)
基本流程:
一. 思路分析
找到数据来源
当前的这个数据 是动态数据还是静态数据
network 网络资源抓包
捋清楚整个案例的实现过程
访问该网址 获取到 数据内容
并且将我们需要的数据内容提取出来
保存 (单页)
多页采集 分析 链接变化规律 构建翻页规律 实现多页采集
二. 代码实现
-
发送请求
-
获取数据
-
解析数据
-
保存数据
代码展示
import requests # 第三方库 需要额外安装
import csv
headers = {
'Accept': 'application/json, text/javascript, */*; q=0.01',
'Accept-Encoding': 'gzip, deflate',
'Accept-Language': 'zh-CN,zh;q=0.9',
'Cache-Control': 'no-cache',
'Connection': 'keep-alive',
'Cookie': 'HMF_CI=1b17efcb79bb1c54b0972d1e27d1af031f8912351c906f5874e3ee7ad1ca9563806c6b7e37f7dc287b3165e3422da231f587a0c6a2923dea32cb0e422e6553046a; 21_vq=4',
'Host': 'www.cwl.gov.cn',
'Pragma': 'no-cache',
'Referer': 'http://*****/ygkj/wqkjgg/ssq/',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36',
'X-Requested-With': 'XMLHttpRequest',
}
f = open('双色球.csv', mode='a', newline='', encoding='utf-8')
源码、解答、教程、安装包等资料加V:qian97378免费领
csv_writer = csv.writer(f)
csv_writer.writerow(["日期", "红球", "蓝球", "奖池金额", "中奖情况", "一等奖人数", "一等奖金额", "二等奖人数", "二等奖金额", "三等奖人数", "三等奖金额", "四等奖人数", "四等奖金额", "五等奖人数", "五等奖金额", "六等奖人数", "六等奖金额"])
for page in range(1, 54):
print(f"正在抓取第{page}页")
url = f'http://*****/cwl_admin/front/cwlkj/search/kjxx/findDrawNotice?name=ssq&issueCount=&issueStart=&issueEnd=&dayStart=&dayEnd=&pageNo={page}&pageSize=30&week=&systemType=PC'
response = requests.get(url=url, headers=headers)
json_data = response.json()
# red->0->result
result = json_data['result']
for res in result:
reds = res['red']
blue = res['blue']
date = res['date']
poolmoney = res['poolmoney']
content = res['content']
prizegrades = res['prizegrades']
one_prize, one_price, two_prize, two_price, three_prize, three_price, four_prize, four_price, five_prize, five_price, six_prize, six_price = "", "", "", "", "", "", "", "", "", "", "", ""
for prizegrad in prizegrades:
if prizegrad['type'] == 1:
one_prize = prizegrad['typenum']
one_price = prizegrad['typemoney']
elif prizegrad['type'] == 2:
two_prize = prizegrad['typenum']
two_price = prizegrad['typemoney']
elif prizegrad['type'] == 3:
three_prize = prizegrad['typenum']
three_price = prizegrad['typemoney']
elif prizegrad['type'] == 4:
four_prize = prizegrad['typenum']
four_price = prizegrad['typemoney']
elif prizegrad['type'] == 5:
five_prize = prizegrad['typenum']
five_price = prizegrad['typemoney']
elif prizegrad['type'] == 6:
six_prize = prizegrad['typenum']
six_price = prizegrad['typemoney']
print(date, reds, blue, poolmoney, content, one_prize, one_price, two_prize, two_price, three_prize, three_price, four_prize, four_price, five_prize, five_price, six_prize, six_price)
# 我要保存为一个表格
# 期数 红球 蓝球 中奖情况 奖池金额 一等奖中奖人数 一等奖中奖金额 二等奖中奖人数 二等奖中奖金额
csv_writer.writerow([date, reds, blue, poolmoney, content, one_prize, one_price, two_prize, two_price, three_prize, three_price, four_prize, four_price, five_prize, five_price, six_prize, six_price])
尾语
感谢你观看我的文章呐~本次航班到这里就结束啦 🛬
希望本篇文章有对你带来帮助 🎉,有学习到一点知识~
躲起来的星星🍥也在努力发光,你也要努力加油(让我们一起努力叭)。
最后,宣传一下呀~👇👇👇更多源码、资料、素材、解答、交流皆点击下方名片获取呀👇👇