对页面信息进行简单抓取: 需要注意的问题 :
auth-sign 和 auth 都是有一定的时效性,还有url原url是https这里要改为http请求。
这参数的问题需要通过mitmdump去获取请求的具体参数并将之取出,不用手动去截获分析http请求和响应,写好请求和相应的处理逻辑,通过python实现二次操作。 后期通过appium模拟人为操作去滑动请求刷新界面,得到相应再做处理。
import requests def main():
headers = {
"charset":"utf-8",
"Accept-Encoding":"gzip",
"referer":"https://servicewechat.com/wxffc08ac7df482a27/117/page-frame.html",
"authorization":"5bda7657a4ce660001f7eed8",
"auth":"eyJoYXNoIjoibWQ0IiwiYWxnIjoiSFMyNTYiLCJ0eXAiOiJKV1QifQ.eyJzaWQiOiI0M2RkNGY2YS01NTk1LTRjNGEtYTkyMi05ODEzNjdiMTlmMTEiLCJleHBpcmUiOjE1NDExMzAyNjJ9.9AC8VBcXiBG48vHa-LLgVEWOnloTdQvNWzYAyvqGnMA",
"content-type":"application/json",
"auth-sign":"c475525b214bb5d9ae431ac029cb9b50",
"User-Agent":"Mozilla/5.0 (Linux; Android 7.1.2; MI 5X Build/N2G47H; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/64.0.3282.137 Mobile Safari/537.36 MicroMessenger/6.7.3.1360(0x26070336) NetType/WIFI Language/zh_CN Process/appbrand2",
"Host":"www.xiaohongshu.com",
"Connection":"Keep-Alive",
}
# url = "http://www.xiaohongshu.com/sapi/wx_mp_api/sns/v1/homefeed?oid=homefeed.cosmetics_v2&cursor_score=&sid=session.1540996623416187718"
url = "http://www.xiaohongshu.com/sapi/wx_mp_api/sns/v1/homefeed?oid=homefeed.cosmetics_v2&cursor_score=1541067389.9550&sid=session.1540996623416187718" datas = requests.get(url= url, headers=headers ).json()
data = datas['data']
# print(data)
for i in data:
print(i)
# print(i['title'])
# print(i['share_link'])
title = '标题: ' + i['mini_program_info']['share_title']
print(title)
link_url = '链接: ' + i['share_link']
print(link_url)
b_picture = '封面图片: '+ i['mini_program_info']['thumb']
print(b_picture)
type = '类型: ' + i['type']
print(type)
level = '级别: ' + str(i['level'])
print(level)
h_picture = '用户头像: ' + i['user']['images']
print(h_picture)
username = '用户名: ' + i['user']['nickname']
print(username)
user_id = 'userid: ' + i['user']['userid']
print(user_id)
zan = '喜欢点心: ' + str(i['likes'])
print(zan) # 以追加的方式及打开一个文件,文件指针放在文件结尾,追加读写!
with open('text', 'a', encoding='utf-8')as f:
f.write('\n'.join([title,link_url,b_picture,type,level,h_picture,username,user_id,zan]))
f.write('\n' + '=' * 100 + '\n')
if __name__ == "__main__":
main()
保存本地
字段信息:
标题: 王者荣耀——貂蝉~仲夏夜之梦 游戏角色貂蝉皮肤印象妆容 主色
链接: https://www.xiaohongshu.com/discovery/item/5bc0b2bf910cf646cc1087aa
封面图片: http://ci.xiaohongshu.com/161f03cb-0cf6-355f-b178-712a928a7720?imageView2/2/w/540/format/jpg
类型: normal
级别: 4
用户头像: https://img.xiaohongshu.com/avatar/5bb1047b0fd0590001997f83.jpg@80w_80h_90q_1e_1c_1x.jpg
用户名: zanleo
userid: 582c5f8982ec393b5ec866ba
喜欢点心: 233
====================================================================================================
标题:
05-11 10:56