s10day112

内容回顾:
第一部分:爬虫相关
1. 谈谈你对http协议的理解?
规范:
1. Http请求收发数据的格式
GET /index/ http1.1/r/nhost:xxx.com/r/n/r/n
POST /index/ http1.1/r/nhost:xxx.com/r/n/r/nuser=xxx 2. 短连接(无状态)
一次请求一次响应之后,就断开连接 3. 基于TCP协议之上
sk = socket()
sk.send('GET /index/ http1.1/r/nhost:xxx.com/r/n/r/n')
常见请求头有哪些?
host
content-type
user-agent
cookies
referer,上一次请求地址
常见的请求方法有哪些?
GET
POST
DELETE
PUT
PATCH
OPTIONS
2. requests
用于伪造浏览器发送请求
参数:
- url
- headers
- data
- cookies
响应:
- content
- text
- encoding='gbk'
- headers
- cookies.get_dict() 3. bs
用于解析HTML格式的字符串
方法和属性:
- find
- find_all
- attrs
- get
- text 4. 套路
- 汽车之家
- 抽屉新闻:携带user-agent
- 登录抽屉:第一访问保留cookie,登录时需要再次携带;
- 自动登录github:获取csrf_token,到底携带那一个cookie 补充:自动登录github 第二部分:路飞相关
1. 公司的组织架构?
开发:
- 村长
- 前端姑娘
- 涛
- 云(产品+开发)
UI:1人
测试:1人
运维:1人
运营:2人
销售:3人
班主任:1人
全职助教:2人
人事/财务:老男孩共享 2. 项目架构
- 管理后台(1)
- 权限
- xadmin
- 导师后台(1)
- 权限
- xadmin
- 主站(1+1+0.5+1)
- restful api
- vue.js 现在开发:题库系统 3. 涉及技术点:
- django
- django rest framework
- vue.js
- 跨域cors
- redis
- 支付宝支付
- 视频播放
- CC视频
- 保利
- 微信消息推送
- 已认证的服务号
- 发送模板消息
- content-type 今日内容:
- 拉勾网
- 抖音
- requests
- bs4
- 初识scrapy框架 内容详细:
1.拉勾网
- Token和Code存在页面上,自定义请求头上
- 重定向:
- 响应头的Location中获取要重定向的地址
- 自己去处理
- 请求发送时需要携带上次请求的code和token 原则:
- 完全模拟浏览器的行为 2. 爬抖音视频 3. requests模块
参数:
url
params
headers
cookies
data
示例:
request.post(
data={
user:'alex',
pwd:'sb'
}
) user=alex&pwd=sb chrome: formdata
json
示例:
request.post(
json={
user:'alex',
pwd:'sb'
}
) '{"user":"alex","pwd":"sb"}' chrome: request payload
allow_redirecs
stream files
requests.post(
url='xxx',
files={
'f1': open('readme', 'rb')
}
) auth
from requests.auth import HTTPBasicAuth, HTTPDigestAuth ret = requests.get('https://api.github.com/user', auth=HTTPBasicAuth('admin', 'admin'))
print(ret.text) timeout
ret = requests.get('http://google.com/', timeout=1) ret = requests.get('http://google.com/', timeout=(5, 1))
proxies
proxies = {
"http": "61.172.249.96:80",
"https": "http://61.185.219.126:3128",
}
# proxies = {'http://10.20.1.128': 'http://10.10.1.10:5323'} ret = requests.get("https://www.proxy360.cn/Proxy", proxies=proxies)
print(ret.headers) from requests.auth import HTTPProxyAuth
auth = HTTPProxyAuth('username', 'mypassword') r = requests.get("http://www.google.com", proxies=proxyDict, auth=auth) 证书相关:
cert
verify session:自动管理cookie和headers(不建议使用)
import requests session = requests.Session()
i1 = session.get(url="http://dig.chouti.com/help/service")
i2 = session.post(
url="http://dig.chouti.com/login",
data={
'phone': "8615131255089",
'password': "xxooxxoo",
'oneMonth': ""
}
)
i3 = session.post(
url="http://dig.chouti.com/link/vote?linksId=8589523"
)
print(i3.text) 4. bs4 参考示例:https://www.cnblogs.com/wupeiqi/articles/6283017.html 预习:
1. 安装scrapy
https://www.cnblogs.com/wupeiqi/articles/6229292.html a. 下载twisted
http://www.lfd.uci.edu/~gohlke/pythonlibs/#twisted b. 安装wheel
pip3 install wheel c. 安装twisted pip3 install Twisted‑18.7.0‑cp36‑cp36m‑win_amd64.whl d. 安装pywin32
pip3 install pywin32 e. 安装scrapy
pip3 install scrapy

今日课程笔记(teacher)

request、bs4参考博客

import requests
from bs4 import BeautifulSoup r1 = requests.get(
url='https://github.com/login'
)
s1 = BeautifulSoup(r1.text,'html.parser')
token = s1.find(name='input',attrs={'name':'authenticity_token'}).get('value')
r1_cookie_dict = r1.cookies.get_dict() r2 = requests.post(
url='https://github.com/session',
data={
'commit':'Sign in',
'utf8':'✓',
'authenticity_token':token,
'login':'[email protected]',
'password':'sdfasdfasdfasdf'
},
cookies=r1_cookie_dict
)
r2_cookie_dict = r2.cookies.get_dict()
r3 = requests.get(
url='https://github.com/settings/emails',
cookies=r2_cookie_dict
)
print(r3.text)

github登录

import re
import requests
all_cookie_dict = {} # ##################################### 第一步:访问登录页面 #####################################
r1 = requests.get(
url='https://passport.lagou.com/login/login.html',
headers={
'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36'
}
) token = re.findall("X_Anti_Forge_Token = '(.*)';",r1.text)[0]
code = re.findall("X_Anti_Forge_Code = '(.*)';",r1.text)[0]
r1_cookie_dict = r1.cookies.get_dict()
all_cookie_dict.update(r1_cookie_dict) # ##################################### 第二步:去登陆 #####################################
r2 = requests.post(
url='https://passport.lagou.com/login/login.json',
data={
'isValidate':'true',
'username':'',
'password':'',
'request_form_verifyCode':'',
'submit':''
},
headers={
'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36',
'X-Requested-With':'XMLHttpRequest',
'Content-Type':'application/x-www-form-urlencoded; charset=UTF-8',
'Host':'passport.lagou.com',
'Origin':'https://passport.lagou.com',
'Referer':'https://passport.lagou.com/login/login.html',
'X-Anit-Forge-Code':code,
'X-Anit-Forge-Token':token
},
cookies=all_cookie_dict )
r2_response_json = r2.json()
r2_cookie_dict = r2.cookies.get_dict()
all_cookie_dict.update(r2_cookie_dict)
# ##################################### 第三步:grant #####################################
r3 = requests.get(
url='https://passport.lagou.com/grantServiceTicket/grant.html',
headers={
'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36',
'Referer':'https://passport.lagou.com/login/login.html',
'Host':'passport.lagou.com',
},
cookies=all_cookie_dict,
allow_redirects=False )
r3_cookie_dict = r3.cookies.get_dict()
all_cookie_dict.update(r3_cookie_dict)
# ##################################### 第四步:action #####################################
r4 = requests.get(
url=r3.headers['Location'],
headers={
'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36',
'Referer':'https://passport.lagou.com/login/login.html',
'Host':'www.lagou.com',
'Upgrade-Insecure-Requests':'',
},
cookies=all_cookie_dict,
allow_redirects=False )
r4_cookie_dict = r4.cookies.get_dict()
all_cookie_dict.update(r4_cookie_dict) # ##################################### 第五步:获取认证信息 #####################################
r5 = requests.get(
url=r4.headers['Location'],
headers={
'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36',
'Referer':'https://passport.lagou.com/login/login.html',
'Host':'www.lagou.com',
'Upgrade-Insecure-Requests':'',
},
cookies=all_cookie_dict,
allow_redirects=False )
r5_cookie_dict = r5.cookies.get_dict()
all_cookie_dict.update(r5_cookie_dict) print(r5.headers['Location']) # ##################################### 第六步:我的邀请 #####################################
r = requests.get(
url='https://www.lagou.com/mycenter/invitation.html',
headers={
'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36',
'Host':'www.lagou.com',
'Upgrade-Insecure-Requests':'',
'Pragma':'no-cache',
},
cookies=all_cookie_dict
)
print('wupeiqi' in r.text)

lagou1.py

import re
import requests
all_cookie_dict = {} # ##################################### 第一步:访问登录页面 #####################################
r1 = requests.get(
url='https://passport.lagou.com/login/login.html',
headers={
'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36'
}
) token = re.findall("X_Anti_Forge_Token = '(.*)';",r1.text)[0]
code = re.findall("X_Anti_Forge_Code = '(.*)';",r1.text)[0]
r1_cookie_dict = r1.cookies.get_dict()
all_cookie_dict.update(r1_cookie_dict) # ##################################### 第二步:去登陆 #####################################
r2 = requests.post(
url='https://passport.lagou.com/login/login.json',
data={
'isValidate':'true',
'username':'',
'password':'',
'request_form_verifyCode':'',
'submit':''
},
headers={
'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36',
'X-Requested-With':'XMLHttpRequest',
'Content-Type':'application/x-www-form-urlencoded; charset=UTF-8',
'Host':'passport.lagou.com',
'Origin':'https://passport.lagou.com',
'Referer':'https://passport.lagou.com/login/login.html',
'X-Anit-Forge-Code':code,
'X-Anit-Forge-Token':token
},
cookies=all_cookie_dict )
r2_response_json = r2.json()
r2_cookie_dict = r2.cookies.get_dict()
all_cookie_dict.update(r2_cookie_dict)
# ##################################### 第三步:grant #####################################
r3 = requests.get(
url='https://passport.lagou.com/grantServiceTicket/grant.html',
headers={
'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36',
'Referer':'https://passport.lagou.com/login/login.html',
'Host':'passport.lagou.com',
},
cookies=all_cookie_dict,
allow_redirects=False )
r3_cookie_dict = r3.cookies.get_dict()
all_cookie_dict.update(r3_cookie_dict)
# ##################################### 第四步:action #####################################
r4 = requests.get(
url=r3.headers['Location'],
headers={
'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36',
'Referer':'https://passport.lagou.com/login/login.html',
'Host':'www.lagou.com',
'Upgrade-Insecure-Requests':'',
},
cookies=all_cookie_dict,
allow_redirects=False )
r4_cookie_dict = r4.cookies.get_dict()
all_cookie_dict.update(r4_cookie_dict) # ##################################### 第五步:获取认证信息 #####################################
r5 = requests.get(
url=r4.headers['Location'],
headers={
'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36',
'Referer':'https://passport.lagou.com/login/login.html',
'Host':'www.lagou.com',
'Upgrade-Insecure-Requests':'',
},
cookies=all_cookie_dict,
allow_redirects=False )
r5_cookie_dict = r5.cookies.get_dict()
all_cookie_dict.update(r5_cookie_dict) print(r5.headers['Location']) # ##################################### 第六步 #####################################
r6 = requests.get(
url=r5.headers['Location'],
headers={
'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36',
'Referer':'https://passport.lagou.com/login/login.html',
'Host':'www.lagou.com',
'Upgrade-Insecure-Requests':'',
},
cookies=all_cookie_dict,
allow_redirects=False )
r6_cookie_dict = r6.cookies.get_dict()
all_cookie_dict.update(r6_cookie_dict) print(r6.headers['Location']) # ##################################### 第七步 #####################################
r7 = requests.get(
url=r6.headers['Location'],
headers={
'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36',
'Referer':'https://passport.lagou.com/login/login.html',
'Host':'www.lagou.com',
'Upgrade-Insecure-Requests':'',
},
cookies=all_cookie_dict,
allow_redirects=False )
r7_cookie_dict = r7.cookies.get_dict()
all_cookie_dict.update(r7_cookie_dict) # ##################################### 第八步:查看个人信息 #####################################
r8 = requests.get(
url='https://gate.lagou.com/v1/neirong/account/users/0/',
headers={
'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36',
'Host':'gate.lagou.com',
'Pragma':'no-cache',
'Referer':'https://account.lagou.com/v2/account/userinfo.html',
'X-L-REQ-HEADER':'{deviceType:1}'
},
cookies=all_cookie_dict
)
r8_response_json = r8.json()
# print(r8_response_json)
all_cookie_dict.update(r8.cookies.get_dict()) # ##################################### 第九步:查看个人信息 ##################################### r9 = requests.put(
url='https://gate.lagou.com/v1/neirong/account/users/0/',
headers={
'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36',
'Host':'gate.lagou.com',
'Origin':'https://account.lagou.com',
'Referer':'https://account.lagou.com/v2/account/userinfo.html',
'X-L-REQ-HEADER':'{deviceType:1}',
'X-Anit-Forge-Code':r8_response_json.get('submitCode'),
'X-Anit-Forge-Token':r8_response_json.get('submitToken'),
'Content-Type':'application/json;charset=UTF-8',
},
json={"userName":"wupeiqi999","sex":"MALE","portrait":"images/myresume/default_headpic.png","positionName":"...","introduce":"...."},
cookies=all_cookie_dict
) print(r9.text)

lagou2

爬抖音

<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Title</title>
</head>
<body>
<script>
!function (t) {
if (t.__M = t.__M || {},
!t.__M.require) {
var e, n, r = document.getElementsByTagName("head")[0], i = {}, o = {}, a = {}, u = {}, c = {}, s = {},
l = function (t, n) {
if (!(t in u)) {
u[t] = !0;
var i = document.createElement("script");
if (n) {
var o = setTimeout(n, e.timeout);
i.onerror = function () {
clearTimeout(o),
n()
}
;
var a = function () {
clearTimeout(o)
};
"onload" in i ? i.onload = a : i.onreadystatechange = function () {
("loaded" === this.readyState || "complete" === this.readyState) && a()
}
}
return i.type = "text/javascript",
i.src = t,
r.appendChild(i),
i
}
}, f = function (t, e, n) {
var r = i[t] || (i[t] = []);
r.push(e);
var o, a = c[t] || c[t + ".js"] || {}, u = a.pkg;
o = u ? s[u].url || s[u].uri : a.url || a.uri || t,
l(o, n && function () {
n(t)
}
)
};
n = function (t, e) {
"function" != typeof e && (e = arguments[2]),
t = t.replace(/\.js$/i, ""),
o[t] = e;
var n = i[t];
if (n) {
for (var r = 0, a = n.length; a > r; r++)
n[r]();
delete i[t]
}
}
,
e = function (t) {
if (t && t.splice)
return e.async.apply(this, arguments);
t = e.alias(t);
var n = a[t];
if (n)
return n.exports;
var r = o[t];
if (!r)
throw "[ModJS] Cannot find module `" + t + "`";
n = a[t] = {
exports: {}
};
var i = "function" == typeof r ? r.apply(n, [e, n.exports, n]) : r;
return i && (n.exports = i),
n.exports && !n.exports["default"] && Object.defineProperty && Object.isExtensible(n.exports) && Object.defineProperty(n.exports, "default", {
value: n.exports
}),
n.exports
}
,
e.async = function (n, r, i) {
function a(t) {
for (var n, r = 0, h = t.length; h > r; r++) {
var p = e.alias(t[r]);
p in o ? (n = c[p] || c[p + ".js"],
n && "deps" in n && a(n.deps)) : p in s || (s[p] = !0,
l++,
f(p, u, i),
n = c[p] || c[p + ".js"],
n && "deps" in n && a(n.deps))
}
} function u() {
if (0 === l--) {
for (var i = [], o = 0, a = n.length; a > o; o++)
i[o] = e(n[o]);
r && r.apply(t, i)
}
} "string" == typeof n && (n = [n]);
var s = {}
, l = 0;
a(n),
u()
}
,
e.resourceMap = function (t) {
var e, n;
n = t.res;
for (e in n)
n.hasOwnProperty(e) && (c[e] = n[e]);
n = t.pkg;
for (e in n)
n.hasOwnProperty(e) && (s[e] = n[e])
}
,
e.loadJs = function (t) {
l(t)
}
,
e.loadCss = function (t) {
if (t.content) {
var e = document.createElement("style");
e.type = "text/css",
e.styleSheet ? e.styleSheet.cssText = t.content : e.innerHTML = t.content,
r.appendChild(e)
} else if (t.url) {
var n = document.createElement("link");
n.href = t.url,
n.rel = "stylesheet",
n.type = "text/css",
r.appendChild(n)
}
}
,
e.alias = function (t) {
return t.replace(/\.js$/i, "")
}
,
e.timeout = 5e3,
t.__M.define = n,
t.__M.require = e
}
}(this) __M.define("douyin_falcon:node_modules/byted-acrawler/dist/runtime", function (l, e) {
Function(function (l) {
return 'e(e,a,r){(b[e]||(b[e]=t("x,y","x "+e+" y")(r,a)}a(e,a,r){(k[r]||(k[r]=t("x,y","new x[y]("+Array(r+1).join(",x[y]")(1)+")")(e,a)}r(e,a,r){n,t,s={},b=s.d=r?r.d+1:0;for(s["$"+b]=s,t=0;t<b;t)s[n="$"+t]=r[n];for(t=0,b=s=a;t<b;t)s[t]=a[t];c(e,0,s)}c(t,b,k){u(e){v[x]=e}f{g=,ting(bg)}l{try{y=c(t,b,k)}catch(e){h=e,y=l}}for(h,y,d,g,v=[],x=0;;)switch(g=){case 1:u(!)4:f5:u((e){a=0,r=e;{c=a<r;c&&u(e[a]),c}}(6:y=,u((y8:if(g=,lg,g=,y===c)b+=g;else if(y!==l)y9:c10:u(s(11:y=,u(+y)12:for(y=f,d=[],g=0;g<y;g)d[g]=y.charCodeAt(g)^g+y;u(String.fromCharCode.apply(null,d13:y=,h=delete [y]14:59:u((g=)?(y=x,v.slice(x-=g,y:[])61:u([])62:g=,k[0]=65599*k[0]+k[1].charCodeAt(g)>>>065:h=,y=,[y]=h66:u(e(t[b],,67:y=,d=,u((g=).x===c?r(g.y,y,k):g.apply(d,y68:u(e((g=t[b])<"<"?(b--,f):g+g,,70:u(!1)71:n72:+f73:u(parseInt(f,3675:if(){bcase 74:g=<<16>>16g76:u(k[])77:y=,u([y])78:g=,u(a(v,x-=g+1,g79:g=,u(k["$"+g])81:h=,[f]=h82:u([f])83:h=,k[]=h84:!085:void 086:u(v[x-1])88:h=,y=,h,y89:u({e{r(e.y,arguments,k)}e.y=f,e.x=c,e})90:null91:h93:h=0:;default:u((g<<16>>16)-16)}}n=this,t=n.Function,s=Object.keys||(e){a={},r=0;for(c in e)a[r]=c;a=r,a},b={},k={};r'.replace(/[-]/g, function (e) {
return l[15 & e.charCodeAt(0)]
})
}("v[x++]=v[--x]t.charCodeAt(b++)-32function return ))++.substrvar .length(),b+=;break;case ;break}".split("")))()('gr$Daten Иb/s!l y͒yĹg,(lfi~ah`{mv,-n|jqewVxp{rvmmx,&effkx[!cs"l".Pq%widthl"@q&heightl"vr*getContextx$"2d[!cs#l#,*;?|u.|uc{uq$fontl#vr(fillTextx$$龘ฑภ경2<[#c}l#2q*shadowBlurl#1q-shadowOffsetXl#$$limeq+shadowColorl#vr#arcx88802[%c}l#vr&strokex[ c}l"v,)}eOmyoZB]mx[ cs!0s$l$Pb<k7l l!r&lengthb%^l$1+s$jl s#i$1ek1s$gr#tack4)zgr#tac$! +0o![#cj?o ]!l$b%s"o ]!l"l$b*b^0d#>>>s!0s%yA0s"l"l!r&lengthb<k+l"^l"1+s"jl s&l&z0l!$ +["cs\'(0l#i\'1ps9wxb&s() &{s)/s(gr&Stringr,fromCharCodes)0s*yWl ._b&s o!])l l Jb<k$.aj;l .Tb<k$.gj/l .^b<k&i"-4j!+& s+yPo!]+s!l!l Hd>&l!l Bd>&+l!l <d>&+l!l 6d>&+l!l &+ s,y=o!o!]/q"13o!l q"10o!],l 2d>& s.{s-yMo!o!]0q"13o!]*Ld<l 4d#>>>b|s!o!l q"10o!],l!& s/yIo!o!].q"13o!],o!]*Jd<l 6d#>>>b|&o!]+l &+ s0l-l!&l-l!i\'1z141z4b/@d<l"b|&+l-l(l!b^&+l-l&zl\'g,)gk}ejo{cm,)|yn~Lij~em["cl$b%@d<l&zl\'l $ +["cl$b%b|&+l-l%8d<@b|l!b^&+ q$sign ', [Object.defineProperty(e, "__esModule", {value: !0})])
}); _bytedAcrawler = __M.require("douyin_falcon:node_modules/byted-acrawler/dist/runtime");
signature = _bytedAcrawler.sign('58841646784')
console.log(signature);
</script>
</body>
</html>

douyin.html

navigator = {
userAgent:"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36"
} !function (t) {
if (t.__M = t.__M || {},
!t.__M.require) {
var e, n, r = "<head> <meta charset=\"utf-8\"><title>快来加入抖音短视频,让你发现最有趣的我!</title><meta name=\"viewport\" content=\"width=device-width,initial-scale=1,user-scalable=0,minimum-scale=1,maximum-scale=1,minimal-ui,viewport-fit=cover\"><meta name=\"format-detection\" content=\"telephone=no\"><meta name=\"baidu-site-verification\" content=\"szjdG38sKy\"><meta name=\"keywords\" content=\"抖音、抖音音乐、抖音短视频、抖音官网、amemv\"><meta name=\"description\" content=\"抖音短视频-记录美好生活的视频平台\"><meta name=\"apple-mobile-web-app-capable\" content=\"yes\"><meta name=\"apple-mobile-web-app-status-bar-style\" content=\"default\"><link rel=\"apple-touch-icon-precomposed\" href=\"//s3.bytecdn.cn/ies/resource/falcon/douyin_falcon/static/image/logo/logo_launcher_v2_40f12f4.png\"><link rel=\"shortcut icon\" href=\"//s3.bytecdn.cn/ies/resource/falcon/douyin_falcon/static/image/logo/favicon_v2_7145ff0.ico\" type=\"image/x-icon\"><meta http-equiv=\"X-UA-Compatible\" content=\"IE=Edge;chrome=1\"><meta name=\"screen-orientation\" content=\"portrait\"><meta name=\"x5-orientation\" content=\"portrait\"><script async=\"\" src=\"//www.google-analytics.com/analytics.js\"></script><script type=\"text/javascript\">!function(){function e(e){return this.config=e,this}e.prototype={reset:function(){var e=Math.min(document.documentElement.clientWidth,750)/750*100;document.documentElement.style.fontSize=e+\"px\";var t=parseFloat(window.getComputedStyle(document.documentElement).fontSize),n=e/t;1!=n&&(document.documentElement.style.fontSize=e*n+\"px\")}},window.Adapter=new e,window.Adapter.reset(),window.onload=function(){window.Adapter.reset()},window.onresize=function(){window.Adapter.reset()}}();</script> <meta name=\"screen-orientation\" content=\"portrait\"><meta name=\"x5-orientation\" content=\"portrait\"><script>tac='i)69eo056r4s!i$1afls\"0,<8~z|\x7f@QGNCJF[\\\\^D\\\\KFYSk~^WSZhg,(lfi~ah`{md\"inb|1d<,%Dscafgd\"in,8[xtm}nLzNEGQMKAdGG^NTY\x1ckgd\"inb<b|1d<g,&TboLr{m,(\x02)!jx-2n&vr$testxg,%@tug{mn ,%vrfkbm[!cb|'</script><script type=\"text/javascript\">!function(){function e(e){return this.config=e,this}e.prototype={reset:function(){var e=Math.min(document.documentElement.clientWidth,750)/750*100;document.documentElement.style.fontSize=e+\"px\";var t=parseFloat(window.getComputedStyle(document.documentElement).fontSize),n=e/t;1!=n&&(document.documentElement.style.fontSize=e*n+\"px\")}},window.Adapter=new e,window.Adapter.reset(),window.onload=function(){window.Adapter.reset()},window.onresize=function(){window.Adapter.reset()}}();</script><meta name=\"pathname\" content=\"aweme_mobile_user\"> <meta name=\"screen-orientation\" content=\"portrait\"><meta name=\"x5-orientation\" content=\"portrait\"><meta name=\"theme-color\" content=\"#161823\"><meta name=\"pathname\" content=\"aweme_mobile_video\"><link rel=\"dns-prefetch\" href=\"//s3.bytecdn.cn/\"><link rel=\"dns-prefetch\" href=\"//s3a.bytecdn.cn/\"><link rel=\"dns-prefetch\" href=\"//s3b.bytecdn.cn/\"><link rel=\"dns-prefetch\" href=\"//s0.pstatp.com/\"><link rel=\"dns-prefetch\" href=\"//s1.pstatp.com/\"><link rel=\"dns-prefetch\" href=\"//s2.pstatp.com/\"><link rel=\"dns-prefetch\" href=\"//v1-dy.ixigua.com/\"><link rel=\"dns-prefetch\" href=\"//v1-dy.ixiguavideo.com/\"><link rel=\"dns-prefetch\" href=\"//v3-dy.ixigua.com/\"><link rel=\"dns-prefetch\" href=\"//v3-dy.ixiguavideo.com/\"><link rel=\"dns-prefetch\" href=\"//v6-dy.ixigua.com/\"><link rel=\"dns-prefetch\" href=\"//v6-dy.ixiguavideo.com/\"><link rel=\"dns-prefetch\" href=\"//v9-dy.ixigua.com/\"><link rel=\"dns-prefetch\" href=\"//v9-dy.ixiguavideo.com/\"><link rel=\"dns-prefetch\" href=\"//v11-dy.ixigua.com/\"><link rel=\"dns-prefetch\" href=\"//v11-dy.ixiguavideo.com/\"><link rel=\"stylesheet\" href=\"//s3.bytecdn.cn/ies/resource/falcon/douyin_falcon/static/style/base_99078a4.css\"><style>@font-face{font-family:iconfont;src:url(//s3a.bytecdn.cn/ies/resource/falcon/douyin_falcon/static/font/iconfont_9eadf2f.eot);src:url(//s3a.bytecdn.cn/ies/resource/falcon/douyin_falcon/static/font/iconfont_9eadf2f.eot#iefix) format('embedded-opentype'),url(//s3.bytecdn.cn/ies/resource/falcon/douyin_falcon/static/font/iconfont_9eb9a50.woff) format('woff'),url(//s3a.bytecdn.cn/ies/resource/falcon/douyin_falcon/static/font/iconfont_da2e2ef.ttf) format('truetype'),url(//s3.bytecdn.cn/ies/resource/falcon/douyin_falcon/static/font/iconfont_31180f7.svg#iconfont) format('svg')}.iconfont{font-family:iconfont!important;font-size:.24rem;font-style:normal;letter-spacing:-.045rem;margin-left:-.085rem}@font-face{font-family:icons;src:url(//s3.bytecdn.cn/ies/resource/falcon/douyin_falcon/static/icons/iconfont_2f1b1cd.eot);src:url(//s3.bytecdn.cn/ies/resource/falcon/douyin_falcon/static/icons/iconfont_2f1b1cd.eot#iefix) format('embedded-opentype'),url(//s3.bytecdn.cn/ies/resource/falcon/douyin_falcon/static/icons/iconfont_87ad39c.woff) format('woff'),url(//s3.bytecdn.cn/ies/resource/falcon/douyin_falcon/static/icons/iconfont_5848858.ttf) format('truetype'),url(//s3a.bytecdn.cn/ies/resource/falcon/douyin_falcon/static/icons/iconfont_20c7f77.svg#iconfont) format('svg')}.icons{font-family:icons!important;font-size:.24rem;font-style:normal;-webkit-font-smoothing:antialiased;-webkit-text-stroke-width:.2px;-moz-osx-font-smoothing:grayscale}@font-face{font-family:Ies;src:url(//s3.bytecdn.cn/ies/resource/falcon/douyin_falcon/static/icons/Ies_317064f.woff2?ba9fc668cd9544e80b6f5998cdce1672) format(\"woff2\"),url(//s3a.bytecdn.cn/ies/resource/falcon/douyin_falcon/static/icons/Ies_a07f3d4.woff?ba9fc668cd9544e80b6f5998cdce1672) format(\"woff\"),url(//s3.bytecdn.cn/ies/resource/falcon/douyin_falcon/static/icons/Ies_4c0d8be.ttf?ba9fc668cd9544e80b6f5998cdce1672) format(\"truetype\"),url(//s3.bytecdn.cn/ies/resource/falcon/douyin_falcon/static/icons/Ies_1ac3f94.svg?ba9fc668cd9544e80b6f5998cdce1672#Ies) format(\"svg\")}i{line-height:1}i[class^=ies-]:before,i[class*=\" ies-\"]:before{font-family:Ies!important;font-style:normal;font-weight:400!important;font-variant:normal;text-transform:none;line-height:1;-webkit-font-smoothing:antialiased;-moz-osx-font-smoothing:grayscale}.ies-checked:before{content:\"\\f101\"}.ies-chevron-left:before{content:\"\\f102\"}.ies-chevron-right:before{content:\"\\f103\"}.ies-clear:before{content:\"\\f104\"}.ies-close:before{content:\"\\f105\"}.ies-copy:before{content:\"\\f106\"}.ies-delete:before{content:\"\\f107\"}.ies-edit:before{content:\"\\f108\"}.ies-help-circle:before{content:\"\\f109\"}.ies-info:before{content:\"\\f10a\"}.ies-loading:before{content:\"\\f10b\"}.ies-location:before{content:\"\\f10c\"}.ies-paste:before{content:\"\\f10d\"}.ies-plus:before{content:\"\\f10e\"}.ies-query:before{content:\"\\f10f\"}.ies-remove:before{content:\"\\f110\"}.ies-search:before{content:\"\\f111\"}.ies-settings:before{content:\"\\f112\"}.ies-shopping-bag:before{content:\"\\f113\"}.ies-sort-left:before{content:\"\\f114\"}.ies-sort-right:before{content:\"\\f115\"}.ies-title-decorate-left:before{content:\"\\f116\"}.ies-title-decorate-right:before{content:\"\\f117\"}.ies-triangle-right:before{content:\"\\f118\"}.ies-triangle-top:before{content:\"\\f119\"}.ies-video:before{content:\"\\f11a\"}</style> <link rel=\"stylesheet\" href=\"//s3a.bytecdn.cn/ies/resource/falcon/douyin_falcon/component/loading/index_5108ff2.css\">\n" +
"<link rel=\"stylesheet\" href=\"//s3.bytecdn.cn/ies/resource/falcon/douyin_falcon/component/banner/index_3941ffc.css\">\n" +
"<link rel=\"stylesheet\" href=\"//s3a.bytecdn.cn/ies/resource/falcon/douyin_falcon/component/common/openBrowser/index_2c31596.css\">\n" +
"<link rel=\"stylesheet\" href=\"//s3a.bytecdn.cn/ies/resource/falcon/douyin_falcon/page/reflow_user/index_ecb0bc9.css\">\n" +
"<link rel=\"stylesheet\" href=\"//s3a.bytecdn.cn/ies/resource/falcon/douyin_falcon/pkg/video_93fd288.css\"></head>", i = {}, o = {}, a = {}, u = {}, c = {}, s = {},
l = function (t, n) {
if (!(t in u)) {
u[t] = !0;
var i = document.createElement("script");
if (n) {
var o = setTimeout(n, e.timeout);
i.onerror = function () {
clearTimeout(o),
n()
}
;
var a = function () {
clearTimeout(o)
};
"onload" in i ? i.onload = a : i.onreadystatechange = function () {
("loaded" === this.readyState || "complete" === this.readyState) && a()
}
}
return i.type = "text/javascript",
i.src = t,
r.appendChild(i),
i
}
}, f = function (t, e, n) {
var r = i[t] || (i[t] = []);
r.push(e);
var o, a = c[t] || c[t + ".js"] || {}, u = a.pkg;
o = u ? s[u].url || s[u].uri : a.url || a.uri || t,
l(o, n && function () {
n(t)
}
)
};
n = function (t, e) {
"function" != typeof e && (e = arguments[2]),
t = t.replace(/\.js$/i, ""),
o[t] = e;
var n = i[t];
if (n) {
for (var r = 0, a = n.length; a > r; r++)
n[r]();
delete i[t]
}
}
,
e = function (t) {
if (t && t.splice)
return e.async.apply(this, arguments);
t = e.alias(t);
var n = a[t];
if (n)
return n.exports;
var r = o[t];
if (!r)
throw "[ModJS] Cannot find module `" + t + "`";
n = a[t] = {
exports: {}
};
var i = "function" == typeof r ? r.apply(n, [e, n.exports, n]) : r;
return i && (n.exports = i),
n.exports && !n.exports["default"] && Object.defineProperty && Object.isExtensible(n.exports) && Object.defineProperty(n.exports, "default", {
value: n.exports
}),
n.exports
}
,
e.async = function (n, r, i) {
function a(t) {
for (var n, r = 0, h = t.length; h > r; r++) {
var p = e.alias(t[r]);
p in o ? (n = c[p] || c[p + ".js"],
n && "deps" in n && a(n.deps)) : p in s || (s[p] = !0,
l++,
f(p, u, i),
n = c[p] || c[p + ".js"],
n && "deps" in n && a(n.deps))
}
} function u() {
if (0 === l--) {
for (var i = [], o = 0, a = n.length; a > o; o++)
i[o] = e(n[o]);
r && r.apply(t, i)
}
} "string" == typeof n && (n = [n]);
var s = {}
, l = 0;
a(n),
u()
}
,
e.resourceMap = function (t) {
var e, n;
n = t.res;
for (e in n)
n.hasOwnProperty(e) && (c[e] = n[e]);
n = t.pkg;
for (e in n)
n.hasOwnProperty(e) && (s[e] = n[e])
}
,
e.loadJs = function (t) {
l(t)
}
,
e.loadCss = function (t) {
if (t.content) {
var e = document.createElement("style");
e.type = "text/css",
e.styleSheet ? e.styleSheet.cssText = t.content : e.innerHTML = t.content,
r.appendChild(e)
} else if (t.url) {
var n = document.createElement("link");
n.href = t.url,
n.rel = "stylesheet",
n.type = "text/css",
r.appendChild(n)
}
}
,
e.alias = function (t) {
return t.replace(/\.js$/i, "")
}
,
e.timeout = 5e3,
t.__M.define = n,
t.__M.require = e
}
}(this) this.__M.define("douyin_falcon:node_modules/byted-acrawler/dist/runtime", function (l, e) {
Function(function (l) {
return 'e(e,a,r){(b[e]||(b[e]=t("x,y","x "+e+" y")(r,a)}a(e,a,r){(k[r]||(k[r]=t("x,y","new x[y]("+Array(r+1).join(",x[y]")(1)+")")(e,a)}r(e,a,r){n,t,s={},b=s.d=r?r.d+1:0;for(s["$"+b]=s,t=0;t<b;t)s[n="$"+t]=r[n];for(t=0,b=s=a;t<b;t)s[t]=a[t];c(e,0,s)}c(t,b,k){u(e){v[x]=e}f{g=,ting(bg)}l{try{y=c(t,b,k)}catch(e){h=e,y=l}}for(h,y,d,g,v=[],x=0;;)switch(g=){case 1:u(!)4:f5:u((e){a=0,r=e;{c=a<r;c&&u(e[a]),c}}(6:y=,u((y8:if(g=,lg,g=,y===c)b+=g;else if(y!==l)y9:c10:u(s(11:y=,u(+y)12:for(y=f,d=[],g=0;g<y;g)d[g]=y.charCodeAt(g)^g+y;u(String.fromCharCode.apply(null,d13:y=,h=delete [y]14:59:u((g=)?(y=x,v.slice(x-=g,y:[])61:u([])62:g=,k[0]=65599*k[0]+k[1].charCodeAt(g)>>>065:h=,y=,[y]=h66:u(e(t[b],,67:y=,d=,u((g=).x===c?r(g.y,y,k):g.apply(d,y68:u(e((g=t[b])<"<"?(b--,f):g+g,,70:u(!1)71:n72:+f73:u(parseInt(f,3675:if(){bcase 74:g=<<16>>16g76:u(k[])77:y=,u([y])78:g=,u(a(v,x-=g+1,g79:g=,u(k["$"+g])81:h=,[f]=h82:u([f])83:h=,k[]=h84:!085:void 086:u(v[x-1])88:h=,y=,h,y89:u({e{r(e.y,arguments,k)}e.y=f,e.x=c,e})90:null91:h93:h=0:;default:u((g<<16>>16)-16)}}n=this,t=n.Function,s=Object.keys||(e){a={},r=0;for(c in e)a[r]=c;a=r,a},b={},k={};r'.replace(/[-]/g, function (e) {
return l[15 & e.charCodeAt(0)]
})
}("v[x++]=v[--x]t.charCodeAt(b++)-32function return ))++.substrvar .length(),b+=;break;case ;break}".split("")))()('gr$Daten Иb/s!l y͒yĹg,(lfi~ah`{mv,-n|jqewVxp{rvmmx,&effkx[!cs"l".Pq%widthl"@q&heightl"vr*getContextx$"2d[!cs#l#,*;?|u.|uc{uq$fontl#vr(fillTextx$$龘ฑภ경2<[#c}l#2q*shadowBlurl#1q-shadowOffsetXl#$$limeq+shadowColorl#vr#arcx88802[%c}l#vr&strokex[ c}l"v,)}eOmyoZB]mx[ cs!0s$l$Pb<k7l l!r&lengthb%^l$1+s$jl s#i$1ek1s$gr#tack4)zgr#tac$! +0o![#cj?o ]!l$b%s"o ]!l"l$b*b^0d#>>>s!0s%yA0s"l"l!r&lengthb<k+l"^l"1+s"jl s&l&z0l!$ +["cs\'(0l#i\'1ps9wxb&s() &{s)/s(gr&Stringr,fromCharCodes)0s*yWl ._b&s o!])l l Jb<k$.aj;l .Tb<k$.gj/l .^b<k&i"-4j!+& s+yPo!]+s!l!l Hd>&l!l Bd>&+l!l <d>&+l!l 6d>&+l!l &+ s,y=o!o!]/q"13o!l q"10o!],l 2d>& s.{s-yMo!o!]0q"13o!]*Ld<l 4d#>>>b|s!o!l q"10o!],l!& s/yIo!o!].q"13o!],o!]*Jd<l 6d#>>>b|&o!]+l &+ s0l-l!&l-l!i\'1z141z4b/@d<l"b|&+l-l(l!b^&+l-l&zl\'g,)gk}ejo{cm,)|yn~Lij~em["cl$b%@d<l&zl\'l $ +["cl$b%b|&+l-l%8d<@b|l!b^&+ q$sign ', [Object.defineProperty(e, "__esModule", {value: !0})])
}); _bytedAcrawler = this.__M.require("douyin_falcon:node_modules/byted-acrawler/dist/runtime"); signature = _bytedAcrawler.sign(process.argv[2])
console.log(signature);

s1.js

import requests

user_id = '58841646784' # 6556303280

# 获取小姐姐的所有作品
"""
signature = _bytedAcrawler.sign('用户ID')
douyin_falcon:node_modules/byted-acrawler/dist/runtime
"""
import subprocess
signature = subprocess.getoutput('node s1.js %s' %user_id) user_video_list = [] # ############################# 获取个人作品 ##########################
user_video_params = {
'user_id': str(user_id),
'count': '21',
'max_cursor': '0',
'aid': '1128',
'_signature': signature,
'dytk': 'b4dceed99803a04a1c4395ffc81f3dbc' # '114f1984d1917343ccfb14d94e7ce5f5'
} def get_aweme_list(max_cursor=None):
if max_cursor:
user_video_params['max_cursor'] = str(max_cursor)
res = requests.get(
url="https://www.douyin.com/aweme/v1/aweme/post/",
params=user_video_params,
headers={
'user-agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36',
'x-requested-with':'XMLHttpRequest',
'referer':'https://www.douyin.com/share/user/58841646784',
}
)
content_json = res.json()
aweme_list = content_json.get('aweme_list', []) user_video_list.extend(aweme_list)
if content_json.get('has_more') == 1:
return get_aweme_list(content_json.get('max_cursor')) get_aweme_list() # ############################# 获取喜欢作品 ########################## favor_video_list = [] favor_video_params = {
'user_id': str(user_id),
'count': '21',
'max_cursor': '0',
'aid': '1128',
'_signature': signature,
'dytk': 'b4dceed99803a04a1c4395ffc81f3dbc'
} def get_favor_list(max_cursor=None):
if max_cursor:
favor_video_params['max_cursor'] = str(max_cursor)
res = requests.get(
url="https://www.douyin.com/aweme/v1/aweme/favorite/",
params=favor_video_params,
headers={
'user-agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36',
'x-requested-with':'XMLHttpRequest',
'referer':'https://www.douyin.com/share/user/58841646784',
}
)
content_json = res.json()
aweme_list = content_json.get('aweme_list', [])
favor_video_list.extend(aweme_list)
if content_json.get('has_more') == 1:
return get_favor_list(content_json.get('max_cursor')) get_favor_list() # ############################# 视频下载 ##########################
for item in user_video_list:
video_id = item['video']['play_addr']['uri'] video = requests.get(
url='https://aweme.snssdk.com/aweme/v1/playwm/',
params={
'video_id':video_id
},
headers={
'user-agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36',
'x-requested-with': 'XMLHttpRequest',
'referer': 'https://www.douyin.com/share/user/58841646784',
},
stream=True
)
file_name = video_id + '.mp4'
with open(file_name,'wb') as f:
for line in video.iter_content():
f.write(line) for item in favor_video_list:
video_id = item['video']['play_addr']['uri'] video = requests.get(
url='https://aweme.snssdk.com/aweme/v1/playwm/',
params={
'video_id':video_id
},
headers={
'user-agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36',
'x-requested-with': 'XMLHttpRequest',
'referer': 'https://www.douyin.com/share/user/58841646784',
},
stream=True
)
file_name = video_id + '.mp4'
with open(file_name, 'wb') as f:
for line in video.iter_content():
f.write(line)

爬抖音

爬虫day02-LMLPHP

05-12 04:03