问题描述
Instagram过去在终结点https://www.instagram.com/<username>/?__a=1
下将开放数据公开为json.一夜之间发生了变化,该端点不再可用.什么是新端点,或者可以替代此端点?
Instagram used to expose open data as json under the endpoint https://www.instagram.com/<username>/?__a=1
. This changed over night, the endpoint is not available anymore. What is the new endpoint or what could be an alternative to this?
提前谢谢!
推荐答案
该端点不再存在.由于丑闻,Facebook正在限制API.当然数据仍然在那里,Instagram的前端需要它,所以现在的替代方案是抓取页面并在其中找到json数据.这是我的方法:
The endpoint does not exist anymore. Facebook is restricting APIs because of scandals. The data is still there of course, Instagram's frontend needs it, so the alternative right now is to scrape the page and find the json data there. Here is how I do it:
- 对
https://www.instagram.com/<username>
进行http访问. - 查找
script
标记,该标记的文本以window._sharedData =
开头.您可以为此使用正则表达式或抓取库. - 其余文本(末尾的
;
除外)是您想要的json数据. - 将字符串化的json转换为json以便像以前一样访问它.
- "entry_data"键中"ProfilePage"键中的第一个元素与旧端点返回的json完全对应.
- Do an http get to to
https://www.instagram.com/<username>
. - Look for the
script
tag which text's starts withwindow._sharedData =
. You can use regular expressions or a scraping library for this. - The rest of the text (except for the
;
at the end) is the json data you want. - Cast the stringified json into json in order to access it like before.
- The first element in the 'ProfilePage' key in the 'entry_data' key corresponds exactly to the json returned by the old endpoint.
以下是使用Python的示例:
Here is an example using Python:
import requests
from bs4 import BeautifulSoup
import re
import json
r = requests.get('https://www.instagram.com/github/')
soup = BeautifulSoup(r.content)
scripts = soup.find_all('script', type="text/javascript", text=re.compile('window._sharedData'))
stringified_json = scripts[0].get_text().replace('window._sharedData = ', '')[:-1]
json.loads(stringified_json)['entry_data']['ProfilePage'][0]
Out[1]:
{u'graphql': {u'user': {u'biography': u'How people build software.',
u'blocked_by_viewer': False,
...
}
这篇关于什么是新的instagram json终结点?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!