问题描述
我想让我的网站能够在用户粘贴到Facebook的邮箱中的链接时,提取有关网页的信息。类似于Facebook。
I want my website to be able to pull up information about a web page when the user pastes a link into the post box, similar to Facebook.
我是想知道谷歌,Reddit和Facebook等网站如何才能使用URL来检索缩略图,标题和说明。
I was wondering how sites like Google, Reddit and Facebook are able to retrieve thumbnails, titles and descriptions with just a URL.
任何人都知道他们是如何做到的? b $ b
Anyone know how they do this?
推荐答案
基本算法相当简单:获取页面,分析内容,提取文本和图像以及标题和任何内容,构建预览。
但是对于特定的用例来说有很多困难。菜单,横幅和添加,文本结构 - 大量不同的细节,需要非常仔细的处理。 AFAIK没有可以在100%的情况下解决这个任务的算法(是的,Google和其他算法不完美)。
The basic algorithm is rather simple: fetch the page, analyze content, extract text&images&title&whatever, build preview.However there are a lot of difficulties for particular use cases. Menus, banners and adds, text structure - plenty of different details that require very scrupulous processing. AFAIK there is no algorithm that can solve this task in 100% cases (yes, Google's and other algorighms aren't perfect).
关于Reddit。由于它是开放的,你可以找到它们如何做到这一点。
以下是您要查找的代码:
About Reddit. Since it's opensourced, you can find how they do it exactly.Here is the code you're looking for: https://github.com/reddit/reddit/blob/master/r2/r2/lib/scraper.py
Yandex有API可以做同样的事情。您可以在和。
Yandex has API that allows to do the same. You can find more here and here.
这篇关于如何从具有网址的网页中阅读Open Graph和meta标签的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!