如何从具有网址的网页中阅读Open Graph和meta标签

本文介绍了如何从具有网址的网页中阅读Open Graph和meta标签的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想让我的网站能够在用户粘贴到Facebook的邮箱中的链接时，提取有关网页的信息。类似于Facebook。

I want my website to be able to pull up information about a web page when the user pastes a link into the post box, similar to Facebook.

我是想知道谷歌，Reddit和Facebook等网站如何才能使用URL来检索缩略图，标题和说明。

I was wondering how sites like Google, Reddit and Facebook are able to retrieve thumbnails, titles and descriptions with just a URL.

任何人都知道他们是如何做到的？ b $ b

Anyone know how they do this?

推荐答案

基本算法相当简单：获取页面，分析内容，提取文本和图像以及标题和任何内容，构建预览。
但是对于特定的用例来说有很多困难。菜单，横幅和添加，文本结构 - 大量不同的细节，需要非常仔细的处理。 AFAIK没有可以在100％的情况下解决这个任务的算法（是的，Google和其他算法不完美）。

The basic algorithm is rather simple: fetch the page, analyze content, extract text&images&title&whatever, build preview.However there are a lot of difficulties for particular use cases. Menus, banners and adds, text structure - plenty of different details that require very scrupulous processing. AFAIK there is no algorithm that can solve this task in 100% cases (yes, Google's and other algorighms aren't perfect).

关于Reddit。由于它是开放的，你可以找到它们如何做到这一点。
以下是您要查找的代码：

About Reddit. Since it's opensourced, you can find how they do it exactly.Here is the code you're looking for: https://github.com/reddit/reddit/blob/master/r2/r2/lib/scraper.py

Yandex有API可以做同样的事情。您可以在和。

Yandex has API that allows to do the same. You can find more here and here.

这篇关于如何从具有网址的网页中阅读Open Graph和meta标签的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！