本文介绍了如何从具有网址的网页中阅读Open Graph和meta标签的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想让我的网站能够在用户粘贴到Facebook的邮箱中的链接时,提取有关网页的信息。类似于Facebook。

I want my website to be able to pull up information about a web page when the user pastes a link into the post box, similar to Facebook.

我是想知道谷歌,Reddit和Facebook等网站如何才能使用URL来检索缩略图,标题和说明。

I was wondering how sites like Google, Reddit and Facebook are able to retrieve thumbnails, titles and descriptions with just a URL.

任何人都知道他们是如何做到的? b $ b

Anyone know how they do this?

推荐答案

基本算法相当简单:获取页面,分析内容,提取文本和图像以及标题和任何内容,构建预览。
但是对于特定的用例来说有很多困难。菜单,横幅和添加,文本结构 - 大量不同的细节,需要非常仔细的处理。 AFAIK没有可以在100%的情况下解决这个任务的算法(是的,Google和其他算法不完美)。

The basic algorithm is rather simple: fetch the page, analyze content, extract text&images&title&whatever, build preview.However there are a lot of difficulties for particular use cases. Menus, banners and adds, text structure - plenty of different details that require very scrupulous processing. AFAIK there is no algorithm that can solve this task in 100% cases (yes, Google's and other algorighms aren't perfect).

关于Reddit。由于它是开放的,你可以找到它们如何做到这一点。
以下是您要查找的代码:

About Reddit. Since it's opensourced, you can find how they do it exactly.Here is the code you're looking for: https://github.com/reddit/reddit/blob/master/r2/r2/lib/scraper.py

Yandex有API可以做同样的事情。您可以在和。

Yandex has API that allows to do the same. You can find more here and here.

这篇关于如何从具有网址的网页中阅读Open Graph和meta标签的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-23 14:40