如何检测像Evernote clipper这样的主要文章标签

本文介绍了如何检测像Evernote clipper这样的主要文章标签的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

当我尝试使用

您还可以查看此主题：

或在google上搜索content extraction js lib之类的术语' 例如。
（找到这个：）

希望这会有所帮助

When I tried with Evernote clipper extension,I see a very useful feature.When I clicked at "article", It gives me a really correct main content of page.Let see the result when I used Evernote Clipper with page https://developer.chrome.com/extensions/api_index

I looked at the main article that evernote field out, in several pages, the article is infact extracted from the first article tag. However evernote clipper still work well with pages doesn't use that kind of tag.

I wonder how Evernote clipper can do that ? Is there any js library support to detect the main tag containing the main content of pages. Could you give me some advises to do it.

Thank you in advance!

解决方案

From my knowledge, there is no universal js lib to do that. The Evernote clipper uses its own method to extract the "interesting" content from a web page.You can access the code of the Evernote clipper to try to understand the process.

On my mac, the path to the chrome extension is :

~/Library/Application Support/Google/Chrome/Default/Extensions/pioclpoplcdbaefihamjohnefbikjilc/6.2_0/

Here's another tool that works pretty much the same : https://www.readability.com/

You can also check this thread : What algorithm does Readability use for extracting text from URLs?

or search on google for terms like 'content extraction js lib' for example.(Found this one : https://github.com/hatena/extract-content-javascript)

Hope this helps

这篇关于如何检测像Evernote clipper这样的主要文章标签的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！