问题描述
当我尝试使用
您还可以查看此主题:
或在google上搜索content extraction js lib之类的术语' 例如。
(找到这个:)
希望这会有所帮助
When I tried with Evernote clipper extension,I see a very useful feature.When I clicked at "article", It gives me a really correct main content of page.Let see the result when I used Evernote Clipper with page https://developer.chrome.com/extensions/api_index
I looked at the main article that evernote field out, in several pages, the article is infact extracted from the first article tag. However evernote clipper still work well with pages doesn't use that kind of tag.
I wonder how Evernote clipper can do that ? Is there any js library support to detect the main tag containing the main content of pages. Could you give me some advises to do it.
Thank you in advance!
From my knowledge, there is no universal js lib to do that. The Evernote clipper uses its own method to extract the "interesting" content from a web page.You can access the code of the Evernote clipper to try to understand the process.
On my mac, the path to the chrome extension is :
~/Library/Application Support/Google/Chrome/Default/Extensions/pioclpoplcdbaefihamjohnefbikjilc/6.2_0/
Here's another tool that works pretty much the same : https://www.readability.com/
You can also check this thread : What algorithm does Readability use for extracting text from URLs?
or search on google for terms like 'content extraction js lib' for example.(Found this one : https://github.com/hatena/extract-content-javascript)
Hope this helps
这篇关于如何检测像Evernote clipper这样的主要文章标签的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!