问题描述
我需要提取网址的顶级域名,我得到了他的 http://publicsuffix.org/index.html
i need to extract the top domain of an url and i got his http://publicsuffix.org/index.html
并且java实现在 http://guava-libraries.googlecode.com
中,我找不到
任何提取域的例子姓名
and the java implementation is in http://guava-libraries.googlecode.com
and i could not findany example to extract domain name
say example..
example.google.com
returns google.com
and bing.bing.bing.com
returns bing.com
可以任何人都告诉我如何使用这个库实现一个例子....
can any one tell me how can i implement using this library with an example....
推荐答案
它在我看来像 完全你想要什么。 Guava维护一个公共后缀列表(基于mozilla在publicsuffix.org上的列表),用于确定主机的公共后缀部分是什么......顶级私有域是公共后缀加上它的第一个孩子。
It looks to me like InternetDomainName.topPrivateDomain() does exactly what you want. Guava maintains a list of public suffixes (based on Mozilla's list at publicsuffix.org) that it uses to determine what the public suffix part of the host is... the top private domain is the public suffix plus its first child.
以下是一个简单示例:
public class Test {
public static void main(String[] args) throws URISyntaxException {
ImmutableList<String> urls = ImmutableList.of(
"http://example.google.com", "http://google.com",
"http://bing.bing.bing.com", "http://www.amazon.co.jp/");
for (String url : urls) {
System.out.println(url + " -> " + getTopPrivateDomain(url));
}
}
private static String getTopPrivateDomain(String url) throws URISyntaxException {
String host = new URI(url).getHost();
InternetDomainName domainName = InternetDomainName.from(host);
return domainName.topPrivateDomain().name();
}
}
运行此代码打印:
http://example.google.com -> google.com
http://google.com -> google.com
http://bing.bing.bing.com -> bing.com
http://www.amazon.co.jp/ -> amazon.co.jp
这篇关于使用java实现Public Suffix提取的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!