问题描述
我正在尝试(或者只是找到一个现有的)PHP方法,它可以链接并提取url。诀窍在于,它需要承受奇怪的域名的重量,如:
I'm trying to write (or just find an existing) PHP method that can take a link and extract the url. The trick is, it needs to hold under the weight of strange looking domains like:
www.champa.kku.ac.th
我用人的眼睛看着这个人,我还是猜测不正确:认为域将是 kku.ac.th
但是访问时会出现dns错误。
Looking at this one myself with human eyes, I still guessed it incorrectly: thought the domain would be kku.ac.th
but that gives a dns error when visiting.
所以任何人都知道从url可靠地提取域名的好方法:
So anyone knows of a good way to reliably extract the domain from url:
http://site.com/hello.php
http://site.com.uk/hello.php
http://subdomain.site.com/hello.php
http://subdomain.site.com.uk/hello.php
http://www.champa.kku.ac.th/hello.php // and even the one I couldn't tell
推荐答案
PHP具有功能,将帮助您将基本拆分成协议,主机,端口等。
PHP has the parse_url() function that will help you do the basic splitting into protocol, host, port, and so on.
为了在不确定的情况下提取正确的域名,这是非常难以告诉的,因为有时候,两部分TLD是TLD权限(例如在英国)的措施,有时候是私人企业(例如 .uk.com
)。我想你不会在维护名单上列出两个部分,如
As to extracting the "right" domain in uncertain cases, this is extremely hard to tell because sometimes, "two-part TLDs" are a measure by the TLD authority (e.g. in the UK) and sometimes are private enterprises (e.g. .uk.com
). I think you won't get around maintaining lists of top level domains that have two parts like
- .co.uk
- .ac.uk
- .ac.th
将被视为TLD(顶级级域),吞咽第二部分。
those endings would be treated like TLDs (Top level domains), swallowing the second part.
这是可靠地将两部分TLD分开的唯一方法,如 .co.uk
- 其中 server1.ibm.co.uk
(需要删除两部分 .co.uk
以确定域名本身)从 server1.ibm.com
(其中 .com
需要删除的常规子域)。
This is the only way of reliably telling apart "two-part TLDs" like .co.uk
- where server1.ibm.co.uk
(where the two-part .co.uk
needs to be removed to determine the domain itself) from regular sub-domains like server1.ibm.com
(where .com
needs to be removed).
获取许多重要的两部分TLD列表的一个好的起点是在speednames.com的域名搜索(在国家中选择全部)。可以找到更完整的列表,如。
A good starting point to get a list of many important "two-part TLDs" is the domain search at speednames.com (select "all" in countries). A more complete list can be found as part of the Ruby domainatrix library.
这篇关于从url提取域(包括硬盘)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!