问题描述
您好,我正在尝试从 Google 表格中的 URL 字符串中提取根域.我知道如何获取域,并且我有删除 www.
的公式,但现在我意识到它不会删除像mysite".site.com 这样的子域前缀;其中 mysite
未从域名中剥离.
Hi I am trying to extract the rootdomain from URL string in Google Sheets. I know how to get the domain and I have the formula to remove www.
but now I realize it does not strip subdomain prefixes like 'mysite'.site.com; where mysite
is not stripped from the domain name.
问题:我怎样才能检索domain.com
rootdomain,其中域字符串与字母数字字符,然后是1个点,然后是字母数字字符(仅此而已)
Question: How can I retrieve the domain.com
rootdomain where the domain string contacts alphanumeric characters, then 1 dot, then alphanumeric characters (and nothing more)
到目前为止 Google 表格中的公式:
Formula so far in Google Sheets:
=REGEXREPLACE(REGEXREPLACE(D3923;"(http(s)?://)?(www.)?";"");"/.*";"")
也许这可以简化...
Maybe this can be simplified ...
测试用例
https://www.domain.com/ => domain.com
https://domain.com/ => domain.com
http://www.domain.nl/ => domain.com
http://domain.de/ => domain.com
http://www.domain.co.uk/ => domain.co.uk
http://domain.co.au/ => domain.co.au
sub.domain.org/ => sub.domain.com
sub.domain.org => sub.domain.com
domain.com => domain.com
http://www.domain.nl?par=1 => domain.com
https://www.domain.nl/test/?par=1 => domain.com
http2://sub2.startpagina.nl/test/?par=1 => domain.com
推荐答案
目前正在使用:
=trim(REGEXEXTRACT(REGEXREPLACE(REGEXREPLACE(A2;"https?://";"");"^(w{3}.)?";"")&"/";"([^/?]+)"))
似乎工作正常
更新:7-7-2016
Updated:7-7-2016
(感谢大家的帮助!)
这篇关于从 Google Sheets 中的 URL 字符串中提取根域的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!