shell脚本。如何使用正则表达式提取字符串

本文介绍了shell脚本。如何使用正则表达式提取字符串的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我是新的shell脚本。我想使用curl发送一个http请求，然后使用正则表达式提取一些字符串。例如，如何从http响应中提取域名？（示例仅用于学习目的）

I am new to shell scripts. I want to send a http request using curl and then extract some string using regular expressions. For example, how can I extract a domain name from a http response? (The example is for learning purposes only)

#!/bin/bash
name=$(curl google.com | grep "www\..*com")
echo "domain name is"
echo $name

推荐答案

使用：

re="http://([^/]+)/"
if [[ $name =~ $re ]]; then echo ${BASH_REMATCH[1]}; fi

编辑 - OP要求解释语法。是一个大主题，我不能在这里完全解释，但我会尝试

Edit - OP asked for explanation of syntax. Regular expression syntax is a large topic which I can't explain in full here, but I will attempt to explain enough to understand the example.

re="http://([^/]+)/"

这是存储在bash变量中的正则表达式， re - 即你想要输入的字符串匹配，并希望提取一个子字符串。结束：

This is the regular expression stored in a bash variable, re - i.e. what you want your input string to match, and hopefully extract a substring. Breaking it down:

http：// 只是一个字符串 - 输入字符串必须包含此子字符串以匹配正则表达式

[] 通常使用方括号匹配括号内的任何字符。因此 c [ao] t 会匹配cat和cot。 [] 中的 ^ 字符将此修改为匹配任何字符因此在这种情况下， [^ /] 将匹配除/\".

之外的任何字符
表达式只匹配一个字符，添加一个 + 到它的末尾说匹配1个或多个前面的子表达式因此 [ ^ /] + 匹配所有字符集中的一个或多个，不包括/\".

放置 c $ c>子表达式中的圆括号表示要保存与该子表达式相匹配的任何内容以供以后处理。如果您使用的语言支持这种方式，它将提供一些机制来检索这些子匹配，对于bash，它是BASH_REMATCH数组。
最后，我们对/进行完全匹配，以确保我们完全匹配完全限定域名的结尾和以下/。

http:// is just a string - the input string must contain this substring for the regular expression to match
[] Normally square brackets are used say "match any character within the brackets". So c[ao]t would match both "cat" and "cot". The ^ character within the [] modifies this to say "match any character except those within the square brackets. So in this case [^/] will match any character apart from "/".
The square bracket expression will only match one character. Adding a + to the end of it says "match 1 or more of the preceding sub-expression". So [^/]+ matches 1 or more of the set of all characters, excluding "/".
Putting () parentheses around a subexpression says that you want to save whatever matched that subexpression for later processing. If the language you are using supports this, it will provide some mechanism to retrieve these submatches. For bash, it is the BASH_REMATCH array.
Finally we do an exact match on "/" to make sure we match all the way to end of the fully qualified domain name and the following "/"

接下来，我们必须针对正则表达式测试输入字符串，看看它是否匹配。我们可以使用bash条件：

Next, we have to test the input string against the regular expression to see if it matches. We can use a bash conditional to do that:

if [[ $name =~ $re ]]; then
    echo ${BASH_REMATCH[1]}
fi

[[]] 指定扩展条件测试，并且可以包含 =〜 bash正则表达式运算符。在这种情况下，我们测试输入字符串 $ name 是否匹配正则表达式 $ re 。如果它匹配，那么由于正则表达式的构造，我们保证我们将有一个子匹配（从括号（）），我们可以访问它使用BASH_REMATCH数组：

In bash, the [[ ]] specify an extended conditional test, and may contain the =~ bash regular expression operator. In this case we test whether the input string $name matches the regular expression $re. If it does match, then due to the construction of the regular expression, we are guaranteed that we will have a submatch (from the parentheses ()), and we can access it using the BASH_REMATCH array:

此数组的元素0 $ {BASH_REMATCH [0]} 将是由正则表达式匹配的整个字符串，即中，更改了有关此类字面正则表达式引用的规则是否需要。将正则表达式放在一个单独的变量中是一个简单的方法，所以条件在支持 =〜匹配运算符的所有bash版本中正常工作。 Note that instead of setting the $re variable on a separate line and referring to this variable in the condition, you can put the regular expression directly into the condition. However in bash 3.2, the rules were changed regarding whether quotes around such literal regular expressions are required or not. Putting the regular expression in a separate variable is a straightforward way around this, so that the condition works as expected in all bash versions that support the =~ match operator. 这篇关于shell脚本。如何使用正则表达式提取字符串的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！