我正在使用curl来获取网页并将其存储在python中的变量中。

var = '<body><img src=\"https://cdn.neow.in/news/images/uploaded/2018/06/Gaming sloop, preview group, hardware _mediump.jpg\" style=\"display: none;\"/><div class=\"wrapper\">'


我只想要来自字符串的链接,例如:

"https://cdn.neow.in/news/images/uploaded/2018/06/Gaming sloop, preview group, hardware _mediump.jpg"


我尝试通过将正则表达式的开头定义为“(https | http)并将结尾定义为”来与正则表达式进行匹配:

x = re.findall(r'"(https|http)*"$', var)
print(x)


但是我没有得到输出。请帮助我,在此先感谢。

>>>[]

最佳答案

@Manoj,您还可以使用src方法多次检索split()属性的值,如下所示。

»使用lambda函数(1行语句)

var = '<body><img src=\"https://cdn.neow.in/news/images/uploaded/2018/06/Gaming sloop, preview group, hardware _mediump.jpg\" style=\"display: none;\"/><div class=\"wrapper\">'

get_url = lambda html: html.split('=')[1].split('\"')[1]
print(get_url(var))


»输出

https://cdn.neow.in/news/images/uploaded/2018/06/Gaming sloop, preview group, hardware _mediump.jpg


让我们在多条语句中扩展上述方法,以了解实际的直接过程。

var = '<body><img src=\"https://cdn.neow.in/news/images/uploaded/2018/06/Gaming sloop, preview group, hardware _mediump.jpg\" style=\"display: none;\"/><div class=\"wrapper\">'
print(var, "\n")
# <body><img src="https://cdn.neow.in/news/images/uploaded/2018/06/Gaming sloop, preview group, hardware _mediump.jpg" style="display: none;"/><div class="wrapper">

parts1 = var.split("=")
print(parts1, "\n")
# ['<body><img src', '"https://cdn.neow.in/news/images/uploaded/2018/06/Gaming sloop, preview group, hardware _mediump.jpg" style', '"display: none;"/><div class', '"wrapper">']

parts2 = parts1[1].split('\"')
print(parts2, "\n")
# ['', 'https://cdn.neow.in/news/images/uploaded/2018/06/Gaming sloop, preview group, hardware _mediump.jpg', ' style']

print(parts2[1])
# https://cdn.neow.in/news/images/uploaded/2018/06/Gaming sloop, preview group, hardware _mediump.jpg


»输出

E:\Users\Rishikesh\Python3\Practice\Temp>python GetUrls.py
<body><img src="https://cdn.neow.in/news/images/uploaded/2018/06/Gaming sloop, preview group, hardware _mediump.jpg" style="display: none;"/><div class="wrapper">

['<body><img src', '"https://cdn.neow.in/news/images/uploaded/2018/06/Gaming sloop, preview group, hardware _mediump.jpg" style', '"display: none;"/><div class', '"wrapper">']

['', 'https://cdn.neow.in/news/images/uploaded/2018/06/Gaming sloop, preview group, hardware _mediump.jpg', ' style']



https://cdn.neow.in/news/images/uploaded/2018/06/Gaming sloop, preview group, hardware _mediump.jpg

关于python - 如何从python中的字符串获取url,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/50887460/

10-10 00:58
查看更多