我正在尝试在google图片网页上创建所有图片文件的数组。
我希望正则表达式在"imagurl="
之后和"&"
之前提取所有内容,如此html所示:
<a href="http://www.google.com/imgres?imgurl=http://www.trendytree.com/old-world- christmas/images/20031chapel20031-silent-night-chapel.jpg&imgrefurl=http://www.trendytree.com/old-world-christmas/silent-night-chapel-20031-christmas-ornament-old-world-christmas.html&usg=__YJdf3xc4ydSfLQa9tYnAzavKHYQ=&h=400&w=400&sz=58&hl=en&start=19&zoom=1&tbnid=ajDcsGGs0tgE9M:&tbnh=124&tbnw=124&ei=qagfUbXmHKfv0QHI3oG4CQ&itbs=1&sa=X&ved=0CE4QrQMwEg"><img height="124" width="124" src="https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcRLy5inpSdHxWuE7z3QSZw35JwN3upbBaLr11LR25noTKbSMn9-qrySSg"></a><br><cite title="trendytree.com">trendytree.com</cite><br>Silent Night Chapel <b>20031</b><br>400 × 400 - 58k - jpg</td>
我觉得我可以用regex来实现这一点,但是我找不到用regex搜索我的解析文档的方法,但是我没有找到任何解决方案。
最佳答案
str = '<a href="http://www.google.com/imgres?imgurl=http://www.trendytree.com/old-world- christmas/images/20031chapel20031-silent-night-chapel.jpg&imgrefurl=http://www.trendytree.com/old-world-christmas/silent-night-chapel-20031-christmas-ornament-old-world-christmas.html&usg=__YJdf3xc4ydSfLQa9tYnAzavKHYQ=&h=400&w=400&sz=58&hl=en&start=19&zoom=1&tbnid=ajDcsGGs0tgE9M:&tbnh=124&tbnw=124&ei=qagfUbXmHKfv0QHI3oG4CQ&itbs=1&sa=X&ved=0CE4QrQMwEg"><img height="124" width="124" src="https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcRLy5inpSdHxWuE7z3QSZw35JwN3upbBaLr11LR25noTKbSMn9-qrySSg"></a><br><cite title="trendytree.com">trendytree.com</cite><br>Silent Night Chapel <b>20031</b><br>400 × 400 - 58k - jpg</td>'
str.split('imgurl=')[1].split('&')[0]
#=> "http://www.trendytree.com/old-world- christmas/images/20031chapel20031-silent-night-chapel.jpg"
这就是你要找的吗?
关于ruby - 如何使用Ruby和Nokogiri解析Google图片网址?,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/14912392/