我想了解当我想从网页中提取内容时,scrapy的工作原理
所以我正在使用此页面练习https://banco.santander.cl/beneficios?segmento=s-personas
现在,我要获取以下URL:
https://banco.santander.cl/beneficios/todos?segmento=s-personas&categoria=shopping
https://banco.santander.cl/beneficios/todos?segmento=s-personas&categoria=sabores
https://banco.santander.cl/beneficios/todos?segmento=s-personas&categoria=decoracion
https://banco.santander.cl/beneficios/todos?segmento=s-personas&categoria=salud-y-belleza
https://banco.santander.cl/beneficios/todos?segmento=s-personas&categoria=panoramas
https://banco.santander.cl/beneficios/todos?segmento=s-personas&categoria=otros
这就是它们在HTML代码中的位置:
<!-- ==================================== HEADER BENEFICIOS ======================================================== -->
<div class="transition navFixed">
<div id="webviewHeaderBeneficios" class="headerBeneficios">
<div class="container">
<div class="row" style="position: relative;">
<a href="" class="nextMenu hidden-md hidden-lg" target="_self"><i class="fa fa-angle-right" aria-hidden="true"></i></a>
<div class="col-xs-12 col-sm-12 col-md-8 beneficiosNav">
<a href="" class="overlayHeader" ng-show="search.length > 0" ng-click="search = null" target="_self"></a>
<ul class="nav nav-pills nav-justified">
<li role="presentation" class=""><a href="https://banco.santander.cl/beneficios" class="icon-home transition ev-pers-home" target="_self"><span><svg width="41" height="36" viewBox="0 0 41 36" xmlns="http://www.w3.org/2000/svg"><title>41x36</title><g fill="" fill-rule="evenodd"><path d="M31.46 12.3V6.9c0-1-.8-1.8-1.78-1.8s-1.77.8-1.77 1.8v1.86l-3.5-3.5c-1.72-1.72-4.72-1.72-6.45 0L6.2 17.02c-.68.7-.68 1.83 0 2.52.7.7 1.83.7 2.52 0L20.46 7.78c.4-.38 1.06-.38 1.44 0l11.74 11.74c.35.35.8.52 1.26.52.45 0 .9-.17 1.26-.52.7-.7.7-1.82 0-2.5l-4.7-4.7z"/><path d="M20.4 11.12L10.1 21.45c-.16.16-.25.38-.25.6v7.54c0 1.76 1.43 3.2 3.2 3.2h5.1v-7.93h5.78v7.92h5.13c1.76 0 3.2-1.44 3.2-3.2v-7.54c0-.23-.1-.45-.26-.6L21.64 11.1c-.34-.34-.9-.34-1.23 0z"/></g></svg><br><p>Inicio</p></span></a></li>
<li role="presentation" class=""><a href="https://banco.santander.cl/beneficios/todos?segmento=s-personas&categoria=shopping" class="icon-shopping transition ev-pers-shopping" target="_self"><span><svg width="41" height="36" viewBox="0 0 41 36" xmlns="http://www.w3.org/2000/svg"><title>41x36</title><g fill="" fill-rule="evenodd"><path d="M26 12.76v-3.5C26 5.82 23.4 3 20.2 3c-3.2 0-5.82 2.8-5.82 6.27v3.5c0 .42.32.77.72.77.4 0 .73-.35.73-.78v-3.5c0-2.6 1.96-4.7 4.36-4.7 2.4 0 4.36 2.1 4.36 4.7v3.5c0 .43.32.78.72.78.4 0 .72-.35.72-.78"/><path d="M29.83 10.88c-.04-1-.83-1.16-.83-1.16h-1.62v2.1l-.02 1.02c0 1.18-.95 2.13-2.13 2.13s-2.13-.95-2.13-2.13c0-.37.08-1.02.08-1.02v-2.1h-5.95l-.03 2.1s.04.65.04 1.02c0 1.18-.95 2.13-2.13 2.13-1.16 0-2.12-.95-2.12-2.13 0-.37.03-1.02.03-1.02v-2.1h-1.26s-.98 0-1.17.98C10.4 11.57 9 29.7 9 30.2c0 1.8 1.94 1.92 1.94 1.92s17.77.12 18.45 0c.66-.12 1.9-.98 1.8-2.52-.4-5.46-1.34-17.94-1.37-18.72z"/></g></svg><br><p>Shopping</p></span></a></li>
<li role="presentation" class=""><a href="https://banco.santander.cl/beneficios/todos?segmento=s-personas&categoria=sabores" class="icon-sabores transition ev-pers-sabores" target="_self"><span><svg width="41" height="36" viewBox="0 0 41 36" xmlns="http://www.w3.org/2000/svg"><title>41x36</title><g fill="" fill-rule="evenodd"><path d="M34.97 5.65c0-1-.34-1.82-1-2.47-.22-.22-.57-.24-.82-.05-.1.1-2.14 2.1-25.95 25.73-.7.7-.7 1.87 0 2.57.35.35.8.54 1.3.54s.95-.2 1.3-.53l7.76-7.7c3-2.98 4.3-3.4 5-3.44 1.35-.1 2.5-.6 3.36-1.44 3.9-3.86 9.06-9.42 9.05-13.2zM10.97 13.86c.84.82 1.98 1.3 3.3 1.4.72.05 1.6.5 2.74 1.4.13.08.26.12.4.12.15 0 .3-.06.43-.18l1.95-1.93c.2-.22.23-.56.04-.8-1.27-1.6-1.42-2.43-1.4-2.9 0-1.08-.45-2.15-1.3-2.98l-.15-.15-5.5-4.6c-.26-.23-.64-.2-.86.06-.2.26-.17.64.08.86l5 4.2-1.7 1.66L8.94 5c-.24-.24-.62-.24-.86 0-.24.23-.24.62 0 .85l5.06 5.02-1.7 1.7L7.1 8.03c-.22-.24-.6-.25-.85-.02-.25.24-.26.62-.03.87l4.74 4.98zM34.83 28.96l-8.4-8.32c-.2-.2-.5-.24-.75-.1-.67.42-1.45.7-2.3.85-.23.03-.42.2-.48.4-.07.22-.02.46.15.62l9.18 9.1c.35.35.8.54 1.3.54s.95-.2 1.3-.53c.34-.34.53-.8.53-1.28 0-.5-.2-.94-.53-1.3z"/></g></svg><br><p>Sabores</p></span></a></li>
<li role="presentation" class=""><a href="https://banco.santander.cl/beneficios/todos?segmento=s-personas&categoria=decoracion" class="icon-deco transition ev-pers-decoracion" target="_self"><span><svg width="41" height="36" viewBox="0 0 41 36" xmlns="http://www.w3.org/2000/svg"><title>41x36</title><path d="M9.05 19.38l3.78-15.1c.2-.74.86-1.28 1.6-1.28h11.2c.77 0 1.42.52 1.6 1.27l3.8 15.1c.25 1.07-.55 2.08-1.63 2.08h-8.2V26c0 2.23 1.58 4.1 3.67 4.52.56.1.95.62.95 1.2 0 .65-.54 1.2-1.2 1.2h-9.07c-.67 0-1.2-.55-1.2-1.2 0-.58.38-1.07.94-1.2 2.08-.43 3.63-2.3 3.63-4.5v-4.57h-8.2c-1.14 0-1.94-1-1.68-2.07" fill="" fill-rule="evenodd"/></svg><br><p>Decoración</p></span></a></li>
<li role="presentation" class=""><a href="https://banco.santander.cl/beneficios/todos?segmento=s-personas&categoria=salud-y-belleza" class="icon-salud transition ev-pers-salud-y-belleza" target="_self"><span><svg width="41" height="36" viewBox="0 0 41 36" xmlns="http://www.w3.org/2000/svg"><title>41x36</title><g fill="" fill-rule="evenodd"><path d="M18.3 10.9V7.8h-6.97v3.1c-1.92.33-3.35 1.07-4.22 2.22-.9 1.17-1.2 2.72-.95 4.62 1.7 12.44 3 13.92 3.4 14.4 1.07 1.23 3.33 1.42 4.62 1.42h1.3c1.28 0 3.55-.2 4.6-1.42.43-.48 1.7-1.96 3.4-14.4.27-1.9-.05-3.45-.94-4.62-.88-1.15-2.3-1.9-4.22-2.22zm3.48 6.6c-1.48 10.76-2.6 13.03-3 13.5-.43.5-1.73.82-3.32.82h-1.3c-1.57 0-2.87-.32-3.3-.82-.4-.47-1.52-2.74-3-13.5-.2-1.42 0-2.54.6-3.32.6-.78 1.67-1.3 3.16-1.57.83-.14 1.43-.86 1.43-1.7V9.55h3.53v1.35c0 .84.6 1.56 1.43 1.7 1.5.27 2.56.8 3.17 1.58.6.78.8 1.9.6 3.32z"/><path d="M15.15 14.04h-.66c-1.74 0-5.5.24-4.97 4.7.6 4.9 1.35 9.56 2.15 10.6.6.8 2.04.9 2.75.9h.77c.72 0 2.16-.1 2.75-.9.8-1.04 1.57-5.7 2.16-10.6.55-4.46-3.22-4.7-4.95-4.7zM33.1 16.3h-1.2V8.44c.73-3.93.8-3.95.8-4.46 0-.54-.42-.97-.95-.97-.23 0-.44.1-.6.23-.17-.14-.4-.23-.62-.23-.23 0-.44.1-.6.23-.18-.14-.4-.23-.62-.23-.52 0-.95.43-.95.97 0 .5.08.53.8 4.45v7.9h-1.18c-.63 0-1.14.5-1.14 1.14V32.4c0 .65.5 1.16 1.14 1.16h5.12c.62 0 1.13-.5 1.13-1.15V17.47c0-.63-.5-1.15-1.14-1.15z"/></g></svg><br><p>Salud y Belleza</p></span></a></li>
<li role="presentation" class=""><a href="https://banco.santander.cl/beneficios/todos?segmento=s-personas&categoria=panoramas" class="icon-panoramas transition ev-pers-panoramas" target="_self"><span><svg width="41" height="36" viewBox="0 0 41 36" xmlns="http://www.w3.org/2000/svg"><title>41x36</title><path d="M38.2 14.54L36.53 12l-.24.17c-1.42.87-3.32.44-4.23-.96-.9-1.4-.5-3.23.93-4.1l.26-.14-1.65-2.53c-.47-.73-1.46-.95-2.2-.5l-6.96 4.25c.57.13 1 .62 1 1.23 0 .7-.6 1.26-1.3 1.25-.7 0-1.24-.58-1.24-1.28 0-.1.04-.2.07-.3l-17.5 10.7c-.76.47-.98 1.43-.5 2.16l1.66 2.55c.1-.1.23-.18.35-.26 1.43-.87 3.32-.44 4.23.96.92 1.4.5 3.24-.92 4.1-.12.08-.26.15-.4.2l1.67 2.55c.48.73 1.47.96 2.2.5L29.33 21.8c-.4-.2-.7-.63-.7-1.13 0-.7.6-1.26 1.3-1.25.68 0 1.23.57 1.23 1.27l6.55-4.02c.75-.45.97-1.4.5-2.14m-12.84-2.3c0 .7-.58 1.25-1.28 1.24-.7 0-1.25-.6-1.25-1.28 0-.7.6-1.26 1.3-1.25.68 0 1.24.58 1.23 1.28m1.93 2.82c0 .7-.6 1.26-1.3 1.25-.7 0-1.25-.58-1.24-1.28 0-.7.6-1.26 1.3-1.25.68 0 1.24.58 1.23 1.28m1.92 2.83c0 .7-.58 1.25-1.28 1.25-.7 0-1.26-.6-1.25-1.3 0-.68.58-1.24 1.28-1.23.7 0 1.25.58 1.24 1.28" fill="" fill-rule="evenodd"/></svg><br><p>Panoramas</p></span></a></li>
<li role="presentation" class=""><a href="https://banco.santander.cl/beneficios/todos?segmento=s-personas&categoria=otros" class="icon-otros transition ev-pers-otros" target="_self"><span><svg width="41" height="36" viewBox="0 0 41 36" xmlns="http://www.w3.org/2000/svg"><title>41x36</title><path d="M33.48 10.4c-.2-.33-.63-.44-1-.26l-1.45.78c-.36.18-.48.57-.28.9.12.2.4.3.63.3.12 0 .24-.03.36-.06l1.46-.8c.36-.16.48-.55.28-.87zm-6.9 10.7c.97-1.33 1.85-3 1.85-5.1 0-2.1-.94-4-2.42-5.4-1.47-1.4-3.5-2.25-5.6-2.25h-.2c-2.12 0-4.14.86-5.62 2.25-1.48 1.4-2.42 3.3-2.42 5.4 0 2.1.88 3.77 1.85 5.1.97 1.38 2 2.27 2.32 3.25.2.76.22 1.34.3 1.65.1.3 0 .25.4.42.34.12 1.85.23 3.27.22 1.42 0 2.93-.1 3.28-.22.38-.17.3-.13.4-.42.07-.3.07-.9.3-1.65.3-.98 1.34-1.87 2.3-3.24zm1.6-14.84c.2-.32.1-.7-.26-.9-.36-.17-.8-.06-1 .26l-.86 1.32c-.2.32-.08.7.28.88.1.04.23.08.35.08.23 0 .5-.1.62-.32l.87-1.32zM21 5.73V4.2c0-.35-.3-.64-.7-.64-.4 0-.72.3-.72.64v1.53c-.04.35.28.64.7.64.4 0 .72-.28.72-.64zm-6.47 1.2l-.87-1.3c-.2-.33-.63-.44-1-.26-.34.18-.46.57-.26.9l.87 1.3c.1.22.4.33.63.33.12 0 .24-.04.35-.07.36-.18.48-.57.28-.9zm-4.7 4.88c.2-.3.08-.7-.27-.88l-1.47-.78c-.36-.18-.8-.07-1 .25-.2.3-.07.7.28.88l1.47.78c.1.07.23.07.35.07.24 0 .5-.1.63-.32zm6.8 16.83c0 .37.32.67.7.67h5.94c.37 0 .67-.3.67-.67 0-.37-.3-.67-.67-.67h-5.95c-.37 0-.68.3-.68.67zm0 2.82c0 .38.32.67.7.67h2.15c.13 0 .22.06.25.1.14.13.44.23.8.23.32 0 .6-.1.75-.22.04-.03.13-.1.28-.1h1.7c.38 0 .68-.3.68-.68 0-.37-.3-.67-.67-.67h-5.95c-.37 0-.68.3-.68.67z" fill="" fill-rule="evenodd"/></svg><br><p>Otros</p></span></a></li>
</ul>
</div>
<div class="col-xs-12 col-sm-12 col-md-4 inputSearch" ng-class="mapClass" style="position: relative;">
<div class="searchIcon" ng-hide="search.length > 0">
<i class="fa fa-search" aria-hidden="true"></i>
</div>
<div class="searchIcon" ng-show="search.length > 0">
<a href="" ng-click="search = null" target="_self"><i class="fa fa-times" aria-hidden="true"></i></a>
</div>
<input class="ev-pers-search" type="text" ng-model="search" ng-model-options="{ debounce: 450 }" ng-change="searchChanged(search);" placeholder="Buscar restaurantes, cafés..." maxlength="35">
</div>
</div>
</div>
</div>
我想要的URL在标签
a
,li
,ul
等下据我了解,如果我运行
response.css("ul li a::attr(href)")
,我应该在这些标签下找到这些URL,但是得到的却是一个很大的URL列表,并且没有我想要的URL所以我想了解的是使用
response.css("ul li a::attr(href)")
时我实际上得到了什么,什么是零碎的理解?获得我想要的东西的正确方法是什么? 最佳答案
之所以得到所有其他链接,是因为页脚具有相同的结构(列表内的链接),因此您的初始选择器也会找到这些链接。
您需要使用更具体的选择器。请注意,所需链接位于类div
的beneficiosNav
中。这将为您提供该部分中的所有链接:
response.css(".beneficiosNav ul li a::attr(href)")
编辑:这是我用刮板外壳打开它时得到的输出:
scrapy shell "https://banco.santander.cl/beneficios?segmento=s-personas"
>>> response.css(".beneficiosNav ul li a::attr(href)").getall()
结果是:
['https://banco.santander.cl/beneficios',
'https://banco.santander.cl/beneficios/todos?segmento=s-personas&categoria=shopping',
'https://banco.santander.cl/beneficios/todos?segmento=s-personas&categoria=sabores',
'https://banco.santander.cl/beneficios/todos?segmento=s-personas&categoria=decoracion',
'https://banco.santander.cl/beneficios/todos?segmento=s-personas&categoria=salud-y-belleza',
'https://banco.santander.cl/beneficios/todos?segmento=s-personas&categoria=panoramas',
'https://banco.santander.cl/beneficios/todos?segmento=s-personas&categoria=otros']
关于html - 使用scrapy跟随HTML中的标签,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/54911861/