我对python和scrapy很陌生。这是我从亚马逊内部产品中收集数据集时遇到的问题的示例代码。

from scrapy.selector import HtmlXPathSelector
from amazoncrawler.items import AmazoncrawlerItem
import scrapy

class startcrawler(scrapy.Spider):
     name = "amazone"
     allowed_domains = ["www.amazon.co.uk"]

  start_urls = [
    "http://www.amazon.co.uk/product-reviews/B005KP74BI",
  ]

  def parse(self, response):
    hxs = HtmlXPathSelector(response)
    item = AmazoncrawlerItem()
    reviewText = hxs.xpath('//table[@id="productReviews"]/*/*/*/*/div/div' and '//div[@class="reviewText"]/text()').extract()
    ratings = hxs.xpath('//table[@id="productReviews"]/*/*/*/*/div/div' and '//span[contains(@class, "s_star")]/span/text()').extract()

    for text in reviewText:
      item['comment'] = text
      yield item
    for rating in ratings:
      item['rating'] = rating
      yield item


响应为csv文件:

comment,rating
And they do last quite some time too.,
"Not a lot to say about a pair of 9v batteries, but I've not had any problems with Duracell for this purpose.",
Whilst there are quite a few rechargeable 9v ones around you are better off with these as the rechargeable types are not suggested for use in devices such as this.,
Nearly didnt buy these based on two bad reviews - glad I ignored them. Its the Genuine thing with 4 batteries in the pack sold by amazon themselves.,
"They say you only get what you pay for and I am a firm believer of that and certainly in this case it is without doubt, the price of these batteries however in the high street is quite extortionate, hence this is very good value from Amazon. These batteries outlast normal batteries by at least  5-7 times as I have proved to myself several times as I use batteries for my business to power test meters and I can confirm that if you put a run of the mill relatively cheap battery in some of my meters you will be lucky to get 3 days to a week out of them, that is depending on the use of the meter.",
"I still use cheap batteries but only for the likes of wall clocks and the like that do not have a high power drain and they last a reasonable length of time, sometimes up to 2 years. A classic example of how long a cheap battery last is for example my Gillette Fusion ProGlide powered razor, a cheap battery last about a week, but a Duracell lasts at least 5-6 weeks, as I say you only get what you pay for, highly rated batteries and at this price you cannot loose.",
great value for money and its why my wee town is loosing money as their selling one for the same price.,
Great Value for Duracell  batteries. I need new ones for our 4 smoke alarms in our house. We normal go for cheap ones from pound shops but they don't last more then a week. When I came across these on Amazon at this price I brought them straight away. They came as describe no problems with them all in our smoke alarms and all tested and work that's what I brought them for to do and they do the job. Ignore the negative comments previous to stop you buying. There is no problems with these batteries,
"Put these into my smoke alarms, worked fine for 18 months before the alarms started the usual chirping at 3am to let you know the battery was dying. They were replaced, but the old ones still had enough power to run one of our baby's toys more a few more months.",
Good price and good shelf life too.,
"Bought 2 packs of these batteries in March 2014 to use in PIR sensors for a wireless alarm. Batteries in the sensors generally needed to be changed annually. These batteries lasted barely 5 months, very disappointing.",
"Arrived smartly Thanks and as stated fresh cells 2016 expiry, good for my smoke and CO2 alarms, postman had to ring bell as square box shape did not fit through letter box.",
"I purchased these because I needed one for a smoke alarm - but I knew it wouldn't be long before I needed others because all my alarms were purchased at the same time. Sure enough 5 weeks later I had to change another one. When the alarm instantly gave the ""low battery"" beeps I took it out and tested it - it was well down in the ""weak"" section. Was this a factory fault? or do employees swap their flat batteries for a new one in the box? There is no seal on the box to alert anyone to such a fiddle.",
"They're batteries. They fit well the bastard smoke detectors when they start bleeping bleeping away. They still won't shut up with the new batteries, but that's the bastard smoke detector's fault, and not the battery, which works fine.",
"Whoever thought of compulsory smoke detectors, and of their general ""safety"" features, would also benefit from having a batchload of these batteries inserted in him.",
"Whoever thought of compulsory smoke detectors, and of their general ""safety"" features, would also benefit from having a batchload of these batteries inserted in him.",4.7 out of 5 stars
"Whoever thought of compulsory smoke detectors, and of their general ""safety"" features, would also benefit from having a batchload of these batteries inserted in him.",4.0 out of 5 stars
"Whoever thought of compulsory smoke detectors, and of their general ""safety"" features, would also benefit from having a batchload of these batteries inserted in him.",1.0 out of 5 stars
"Whoever thought of compulsory smoke detectors, and of their general ""safety"" features, would also benefit from having a batchload of these batteries inserted in him.",4.0 out of 5 stars
"Whoever thought of compulsory smoke detectors, and of their general ""safety"" features, would also benefit from having a batchload of these batteries inserted in him.",5.0 out of 5 stars
"Whoever thought of compulsory smoke detectors, and of their general ""safety"" features, would also benefit from having a batchload of these batteries inserted in him.",5.0 out of 5 stars
"Whoever thought of compulsory smoke detectors, and of their general ""safety"" features, would also benefit from having a batchload of these batteries inserted in him.",5.0 out of 5 stars
"Whoever thought of compulsory smoke detectors, and of their general ""safety"" features, would also benefit from having a batchload of these batteries inserted in him.",5.0 out of 5 stars
"Whoever thought of compulsory smoke detectors, and of their general ""safety"" features, would also benefit from having a batchload of these batteries inserted in him.",5.0 out of 5 stars
"Whoever thought of compulsory smoke detectors, and of their general ""safety"" features, would also benefit from having a batchload of these batteries inserted in him.",1.0 out of 5 stars
"Whoever thought of compulsory smoke detectors, and of their general ""safety"" features, would also benefit from having a batchload of these batteries inserted in him.",5.0 out of 5 stars
"Whoever thought of compulsory smoke detectors, and of their general ""safety"" features, would also benefit from having a batchload of these batteries inserted in him.",1.0 out of 5 stars
"Whoever thought of compulsory smoke detectors, and of their general ""safety"" features, would also benefit from having a batchload of these batteries inserted in him.",5.0 out of 5 stars


我的第一个问题是,抓取工具从表ID“ productReview”中提取3个评论评分作为前3个评论评分,但这在我抓取其他产品时是一致的。我可以不理会它,但是很高兴知道如何解决这个问题。

第二,我想将整个段落合并为一个,并用定界符分隔相应的等级。

comment,rating
"And they do last quite some time too.
Not a lot to say about a pair of 9v batteries, but I've not had any problems with Duracell for this purpose.
Whilst there are quite a few rechargeable 9v ones around you are better off with these as the rechargeable types are not suggested for use in devices such as this.",4.0 out of 5 stars

最佳答案

遍历表中的评论,在循环中实例化一个项目,然后yield

def parse(self, response):
    reviews = response.xpath('//table[@id="productReviews"]//td/div')
    for review in reviews:
        item = AmazoncrawlerItem()
        item['comment'] = ' '.join(review.xpath('.//div[@class="reviewText"]/text()').extract())
        item['rating'] = review.xpath('.//span[contains(@class, "s_star")]/span/text()').extract()[0]
        yield item


输出:

{
    'comment': u"And they do last quite some time too. Not a lot to say about a pair of 9v batteries, but I've not had any problems with Duracell for this purpose. Whilst there are quite a few rechargeable 9v ones around you are better off with these as the rechargeable types are not suggested for use in devices such as this.",
    'rating': u'4.0 out of 5 stars'
}
...

关于python - 如何将网站的多个部分中的多个属性映射为草稿项目?,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/28924377/

10-09 01:53