我想从此Dutch webshop帐户(类似于eBay / Amazon)上获取所有产品,然后使用WooCommerce将它们添加到this WordPress webshop中。我大约在2至3周前开始进行Web开发,并且了解HTML,CSS,JavaScript,Nodejs和Express的基础知识。我想我大致知道该怎么做,即:


每页遍历所有产品。
抓住标题,描述,类别,价格和照片。
将该信息存储在带有产品对象的数组中。
获取对WooCommerce API的访问。
遍历所有产品并将其添加到WooCommerce。


我的问题是:


这可能吗?
我可以使用我可以使用的语言吗?
您将使用什么方法? (例如,您将如何抓取HTML?是否有比我描述的步骤更简单的方法?您是使用代码还是使用某些自动化软件来完成此操作等)


对我来说,这是一个很大的项目,因此欢迎任何帮助(如何开始)!

最佳答案

您对步骤是正确的,是的,这是可能的。您已经知道,可以使用node.js刮取数据,在数据刮取方面,我个人比较喜欢python,但是您可以在node.js中进行处理。 Node.js具有HTML解析器等。我建议您一些事情:


使用解析器解析HTML数据,以更好地访问元素以获取数据。
使用某种数据结构来正确存储数据,例如:JSON,XML,CSV ...
如果获取数据是一个漫长的过程,请首先获取数据,因为如果解析系统中的任何部分不合适,解析时可能会丢失所有数据,请稍后再解析数据。


我将带我编写的用于从您放置的网站获取数据的代码,它使用的是python语言,但我在上面添加了注释,以便您可以更好地了解如何获取数据并用其他语言编写。您也可以使用split从HTML数据中剪切这些部分,甚至不需要使用解析器。

例:

import requests, json
from bs4 import BeautifulSoup
from pprint import pprint

endpoint = "http://johndevisser.marktplaza.nl/?p=1"

# Send a get request to page to get the html.
data = requests.get(endpoint).content

# Parse the html via BeautifulSoup
page = BeautifulSoup(data)

# Find 'div' elements whose 'itemscope' attributes are 'itemscope'
products = page.find_all("div", {"itemscope": "itemscope"})[1:]

# Create an empty array to store prepared data.
finalProductList = []

# Iterate over the products.
for i in products:
    # Create a dictionary object to store data properly.
    productData = {}
    # Get the title attribute from 'a' element on the current product.
    productData["title"] = i.find("a").get("title")
    # Get the href attribute from 'a' element on the current product because the real source can be useful in the future.
    productData["origin"] = i.find("a").get("href")
    # Get the image url from 'img' elements to download images.
    productData["imageURL"] = i.find("img").get("src")
    # This may look you complicated but it just finds 'span' elements value of 'class' attribute is 'subtext' and get the
    # inner text, split into two from ' '(space) to this ['€', '15,00'] and get the right part which is the second part
    # in the array which is the price and replace comma with dot to parse in float value.
    productData["price"] = float(i.find("span", {"class": "subtext"}).get_text().split(u"\xa0")[1].replace(",", "."))
    # Append the data to final data array.
    finalProductList.append(productData)

# Get json representation of dictionary.
print(json.dumps(finalProductList))


输出:

[
  {
    "title": "Sieb Posthuma  -  Mannetje Jas  (Hardcover/Gebonden) Kinderjury",
    "origin": "http://www.marktplaza.nl/boeken/kinderboeken/sieb-posthuma-mannetje-jas-hardcover-gebonden-kinderjury-92409632.html",
    "imageURL": "http://www.marktplaza.nl/M92409632/1/sieb-posthuma-mannetje-jas-hardcover-gebonden-kinderjury-92409632.jpg",
    "price": 12.5
  },
  {
    "title": "Estefhan Meijer  -  United Wraps    Wraps Uit De Hele Wereld",
    "origin": "http://www.marktplaza.nl/boeken/kookboeken/estefhan-meijer-united-wraps-wraps-uit-de-hele-wereld-92390218.html",
    "imageURL": "http://www.marktplaza.nl/M92390218/1/estefhan-meijer-united-wraps-wraps-uit-de-hele-wereld-92390218.jpg",
    "price": 15
  },
  {
    "title": "Daphne Deckers  -  De Verschrikkelijke Ijstaart  (Hardcover/Gebonden)",
    "origin": "http://www.marktplaza.nl/boeken/kookboeken/daphne-deckers-de-verschrikkelijke-ijstaart-hardcover-gebonden-92390182.html",
    "imageURL": "http://www.marktplaza.nl/M92390182/1/daphne-deckers-de-verschrikkelijke-ijstaart-hardcover-gebonden-92390182.jpg",
    "price": 10
  },
  {
    "title": "Adelene Fletcher  -   Bomen Aquarelleren Van A Tot Z",
    "origin": "http://www.marktplaza.nl/boeken/hobby-techniek/adelene-fletcher-bomen-aquarelleren-van-a-tot-z-92390124.html",
    "imageURL": "http://www.marktplaza.nl/M92390124/1/adelene-fletcher-bomen-aquarelleren-van-a-tot-z-92390124.jpg",
    "price": 12.5
  },
  {
    "title": "Razorlight ‎– America  (2 Track CDSingle)",
    "origin": "http://www.marktplaza.nl/cd-vinyl/singles/razorlight-america-2-track-cdsingle-92390118.html",
    "imageURL": "http://www.marktplaza.nl/M92390118/1/razorlight-america-2-track-cdsingle-92390118.jpg",
    "price": 5
  },
  {
    "title": "Twarres ‎– Children  (2 Track CDSingle)",
    "origin": "http://www.marktplaza.nl/cd-vinyl/singles/twarres-children-2-track-cdsingle-92390078.html",
    "imageURL": "http://www.marktplaza.nl/M92390078/1/twarres-children-2-track-cdsingle-92390078.jpg",
    "price": 5
  },
  {
    "title": "Tower Of Power ‎– The Very Best Of Tower Of Power - The Warner Years  (CD)",
    "origin": "http://www.marktplaza.nl/cd-vinyl/pop/tower-of-power-the-very-best-of-tower-of-power-the-warner-years-cd-92389836.html",
    "imageURL": "http://www.marktplaza.nl/M92389836/1/tower-of-power-the-very-best-of-tower-of-power-the-warner-years-cd-92389836.jpg",
    "price": 10
  },
  {
    "title": "Red Hot Chili Peppers ‎– Dani California  (2 Track CDSingle)",
    "origin": "http://www.marktplaza.nl/cd-vinyl/singles/red-hot-chili-peppers-dani-california-2-track-cdsingle-92389742.html",
    "imageURL": "http://www.marktplaza.nl/M92389742/1/red-hot-chili-peppers-dani-california-2-track-cdsingle-92389742.jpg",
    "price": 5
  },
  {
    "title": "Seth Godin  -  Icarus Deception  (Engelstalig)",
    "origin": "http://www.marktplaza.nl/boeken/management-en-economie/seth-godin-icarus-deception-engelstalig-92389542.html",
    "imageURL": "http://www.marktplaza.nl/M92389542/1/seth-godin-icarus-deception-engelstalig-92389542.jpg",
    "price": 12.5
  },
  {
    "title": "Rob Gifford  -  De Chinese Weg",
    "origin": "http://www.marktplaza.nl/boeken/reizen/rob-gifford-de-chinese-weg-92389500.html",
    "imageURL": "http://www.marktplaza.nl/M92389500/1/rob-gifford-de-chinese-weg-92389500.jpg",
    "price": 12.5
  },
  {
    "title": "Bart Leeuwenburgh  -   Darwin In Domineesland",
    "origin": "http://www.marktplaza.nl/boeken/informatief/bart-leeuwenburgh-darwin-in-domineesland-92386128.html",
    "imageURL": "http://www.marktplaza.nl/M92386128/1/bart-leeuwenburgh-darwin-in-domineesland-92386128.jpg",
    "price": 12.5
  },
  {
    "title": "Per Olov Enquist  -  Het Record  (Hardcover/Gebonden)",
    "origin": "http://www.marktplaza.nl/boeken/romans/per-olov-enquist-het-record-hardcover-gebonden-92386080.html",
    "imageURL": "http://www.marktplaza.nl/M92386080/1/per-olov-enquist-het-record-hardcover-gebonden-92386080.jpg",
    "price": 10
  },
  {
    "title": "Fred Vargas - Uit De Dood Herrezen (Hardcover/Gebonden) blauw/groene achtergrond",
    "origin": "http://www.marktplaza.nl/boeken/romans/fred-vargas-uit-de-dood-herrezen-hardcover-gebonden-blauw-groene-achtergrond-92385368.html",
    "imageURL": "http://www.marktplaza.nl/M92385368/1/fred-vargas-uit-de-dood-herrezen-hardcover-gebonden-blauw-groene-achtergrond-92385368.jpg",
    "price": 12.5
  },
  {
    "title": "Fred Vargas  -   De Omgekeerde Man   (Hardcover/Gebonden)",
    "origin": "http://www.marktplaza.nl/boeken/romans/fred-vargas-de-omgekeerde-man-hardcover-gebonden-92385304.html",
    "imageURL": "http://www.marktplaza.nl/M92385304/1/fred-vargas-de-omgekeerde-man-hardcover-gebonden-92385304.jpg",
    "price": 15
  },
  {
    "title": "David Sandes  -  Sergei Bubka's Wondermethode  (Hardcover/Gebonden)",
    "origin": "http://www.marktplaza.nl/boeken/romans/david-sandes-sergei-bubkas-wondermethode-hardcover-gebonden-92385090.html",
    "imageURL": "http://www.marktplaza.nl/M92385090/1/david-sandes-sergei-bubkas-wondermethode-hardcover-gebonden-92385090.jpg",
    "price": 10
  },
  {
    "title": "Sjoerd Kuyper  -  Sjaantje Doet Alsof  (Hardcover/Gebonden)",
    "origin": "http://www.marktplaza.nl/boeken/kinderboeken/sjoerd-kuyper-sjaantje-doet-alsof-hardcover-gebonden-92384948.html",
    "imageURL": "http://www.marktplaza.nl/M92384948/1/sjoerd-kuyper-sjaantje-doet-alsof-hardcover-gebonden-92384948.jpg",
    "price": 10
  },
  {
    "title": "Het Piratenschip     Klap Open En Bekijk  (Hardcover/Gebonden)",
    "origin": "http://www.marktplaza.nl/boeken/kinderboeken/het-piratenschip-klap-open-en-bekijk-hardcover-gebonden-92371996.html",
    "imageURL": "http://www.marktplaza.nl/M92371996/1/het-piratenschip-klap-open-en-bekijk-hardcover-gebonden-92371996.jpg",
    "price": 12.5
  },
  {
    "title": "John Topsell  -  Draken Trainen En Verzorgen (Hardcover/Gebonden)",
    "origin": "http://www.marktplaza.nl/boeken/kinderboeken/john-topsell-draken-trainen-en-verzorgen-hardcover-gebonden-92371928.html",
    "imageURL": "http://www.marktplaza.nl/M92371928/1/john-topsell-draken-trainen-en-verzorgen-hardcover-gebonden-92371928.jpg",
    "price": 15
  }
]

关于javascript - 您可以自动将在线商店中的产品添加到WooCommerce吗?,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/59102431/

10-11 22:30
查看更多