我想从此Dutch webshop帐户(类似于eBay / Amazon)上获取所有产品,然后使用WooCommerce将它们添加到this WordPress webshop中。我大约在2至3周前开始进行Web开发,并且了解HTML,CSS,JavaScript,Nodejs和Express的基础知识。我想我大致知道该怎么做,即:
每页遍历所有产品。
抓住标题,描述,类别,价格和照片。
将该信息存储在带有产品对象的数组中。
获取对WooCommerce API的访问。
遍历所有产品并将其添加到WooCommerce。
我的问题是:
这可能吗?
我可以使用我可以使用的语言吗?
您将使用什么方法? (例如,您将如何抓取HTML?是否有比我描述的步骤更简单的方法?您是使用代码还是使用某些自动化软件来完成此操作等)
对我来说,这是一个很大的项目,因此欢迎任何帮助(如何开始)!
最佳答案
您对步骤是正确的,是的,这是可能的。您已经知道,可以使用node.js刮取数据,在数据刮取方面,我个人比较喜欢python,但是您可以在node.js中进行处理。 Node.js具有HTML解析器等。我建议您一些事情:
使用解析器解析HTML数据,以更好地访问元素以获取数据。
使用某种数据结构来正确存储数据,例如:JSON,XML,CSV ...
如果获取数据是一个漫长的过程,请首先获取数据,因为如果解析系统中的任何部分不合适,解析时可能会丢失所有数据,请稍后再解析数据。
我将带我编写的用于从您放置的网站获取数据的代码,它使用的是python语言,但我在上面添加了注释,以便您可以更好地了解如何获取数据并用其他语言编写。您也可以使用split
从HTML数据中剪切这些部分,甚至不需要使用解析器。
例:
import requests, json
from bs4 import BeautifulSoup
from pprint import pprint
endpoint = "http://johndevisser.marktplaza.nl/?p=1"
# Send a get request to page to get the html.
data = requests.get(endpoint).content
# Parse the html via BeautifulSoup
page = BeautifulSoup(data)
# Find 'div' elements whose 'itemscope' attributes are 'itemscope'
products = page.find_all("div", {"itemscope": "itemscope"})[1:]
# Create an empty array to store prepared data.
finalProductList = []
# Iterate over the products.
for i in products:
# Create a dictionary object to store data properly.
productData = {}
# Get the title attribute from 'a' element on the current product.
productData["title"] = i.find("a").get("title")
# Get the href attribute from 'a' element on the current product because the real source can be useful in the future.
productData["origin"] = i.find("a").get("href")
# Get the image url from 'img' elements to download images.
productData["imageURL"] = i.find("img").get("src")
# This may look you complicated but it just finds 'span' elements value of 'class' attribute is 'subtext' and get the
# inner text, split into two from ' '(space) to this ['€', '15,00'] and get the right part which is the second part
# in the array which is the price and replace comma with dot to parse in float value.
productData["price"] = float(i.find("span", {"class": "subtext"}).get_text().split(u"\xa0")[1].replace(",", "."))
# Append the data to final data array.
finalProductList.append(productData)
# Get json representation of dictionary.
print(json.dumps(finalProductList))
输出:
[
{
"title": "Sieb Posthuma - Mannetje Jas (Hardcover/Gebonden) Kinderjury",
"origin": "http://www.marktplaza.nl/boeken/kinderboeken/sieb-posthuma-mannetje-jas-hardcover-gebonden-kinderjury-92409632.html",
"imageURL": "http://www.marktplaza.nl/M92409632/1/sieb-posthuma-mannetje-jas-hardcover-gebonden-kinderjury-92409632.jpg",
"price": 12.5
},
{
"title": "Estefhan Meijer - United Wraps Wraps Uit De Hele Wereld",
"origin": "http://www.marktplaza.nl/boeken/kookboeken/estefhan-meijer-united-wraps-wraps-uit-de-hele-wereld-92390218.html",
"imageURL": "http://www.marktplaza.nl/M92390218/1/estefhan-meijer-united-wraps-wraps-uit-de-hele-wereld-92390218.jpg",
"price": 15
},
{
"title": "Daphne Deckers - De Verschrikkelijke Ijstaart (Hardcover/Gebonden)",
"origin": "http://www.marktplaza.nl/boeken/kookboeken/daphne-deckers-de-verschrikkelijke-ijstaart-hardcover-gebonden-92390182.html",
"imageURL": "http://www.marktplaza.nl/M92390182/1/daphne-deckers-de-verschrikkelijke-ijstaart-hardcover-gebonden-92390182.jpg",
"price": 10
},
{
"title": "Adelene Fletcher - Bomen Aquarelleren Van A Tot Z",
"origin": "http://www.marktplaza.nl/boeken/hobby-techniek/adelene-fletcher-bomen-aquarelleren-van-a-tot-z-92390124.html",
"imageURL": "http://www.marktplaza.nl/M92390124/1/adelene-fletcher-bomen-aquarelleren-van-a-tot-z-92390124.jpg",
"price": 12.5
},
{
"title": "Razorlight – America (2 Track CDSingle)",
"origin": "http://www.marktplaza.nl/cd-vinyl/singles/razorlight-america-2-track-cdsingle-92390118.html",
"imageURL": "http://www.marktplaza.nl/M92390118/1/razorlight-america-2-track-cdsingle-92390118.jpg",
"price": 5
},
{
"title": "Twarres – Children (2 Track CDSingle)",
"origin": "http://www.marktplaza.nl/cd-vinyl/singles/twarres-children-2-track-cdsingle-92390078.html",
"imageURL": "http://www.marktplaza.nl/M92390078/1/twarres-children-2-track-cdsingle-92390078.jpg",
"price": 5
},
{
"title": "Tower Of Power – The Very Best Of Tower Of Power - The Warner Years (CD)",
"origin": "http://www.marktplaza.nl/cd-vinyl/pop/tower-of-power-the-very-best-of-tower-of-power-the-warner-years-cd-92389836.html",
"imageURL": "http://www.marktplaza.nl/M92389836/1/tower-of-power-the-very-best-of-tower-of-power-the-warner-years-cd-92389836.jpg",
"price": 10
},
{
"title": "Red Hot Chili Peppers – Dani California (2 Track CDSingle)",
"origin": "http://www.marktplaza.nl/cd-vinyl/singles/red-hot-chili-peppers-dani-california-2-track-cdsingle-92389742.html",
"imageURL": "http://www.marktplaza.nl/M92389742/1/red-hot-chili-peppers-dani-california-2-track-cdsingle-92389742.jpg",
"price": 5
},
{
"title": "Seth Godin - Icarus Deception (Engelstalig)",
"origin": "http://www.marktplaza.nl/boeken/management-en-economie/seth-godin-icarus-deception-engelstalig-92389542.html",
"imageURL": "http://www.marktplaza.nl/M92389542/1/seth-godin-icarus-deception-engelstalig-92389542.jpg",
"price": 12.5
},
{
"title": "Rob Gifford - De Chinese Weg",
"origin": "http://www.marktplaza.nl/boeken/reizen/rob-gifford-de-chinese-weg-92389500.html",
"imageURL": "http://www.marktplaza.nl/M92389500/1/rob-gifford-de-chinese-weg-92389500.jpg",
"price": 12.5
},
{
"title": "Bart Leeuwenburgh - Darwin In Domineesland",
"origin": "http://www.marktplaza.nl/boeken/informatief/bart-leeuwenburgh-darwin-in-domineesland-92386128.html",
"imageURL": "http://www.marktplaza.nl/M92386128/1/bart-leeuwenburgh-darwin-in-domineesland-92386128.jpg",
"price": 12.5
},
{
"title": "Per Olov Enquist - Het Record (Hardcover/Gebonden)",
"origin": "http://www.marktplaza.nl/boeken/romans/per-olov-enquist-het-record-hardcover-gebonden-92386080.html",
"imageURL": "http://www.marktplaza.nl/M92386080/1/per-olov-enquist-het-record-hardcover-gebonden-92386080.jpg",
"price": 10
},
{
"title": "Fred Vargas - Uit De Dood Herrezen (Hardcover/Gebonden) blauw/groene achtergrond",
"origin": "http://www.marktplaza.nl/boeken/romans/fred-vargas-uit-de-dood-herrezen-hardcover-gebonden-blauw-groene-achtergrond-92385368.html",
"imageURL": "http://www.marktplaza.nl/M92385368/1/fred-vargas-uit-de-dood-herrezen-hardcover-gebonden-blauw-groene-achtergrond-92385368.jpg",
"price": 12.5
},
{
"title": "Fred Vargas - De Omgekeerde Man (Hardcover/Gebonden)",
"origin": "http://www.marktplaza.nl/boeken/romans/fred-vargas-de-omgekeerde-man-hardcover-gebonden-92385304.html",
"imageURL": "http://www.marktplaza.nl/M92385304/1/fred-vargas-de-omgekeerde-man-hardcover-gebonden-92385304.jpg",
"price": 15
},
{
"title": "David Sandes - Sergei Bubka's Wondermethode (Hardcover/Gebonden)",
"origin": "http://www.marktplaza.nl/boeken/romans/david-sandes-sergei-bubkas-wondermethode-hardcover-gebonden-92385090.html",
"imageURL": "http://www.marktplaza.nl/M92385090/1/david-sandes-sergei-bubkas-wondermethode-hardcover-gebonden-92385090.jpg",
"price": 10
},
{
"title": "Sjoerd Kuyper - Sjaantje Doet Alsof (Hardcover/Gebonden)",
"origin": "http://www.marktplaza.nl/boeken/kinderboeken/sjoerd-kuyper-sjaantje-doet-alsof-hardcover-gebonden-92384948.html",
"imageURL": "http://www.marktplaza.nl/M92384948/1/sjoerd-kuyper-sjaantje-doet-alsof-hardcover-gebonden-92384948.jpg",
"price": 10
},
{
"title": "Het Piratenschip Klap Open En Bekijk (Hardcover/Gebonden)",
"origin": "http://www.marktplaza.nl/boeken/kinderboeken/het-piratenschip-klap-open-en-bekijk-hardcover-gebonden-92371996.html",
"imageURL": "http://www.marktplaza.nl/M92371996/1/het-piratenschip-klap-open-en-bekijk-hardcover-gebonden-92371996.jpg",
"price": 12.5
},
{
"title": "John Topsell - Draken Trainen En Verzorgen (Hardcover/Gebonden)",
"origin": "http://www.marktplaza.nl/boeken/kinderboeken/john-topsell-draken-trainen-en-verzorgen-hardcover-gebonden-92371928.html",
"imageURL": "http://www.marktplaza.nl/M92371928/1/john-topsell-draken-trainen-en-verzorgen-hardcover-gebonden-92371928.jpg",
"price": 15
}
]
关于javascript - 您可以自动将在线商店中的产品添加到WooCommerce吗?,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/59102431/