下面是代码
我正在尝试抓取数据并尝试 push Elasticsearch
import re
import time
import requests
from bs4 import BeautifulSoup
from elasticsearch import Elasticsearch
es_client = Elasticsearch(['http://localhost:9200'])
#drop_index = es_client.indices.create(index='blog-sysadmins', ignore=400)
create_index = es_client.indices.delete(index='blog-sysadmins', ignore=[400, 404])
def urlparser(title, url):
# scrape title
p = {}
post = title
page = requests.get(post).content
soup = BeautifulSoup(page, 'lxml')
title_name = soup.title.string
# scrape tags
tag_names = []
desc = soup.findAll(attrs={"property":"article:tag"})
for x in range(len(desc)):
tag_names.append(desc[x-1]['content'].encode('utf-8'))
print (tag_names)
# payload for elasticsearch
doc = {
'date': time.strftime("%Y-%m-%d"),
'title': title_name,
'tags': tag_names,
'url': url
}
# ingest payload into elasticsearch
res = es_client.index(index="blog-sysadmins", doc_type="docs", body=doc)
time.sleep(0.5)
sitemap_feed = 'https://sysadmins.co.za/sitemap-posts.xml'
page = requests.get(sitemap_feed)
sitemap_index = BeautifulSoup(page.content, 'html.parser')
urlss = [element.text for element in sitemap_index.findAll('loc')]
urls = urlss[0:2]
print ('urls',urls)
for x in urls:
urlparser(x, x)
我的错误:SerializationError: ({'date': '2020-07-04', 'title': 'Persistent Storage with OpenEBS on Kubernetes', 'tags': [b'Cassandra', b'Kubernetes', b'Civo', b'Storage'], 'url': 'http://sysadmins.co.za/persistent-storage-with-openebs-on-kubernetes/'}, TypeError("Unable to serialize b'Cassandra' (type: <class 'bytes'>)",))
最佳答案
当您尝试指示不是原始数据类型javascript(开发json的语言)的数据时,就会出现json serialization error
。这是json错误,而不是 flex 错误。 json格式的唯一规则是它仅在内部接受这些数据类型-有关更多说明,请阅读here。在您的情况下,标记字段具有在错误堆栈中编写的bytes
数据类型:
TypeError("Unable to serialize b'Cassandra' (type: <class 'bytes'>)
要解决您的问题,您只需将标签内容转换为字符串即可。因此,只需更改此行:tag_names.append(desc[x-1]['content'].encode('utf-8'))
至:tag_names.append(str(desc[x-1]['content']))
关于python - 报废数据并推送到 Elasticsearch 时出现SerializationError,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/62729052/