1.git地址
https://github.com/onesuper/pandasticsearch
2.建立连接
from pandasticsearch import DataFrame
username = b'xxxx'
password = b'xxxx'
df = DataFrame.from_es(url='IP:9200',
index='x'x'x'x',
username=username,
password=password,
doc_type='x'x'x'x',
compat=5
)
[注] 实测python3 会遇到编码问题
TypeError: a bytes-like object is required, not 'str'
3.修改源码
将~/anaconda3/lib/python3.7/site-packages/pandasticsearch/client.py中
59 if username is not None and password is not None:
60 base64creds = base64.b64encode('%s:%s' % (username,password))
61 req.add_header("Authorization", "Basic %s" % base64creds)
修改为:
if username is not None and password is not None:
base64creds = bytes.decode(base64.b64encode(b'%s:%s' % (username,password)))
req.add_header("Authorization", "Basic %s" % base64creds)
4.批量查询数据
limit()函数查询前20万条数据,to_pandas()转成pandas的dataframe
pd_df = df.limit(200000).to_pandas()