我想使用脚本先将mx格式的Enron电子邮件数据集转换为JSON文档。之后,脚本应使用stream2es实用程序自动将此JSON导入elasticsearch。在这里我遇到了问题。当我启动脚本时,除了stream2es
实用程序之外,其他所有程序都运行良好。实际上,出现stream2es: command not found
。
我有一个包含脚本的文件夹,Enron电子邮件文件夹和stream2es
。我将权限授予streams2es
,所以我认为我拥有使脚本正常运行的一切。
我将在此处发布脚本:
#!/bin/sh
#
# Loading enron data into elasticsearch
#
# Prerequisites:
# make sure that stream2es utility is present in the path
# install beautifulsoup4 and lxml:
# sudo easy_install beautifulsoup4
# sudo easy_install lxml
#
# The mailboxes__jsonify_mbox.py and mailboxes__convert_enron_inbox_to_mbox.py are modified
# versions of https://github.com/ptwobrussell/Mining-the-Social-Web/tree/master/python_code
#
#if [ ! -d enron_mail_20110402 ]; then
# echo "Downloading enron file"
# curl -O -L http://www.cs.cmu.edu/~enron/enron_mail_20110402.tgz
# tar -xzf enron_mail_20110402.tgz
#fi
if [ ! -f enron.mbox.json ]; then
echo "Converting enron emails to mbox format"
python mailboxes__convert_enron_inbox_to_mbox.py allen-p > enron.mbox # allen-p is one of the folders within Enron dataset
echo "Converting enron emails to json format"
python mailboxes__jsonify_mbox.py enron.mbox > enron.mbox.json
rm enron.mbox
fi
echo "Indexing enron emails"
es_host="http://localhost:9200"
curl -XDELETE "$es_host/enron"
curl -XPUT "$es_host/enron" -d '{
"settings": {
"index.number_of_replicas": 0,
"index.number_of_shards": 5,
"index.refresh_interval": -1
},
"mappings": {
"email": {
"properties": {
"Bcc": {
"type": "string",
"index": "not_analyzed"
},
"Cc": {
"type": "string",
"index": "not_analyzed"
},
"Content-Transfer-Encoding": {
"type": "string",
"index": "not_analyzed"
},
"Content-Type": {
"type": "string",
"index": "not_analyzed"
},
"Date": {
"type" : "date",
"format" : "EEE, dd MMM YYYY HH:mm:ss Z"
},
"From": {
"type": "string",
"index": "not_analyzed"
},
"Message-ID": {
"type": "string",
"index": "not_analyzed"
},
"Mime-Version": {
"type": "string",
"index": "not_analyzed"
},
"Subject": {
"type": "string"
},
"To": {
"type": "string",
"index": "not_analyzed"
},
"X-FileName": {
"type": "string",
"index": "not_analyzed"
},
"X-Folder": {
"type": "string",
"index": "not_analyzed"
},
"X-From": {
"type": "string",
"index": "not_analyzed"
},
"X-Origin": {
"type": "string",
"index": "not_analyzed"
},
"X-To": {
"type": "string",
"index": "not_analyzed"
},
"X-bcc": {
"type": "string",
"index": "not_analyzed"
},
"X-cc": {
"type": "string",
"index": "not_analyzed"
},
"bytes": {
"type": "long"
},
"offset": {
"type": "long"
},
"parts": {
"dynamic": "true",
"properties": {
"content": {
"type": "string"
},
"contentType": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
}'
stream2es stdin --target $es_host/enron/email < enron.mbox.json
谁能帮助我解决
stream2es command not found
问题?感谢你们。 最佳答案
command not found
表示 shell 找不到stream2es
命令。您有两种选择:
./stream2es
(即,调用位于同一文件夹中的stream2es
脚本)或stream2es
移动到$PATH
上的文件夹中关于json - stream2es:找不到命令,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/43922494/