我想使用脚本先将mx格式的Enron电子邮件数据集转换为JSON文档。之后,脚本应使用stream2es实用程序自动将此JSON导入elasticsearch。在这里我遇到了问题。当我启动脚本时,除了stream2es实用程序之外,其他所有程序都运行良好。实际上,出现stream2es: command not found
我有一个包含脚本的文件夹,Enron电子邮件文件夹和stream2es。我将权限授予streams2es,所以我认为我拥有使脚本正常运行的一切。
我将在此处发布脚本:

#!/bin/sh
#
# Loading enron data into elasticsearch
#
# Prerequisites:
# make sure that stream2es utility is present in the path
# install beautifulsoup4 and lxml:
#    sudo easy_install beautifulsoup4
#    sudo easy_install lxml
#
# The mailboxes__jsonify_mbox.py and mailboxes__convert_enron_inbox_to_mbox.py are modified
# versions of https://github.com/ptwobrussell/Mining-the-Social-Web/tree/master/python_code
#
#if [ ! -d enron_mail_20110402 ]; then
#    echo "Downloading enron file"
#   curl -O -L http://www.cs.cmu.edu/~enron/enron_mail_20110402.tgz
#  tar -xzf enron_mail_20110402.tgz
#fi
if [ ! -f enron.mbox.json ]; then
    echo "Converting enron emails to mbox format"
    python mailboxes__convert_enron_inbox_to_mbox.py allen-p > enron.mbox       # allen-p is one of the folders within Enron dataset
    echo "Converting enron emails to json format"
    python mailboxes__jsonify_mbox.py enron.mbox > enron.mbox.json
    rm enron.mbox
fi
echo "Indexing enron emails"
es_host="http://localhost:9200"
curl -XDELETE "$es_host/enron"
curl -XPUT "$es_host/enron" -d '{
    "settings": {
        "index.number_of_replicas": 0,
        "index.number_of_shards": 5,
        "index.refresh_interval": -1
    },
    "mappings": {
        "email": {
            "properties": {
                "Bcc": {
                    "type": "string",
                    "index": "not_analyzed"
                },
                "Cc": {
                    "type": "string",
                    "index": "not_analyzed"
                },
                "Content-Transfer-Encoding": {
                    "type": "string",
                    "index": "not_analyzed"
                },
                "Content-Type": {
                    "type": "string",
                    "index": "not_analyzed"
                },
                "Date": {
                    "type" : "date",
                    "format" : "EEE, dd MMM YYYY HH:mm:ss Z"
                },
                "From": {
                    "type": "string",
                    "index": "not_analyzed"
                },
                "Message-ID": {
                    "type": "string",
                    "index": "not_analyzed"
                },
                "Mime-Version": {
                    "type": "string",
                    "index": "not_analyzed"
                },
                "Subject": {
                    "type": "string"
                },
                "To": {
                    "type": "string",
                    "index": "not_analyzed"
                },
                "X-FileName": {
                    "type": "string",
                    "index": "not_analyzed"
                },
                "X-Folder": {
                    "type": "string",
                    "index": "not_analyzed"
                },
                "X-From": {
                    "type": "string",
                    "index": "not_analyzed"
                },
                "X-Origin": {
                    "type": "string",
                    "index": "not_analyzed"
                },
                "X-To": {
                    "type": "string",
                    "index": "not_analyzed"
                },
                "X-bcc": {
                    "type": "string",
                    "index": "not_analyzed"
                },
                "X-cc": {
                    "type": "string",
                    "index": "not_analyzed"
                },
                "bytes": {
                    "type": "long"
                },
                "offset": {
                    "type": "long"
                },
                "parts": {
                    "dynamic": "true",
                    "properties": {
                        "content": {
                            "type": "string"
                        },
                        "contentType": {
                            "type": "string",
                            "index": "not_analyzed"
                        }
                    }
                }
            }
        }
    }
}'

stream2es stdin --target $es_host/enron/email < enron.mbox.json

谁能帮助我解决stream2es command not found问题?感谢你们。

最佳答案

command not found表示 shell 找不到stream2es命令。您有两种选择:

  • 您的脚本需要调用./stream2es(即,调用位于同一文件夹中的stream2es脚本)或
  • 您需要将stream2es移动到$PATH上的文件夹中
  • 关于json - stream2es:找不到命令,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/43922494/

    10-11 00:24