问题描述
我是Linux和Apache Pig的新手。我按照这个教程学习猪:
这是一个基本的字数统计范例。数据文件'input.txt'和程序文件'wordcount.pig'位于Wordcount包中,并链接到该网站。
我已经有 当我下载Wordcount包时,它将我带到 我在本地模式下运行猪如下: 然后我只是复制粘贴wordcount.pig脚本的每一行grunt>提示符如下: dump D; 以下错误: 我的问题: 2。 3。 重试策略为RetryUpToMaximumCountWithFixedSleep(maxRetries = 10,sleepTime = 1 SECONDS ) 此错误表明Pig无法连接到Hadoop以运行作业。你说你已经下载Hadoop - 你有没有安装它?如果你已经安装了它,你是否根据它的文档启动了它 - 你运行了 I am new to Linux and Apache Pig. I am following this tutorial to learn pig:http://salsahpc.indiana.edu/ScienceCloud/pig_word_count_tutorial.htm This is a basic word counting example. The data file 'input.txt' and the program file 'wordcount.pig' are in the Wordcount package, linked on the site. I already have When I downloaded the Wordcount package it took me to a " I ran pig in local mode as follows: and then I just copy-pasted each line of the wordcount.pig script at the grunt> prompt like this: dump D; This generates the following errors:... My questions: 1.Should I be saving 'input.txt' and the original 'wordcount.pig' script to some special folder inside the directory pig-0.11.1? That is, create a folder called word inside pig-0.11.1 and put 'wordcount.pig' and 'input.txt' there and then type in "wordcount.pig" from the grunt> prompt ???In general, if I have data in say, 'dat.txt', and a script say, 'program.pig', where should I be saving them to run 'program.pig' from the grunt shell??? I think they should both go in pig-0.11.1,so I can do $ pig -x local wordcount.pig, but I am not sure. 2.Why am I not able to run the script line by line as I tried to?I have specified the location of the file 'input.txt' in the load statement.So why does it not just run the commands line by line and dump the contents of D to my screen??? 3.When I try to run Pig in mapreduce mode using $pig, it gives this error: retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)2013-06-03 23:57:06,956 [main] ERROR org.apache.pig.Main - ERROR 2999: Unexpected internal error. Failed to create DataStorage This error indicates that Pig is unable to connect to Hadoop to run the job. You say you have downloaded Hadoop -- have you installed it? If you have installed it, have you started it up according to its docs -- have you run the 这篇关于猪初学者的例子[意外错误]的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持! Pig 0.11.1
下载到本地机器上,以及 Hadoop
和 Java 6 $ c
tar.gz
文件。我对这种类型不熟悉,不确定如何提取它。
它包含文件'input.txt','wordcount.pig'和自述文件。我将' input.txt
'保存到我的桌面。我不确定在哪里保存wordcount.pig,并决定在shell中逐行输入命令。
pig -x local
A = load'/home/me/Desktop/input.txt';
B = foreach生成平坦(TOKENIZE((chararray)$ 0))作为单词;
C = B字组;
D = foreach C生成COUNT(B),group;
...
重试连接到服务器:localhost / 127.0.0.1:8021。已经尝试了9次(s);重试策略为RetryUpToMaximumCountWithFixedSleep(maxRetries = 10,sleepTime = 1 SECONDS)
ERROR org.apache.pig.tools.grunt.Grunt - 错误2043:执行过程中出现意外错误。
我应该将'input.txt'和原始'wordcount.pig'脚本保存到目录pig-0.11.1内的某个特殊文件夹中吗?也就是说,在pig-0.11.1中创建一个名为word的文件夹,并在其中放入'wordcount.pig'和'input.txt',然后从grunt>提示符中键入wordcount.pig?
一般来说,如果我有'dat.txt'这样的数据,脚本说'program.pig',那么我应该在哪里保存它们以从grunt shell运行'program.pig'? ?我认为他们都应该在猪-0.11.1,所以我可以做$ pig -x本地wordcount.pig,但我不知道。
为什么我无法像我一样尝试一行一行地运行脚本?
我在load语句中指定了文件'input.txt'的位置。
那么,为什么它不只是逐行运行命令并将D的内容转储到我的屏幕上???
当我尝试使用$ pig在mapreduce模式下运行Pig时,会出现此错误:
$ b
2013-06-03 23:57:06,956 [main] ERROR org.apache.pig.Main - 错误2999:意外的内部错误。无法创建DataStorage bin / start-all.sh
脚本吗?使用 -x local
告诉Pig使用本地文件系统而不是HDFS,但它仍然需要运行Hadoop实例才能执行。在尝试运行Pig之前,请遵循Hadoop文档来设置您的本地集群,并确保您的 NameNode
, DataNode
s等正在运行。Pig 0.11.1
downloaded on my local machine, as well as Hadoop
, and Java 6
.tar.gz
" file. I am unfamiliar with this type, and wasn't sure how to extract it. It contains the files 'input.txt','wordcount.pig' and a Readme file. I saved 'input.txt
' to my Desktop. I wasn't sure where to save wordcount.pig, and decided to just type in the commands line by line in the shell.pig -x local
A = load '/home/me/Desktop/input.txt';
B = foreach A generate flatten(TOKENIZE((chararray)$0)) as word;
C = group B by word;
D = foreach C generate COUNT(B), group;
Retrying connect to server: localhost/127.0.0.1:8021. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2043: Unexpected error during execution.
bin/start-all.sh
script? Using -x local
tells Pig to use the local filesystem instead of HDFS, but it still needs a running Hadoop instance to perform the execution. Before trying to run Pig, follow the Hadoop docs to get your local "cluster" set up and make sure your NameNode
, DataNode
s, etc. are up and running.