本文介绍了猪初学者的例子[意外错误]的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是Linux和Apache Pig的新手。我按照这个教程学习猪:



这是一个基本的字数统计范例。数据文件'input.txt'和程序文件'wordcount.pig'位于Wordcount包中,并链接到该网站。

我已经有 Pig 0.11.1 下载到本地机器上,以及 Hadoop Java 6

当我下载Wordcount包时,它将我带到 tar.gz 文件。我对这种类型不熟悉,不确定如何提取它。
它包含文件'input.txt','wordcount.pig'和自述文件。我将' input.txt '保存到我的桌面。我不确定在哪里保存wordcount.pig,并决定在shell中逐行输入命令。



我在本地模式下运行猪如下: pig -x local



然后我只是复制粘贴wordcount.pig脚本的每一行grunt>提示符如下:

  A = load'/home/me/Desktop/input.txt'; 

B = foreach生成平坦(TOKENIZE((chararray)$ 0))作为单词;

C = B字组;

D = foreach C生成COUNT(B),group;

dump D;



以下错误:
...

 重试连接到服务器:localhost / 127.0.0.1:8021。已经尝试了9次(s);重试策略为RetryUpToMaximumCountWithFixedSleep(maxRetries = 10,sleepTime = 1 SECONDS)

ERROR org.apache.pig.tools.grunt.Grunt - 错误2043:执行过程中出现意外错误。

我的问题:


我应该将'input.txt'和原始'wordcount.pig'脚本保存到目录pig-0.11.1内的某个特殊文件夹中吗?也就是说,在pig-0.11.1中创建一个名为word的文件夹,并在其中放入'wordcount.pig'和'input.txt',然后从grunt>提示符中键入wordcount.pig?
一般来说,如果我有'dat.txt'这样的数据,脚本说'program.pig',那么我应该在哪里保存它们以从grunt shell运行'program.pig'? ?我认为他们都应该在猪-0.11.1,所以我可以做$ pig -x本地wordcount.pig,但我不知道。



2。
为什么我无法像我一样尝试一行一行地运行脚本?
我在load语句中指定了文件'input.txt'的位置。
那么,为什么它不只是逐行运行命令并将D的内容转储到我的屏幕上???



3。
当我尝试使用$ pig在mapreduce模式下运行Pig时,会出现此错误:
$ b

重试策略为RetryUpToMaximumCountWithFixedSleep(maxRetries = 10,sleepTime = 1 SECONDS )
2013-06-03 23:57:06,956 [main] ERROR org.apache.pig.Main - 错误2999:意外的内部错误。无法创建DataStorage

解决方案

此错误表明Pig无法连接到Hadoop以运行作业。你说你已经下载Hadoop - 你有没有安装它?如果你已经安装了它,你是否根据它的文档启动了它 - 你运行了 bin / start-all.sh 脚本吗?使用 -x local 告诉Pig使用本地文件系统而不是HDFS,但它仍然需要运行Hadoop实例才能执行。在尝试运行Pig之前,请遵循Hadoop文档来设置您的本地集群,并确保您的 NameNode DataNode s等正在运行。


I am new to Linux and Apache Pig. I am following this tutorial to learn pig:http://salsahpc.indiana.edu/ScienceCloud/pig_word_count_tutorial.htm

This is a basic word counting example. The data file 'input.txt' and the program file 'wordcount.pig' are in the Wordcount package, linked on the site.

I already have Pig 0.11.1 downloaded on my local machine, as well as Hadoop, and Java 6.

When I downloaded the Wordcount package it took me to a "tar.gz" file. I am unfamiliar with this type, and wasn't sure how to extract it. It contains the files 'input.txt','wordcount.pig' and a Readme file. I saved 'input.txt' to my Desktop. I wasn't sure where to save wordcount.pig, and decided to just type in the commands line by line in the shell.

I ran pig in local mode as follows:pig -x local

and then I just copy-pasted each line of the wordcount.pig script at the grunt> prompt like this:

A = load '/home/me/Desktop/input.txt';

B = foreach A generate flatten(TOKENIZE((chararray)$0)) as word;

C = group B by word;

D = foreach C generate COUNT(B), group;

dump D;

This generates the following errors:...

Retrying connect to server: localhost/127.0.0.1:8021. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)

 ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2043: Unexpected error during execution.

My questions:

1.Should I be saving 'input.txt' and the original 'wordcount.pig' script to some special folder inside the directory pig-0.11.1? That is, create a folder called word inside pig-0.11.1 and put 'wordcount.pig' and 'input.txt' there and then type in "wordcount.pig" from the grunt> prompt ???In general, if I have data in say, 'dat.txt', and a script say, 'program.pig', where should I be saving them to run 'program.pig' from the grunt shell??? I think they should both go in pig-0.11.1,so I can do $ pig -x local wordcount.pig, but I am not sure.

2.Why am I not able to run the script line by line as I tried to?I have specified the location of the file 'input.txt' in the load statement.So why does it not just run the commands line by line and dump the contents of D to my screen???

3.When I try to run Pig in mapreduce mode using $pig, it gives this error:

retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)2013-06-03 23:57:06,956 [main] ERROR org.apache.pig.Main - ERROR 2999: Unexpected internal error. Failed to create DataStorage

解决方案

This error indicates that Pig is unable to connect to Hadoop to run the job. You say you have downloaded Hadoop -- have you installed it? If you have installed it, have you started it up according to its docs -- have you run the bin/start-all.sh script? Using -x local tells Pig to use the local filesystem instead of HDFS, but it still needs a running Hadoop instance to perform the execution. Before trying to run Pig, follow the Hadoop docs to get your local "cluster" set up and make sure your NameNode, DataNodes, etc. are up and running.

这篇关于猪初学者的例子[意外错误]的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

11-01 08:26