java - PigUnit:无法打开迭代器

我正在Apache Pig页面here中的PigUnit测试示例中进行操作。我试图使用Maven项目在Eclipse中编写代码示例。我已经在pom.xml中添加了Pig和PigUnit依赖项，并尝试了0.14和0.15版本。

这是从Apache Pig页面获取的PigUnit测试代码（我将其附带在课程中）：

  @Test
  public void testTop2Queries() {
    String[] args = {
        "n=2",
        };

    PigTest test = new PigTest("top_queries.pig", args);

    String[] input = {
        "yahoo",
        "yahoo",
        "yahoo",
        "twitter",
        "facebook",
        "facebook",
        "linkedin",
    };

    String[] output = {
        "(yahoo,3)",
        "(facebook,2)",
    };

    test.assertOutput("data", input, "queries_limit", output);
  }

和Pig脚本，也复制了：

data = LOAD 'input' AS (query:CHARARRAY);
queries_group = GROUP data BY query;
queries_count = FOREACH queries_group GENERATE group AS query, COUNT(data) AS total;
queries_ordered = ORDER queries_count BY total DESC, query;
queries_limit = LIMIT queries_ordered 2;
STORE queries_limit INTO 'output';

但是，当我尝试以“运行方式”>“ JUnit测试”时遇到此结果：

org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias queries_limit
    at org.apache.pig.PigServer.openIterator(PigServer.java:935)
    ...[truncated]
Caused by: java.io.IOException: Couldn't retrieve job.
    at org.apache.pig.PigServer.store(PigServer.java:999)
    at org.apache.pig.PigServer.openIterator(PigServer.java:910)
    ... 28 more

这是我得到的控制台输出：

STORE queries_limit INTO 'output';
--> none
data: {query: chararray}
data = LOAD 'input' AS (query:CHARARRAY);
--> data = LOAD 'file:/tmp/temp-820202225/tmp-1722948946' USING PigStorage('\t') AS (
    query: chararray
);
STORE queries_limit INTO 'output';
--> none

看起来Pig脚本正在尝试为“输入”加载本地文件系统数据，而不是使用Java String[]变量“输入”变量。

有人能帮忙吗？

最佳答案

在进入解决方案之前，我想评论一下pig脚本是从本地磁盘加载的事实。当Pig覆盖一条语句并提供要模拟的数据时，它将在本地磁盘上创建一个文件并将其加载。这就是为什么您看到该文件正在加载的原因。如果查看该文件，则应该看到在字符串数组中输入的数据，输入。

对于仍在寻找解决方案的任何人，以下是对我有用的。该解决方案基于Pig版本0.15和Hadoop 2.7.1。在我看来，您必须指定所需的猪人工制品。

    <dependency>
        <groupId>org.apache.pig</groupId>
        <artifactId>pigunit</artifactId>
        <version>${pig.version}</version>
        <scope>test</scope>
    </dependency>
    <dependency>
        <groupId>org.apache.pig</groupId>
        <artifactId>pig</artifactId>
        <version>${pig.version}</version>
        <classifier>h2</classifier>
        <!-- NOTE: It is very important to have this classifier. Unit tests will
        break if this doesn't exist. This gets the pig jars for Hadoop v2. -->
    </dependency>

这是Pig github页面上的一些非常有用的类。

PigTest实现（适合阅读API文档）：
https://github.com/apache/pig/blob/trunk/test/org/apache/pig/pigunit/PigTest.java

PigUnit示例：
https://github.com/apache/pig/blob/trunk/test/org/apache/pig/test/pigunit/TestPigTest.java

关于java - PigUnit:无法打开迭代器，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/30930721/