我正在尝试使用hadoop在Windows 10中运行mapper reducer作业。我收到以下错误。我到处都看过,但找不到解决方案。基本上是java.io.IOException: The pipe has been ended我做了什么?

  • 添加了shebanger行
  • -mapper mapper.py更改为-mapper "python <full path to mapper>"
  • 检查了映射器和化简器代码,所有配置文件(XML文件)
  • 将映射器和化简器添加到与hadoop fs
  • 中的输入文件相同的文件夹中

    上述解决方案均无效。
    2020-10-21 01:52:36,861 INFO impl.MetricsConfig: Loaded properties from hadoop-metrics2.properties
    2020-10-21 01:52:37,002 INFO impl.MetricsSystemImpl: Scheduled Metric snapshot period at 10 second(s).
    2020-10-21 01:52:37,002 INFO impl.MetricsSystemImpl: JobTracker metrics system started
    2020-10-21 01:52:37,028 WARN impl.MetricsSystemImpl: JobTracker metrics system already initialized!
    2020-10-21 01:52:37,964 INFO mapred.FileInputFormat: Total input files to process : 1
    2020-10-21 01:52:38,209 INFO mapreduce.JobSubmitter: number of splits:1
    2020-10-21 01:52:38,386 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local105213688_0001
    2020-10-21 01:52:38,387 INFO mapreduce.JobSubmitter: Executing with tokens: []
    2020-10-21 01:52:38,557 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
    2020-10-21 01:52:38,561 INFO mapreduce.Job: Running job: job_local105213688_0001
    2020-10-21 01:52:38,561 INFO mapred.LocalJobRunner: OutputCommitter set in config null
    2020-10-21 01:52:38,566 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapred.FileOutputCommitter
    2020-10-21 01:52:38,579 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 2
    2020-10-21 01:52:38,579 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
    2020-10-21 01:52:38,689 INFO mapred.LocalJobRunner: Waiting for map tasks
    2020-10-21 01:52:38,694 INFO mapred.LocalJobRunner: Starting task: attempt_local105213688_0001_m_000000_0
    2020-10-21 01:52:38,732 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 2
    2020-10-21 01:52:38,733 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
    2020-10-21 01:52:38,745 INFO util.ProcfsBasedProcessTree: ProcfsBasedProcessTree currently is supported only on Linux.
    2020-10-21 01:52:38,801 INFO mapred.Task:  Using ResourceCalculatorProcessTree : org.apache.hadoop.yarn.util.WindowsBasedProcessTree@6b965617
    2020-10-21 01:52:38,820 INFO mapred.MapTask: Processing split: hdfs://localhost:9000/inp/inputfile.txt:0+6488666
    2020-10-21 01:52:38,883 INFO mapred.MapTask: numReduceTasks: 1
    2020-10-21 01:52:39,001 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
    2020-10-21 01:52:39,001 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
    2020-10-21 01:52:39,003 INFO mapred.MapTask: soft limit at 83886080
    2020-10-21 01:52:39,004 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
    2020-10-21 01:52:39,005 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
    2020-10-21 01:52:39,012 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
    2020-10-21 01:52:39,115 INFO streaming.PipeMapRed: PipeMapRed exec [D:\Wireshark\uninstall.exe, D:\ll\sem5\Cloud_Computing\assignment\lab6-7\lab7\mapper.py]
    2020-10-21 01:52:39,127 INFO Configuration.deprecation: mapred.work.output.dir is deprecated. Instead, use mapreduce.task.output.dir
    2020-10-21 01:52:39,129 INFO Configuration.deprecation: map.input.start is deprecated. Instead, use mapreduce.map.input.start
    2020-10-21 01:52:39,130 INFO Configuration.deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap
    2020-10-21 01:52:39,131 INFO Configuration.deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
    2020-10-21 01:52:39,133 INFO Configuration.deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id
    2020-10-21 01:52:39,137 INFO Configuration.deprecation: mapred.local.dir is deprecated. Instead, use mapreduce.cluster.local.dir
    2020-10-21 01:52:39,138 INFO Configuration.deprecation: map.input.file is deprecated. Instead, use mapreduce.map.input.file
    2020-10-21 01:52:39,140 INFO Configuration.deprecation: mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords
    2020-10-21 01:52:39,141 INFO Configuration.deprecation: map.input.length is deprecated. Instead, use mapreduce.map.input.length
    2020-10-21 01:52:39,143 INFO Configuration.deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id
    2020-10-21 01:52:39,144 INFO Configuration.deprecation: user.name is deprecated. Instead, use mapreduce.job.user.name
    2020-10-21 01:52:39,147 INFO Configuration.deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition
    2020-10-21 01:52:39,470 INFO streaming.PipeMapRed: R/W/S=1/0/0 in:NA [rec/s] out:NA [rec/s]
    2020-10-21 01:52:39,471 INFO streaming.PipeMapRed: R/W/S=10/0/0 in:NA [rec/s] out:NA [rec/s]
    2020-10-21 01:52:39,474 INFO streaming.PipeMapRed: R/W/S=100/0/0 in:NA [rec/s] out:NA [rec/s]
    2020-10-21 01:52:39,481 INFO streaming.PipeMapRed: R/W/S=1000/0/0 in:NA [rec/s] out:NA [rec/s]
    2020-10-21 01:52:39,584 INFO mapreduce.Job: Job job_local105213688_0001 running in uber mode : false
    2020-10-21 01:52:39,586 INFO mapreduce.Job:  map 0% reduce 0%
    2020-10-21 01:52:40,423 INFO streaming.PipeMapRed: MRErrorThread done
    2020-10-21 01:52:40,425 INFO streaming.PipeMapRed: R/W/S=1299/0/0 in:1299=1299/1 [rec/s] out:0=0/1 [rec/s]
    minRecWrittenToEnableSkip_=9223372036854775807 HOST=null
    USER=null
    HADOOP_USER=null
    last tool output: |null|
    
    java.io.IOException: The pipe has been ended
           at java.base/java.io.FileOutputStream.writeBytes(Native Method)
           at java.base/java.io.FileOutputStream.write(FileOutputStream.java:348)
           at java.base/java.io.BufferedOutputStream.write(BufferedOutputStream.java:123)
           at java.base/java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:81)
           at java.base/java.io.BufferedOutputStream.write(BufferedOutputStream.java:127)
           at java.base/java.io.DataOutputStream.write(DataOutputStream.java:107)
           at org.apache.hadoop.streaming.io.TextInputWriter.writeUTF8(TextInputWriter.java:72)
           at org.apache.hadoop.streaming.io.TextInputWriter.writeValue(TextInputWriter.java:51)
           at org.apache.hadoop.streaming.PipeMapper.map(PipeMapper.java:106)
           at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
           at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
           at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:465)
           at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349)
           at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:271)
           at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
           at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
           at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
           at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
           at java.base/java.lang.Thread.run(Thread.java:835)
    2020-10-21 01:52:40,431 WARN streaming.PipeMapRed: {}
    java.io.IOException: The pipe is being closed
           at java.base/java.io.FileOutputStream.writeBytes(Native Method)
           at java.base/java.io.FileOutputStream.write(FileOutputStream.java:348)
           at java.base/java.io.BufferedOutputStream.write(BufferedOutputStream.java:123)
           at java.base/java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:81)
           at java.base/java.io.BufferedOutputStream.flush(BufferedOutputStream.java:142)
           at java.base/java.io.DataOutputStream.flush(DataOutputStream.java:123)
           at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:532)
           at org.apache.hadoop.streaming.PipeMapper.map(PipeMapper.java:120)
           at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
           at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
           at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:465)
           at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349)
           at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:271)
           at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
           at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
           at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
           at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
           at java.base/java.lang.Thread.run(Thread.java:835)
    2020-10-21 01:52:40,433 INFO streaming.PipeMapRed: mapRedFinished
    2020-10-21 01:52:40,435 WARN streaming.PipeMapRed: {}
    java.io.IOException: The pipe is being closed
           at java.base/java.io.FileOutputStream.writeBytes(Native Method)
           at java.base/java.io.FileOutputStream.write(FileOutputStream.java:348)
           at java.base/java.io.BufferedOutputStream.write(BufferedOutputStream.java:123)
           at java.base/java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:81)
           at java.base/java.io.BufferedOutputStream.flush(BufferedOutputStream.java:142)
           at java.base/java.io.DataOutputStream.flush(DataOutputStream.java:123)
           at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:532)
           at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
           at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
           at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
           at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:465)
           at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349)
           at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:271)
           at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
           at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
           at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
           at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
           at java.base/java.lang.Thread.run(Thread.java:835)
    2020-10-21 01:52:40,438 INFO streaming.PipeMapRed: mapRedFinished
    2020-10-21 01:52:40,445 INFO mapred.LocalJobRunner: map task executor complete.
    2020-10-21 01:52:40,506 WARN mapred.LocalJobRunner: job_local105213688_0001
    java.lang.Exception: java.io.IOException: The pipe has been ended
           at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:492)
           at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:552)
    Caused by: java.io.IOException: The pipe has been ended
           at java.base/java.io.FileOutputStream.writeBytes(Native Method)
           at java.base/java.io.FileOutputStream.write(FileOutputStream.java:348)
           at java.base/java.io.BufferedOutputStream.write(BufferedOutputStream.java:123)
           at java.base/java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:81)
           at java.base/java.io.BufferedOutputStream.write(BufferedOutputStream.java:127)
           at java.base/java.io.DataOutputStream.write(DataOutputStream.java:107)
           at org.apache.hadoop.streaming.io.TextInputWriter.writeUTF8(TextInputWriter.java:72)
           at org.apache.hadoop.streaming.io.TextInputWriter.writeValue(TextInputWriter.java:51)
           at org.apache.hadoop.streaming.PipeMapper.map(PipeMapper.java:106)
           at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
           at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
           at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:465)
           at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349)
           at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:271)
           at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
           at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
           at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
           at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
           at java.base/java.lang.Thread.run(Thread.java:835)
    2020-10-21 01:52:40,596 INFO mapreduce.Job: Job job_local105213688_0001 failed with state FAILED due to: NA
    2020-10-21 01:52:40,625 INFO mapreduce.Job: Counters: 0
    2020-10-21 01:52:40,625 ERROR streaming.StreamJob: Job not successful!
    Streaming Command Failed!
    
    输入文件
    Over the years, there have been many releases of PowerShell. Initially, Windows PowerShell was built on the .NET Framework and only worked on Windows systems. With the current release, PowerShell uses .NET Core 3.1 as its runtime. PowerShell runs on Windows, macOS, and Linux platforms.
    
    我的Python代码
    映射器
    #!/usr/bin/python
    
    import sys
    d={}
    for line in sys.stdin:
        line= line.strip()
        words=line.split()
        for word in words:
    
    reducer
    import sys
    cur_word= None
    cur_cnt=0
    
    for line in sys.stdin:
        line=line.strip().split(',')
        # print(line)
        word, count= line[0], (int)(line[1])
    
        if cur_word==None:
            cur_word=word
            cur_cnt=count
        elif cur_word==word:
            cur_cnt+=count
        else:
            print(cur_word,',', cur_cnt)
            cur_word=word
            cur_cnt=count
    print(cur_word,",",count)
    
    
    我已经做了一切,但问题仍然存在。

    最佳答案

    您的映射器似乎不完整:缺少print word

    关于java - 从Python访问Hadoop — java.io.IOException:管道已结束,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/64453009/

    10-15 11:49