hadoop如何读取输入文件？

本文介绍了hadoop如何读取输入文件？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个csv文件用hadoop mapreduce进行分析。我想知道，如果hadoop会逐行解析它？如果是的话，我想用逗号分隔字符串来获得想要分析的字段。还是有其他更好的方法来解析csv并将其提供给hadoop？该文件是10 GB，逗号分隔。我想用hadoop来使用java。在下面的map（）方法中，Tex类型的参数value包含由Map / Reduce解析的每一行？ - 这是我最困惑的地方。

i have a csv file to analyze with hadoop mapreduce. I am wondering if hadoop will parse it line by line? if yes, i want to use string split by comma to get the fields want to analyze. or is there other better method of parsing csv and feed it into hadoop? The file is 10 GB, comma delimited. I want to use java with hadoop. The parameter "value" of Tex type in the below map() method contains each line that is parsed in by Map/Reduce? - this is where I'm most confused about.

这是我的代码：

this is my code:

public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
    try {
       String[] tokens = value.toString().split(",");

       String crimeType = tokens[5].trim();
       int year = Integer.parseInt(tokens[17].trim());

       context.write(crimeType, year);

     } catch (Exception e) {...}
 }

hadoop如何读取输入文件

问题描述

推荐答案