将MapReduce输出数据加载到HBase中 | 将MapReduce输出数据加载到HBase

本文介绍了将MapReduce输出数据加载到HBase中的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

最近几天我一直在试用Hadoop。我在Ubuntu 12.10上以伪分布模式运行Hadoop，并成功执行了一些标准的MapReduce作业。

接下来，我想开始尝试使用HBase。我已经安装了HBase，在shell中玩了一下。这一切都很顺利，所以我想通过一个简单的Java程序来实验HBase。我想导入以前MapReduce任务之一的输出并将其加载到HBase表中。我已经写了一个Mapper，它应该产生 HFileOutputFormat 文件，这些文件应该很容易读入HBase表。

现在，每当我运行程序（使用：hadoop jar [编译jar]），我得到一个 ClassNotFoundException 。该程序似乎无法解析 com.google.commons.primitives.Long 。当然，我认为这只是一个缺失的依赖，但JAR（谷歌的番石榴）在那里。

我尝试了很多不同的东西，但似乎找不到解决方案。

我附加了发生的Exception和最重要的类。如果有人可以帮我解决问题或给我一些建议，我会非常感激。

亲切的问候，
Pieterjan

错误

  12/12/13 09:02： 54 WARN snappy.LoadSnappy：Snappy本地库未加载
 12/12/13 09:03:00信息mapred.JobClient：正在运行的作业：job_201212130304_0020 
 12/12/13 09:03:01信息mapred .JobClient：地图0％减少0％
 12/12/13 09:04:07信息mapred.JobClient：地图100％减少0％
 12/12/13 09:04:51信息mapred .JobClient：任务ID：attempt_201212130304_0020_r_000000_0，状态：FAILED 
错误：java.lang.ClassNotFoundException：com.google.common.primitives.Longs $ b $ java.net.URLClassLoader $ 1.run（URLClassLoader.java：在java.net.URLClassLoader中
 $ 1.run（URLClassLoader.java:355）$ java.util.AccessController.doPrivileged处的
（本地方法）$ java $ .b $ java.URLClassLoader.findClass （URLClassLoader.java:354）$ java.util.ClassLoader 
。 loadClass（ClassLoader.java:423）
 at sun.misc.Launcher $ AppClassLoader.loadClass（Launcher.java:308）
 at java.lang.ClassLoader.loadClass（ClassLoader.java:356）
 at org.apache.hadoop.hbase.KeyValue $ KVComparator.compare（KeyValue.java:1554）
 at org.apache.hadoop.hbase.KeyValue $ KVComparator.compare（KeyValue.java:1536）$ b $ java.util.TreeMap.compare（TreeMap.java:1188）$ b $ java.util.TreeMap.put（TreeMap.java:531）
 java.util.TreeSet.add（TreeSet .java：255）
 at org.apache.hadoop.hbase.mapreduce.PutSortReducer.reduce（PutSortReducer.java:63）
 at org.apache.hadoop.hbase.mapreduce.PutSortReducer.reduce（PutSortReducer .java：40）
 at org.apache.hadoop.mapreduce.Reducer.run（Reducer.java:176）
 at org.apache.hadoop.mapred.ReduceTask.runNewReducer（ReduceTask.java:650 ）
 at org.apache.hadoop.mapred.ReduceTask.run（ReduceTask.java:418）
 at org.apache.hadoop.mapred.Child $ 4.run（Child.java:255）
在java.sec （用户组方法） ：1136）
 at org.apache.hadoop.mapred.Child.main（Child.java:249）

JAVA

映射器：

  public class TestHBaseMapper扩展Mapper< LongWritable，Text，ImmutableBytesWritable，Put> {
 $ b @Override 
 public void map（LongWritable key，Text value，Context context）throws IOException，InterruptedException {
 //制表符分隔符\t，空格分隔符：\\ \\\s + 
 String [] s = value.toString（）。split（\t）; 
 Put put = new Put（s [0] .getBytes（））; 
 put.add（amount.getBytes（），value.getBytes（），value.getBytes（））; 
 context.write（new ImmutableBytesWritable（Bytes.toBytes（s [0]）），put）; 
}

工作： $ b

  public class TestHBaseRun extends Configured implements Tool {
 
 @Override 
 public int run（String [] args）throws Exception {
尝试{
配置配置= getConf（）; 
 
作业hbasejob =新作业（配置）; 
 hbasejob.setJobName（TestHBaseJob）; 
 hbasejob.setJarByClass（TestHBaseRun.class）; 
 
 //指定InputFormat和路径。 
 hbasejob.setInputFormatClass（TextInputFormat.class）; 
 TextInputFormat.setInputPaths（hbasejob，new Path（/ hadoopdir / user / data / output / test /））; 
 
 //设置Mapper，MapperOutputKey和MapperOutputValue类。 
 hbasejob.setMapperClass（TestHBaseMapper.class）; 
 hbasejob.setMapOutputKeyClass（ImmutableBytesWritable.class）; 
 hbasejob.setMapOutputValueClass（Put.class）; 
 
 //指定OutputFormat和路径。如果路径存在，则重新初始化。 
 //在这种情况下，会生成可导入HBase的HFiles。 
 hbasejob.setOutputFormatClass（HFileOutputFormat.class）; 
 FileSystem fs = FileSystem.get（configuration）; 
 Path outputpath = new Path（/ hadoopdir / user / data / hbase / table /）; 
 fs.delete（outputpath，true）; 
 HFileOutputFormat.setOutputPath（hbasejob，outputpath）; 
 
 //检查HBase中是否存在表并在必要时创建它。 
 HBaseUtil util = new HBaseUtil（configuration）; 
 if（！util.exists（test））{
 util.createTable（test，new String [] {amount}）; 
} 
 
 //读取现有的（或新创建的）表格。 
配置hbaseconfiguration = HBaseConfiguration.create（configuration）; 
 HTable table = new HTable（hbaseconfiguration，test）; 
 
 //将HFiles写入磁盘。自动配置分区器和减速器。 
 HFileOutputFormat.configureIncrementalLoad（hbasejob，table）; 
 
布尔成功= hbasejob.waitForCompletion（true）; 
 
 //将生成的文件加载到表中。 
 LoadIncrementalHFiles加载器; 
 loader = new LoadIncrementalHFiles（hbaseconfiguration）; 
 loader.doBulkLoad（outputpath，table）; 
 
返回成功？ 0：1; 
} catch（Exception ex）{
 System.out.println（Error：+ ex.getMessage（））; 
} 
返回1;

解决方案

ClassNotFoundException ，这意味着找不到包含com.google.common.primitives.Longs的所需.jar。

有几种方法可以解决此问题： / p>

如果您只是使用Hadoop，解决此问题的最简单方法是将所需的.jar复制到 / usr /共享/ hadoop的/ lib中。

将所需.jar的路径添加到 HADOOP_CLASSPATH 。打开 /etc/hbase/hbase-env.sh 并添加：

export HADOOP_CLASSPATH =< jar_files>：$ HADOOP_CLASSPATH

在您的根项目文件夹中创建一个文件夹 / lib 。将您的.jar复制到该文件夹中。为您的项目创建一个包（.jar）。结果将是一个包含所有包含在 / lib 中的所有jar的胖罐子。

The last few days I've been experimenting with Hadoop. I'm running Hadoop in pseudo-distributed mode on Ubuntu 12.10 and successfully executed some standard MapReduce jobs.
Next I wanted to start experimenting with HBase. I've installed HBase, played a bit in the shell. That all went fine so I wanted to experiment with HBase through a simple Java program. I wanted to import the output of one of the previous MapReduce jobs and load it into an HBase table. I've wrote a Mapper that should produce HFileOutputFormat files that should easily read into a HBase table.
Now, whenever I run the program (using: hadoop jar [compiled jar]) I get a ClassNotFoundException. The program seems unable to resolve com.google.commons.primitives.Long. Of course, I thought it was just a dependency missing but the JAR (Google's Guava) is there.
I've tried a lot of different things but can't seem to find a solution.
I attached the Exception that occurs and the most important classes. I would be truly appreciated if someone could help me out or give me some advice on where to look.
Kind regards,Pieterjan
ERROR
12/12/13 09:02:54 WARN snappy.LoadSnappy: Snappy native library not loaded 12/12/13 09:03:00 INFO mapred.JobClient: Running job: job_201212130304_0020 12/12/13 09:03:01 INFO mapred.JobClient: map 0% reduce 0% 12/12/13 09:04:07 INFO mapred.JobClient: map 100% reduce 0% 12/12/13 09:04:51 INFO mapred.JobClient: Task Id : attempt_201212130304_0020_r_000000_0,Status : FAILED Error: java.lang.ClassNotFoundException: com.google.common.primitives.Longs at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:423) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:356) at org.apache.hadoop.hbase.KeyValue$KVComparator.compare(KeyValue.java:1554) at org.apache.hadoop.hbase.KeyValue$KVComparator.compare(KeyValue.java:1536) at java.util.TreeMap.compare(TreeMap.java:1188) at java.util.TreeMap.put(TreeMap.java:531) at java.util.TreeSet.add(TreeSet.java:255) at org.apache.hadoop.hbase.mapreduce.PutSortReducer.reduce(PutSortReducer.java:63) at org.apache.hadoop.hbase.mapreduce.PutSortReducer.reduce(PutSortReducer.java:40) at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176) at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:650) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1136) at org.apache.hadoop.mapred.Child.main(Child.java:249)
JAVA
Mapper:
public class TestHBaseMapper extends Mapper<LongWritable, Text, ImmutableBytesWritable, Put> { @Override public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { //Tab delimiter \t, white space delimiter: \\s+ String[] s = value.toString().split("\t"); Put put = new Put(s[0].getBytes()); put.add("amount".getBytes(), "value".getBytes(), value.getBytes()); context.write(new ImmutableBytesWritable(Bytes.toBytes(s[0])), put); }
Job:
public class TestHBaseRun extends Configured implements Tool { @Override public int run(String[] args) throws Exception { try { Configuration configuration = getConf(); Job hbasejob = new Job(configuration); hbasejob.setJobName("TestHBaseJob"); hbasejob.setJarByClass(TestHBaseRun.class); //Specifies the InputFormat and the path. hbasejob.setInputFormatClass(TextInputFormat.class); TextInputFormat.setInputPaths(hbasejob, new Path("/hadoopdir/user/data/output/test/")); //Set Mapper, MapperOutputKey and MapperOutputValue classes. hbasejob.setMapperClass(TestHBaseMapper.class); hbasejob.setMapOutputKeyClass(ImmutableBytesWritable.class); hbasejob.setMapOutputValueClass(Put.class); //Specifies the OutputFormat and the path. If The path exists it's reinitialized. //In this case HFiles, that can be imported into HBase, are produced. hbasejob.setOutputFormatClass(HFileOutputFormat.class); FileSystem fs = FileSystem.get(configuration); Path outputpath = new Path("/hadoopdir/user/data/hbase/table/"); fs.delete(outputpath, true); HFileOutputFormat.setOutputPath(hbasejob, outputpath); //Check if table exists in HBase and creates it if necessary. HBaseUtil util = new HBaseUtil(configuration); if (!util.exists("test")) { util.createTable("test", new String[]{"amount"}); } //Reads the existing (or thus newly created) table. Configuration hbaseconfiguration = HBaseConfiguration.create(configuration); HTable table = new HTable(hbaseconfiguration, "test"); //Write HFiles to disk. Autoconfigures partitioner and reducer. HFileOutputFormat.configureIncrementalLoad(hbasejob, table); boolean success = hbasejob.waitForCompletion(true); //Load generated files into table. LoadIncrementalHFiles loader; loader = new LoadIncrementalHFiles(hbaseconfiguration); loader.doBulkLoad(outputpath, table); return success ? 0 : 1; } catch (Exception ex) { System.out.println("Error: " + ex.getMessage()); } return 1; }
解决方案
ClassNotFoundException, it means that the required .jar that contains com.google.common.primitives.Longs cannot be found.
There are several ways to solve this issue:
If you're just playing with Hadoop, the simplest way to solve this issue is to copy the required .jar into /usr/share/hadoop/lib.
Add the path to the required .jar to HADOOP_CLASSPATH. To do so open /etc/hbase/hbase-env.sh and add:
export HADOOP_CLASSPATH="<jar_files>:$HADOOP_CLASSPATH"
Create a folder /lib in your root project folder. Copy your .jar into that folder. Create a package (.jar) for your project. The result will be a fat jar contained all the jars included in /lib.

这篇关于将MapReduce输出数据加载到HBase中的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！