1 - 为什么要自己编译 Hadoop
一般个人安装使用的都是 Apache 的 Hadoop(还有 CDH Hadoop等等)。
从 Apache 官网下载的安装包是在一些特定的机器上编译而来的,并不能兼容所有的环境,尤其是本地库(用来压缩,支持C程序等等),不同平台有不同的限制。
2 - 准备编译环境
1)本机系统:macOS Big Sur 11.0.1版本;
保证能够连接互联网,Linux 系统,需要关闭防火墙和SELinux:
service iptables stop
chkconfig iptables off
# 关闭SELinux
vim /etc/selinux/config
# 注释:SELINUX=enforcing
# 添加:SELINUX=disable
2)配置 JDK 环境变量,版本为1.8.0_162;
Linux 系统,一般要卸载掉系统自带的 Java 环境:
# 查看已安装的版本:
rpm -qa | grep java
# 卸载:
rpm -e java-1.6.0-openjdk-1.6.0.41-1.13.13.1.el6_8.x86_64 java-1.7.0-openjdk-1.7.0.131-2.6.9.0.el6_8.x86_64
3)安装 Maven,版本为 3.5.2;
为了加速依赖的下载,可以添加阿里云的 Maven 镜像:
<mirror>
<id>alimaven</id>
<name>aliyun maven repo</name>
<url>http://maven.aliyun.com/nexus/content/groups/public/</url>
<mirrorOf>central</mirrorOf>
</mirror>
4)上述软件的环境变量信息:
export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_162.jdk/Contents/Home
export CLASSPATH=$JAVA_HOME/lib/tools.jar:$JAVA_HOME/lib/dt.jar:.
export PATH=$JAVA_HOME/bin:$PATH:.
export MAVEN_HOME=/usr/local/apache-maven-3.5.2
export PATH=$PATH:$MAVEN_HOME/bin
export HADOOP_HOME=/Users/healchow/bigdata/hadoop-3.2.1
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
3 - 安装依赖库
1)安装 gcc,cmake,以及 GNU 相关库:
brew install gcc cmake autoconf automake libtool
2)安装 gzip、bzip2、zlib、snappy 等压缩库:
brew install gzip bzip2 zlib
手动安装 snappy 1.1.4 —— 其他版本会出错!
# 下载并解压:
wget https://github.com/google/snappy/archive/1.1.4.tar.gz
tar -zxf 1.1.4.tar.gz
cd snappy-1.1.4
# 指定安装路径,便于 brew 链接(不指定,就会安装到 /usr/local/bin)
./autogen.sh
./configure --prefix=/usr/local/Cellar/snappy/1.1.4
# 编译并安装到上面的路径:
make && make install
# 添加到环境变量:
brew link snappy
3)安装 openssl 依赖,并配置环境变量:
brew install openssl
向 ~/.bash_profile
中添加环境变量:
export OPENSSL_ROOT_DIR="/usr/local/opt/[email protected]"
export OPENSSL_INCLUDE_DIR="$OPENSSL_ROOT_DIR/include"
export PKG_CONFIG_PATH="${OPENSSL_ROOT_DIR}/lib/pkgconfig"
# 保存后,令环境变量立即生效:
source ~/.bash_profile
4)手动安装 protobuf 2.5.0:
下载链接:https://github.com/protocolbuffers/protobuf/releases/tag/v2.5.0,解压后,编译安装:
wget https://github.com/google/protobuf/releases/download/v2.5.0/protobuf-2.5.0.tar.gz
tar -zxf protobuf-2.5.0.tar.gz
cd protobuf-2.5.0
# 指定安装路径,便于 brew 链接(不指定,就会安装到 /usr/local/bin)
./configure --prefix=/usr/local/Cellar/protobuf/2.5.0
# 编译,并安装到上面的路径:
make && make install
# 添加到环境变量:
brew link protobuf
5)可选安装 isa-l:
先安装 nasm:brew install nasm
然后下载源码包(https://github.com/intel/isa-l/releases),编译安装:
cd isa-l-2.28.0
# 执行创建configure
autoreconf --install --symlink -f
./configure --prefix=/usr/local/Cellar/isa-l --libdir=/usr/local/Cellar/isa-l/lib AS=yasm --target=darwin
# 编译安装:
make && make install
# 创建软链接:
cd /usr/local/lib
ln -s /usr/local/Cellar/isa-l/lib/libisal.2.dylib libisal.2.dylib
ln -s /usr/local/Cellar/isa-l/lib/libisal.a libisal.a
ln -s /usr/local/Cellar/isa-l/lib/libisal.dylib libisal.dylib
ln -s /usr/local/Cellar/isa-l/lib/libisal.la libisal.la
cd /usr/local/lib/pkgconfig
ln -s /usr/local/Cellar/isa-l/lib/pkgconfig/libisal.pc libisal.pc
4 - 编译 Hadoop 源码
下载 Apache Hadoop 源码包,这里下载 3.2.1 版本(https://archive.apache.org/dist/hadoop/core/hadoop-3.2.1/);
下载后,解压到 ${HOME}/bigdata/
。
编译源码,编译命令是:
cd ${HOME}/bigdata/hadoop-3.2.1-src
# 编译支持 snappy 压缩,需要指定 openssl.prefix,否则默认使用 macOS 自带的 openssl,会导致编译失败:
# -e -X 参数是打印编译过程中的所有日志:
mvn clean package -DskipTests -Pdist,native -Dmaven.javadoc.skip -Dtar \
-Drequire.bzip2 -Dbzip2.prefix=/usr/local/Cellar/bzip2/1.0.8 \
-Drequire.openssl -Dopenssl.prefix=/usr/local/Cellar/[email protected]/1.1.1k \
-Drequire.snappy -Dsnappy.lib=/usr/local/Cellar/snappy/1.1.4/lib \
-Drequire.isal -Disal.prefix=/usr/local/Cellar/isa-l -Disal.lib=/usr/local/Cellar/isa-l/lib \
-e -X
5 - 遇到的问题及解决方法
5.1 hadoop-common 模块编译出错
[WARNING] CMake Warning (dev) at CMakeLists.txt:47 (find_package):
[WARNING] Policy CMP0074 is not set: find_package uses <PackageName>_ROOT variables.
[WARNING] Run "cmake --help-policy CMP0074" for policy details. Use the cmake_policy
[WARNING] command to set the policy and suppress this warning.
[WARNING]
[WARNING] Environment variable ZLIB_ROOT is set to:
[WARNING]
[WARNING] /usr/local/Cellar/zlib/1.2.11/
[WARNING]
[WARNING] For compatibility, CMake is ignoring the variable.
[WARNING] This warning is for project developers. Use -Wno-dev to suppress it.
[WARNING]
[WARNING] CMake Error at /usr/local/Cellar/cmake/3.20.5/share/cmake/Modules/FindPackageHandleStandardArgs.cmake:230 (message):
[WARNING] Could NOT find ZLIB (missing: ZLIB_LIBRARY) (found version "1.2.11")
[WARNING] Call Stack (most recent call first):
[WARNING] /usr/local/Cellar/cmake/3.20.5/share/cmake/Modules/FindPackageHandleStandardArgs.cmake:594 (_FPHSA_FAILURE_MESSAGE)
[WARNING] /usr/local/Cellar/cmake/3.20.5/share/cmake/Modules/FindZLIB.cmake:120 (FIND_PACKAGE_HANDLE_STANDARD_ARGS)
[WARNING] CMakeLists.txt:47 (find_package)
它提示说找不到 ZLIB_LIBRARY
,而 ZLIB_ROOT
被忽略了。看看我的环境变量:
export ZLIB_ROOT=/usr/local/Cellar/zlib/1.2.11
export ZLIB_LIBRARY=/usr/local/Cellar/zlib/1.2.11/lib
export ZLIB_INCLUDE_DIR=/usr/local/Cellar/zlib/1.2.11/include
经过一番查找,原来 XXX_ROOT
在 CMake 3.12 以上是这样的作用:
再参考这位大神(https://github.com/MarkDana/Compile-Hadoop2.2.0-on-MacOS)的分析:
cd /usr/local/Cellar/cmake/3.20.5/share/cmake/Modules
vim FindZLIB.cmake
所以我们只需要设置 ZLIB_ROOT 即可,为了让此变量生效,需要在 CMakeFile 中启用 cmake 的 CMP0074 策略:
修改报错项目对应的 CMake 配置:
vim hadoop-common-project/hadoop-common/src/CMakeLists.txt
,启用新特性:
# 在 cmake_minimum_required(VERSION 3.1 FATAL_ERROR) 之后,加入这一行:
cmake_policy(SET CMP0074 NEW)
最后,环境变量中只需要保留这一行即可:
export ZLIB_ROOT=/usr/local/Cellar/zlib/1.2.11
然后,此错误就消失了。
5.2 hadoop-common 模块,仍然出错
[WARNING] CMake Warning (dev) in CMakeLists.txt:
[WARNING] No project() command is present. The top-level CMakeLists.txt file must
[WARNING] contain a literal, direct call to the project() command. Add a line of
[WARNING] code such as
[WARNING]
[WARNING] project(ProjectName)
[WARNING]
[WARNING] near the top of the file, but after cmake_minimum_required().
[WARNING]
[WARNING] CMake is pretending there is a "project(Project)" command on the first
[WARNING] line.
[WARNING] This warning is for project developers. Use -Wno-dev to suppress it.
[WARNING]
[WARNING] CMake Error at CMakeLists.txt:68 (message):
[WARNING] Required bzip2 library and/or header files could not be found.
[WARNING]
[WARNING]
[WARNING] -- Configuring incomplete, errors occurred!
找不到 bzip2
库或相关的头文件。。。可是我的 bzip2 环境变量都已经设置了呀:
export BZIP2_ROOT=/usr/local/Cellar/bzip2/1.0.8
export BZIP2_INCLUDE_DIR=/usr/local/Cellar/bzip2/1.0.8/include
export BZIP2_LIBRARY=/usr/local/Cellar/bzip2/1.0.8
其他各种环境变量和 LDFLAGS、CPPFLAGS 设置都无效;
经过各种搜索,感觉 macOS 上就不能编译 bzip2
。所以,我改了这里,跳过检查:
# 修改下面一行,直接设置了 REQUIRE_BZIP2,即 TRUE
# if(BZIP2_INCLUDE_DIR AND BZIP2_LIBRARIES)
if(REQUIRE_BZIP2)
5.3 MapReduce NativeTask 模块编译出错
[WARNING] 2 warnings and 12 errors generated.
[WARNING] make[2]: *** [CMakeFiles/nttest.dir/main/native/test/TestCompressions.cc.o] Error 1
[WARNING] make[2]: *** Waiting for unfinished jobs....
[WARNING] make[1]: *** [CMakeFiles/nttest.dir/all] Error 2
[WARNING] make: *** [all] Error 2
......
[INFO] Apache Hadoop MapReduce NativeTask ................. FAILURE [ 21.506 s]
搜索后得知,brew 安装的 snappy 版本是最新的 1.1.9,是通过 C++11 编译的,但是 Hadoop 3.2.1 的编译不支持C++11。
期间,又尝试安装了 snappy 1.1.5 编译还是会出错:
[WARNING] CMake Error at CMakeLists.txt:96 (message):
[WARNING] Required snappy library could not be found.
[WARNING] SNAPPY_LIBRARY=SNAPPY_LIBRARY-NOTFOUND, SNAPPY_INCLUDE_DIR=,
[WARNING] CUSTOM_SNAPPY_INCLUDE_DIR=, CUSTOM_SNAPPY_PREFIX=, CUSTOM_SNAPPY_INCLUDE=
尝试添加了环境变量,不起作用:
export SNAPPY_LIBRARY=/usr/local/Cellar/snappy/1.1.5
export SNAPPY_INCLUDE_DIR=/usr/local/Cellar/snappy/1.1.5/include
# 仍然会报下面的错:
Required snappy library could not be found.
[WARNING] SNAPPY_LIBRARY=SNAPPY_LIBRARY-NOTFOUND, SNAPPY_INCLUDE_DIR=,
[WARNING] CUSTOM_SNAPPY_INCLUDE_DIR=, CUSTOM_SNAPPY_PREFIX=, CUSTOM_SNAPPY_INCLUDE=
所以我又安装了最上面提到的 snappy 1.1.4,再测试,然后它终于编译成功了✌️
6 - 编译成功,测试验证
编译命令已经放在上面第4姐了。贴上编译成功的证明✌️
[INFO] Reactor Summary:
[INFO]
[INFO] Apache Hadoop Main ................................. SUCCESS [ 1.893 s]
[INFO] Apache Hadoop Build Tools .......................... SUCCESS [ 4.338 s]
[INFO] Apache Hadoop Project POM .......................... SUCCESS [ 1.560 s]
[INFO] Apache Hadoop Annotations .......................... SUCCESS [ 2.337 s]
[INFO] Apache Hadoop Assemblies ........................... SUCCESS [ 0.359 s]
[INFO] Apache Hadoop Project Dist POM ..................... SUCCESS [ 1.777 s]
[INFO] Apache Hadoop Maven Plugins ........................ SUCCESS [ 3.786 s]
[INFO] Apache Hadoop MiniKDC .............................. SUCCESS [ 0.951 s]
[INFO] Apache Hadoop Auth ................................. SUCCESS [ 6.846 s]
[INFO] Apache Hadoop Auth Examples ........................ SUCCESS [ 1.994 s]
[INFO] Apache Hadoop Common ............................... SUCCESS [ 54.530 s]
[INFO] Apache Hadoop NFS .................................. SUCCESS [ 3.630 s]
[INFO] Apache Hadoop KMS .................................. SUCCESS [ 5.173 s]
[INFO] Apache Hadoop Common Project ....................... SUCCESS [ 0.118 s]
[INFO] Apache Hadoop HDFS Client .......................... SUCCESS [ 27.638 s]
[INFO] Apache Hadoop HDFS ................................. SUCCESS [ 31.633 s]
[INFO] Apache Hadoop HDFS Native Client ................... SUCCESS [02:38 min]
[INFO] Apache Hadoop HttpFS ............................... SUCCESS [ 4.768 s]
[INFO] Apache Hadoop HDFS-NFS ............................. SUCCESS [ 1.722 s]
[INFO] Apache Hadoop HDFS-RBF ............................. SUCCESS [ 5.303 s]
[INFO] Apache Hadoop HDFS Project ......................... SUCCESS [ 0.042 s]
[INFO] Apache Hadoop YARN ................................. SUCCESS [ 0.054 s]
[INFO] Apache Hadoop YARN API ............................. SUCCESS [ 6.738 s]
[INFO] Apache Hadoop YARN Common .......................... SUCCESS [ 9.302 s]
[INFO] Apache Hadoop YARN Registry ........................ SUCCESS [ 2.945 s]
[INFO] Apache Hadoop YARN Server .......................... SUCCESS [ 0.133 s]
[INFO] Apache Hadoop YARN Server Common ................... SUCCESS [ 8.103 s]
[INFO] Apache Hadoop YARN NodeManager ..................... SUCCESS [ 40.942 s]
[INFO] Apache Hadoop YARN Web Proxy ....................... SUCCESS [ 1.310 s]
[INFO] Apache Hadoop YARN ApplicationHistoryService ....... SUCCESS [ 2.386 s]
[INFO] Apache Hadoop YARN Timeline Service ................ SUCCESS [ 1.992 s]
[INFO] Apache Hadoop YARN ResourceManager ................. SUCCESS [ 12.021 s]
[INFO] Apache Hadoop YARN Server Tests .................... SUCCESS [ 1.714 s]
[INFO] Apache Hadoop YARN Client .......................... SUCCESS [ 2.445 s]
[INFO] Apache Hadoop YARN SharedCacheManager .............. SUCCESS [ 1.740 s]
[INFO] Apache Hadoop YARN Timeline Plugin Storage ......... SUCCESS [ 1.592 s]
[INFO] Apache Hadoop YARN TimelineService HBase Backend ... SUCCESS [ 0.061 s]
[INFO] Apache Hadoop YARN TimelineService HBase Common .... SUCCESS [ 2.382 s]
[INFO] Apache Hadoop YARN TimelineService HBase Client .... SUCCESS [ 2.167 s]
[INFO] Apache Hadoop YARN TimelineService HBase Servers ... SUCCESS [ 0.124 s]
[INFO] Apache Hadoop YARN TimelineService HBase Server 1.2 SUCCESS [ 2.625 s]
[INFO] Apache Hadoop YARN TimelineService HBase tests ..... SUCCESS [ 3.917 s]
[INFO] Apache Hadoop YARN Router .......................... SUCCESS [ 1.785 s]
[INFO] Apache Hadoop YARN Applications .................... SUCCESS [ 0.119 s]
[INFO] Apache Hadoop YARN DistributedShell ................ SUCCESS [ 1.679 s]
[INFO] Apache Hadoop YARN Unmanaged Am Launcher ........... SUCCESS [ 1.112 s]
[INFO] Apache Hadoop MapReduce Client ..................... SUCCESS [ 0.196 s]
[INFO] Apache Hadoop MapReduce Core ....................... SUCCESS [ 5.185 s]
[INFO] Apache Hadoop MapReduce Common ..................... SUCCESS [ 2.387 s]
[INFO] Apache Hadoop MapReduce Shuffle .................... SUCCESS [ 1.852 s]
[INFO] Apache Hadoop MapReduce App ........................ SUCCESS [ 3.299 s]
[INFO] Apache Hadoop MapReduce HistoryServer .............. SUCCESS [ 1.948 s]
[INFO] Apache Hadoop MapReduce JobClient .................. SUCCESS [ 3.972 s]
[INFO] Apache Hadoop Mini-Cluster ......................... SUCCESS [ 1.252 s]
[INFO] Apache Hadoop YARN Services ........................ SUCCESS [ 0.040 s]
[INFO] Apache Hadoop YARN Services Core ................... SUCCESS [ 2.626 s]
[INFO] Apache Hadoop YARN Services API .................... SUCCESS [ 1.434 s]
[INFO] Apache Hadoop Image Generation Tool ................ SUCCESS [ 0.980 s]
[INFO] Yet Another Learning Platform ...................... SUCCESS [ 1.346 s]
[INFO] Apache Hadoop YARN Site ............................ SUCCESS [ 0.044 s]
[INFO] Apache Hadoop YARN UI .............................. SUCCESS [ 0.069 s]
[INFO] Apache Hadoop YARN Project ......................... SUCCESS [ 9.978 s]
[INFO] Apache Hadoop MapReduce HistoryServer Plugins ...... SUCCESS [ 0.671 s]
[INFO] Apache Hadoop MapReduce NativeTask ................. SUCCESS [ 39.343 s]
[INFO] Apache Hadoop MapReduce Uploader ................... SUCCESS [ 0.862 s]
[INFO] Apache Hadoop MapReduce Examples ................... SUCCESS [ 1.086 s]
[INFO] Apache Hadoop MapReduce ............................ SUCCESS [ 4.303 s]
[INFO] Apache Hadoop MapReduce Streaming .................. SUCCESS [ 0.906 s]
[INFO] Apache Hadoop Distributed Copy ..................... SUCCESS [ 1.362 s]
[INFO] Apache Hadoop Archives ............................. SUCCESS [ 0.496 s]
[INFO] Apache Hadoop Archive Logs ......................... SUCCESS [ 0.599 s]
[INFO] Apache Hadoop Rumen ................................ SUCCESS [ 1.424 s]
[INFO] Apache Hadoop Gridmix .............................. SUCCESS [ 0.968 s]
[INFO] Apache Hadoop Data Join ............................ SUCCESS [ 0.466 s]
[INFO] Apache Hadoop Extras ............................... SUCCESS [ 0.543 s]
[INFO] Apache Hadoop Pipes ................................ SUCCESS [ 4.609 s]
[INFO] Apache Hadoop OpenStack support .................... SUCCESS [ 0.960 s]
[INFO] Apache Hadoop Amazon Web Services support .......... SUCCESS [ 3.611 s]
[INFO] Apache Hadoop Kafka Library support ................ SUCCESS [ 0.838 s]
[INFO] Apache Hadoop Azure support ........................ SUCCESS [ 2.333 s]
[INFO] Apache Hadoop Aliyun OSS support ................... SUCCESS [ 0.449 s]
[INFO] Apache Hadoop Client Aggregator .................... SUCCESS [ 3.429 s]
[INFO] Apache Hadoop Scheduler Load Simulator ............. SUCCESS [ 2.420 s]
[INFO] Apache Hadoop Resource Estimator Service ........... SUCCESS [ 1.536 s]
[INFO] Apache Hadoop Azure Data Lake support .............. SUCCESS [ 0.592 s]
[INFO] Apache Hadoop Tools Dist ........................... SUCCESS [ 10.087 s]
[INFO] Apache Hadoop Tools ................................ SUCCESS [ 0.048 s]
[INFO] Apache Hadoop Client API ........................... SUCCESS [01:24 min]
[INFO] Apache Hadoop Client Runtime ....................... SUCCESS [01:05 min]
[INFO] Apache Hadoop Client Packaging Invariants .......... SUCCESS [ 0.262 s]
[INFO] Apache Hadoop Client Test Minicluster .............. SUCCESS [02:00 min]
[INFO] Apache Hadoop Client Packaging Invariants for Test . SUCCESS [ 0.174 s]
[INFO] Apache Hadoop Client Packaging Integration Tests ... SUCCESS [ 0.151 s]
[INFO] Apache Hadoop Distribution ......................... SUCCESS [ 22.983 s]
[INFO] Apache Hadoop Client Modules ....................... SUCCESS [ 0.053 s]
[INFO] Apache Hadoop Cloud Storage ........................ SUCCESS [ 0.607 s]
[INFO] Apache Hadoop Cloud Storage Project ................ SUCCESS [ 0.054 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 14:07 min
[INFO] Finished at: 2021-06-30T00:10:40+08:00
[INFO] Final Memory: 406M/2110M
[INFO] ------------------------------------------------------------------------
[WARNING] The requested profile "dev" could not be activated because it does not exist.
看截图:
编译好的安装文件,在这个目录下:
我们需要的压缩包在 lib/native
下。
${源码}/hadoop-dist/target/hadoop-3.2.1
# 本地库包在这里:
${源码}/hadoop-dist/target/hadoop-3.2.1/lib/native
拷贝 native 下的文件,到已经 Hadoop 集群的安装目录中,然后检查它对本地库的支持:
没有恼人的 WARN 警告了,zlib、snappy 等压缩功能也都有了✌️
7 - 经验总结
1)尽量用 CentOS系统编译。macOS 编译,大部分本地库都不会通过,会卡死在 CMake。
2)额外使用的环境变量如下:
# 本地编译 Hadoop,必须设置 ZLIB_ROOT,且在 CMakeFile 中启用 cmake 的 CMP0074 策略:
export ZLIB_ROOT=/usr/local/Cellar/zlib/1.2.11
# export ZLIB_LIBRARY=/usr/local/Cellar/zlib/1.2.11/lib
# export ZLIB_INCLUDE_DIR=/usr/local/Cellar/zlib/1.2.11/include
export OPENSSL_ROOT_DIR="/usr/local/opt/[email protected]"
export OPENSSL_INCLUDE_DIR="$OPENSSL_ROOT_DIR/include"
export PKG_CONFIG_PATH="${OPENSSL_ROOT_DIR}/lib/pkgconfig"