Hadoop 3.4.0 安装与WordCount示例
1. 下载Hadoop
wget https://archive.apache.org/dist/hadoop/common/hadoop-3.4.0/hadoop-3.4.0.tar.gz
下载过程如下:
--2024-10-17 10:13:48-- https://archive.apache.org/dist/hadoop/common/hadoop-3.4.0/hadoop-3.4.0.tar.gz
Resolving archive.apache.org (archive.apache.org)... 65.108.204.189, 2a01:4f9:1a:a084::2
Connecting to archive.apache.org (archive.apache.org)|65.108.204.189|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 965537117 (921M) [application/x-gzip]
Saving to: ‘hadoop-3.4.0.tar.gz’
hadoop-3.4.0.tar.gz 93%[===============================================================================================================================> ] 864.91M 1.11MB/s eta 50s h
hadoop-3.4.0.tar.gz 100%[========================================================================================================================================>] 920.81M 1.00MB/s in 13m 44s
2024-10-17 10:27:33 (1.12 MB/s) - ‘hadoop-3.4.0.tar.gz’ saved [965537117/965537117]
2. 解压Hadoop
tar -xzf hadoop-3.4.0.tar.gz
3. 配置环境变量
sudo echo "export HADOOP_HOME=$PWD/hadoop-3.4.0" > /etc/profile.d/hadoop.sh
sudo echo "PATH=$PATH:$HADOOP_HOME/bin" >> /etc/profile.d/hadoop.sh
更新.bashrc
文件并使其生效:
nano .bashrc
source .bashrc
4. 查看HDFS文件系统
hdfs dfs -ls /
输出如下:
Found 24 items
drwxr-xr-x - root root 40960 2024-10-17 09:40 /bin
drwxr-xr-x - root root 4096 2022-04-18 18:28 /boot
drwxr-xr-x - root root 3540 2024-10-17 08:42 /dev
...
5. 运行WordCount示例
创建输入目录和文件:
mkdir wc-in
echo "bla bla" > wc-in/a.txt
echo "bla wa wa" > wc-in/b.txt
执行WordCount作业:
hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.4.0.jar wordcount wc-in wc-out
6. 查看结果
查看本地输出:
ls wc-out/*
cat wc-out/*
输出如下:
bla 3
wa 2
查看HDFS上的输出:
hdfs dfs -cat wc-out/*
输出如下:
bla 3
wa 2