NTFY 超得屁(°∀°)ノ

NTFY 超得屁(°∀°)ノ

Hadoop 3.4.0 安装与WordCount示例

1. 下载Hadoop

wget https://archive.apache.org/dist/hadoop/common/hadoop-3.4.0/hadoop-3.4.0.tar.gz

下载过程如下:

--2024-10-17 10:13:48--  https://archive.apache.org/dist/hadoop/common/hadoop-3.4.0/hadoop-3.4.0.tar.gz
Resolving archive.apache.org (archive.apache.org)... 65.108.204.189, 2a01:4f9:1a:a084::2
Connecting to archive.apache.org (archive.apache.org)|65.108.204.189|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 965537117 (921M) [application/x-gzip]
Saving to: ‘hadoop-3.4.0.tar.gz’
hadoop-3.4.0.tar.gz                                          93%[===============================================================================================================================>         ] 864.91M  1.11MB/s    eta 50s    h
hadoop-3.4.0.tar.gz                                         100%[========================================================================================================================================>] 920.81M  1.00MB/s    in 13m 44s
2024-10-17 10:27:33 (1.12 MB/s) - ‘hadoop-3.4.0.tar.gz’ saved [965537117/965537117]

2. 解压Hadoop

tar -xzf hadoop-3.4.0.tar.gz

3. 配置环境变量

sudo echo "export HADOOP_HOME=$PWD/hadoop-3.4.0" > /etc/profile.d/hadoop.sh
sudo echo "PATH=$PATH:$HADOOP_HOME/bin" >> /etc/profile.d/hadoop.sh

更新.bashrc文件并使其生效:

nano .bashrc
source .bashrc

4. 查看HDFS文件系统

hdfs dfs -ls /

输出如下:

Found 24 items
drwxr-xr-x   - root root      40960 2024-10-17 09:40 /bin
drwxr-xr-x   - root root       4096 2022-04-18 18:28 /boot
drwxr-xr-x   - root root       3540 2024-10-17 08:42 /dev
...

5. 运行WordCount示例

创建输入目录和文件:

mkdir wc-in
echo "bla bla" > wc-in/a.txt
echo "bla wa wa" > wc-in/b.txt

执行WordCount作业:

hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.4.0.jar wordcount wc-in wc-out

6. 查看结果

查看本地输出:

ls wc-out/*
cat wc-out/*

输出如下:

bla     3
wa      2

查看HDFS上的输出:

hdfs dfs -cat wc-out/*

输出如下:

bla     3
wa      2
10-24 10:53