You may choose to install spark, yarn, hive, etc one by one. [Spark] 00 - Install Hadoop & Spark
But here, we will introduce how to install and configure big data environment in an automatic way. You will also understand why CDH is there.
一些资源
cdh的pyspark安装:https://blog.csdn.net/weixin_43215250/article/details/89186733
背景知识
Cluster
Ubuntu 18.04
node00, 192.168.56.1
CentOS 7.7
node01, 192.168.56.100
node02, 192.168.56.110
node03, 192.168.56.120
Packages
(base) [hadoop@node01 soft]$ ll
total
-rwxrwxrwx hadoop hadoop Nov : Anaconda3-2019.07-Linux-x86_64.sh
-rw-r--r-- hadoop hadoop Nov : CDH-5.14.2-1.cdh5.14.2.p0.3-el7.parcel
-rwxrwxr-x hadoop hadoop Nov : CDH-5.14.2-1.cdh5.14.2.p0.3-el7.parcel.sha
-rw-r--r-- hadoop hadoop Nov : CDH-5.14.2-1.cdh5.14.2.p0.3-el7.parcel.sha1
-rw-r--r-- hadoop hadoop Nov : cloudera-manager-centos7-cm5.14.2_x86_64.tar.gz
-rw-rw-r-- hadoop hadoop Nov : jdk-8u202-linux-x64.tar.gz
-rw-r--r-- hadoop hadoop Nov : jdk-8u211-linux-x64.rpm
drwxr-xr-x hadoop hadoop Nov : kafka parcel安装包
-rw-r--r-- hadoop hadoop Nov : manifest.json
-rw-r--r-- hadoop hadoop Nov : maxwell-1.22.1.tar.gz
-rw-r--r-- root root Apr mysql57-community-release-el7-10.noarch.rpm
-rw-r--r-- hadoop hadoop Nov : mysql-connector-java.jar
-rw-rw-r-- hadoop hadoop Nov : zookeeper-3.4.5-cdh5.14.2.tar.gz
Data Pipeline
一、引出问题
QPS: Queries Per Second
单个表的大小达到400w-500w时,性能比较低下。开始 “分库、分表”。那岂不是要分太多的表?
- 辅助索引:只能局部有效。
- 分库策略:join方法不能用了;cout, order, group也不能用了。
- 扩容策略:需要再次水平拆分,迁移数据。
读写分离
Ref: 什么是数据库读写分离?
Ref: mysql数据库的主从同步,实现读写分离
大多数互联网业务,往往读多写少,这时候,数据库的读会首先称为数据库的瓶颈,这时,如果我们希望能够线性的提升数据库的读性能,消除读写锁冲突从而提升数据库的写性能,那么就可以使用“分组架构”(读写分离架构)。
用一句话概括,读写分离是用来解决数据库的读性能瓶颈的。
同时也带来一些问题:
- 数据库连接池要进行区分,哪些是读连接池,哪个是写连接池,研发的难度会增加;
- 为了保证高可用,读连接池要能够实现故障自动转移;
- 主从的一致性问题需要考虑。
缓存策略
如果在缓存的读写分离进行二选一时,还是应该首先考虑 "缓存"。
- 缓存的使用成本要比从库少非常多;
- 缓存的开发比较容易,大部分的读操作都可以先去缓存,找不到的再渗透到数据库。
- 当然,如果我们已经运用了缓存,但是读依旧还是瓶颈时,就可以选择“读写分离”架构了。简单来说,我们可以将读写分离看做是缓存都解决不了时的一种解决方案。
- 当然,缓存也不是没有缺点的:对于缓存,我们必须要考虑的就是高可用,不然,如果缓存一旦挂了,所有的流量都同时聚集到了数据库上,那么数据库是肯定会挂掉的。
水平切分
常见的,其实是数据容量的瓶颈。例如订单表,数据量只增不减,历史数据又必须要留存,非常容易成为性能的瓶颈。
数据库水平切分,也是一种常见的数据库架构,是一种通过算法,将数据库进行分割的架构。一个水平切分集群中的每个数据库,通常称为一个“分片”。每一个分片中的数据没有重合,所有分片中的数据并集组成全部数据。
二、数据工程的 "细节问题"
binlog的意义
分析数据时,不能影响主库。
主库 --> binlog文件 --> maxwell --> kalfa cluster --> hbaseMaster
把数据分析的业务 脱离出来,缓解数据分析的 “读写压力” 。
hbase的使用问题
spark和hbase集成时默认进行的是全表扫描,给内存带来压力。
kafka的使用问题
kafka的偏移量管理,(1) 可能会导致一条消息被处理多次。(2) 至少被消费一次。
前后端分离
REST架构风格。
Goto: 理解本真的 REST 架构风格
Cloudera 集群搭建
一、Cloudera Manager
Server
--> Management Service
--> Database Agent
Agent
...
二、前期准备
三台机器执行如下命令。
防火墙 systemctl stop firewalld
systemctl disable firewalld 关闭selinux安全子系统 vim /etc/selinux/config
SELINUX=disabled 配置时区 Asia>China>beijing
cp /usr/share/zoneinfo/Asia/Shanghai /etc/localtime 同步时间 如果下面不成功,则先执行类似的命令:date -s '2019-11-11 11:55:55'
三台机器执行以下命令定时同步xxx云服务器时间 yum -y install ntpdate
crontab -e
*/ * * * * /usr/sbin/ntpdate time1.aliyun.com
Ubuntu 创建root用户;ssh 免密登录。
(base) hadoop@unsw-ThinkPad-T490:/kkb/soft$ su root
Password:
su: Authentication failure
(base) hadoop@unsw-ThinkPad-T490:/soft$ sudo passwd root
Enter new UNIX password:
Retype new UNIX password:
passwd: password updated successfully
(base) hadoop@unsw-ThinkPad-T490:/soft$ su
Password:
root@unsw-ThinkPad-T490:/soft#
准备 cloudera安装包,jdk包,依赖包(如下)
yum -y install chkconfig python bind-utils psmisc libxslt zlib sqlite cyrus-sasl-plain cyrus-sasl-gssapi fuse portmap fuse-libs redhat-lsb
三、配置MySQL的scm用户
MySQL数据库只安装在node02即可,放不同的node上。
[root@node02 ~]# yum -y install mysql57-community-release-el7-10.noarch.rpm [root@node02 ~]# yum -y install mysql-community-server # 已经查询不到mariadb数据库,被覆盖掉了。
[root@node02 ~]# rpm -qa|grep mariadb
You have new mail in /var/spool/mail/root
[服务] MySQL数据库设置。
[root@node02 hadoop]# systemctl start mysqld.service
[root@node02 hadoop]# systemctl status mysqld.service
● mysqld.service - MySQL Server
Loaded: loaded (/usr/lib/systemd/system/mysqld.service; enabled; vendor preset: disabled)
Active: active (running) since Sun 2019-11-24 10:00:18 AEDT; 2h 58min ago
Docs: man:mysqld(8)
http://dev.mysql.com/doc/refman/en/using-systemd.html
Main PID: 1311 (mysqld)
CGroup: /system.slice/mysqld.service
└─1311 /usr/sbin/mysqld --daemonize --pid-file=/var/run/mysqld/mysqld.pid Nov 24 10:00:17 node02.kaikeba.com systemd[1]: Starting MySQL Server...
Nov 24 10:00:18 node02.kaikeba.com systemd[1]: Started MySQL Server.
添加root用户远程访问数据库。
[root@node02 ~]# mysql -u root -p mysql> show databases; +--------------------+
| Database |
+--------------------+
| information_schema |
| maxwell |
| mysql |
| #mysql50#mysql-bin |
| performance_schema |
| scm |
| sys |
+--------------------+
7 rows in set (0.09 sec) mysql> update mysql.user set Grant_priv='Y',Super_priv='Y' where user = 'root' and host = '%';
Query OK, 1 row affected (0.00 sec)
Rows matched: 1 Changed: 1 Warnings: 0
mysql> flush privileges;
Query OK, 0 rows affected (0.00 sec) mysql> quit [root@node02 ~]# systemctl restart mysqld.service [root@node01 ~]# cp mysql-connector-java.jar /opt/cm-5.14.2/share/cmf/lib/
所有节点手动创建文件夹。
[root@node01 ~]# mkdir /opt/cm-5.14./run/cloudera-scm-agent
所有节点,创建cloudera-scm用户
useradd --system --home=/opt/cm-5.14./run/cloudera-scm-server --no-create-home --shell=/bin/false --comment "Cloudera SCM User" cloudera-scm
远程创建用户:scm。如果原数据库中有scm database,先删掉。再创建如下。
[root@node01 ~]# /opt/cm-5.14.2/share/cmf/schema/scm_prepare_database.sh mysql -h node02 -uroot -p'<pwd>' --scm-host node01 scm scm '<pwd>'
所有节点,设置好cloudera的master的位置。
[root@node01 ~]# vi /opt/cm-5.14./etc/cloudera-scm-agent/config.ini
[General]
# 修改成node01
server_host=node01
所有节点,添加cloudera-scm权限。
[root@node01 ~]# chown -R cloudera-scm:cloudera-scm /opt/cloudera
[root@node01 ~]# chown -R cloudera-scm:cloudera-scm /opt/cm-5.14.2
四、启动cloudera服务界面
[服务] node01启动服务。
[root@node01 ~]# /opt/cm-5.14./etc/init.d/cloudera-scm-server start
Starting cloudera-scm-server: [ OK ]
等待直到7180端口出现。
[root@node01 opt]# ps -ef | grep scm-server
root 20411 1 77 15:17 pts/0 00:02:46 /kkb/install/jdk1.8.0_202/bin/java -cp .:lib/*:/usr/share/java/mysql-connector-java.jar:/usr/share/java/oracle-connector-java.jar -server -Dlog4j.configuration=file:/opt/cm-5.14.2/etc/cloudera-scm-server/log4j.properties -Dfile.encoding=UTF-8 -Dcmf.root.logger=INFO,LOGFILE -Dcmf.log.dir=/opt/cm-5.14.2/log/cloudera-scm-server -Dcmf.log.file=cloudera-scm-server.log -Dcmf.jetty.threshhold=WARN -Dcmf.schema.dir=/opt/cm-5.14.2/share/cmf/schema -Djava.awt.headless=true -Djava.net.preferIPv4Stack=true -Dpython.home=/opt/cm-5.14.2/share/cmf/python -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+HeapDumpOnOutOfMemoryError -Xmx2G -XX:MaxPermSize=256m -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp -XX:OnOutOfMemoryError=kill -9 %p com.cloudera.server.cmf.Main
root 21270 9383 0 15:20 pts/0 00:00:00 grep --color=auto scm-server [root@node01 opt]# netstat -anpl | grep 20411
tcp 0 0 0.0.0.0:7180 0.0.0.0:* LISTEN 20411/java
tcp 0 0 0.0.0.0:7182 0.0.0.0:* LISTEN 20411/java
tcp 0 0 192.168.56.100:7182 192.168.56.110:55332 ESTABLISHED 20411/java
tcp 0 0 192.168.56.100:47974 192.168.56.110:3306 ESTABLISHED 20411/java
tcp 0 0 192.168.56.100:7182 192.168.56.100:42990 ESTABLISHED 20411/java
tcp 0 0 192.168.56.100:47230 192.168.56.110:3306 ESTABLISHED 20411/java
tcp 0 0 192.168.56.100:47228 192.168.56.110:3306 ESTABLISHED 20411/java
tcp 0 0 192.168.56.100:47972 192.168.56.110:3306 ESTABLISHED 20411/java
tcp 0 0 192.168.56.100:47976 192.168.56.110:3306 ESTABLISHED 20411/java
tcp 0 0 192.168.56.100:47234 192.168.56.110:3306 ESTABLISHED 20411/java
tcp 0 0 192.168.56.100:7182 192.168.56.120:53120 ESTABLISHED 20411/java
tcp 0 0 192.168.56.100:47232 192.168.56.110:3306 ESTABLISHED 20411/java
tcp 0 0 192.168.56.100:47226 192.168.56.110:3306 ESTABLISHED 20411/java
unix 2 [ ] STREAM CONNECTED 81363 20411/java
unix 2 [ ] STREAM CONNECTED 82057 20411/java
[服务] 所有node启动agent。
[root@node01 ~]# /opt/cm-5.14./etc/init.d/cloudera-scm-agent start
Starting cloudera-scm-agent: [ OK ]
登录管理界面,使用admin登录。常见问题如下。
注意下文件夹权限,删除这个cm_guid,重启agent服务就好了。
rm /opt/cm-5.14./lib/cloudera-scm-agent/cm_guid -rf
/opt/cm-5.14./etc/init.d/cloudera-scm-agent restart
如果半途中断,则reboot,关掉防火墙,删掉MySQL中的scm并重新创建,删除之前的分配数据,再开启server & agent。
[root@node01 ~]# hadoop jar /opt/cloudera/parcels/CDH-5.14.2-1.cdh5.14.2.p0.3/jars/hadoop-mapreduce-examples-2.6.0-cdh5.14.2.jar wordcount /test/words /test/output3
19/11/27 11:41:51 INFO client.RMProxy: Connecting to ResourceManager at node03.kaikeba.com/192.168.56.120:8032
19/11/27 11:41:52 INFO input.FileInputFormat: Total input paths to process : 1
19/11/27 11:41:52 INFO mapreduce.JobSubmitter: number of splits:1
19/11/27 11:41:52 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1574760305839_0002
19/11/27 11:41:52 INFO impl.YarnClientImpl: Submitted application application_1574760305839_0002
19/11/27 11:41:52 INFO mapreduce.Job: The url to track the job: http://node03.kaikeba.com:8088/proxy/application_1574760305839_0002/
19/11/27 11:41:52 INFO mapreduce.Job: Running job: job_1574760305839_0002
19/11/27 11:41:58 INFO mapreduce.Job: Job job_1574760305839_0002 running in uber mode : false
19/11/27 11:41:58 INFO mapreduce.Job: map 0% reduce 0%
19/11/27 11:42:03 INFO mapreduce.Job: map 100% reduce 0%
19/11/27 11:42:07 INFO mapreduce.Job: map 100% reduce 100%
19/11/27 11:42:08 INFO mapreduce.Job: Job job_1574760305839_0002 completed successfully
19/11/27 11:42:08 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=45
FILE: Number of bytes written=298317
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=119
HDFS: Number of bytes written=17
HDFS: Number of read operations=6
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=2744
Total time spent by all reduces in occupied slots (ms)=2409
Total time spent by all map tasks (ms)=2744
Total time spent by all reduce tasks (ms)=2409
Total vcore-milliseconds taken by all map tasks=2744
Total vcore-milliseconds taken by all reduce tasks=2409
Total megabyte-milliseconds taken by all map tasks=2809856
Total megabyte-milliseconds taken by all reduce tasks=2466816
Map-Reduce Framework
Map input records=1
Map output records=2
Map output bytes=21
Map output materialized bytes=41
Input split bytes=106
Combine input records=2
Combine output records=2
Reduce input groups=2
Reduce shuffle bytes=41
Reduce input records=2
Reduce output records=2
Spilled Records=4
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=121
CPU time spent (ms)=1290
Physical memory (bytes) snapshot=622190592
Virtual memory (bytes) snapshot=5582626816
Total committed heap usage (bytes)=505413632
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=13
File Output Format Counters
Bytes Written=17
测试 hadoop - wordcount
五、单独安装Spark
可能自带的spark版本过低。
[root@node01 hadoop]# ls
manifest.json
SPARK2-2.1.0.cloudera1-1.cdh5.7.0.p0.120904-el7.parcel
SPARK2-2.1.0.cloudera1-1.cdh5.7.0.p0.120904-el7.parcel.sha1
SPARK2_ON_YARN-2.2.0.cloudera1.jar [root@node01 hadoop]# ls /opt/cloudera/csd/
[root@node01 hadoop]# cp SPARK2_ON_YARN-2.2.0.cloudera1.jar /opt/cloudera/csd/
[root@node01 hadoop]# chown cloudera-scm:cloudera-scm /opt/cloudera/csd/SPARK2_ON_YARN-2.2.0.cloudera1.jar [root@node01 hadoop]# ls /opt/cloudera/csd/
SPARK2_ON_YARN-2.2.0.cloudera1.jar [root@node01 hadoop]# ls /opt/cloudera/parcel-repo/
bap_manifest.json KAFKA-3.1.0-1.3.1.0.p0.35-el7.parcel
CDH-5.14.2-1.cdh5.14.2.p0.3-el7.parcel KAFKA-3.1.0-1.3.1.0.p0.35-el7.parcel.sha
CDH-5.14.2-1.cdh5.14.2.p0.3-el7.parcel.sha KAFKA-3.1.0-1.3.1.0.p0.35-el7.parcel.torrent
CDH-5.14.2-1.cdh5.14.2.p0.3-el7.parcel.torrent manifest.json [root@node01 hadoop]# mv /opt/cloudera/parcel-repo/manifest.json /opt/cloudera/parcel-repo/manifest.json.bak [root@node01 hadoop]# ls /opt/cloudera/parcel-repo/
bap_manifest.json KAFKA-3.1.0-1.3.1.0.p0.35-el7.parcel
CDH-5.14.2-1.cdh5.14.2.p0.3-el7.parcel KAFKA-3.1.0-1.3.1.0.p0.35-el7.parcel.sha
CDH-5.14.2-1.cdh5.14.2.p0.3-el7.parcel.sha KAFKA-3.1.0-1.3.1.0.p0.35-el7.parcel.torrent
CDH-5.14.2-1.cdh5.14.2.p0.3-el7.parcel.torrent manifest.json.bak
[root@node01 hadoop]# cp SPARK2-2.1.0.cloudera1-1.cdh5.7.0.p0.120904-el7.parcel manifest.json /opt/cloudera/parcel-repo/
[root@node01 hadoop]#
先关掉集群,运行如下命令,再启动集群;如此,集群 -> parcel目录下能看到新增的spark2安装包,安装分配激活即可。
[root@node01 hadoop]# /opt/cm-5.14.2/etc/init.d/cloudera-scm-agent restart
Stopping cloudera-scm-agent: [ OK ]
Starting cloudera-scm-agent: [ OK ] [root@node01 hadoop]# /opt/cm-5.14.2/etc/init.d/cloudera-scm-server restart
Stopping cloudera-scm-server: [ OK ]
Starting cloudera-scm-server: [ OK ]
Ref: CDH5.14.4离线安装Spark2.2.0详细步骤
还需要继续配置,依赖hadoop。
# 拷贝文件
cp /opt/cloudera/parcels/CDH/etc/spark/conf.dist/* /opt/cloudera/parcels/SPARK2/etc/spark2/conf.dist/ # 配置spark-env.sh文件
vim /opt/cloudera/parcels/SPARK2/etc/spark2/conf.dist/spark-env.sh
配置文件:/opt/cloudera/parcels/SPARK2/etc/spark2/conf.dist/spark-env.sh
SPARK_DIST_CLASSPATH="$SPARK_DIST_CLASSPATH:$(hadoop classpath)"
HADOOP_CONF_DIR=/etc/hadoop/conf
测试 spark 安装是否成功。
[root@node01 ~]# spark2-submit --deploy-mode client --conf spark.ui.port=4041 --class org.apache.spark.examples.SparkPi /opt/cloudera/parcels/SPARK2-2.1.0.cloudera1-1.cdh5.7.0.p0.120904/lib/spark2/examples/jars/spark-examples_2.11-2.1.0.cloudera1.jar 10
19/11/27 13:56:58 INFO spark.SparkContext: Running Spark version 2.1.0.cloudera1
19/11/27 13:56:58 INFO spark.SecurityManager: Changing view acls to: root,hdfs
19/11/27 13:56:58 INFO spark.SecurityManager: Changing modify acls to: root,hdfs
19/11/27 13:56:58 INFO spark.SecurityManager: Changing view acls groups to:
19/11/27 13:56:58 INFO spark.SecurityManager: Changing modify acls groups to:
19/11/27 13:56:58 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root, hdfs); groups with view permissions: Set(); users with modify permissions: Set(root, hdfs); groups with modify permissions: Set()
19/11/27 13:56:59 INFO util.Utils: Successfully started service 'sparkDriver' on port 42324.
19/11/27 13:56:59 INFO spark.SparkEnv: Registering MapOutputTracker
19/11/27 13:56:59 INFO spark.SparkEnv: Registering BlockManagerMaster
19/11/27 13:56:59 INFO storage.BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
19/11/27 13:56:59 INFO storage.BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
19/11/27 13:56:59 INFO storage.DiskBlockManager: Created local directory at /tmp/blockmgr-006c0a44-497e-4cc6-b3bd-91c133c9fb83
19/11/27 13:56:59 INFO memory.MemoryStore: MemoryStore started with capacity 366.3 MB
19/11/27 13:56:59 INFO spark.SparkEnv: Registering OutputCommitCoordinator
19/11/27 13:56:59 INFO util.log: Logging initialized @2260ms
19/11/27 13:56:59 INFO server.Server: jetty-9.2.z-SNAPSHOT
19/11/27 13:56:59 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@791cbf87{/jobs,null,AVAILABLE}
19/11/27 13:56:59 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@a7e2d9d{/jobs/json,null,AVAILABLE}
19/11/27 13:56:59 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@754777cd{/jobs/job,null,AVAILABLE}
19/11/27 13:56:59 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@2b52c0d6{/jobs/job/json,null,AVAILABLE}
19/11/27 13:56:59 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@372ea2bc{/stages,null,AVAILABLE}
19/11/27 13:56:59 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@4cc76301{/stages/json,null,AVAILABLE}
19/11/27 13:56:59 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@2f08c4b{/stages/stage,null,AVAILABLE}
19/11/27 13:56:59 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3f19b8b3{/stages/stage/json,null,AVAILABLE}
19/11/27 13:56:59 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@7de0c6ae{/stages/pool,null,AVAILABLE}
19/11/27 13:56:59 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@a486d78{/stages/pool/json,null,AVAILABLE}
19/11/27 13:56:59 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@cdc3aae{/storage,null,AVAILABLE}
19/11/27 13:56:59 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@7ef2d7a6{/storage/json,null,AVAILABLE}
19/11/27 13:56:59 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5dcbb60{/storage/rdd,null,AVAILABLE}
19/11/27 13:56:59 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@4c36250e{/storage/rdd/json,null,AVAILABLE}
19/11/27 13:56:59 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@21526f6c{/environment,null,AVAILABLE}
19/11/27 13:56:59 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@49f5c307{/environment/json,null,AVAILABLE}
19/11/27 13:56:59 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@299266e2{/executors,null,AVAILABLE}
19/11/27 13:56:59 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5471388b{/executors/json,null,AVAILABLE}
19/11/27 13:56:59 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@66ea1466{/executors/threadDump,null,AVAILABLE}
19/11/27 13:56:59 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1601e47{/executors/threadDump/json,null,AVAILABLE}
19/11/27 13:56:59 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3bffddff{/static,null,AVAILABLE}
19/11/27 13:56:59 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@66971f6b{/,null,AVAILABLE}
19/11/27 13:56:59 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@50687efb{/api,null,AVAILABLE}
19/11/27 13:56:59 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@517bd097{/jobs/job/kill,null,AVAILABLE}
19/11/27 13:56:59 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@142eef62{/stages/stage/kill,null,AVAILABLE}
19/11/27 13:56:59 INFO server.ServerConnector: Started ServerConnector@e6516e{HTTP/1.1}{0.0.0.0:4041}
19/11/27 13:56:59 INFO server.Server: Started @2384ms
19/11/27 13:56:59 INFO util.Utils: Successfully started service 'SparkUI' on port 4041.
19/11/27 13:56:59 INFO ui.SparkUI: Bound SparkUI to 0.0.0.0, and started at http://192.168.56.100:4041
19/11/27 13:56:59 INFO spark.SparkContext: Added JAR file:/opt/cloudera/parcels/SPARK2-2.1.0.cloudera1-1.cdh5.7.0.p0.120904/lib/spark2/examples/jars/spark-examples_2.11-2.1.0.cloudera1.jar at spark://192.168.56.100:42324/jars/spark-examples_2.11-2.1.0.cloudera1.jar with timestamp 1574823419519
19/11/27 13:56:59 INFO executor.Executor: Starting executor ID driver on host localhost
19/11/27 13:56:59 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 33556.
19/11/27 13:56:59 INFO netty.NettyBlockTransferService: Server created on 192.168.56.100:33556
19/11/27 13:56:59 INFO storage.BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
19/11/27 13:56:59 INFO storage.BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 192.168.56.100, 33556, None)
19/11/27 13:56:59 INFO storage.BlockManagerMasterEndpoint: Registering block manager 192.168.56.100:33556 with 366.3 MB RAM, BlockManagerId(driver, 192.168.56.100, 33556, None)
19/11/27 13:56:59 INFO storage.BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 192.168.56.100, 33556, None)
19/11/27 13:56:59 INFO storage.BlockManager: Initialized BlockManager: BlockManagerId(driver, 192.168.56.100, 33556, None)
19/11/27 13:56:59 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@24e8de5c{/metrics/json,null,AVAILABLE}
19/11/27 13:56:59 INFO internal.SharedState: Warehouse path is 'file:/root/spark-warehouse/'.
19/11/27 13:56:59 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@2f6bcf87{/SQL,null,AVAILABLE}
19/11/27 13:56:59 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@78c7f9b3{/SQL/json,null,AVAILABLE}
19/11/27 13:56:59 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@e93f3d5{/SQL/execution,null,AVAILABLE}
19/11/27 13:56:59 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@7a26928a{/SQL/execution/json,null,AVAILABLE}
19/11/27 13:56:59 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@73eb8672{/static/sql,null,AVAILABLE}
19/11/27 13:57:00 INFO spark.SparkContext: Starting job: reduce at SparkPi.scala:38
19/11/27 13:57:00 INFO scheduler.DAGScheduler: Got job 0 (reduce at SparkPi.scala:38) with 10 output partitions
19/11/27 13:57:00 INFO scheduler.DAGScheduler: Final stage: ResultStage 0 (reduce at SparkPi.scala:38)
19/11/27 13:57:00 INFO scheduler.DAGScheduler: Parents of final stage: List()
19/11/27 13:57:00 INFO scheduler.DAGScheduler: Missing parents: List()
19/11/27 13:57:00 INFO scheduler.DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:34), which has no missing parents
19/11/27 13:57:00 INFO memory.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 1832.0 B, free 366.3 MB)
19/11/27 13:57:00 INFO memory.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 1172.0 B, free 366.3 MB)
19/11/27 13:57:00 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.56.100:33556 (size: 1172.0 B, free: 366.3 MB)
19/11/27 13:57:00 INFO spark.SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:996
19/11/27 13:57:00 INFO scheduler.DAGScheduler: Submitting 10 missing tasks from ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:34)
19/11/27 13:57:00 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 10 tasks
19/11/27 13:57:01 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, executor driver, partition 0, PROCESS_LOCAL, 6036 bytes)
19/11/27 13:57:01 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, localhost, executor driver, partition 1, PROCESS_LOCAL, 6036 bytes)
19/11/27 13:57:01 INFO scheduler.TaskSetManager: Starting task 2.0 in stage 0.0 (TID 2, localhost, executor driver, partition 2, PROCESS_LOCAL, 6036 bytes)
19/11/27 13:57:01 INFO executor.Executor: Running task 0.0 in stage 0.0 (TID 0)
19/11/27 13:57:01 INFO executor.Executor: Running task 1.0 in stage 0.0 (TID 1)
19/11/27 13:57:01 INFO executor.Executor: Fetching spark://192.168.56.100:42324/jars/spark-examples_2.11-2.1.0.cloudera1.jar with timestamp 1574823419519
19/11/27 13:57:01 INFO executor.Executor: Running task 2.0 in stage 0.0 (TID 2)
19/11/27 13:57:01 INFO client.TransportClientFactory: Successfully created connection to /192.168.56.100:42324 after 57 ms (0 ms spent in bootstraps)
19/11/27 13:57:01 INFO util.Utils: Fetching spark://192.168.56.100:42324/jars/spark-examples_2.11-2.1.0.cloudera1.jar to /tmp/spark-185ac05b-8163-4178-b316-b1081fa7cf76/userFiles-bf97f19c-518c-4bb9-aefe-2b67ab152467/fetchFileTemp69449570899073492.tmp
19/11/27 13:57:01 INFO executor.Executor: Adding file:/tmp/spark-185ac05b-8163-4178-b316-b1081fa7cf76/userFiles-bf97f19c-518c-4bb9-aefe-2b67ab152467/spark-examples_2.11-2.1.0.cloudera1.jar to class loader
19/11/27 13:57:01 INFO executor.Executor: Finished task 1.0 in stage 0.0 (TID 1). 1041 bytes result sent to driver
19/11/27 13:57:01 INFO executor.Executor: Finished task 0.0 in stage 0.0 (TID 0). 1041 bytes result sent to driver
19/11/27 13:57:01 INFO executor.Executor: Finished task 2.0 in stage 0.0 (TID 2). 1041 bytes result sent to driver
19/11/27 13:57:01 INFO scheduler.TaskSetManager: Starting task 3.0 in stage 0.0 (TID 3, localhost, executor driver, partition 3, PROCESS_LOCAL, 6036 bytes)
19/11/27 13:57:01 INFO executor.Executor: Running task 3.0 in stage 0.0 (TID 3)
19/11/27 13:57:01 INFO scheduler.TaskSetManager: Starting task 4.0 in stage 0.0 (TID 4, localhost, executor driver, partition 4, PROCESS_LOCAL, 6036 bytes)
19/11/27 13:57:01 INFO executor.Executor: Running task 4.0 in stage 0.0 (TID 4)
19/11/27 13:57:01 INFO executor.Executor: Finished task 4.0 in stage 0.0 (TID 4). 1041 bytes result sent to driver
19/11/27 13:57:01 INFO scheduler.TaskSetManager: Starting task 5.0 in stage 0.0 (TID 5, localhost, executor driver, partition 5, PROCESS_LOCAL, 6036 bytes)
19/11/27 13:57:01 INFO executor.Executor: Finished task 3.0 in stage 0.0 (TID 3). 1041 bytes result sent to driver
19/11/27 13:57:01 INFO executor.Executor: Running task 5.0 in stage 0.0 (TID 5)
19/11/27 13:57:01 INFO scheduler.TaskSetManager: Starting task 6.0 in stage 0.0 (TID 6, localhost, executor driver, partition 6, PROCESS_LOCAL, 6036 bytes)
19/11/27 13:57:01 INFO scheduler.TaskSetManager: Starting task 7.0 in stage 0.0 (TID 7, localhost, executor driver, partition 7, PROCESS_LOCAL, 6036 bytes)
19/11/27 13:57:01 INFO executor.Executor: Running task 7.0 in stage 0.0 (TID 7)
19/11/27 13:57:01 INFO executor.Executor: Finished task 5.0 in stage 0.0 (TID 5). 1041 bytes result sent to driver
19/11/27 13:57:01 INFO executor.Executor: Running task 6.0 in stage 0.0 (TID 6)
19/11/27 13:57:01 INFO executor.Executor: Finished task 6.0 in stage 0.0 (TID 6). 1041 bytes result sent to driver
19/11/27 13:57:01 INFO executor.Executor: Finished task 7.0 in stage 0.0 (TID 7). 1041 bytes result sent to driver
19/11/27 13:57:01 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 629 ms on localhost (executor driver) (1/10)
19/11/27 13:57:01 INFO scheduler.TaskSetManager: Starting task 8.0 in stage 0.0 (TID 8, localhost, executor driver, partition 8, PROCESS_LOCAL, 6036 bytes)
19/11/27 13:57:01 INFO scheduler.TaskSetManager: Starting task 9.0 in stage 0.0 (TID 9, localhost, executor driver, partition 9, PROCESS_LOCAL, 6036 bytes)
19/11/27 13:57:01 INFO executor.Executor: Running task 8.0 in stage 0.0 (TID 8)
19/11/27 13:57:01 INFO executor.Executor: Running task 9.0 in stage 0.0 (TID 9)
19/11/27 13:57:01 INFO executor.Executor: Finished task 9.0 in stage 0.0 (TID 9). 1041 bytes result sent to driver
19/11/27 13:57:01 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 759 ms on localhost (executor driver) (2/10)
19/11/27 13:57:01 INFO scheduler.TaskSetManager: Finished task 5.0 in stage 0.0 (TID 5) in 129 ms on localhost (executor driver) (3/10)
19/11/27 13:57:01 INFO scheduler.TaskSetManager: Finished task 6.0 in stage 0.0 (TID 6) in 105 ms on localhost (executor driver) (4/10)
19/11/27 13:57:01 INFO scheduler.TaskSetManager: Finished task 7.0 in stage 0.0 (TID 7) in 104 ms on localhost (executor driver) (5/10)
19/11/27 13:57:01 INFO scheduler.TaskSetManager: Finished task 9.0 in stage 0.0 (TID 9) in 52 ms on localhost (executor driver) (6/10)
19/11/27 13:57:01 INFO executor.Executor: Finished task 8.0 in stage 0.0 (TID 8). 1041 bytes result sent to driver
19/11/27 13:57:01 INFO scheduler.TaskSetManager: Finished task 2.0 in stage 0.0 (TID 2) in 721 ms on localhost (executor driver) (7/10)
19/11/27 13:57:01 INFO scheduler.TaskSetManager: Finished task 4.0 in stage 0.0 (TID 4) in 149 ms on localhost (executor driver) (8/10)
19/11/27 13:57:01 INFO scheduler.TaskSetManager: Finished task 3.0 in stage 0.0 (TID 3) in 168 ms on localhost (executor driver) (9/10)
19/11/27 13:57:01 INFO scheduler.TaskSetManager: Finished task 8.0 in stage 0.0 (TID 8) in 59 ms on localhost (executor driver) (10/10)
19/11/27 13:57:01 INFO scheduler.DAGScheduler: ResultStage 0 (reduce at SparkPi.scala:38) finished in 0.801 s
19/11/27 13:57:01 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
19/11/27 13:57:01 INFO scheduler.DAGScheduler: Job 0 finished: reduce at SparkPi.scala:38, took 1.254717 s
Pi is roughly 3.1406751406751408
19/11/27 13:57:01 INFO server.ServerConnector: Stopped ServerConnector@e6516e{HTTP/1.1}{0.0.0.0:4041}
19/11/27 13:57:01 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@142eef62{/stages/stage/kill,null,UNAVAILABLE}
19/11/27 13:57:01 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@517bd097{/jobs/job/kill,null,UNAVAILABLE}
19/11/27 13:57:01 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@50687efb{/api,null,UNAVAILABLE}
19/11/27 13:57:01 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@66971f6b{/,null,UNAVAILABLE}
19/11/27 13:57:01 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@3bffddff{/static,null,UNAVAILABLE}
19/11/27 13:57:01 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@1601e47{/executors/threadDump/json,null,UNAVAILABLE}
19/11/27 13:57:01 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@66ea1466{/executors/threadDump,null,UNAVAILABLE}
19/11/27 13:57:01 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@5471388b{/executors/json,null,UNAVAILABLE}
19/11/27 13:57:01 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@299266e2{/executors,null,UNAVAILABLE}
19/11/27 13:57:01 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@49f5c307{/environment/json,null,UNAVAILABLE}
19/11/27 13:57:01 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@21526f6c{/environment,null,UNAVAILABLE}
19/11/27 13:57:01 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@4c36250e{/storage/rdd/json,null,UNAVAILABLE}
19/11/27 13:57:01 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@5dcbb60{/storage/rdd,null,UNAVAILABLE}
19/11/27 13:57:01 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@7ef2d7a6{/storage/json,null,UNAVAILABLE}
19/11/27 13:57:01 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@cdc3aae{/storage,null,UNAVAILABLE}
19/11/27 13:57:01 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@a486d78{/stages/pool/json,null,UNAVAILABLE}
19/11/27 13:57:01 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@7de0c6ae{/stages/pool,null,UNAVAILABLE}
19/11/27 13:57:01 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@3f19b8b3{/stages/stage/json,null,UNAVAILABLE}
19/11/27 13:57:01 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@2f08c4b{/stages/stage,null,UNAVAILABLE}
19/11/27 13:57:01 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@4cc76301{/stages/json,null,UNAVAILABLE}
19/11/27 13:57:01 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@372ea2bc{/stages,null,UNAVAILABLE}
19/11/27 13:57:01 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@2b52c0d6{/jobs/job/json,null,UNAVAILABLE}
19/11/27 13:57:01 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@754777cd{/jobs/job,null,UNAVAILABLE}
19/11/27 13:57:01 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@a7e2d9d{/jobs/json,null,UNAVAILABLE}
19/11/27 13:57:01 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@791cbf87{/jobs,null,UNAVAILABLE}
19/11/27 13:57:01 INFO ui.SparkUI: Stopped Spark web UI at http://192.168.56.100:4041
19/11/27 13:57:01 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
19/11/27 13:57:01 INFO memory.MemoryStore: MemoryStore cleared
19/11/27 13:57:01 INFO storage.BlockManager: BlockManager stopped
19/11/27 13:57:02 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
19/11/27 13:57:02 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
19/11/27 13:57:02 INFO spark.SparkContext: Successfully stopped SparkContext
19/11/27 13:57:02 INFO util.ShutdownHookManager: Shutdown hook called
19/11/27 13:57:02 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-185ac05b-8163-4178-b316-b1081fa7cf76
spark test log
六、图形管理界面 Kafka Manager
Download: https://github.com/yahoo/kafka-manager/archive/1.3.1.6.zip
Goto: kafka-manager的安装与配置
启动后登录:http://192.168.56.100:8080/,设置cluster name, Cluster Zookeeper Hosts后,保存即可。图形化管理Kafka配置。
[root@node01 kafka-manager-1.3.1.6]# ls
bin conf lib README.md share [root@node01 kafka-manager-1.3.1.6]# nohup bin/kafka-manager -Dconfig.file=conf/application.conf -Dhttp.port=8080 &
[1] 4115
[root@node01 kafka-manager-1.3.1.6]# nohup: ignoring input and appending output to ‘nohup.out’ [root@node01 kafka-manager-1.3.1.6]#
[root@node01 kafka-manager-1.3.1.6]# netstat -ano|grep 8080
tcp6 0 0 :::8080 :::* LISTEN off (0.00/0/0)
七、分布式缓存 Redis
Ref: 同为分布式缓存,为何Redis更胜一筹?
Ref: 为什么分布式一定要有redis?
先编译,然后将编译结果拷贝到 /usr/local 下面。
make
make test mkdir -p /usr/local/redis/bin
mkdir -p /usr/local/redis/etc
cd ./src
cp redis-cli redis-server mkreleasehdr.sh redis-check-aof redis-check-dump redis-benchmark /usr/local/redis/bin
cp ../redis.conf /usr/local/redis/etc
编辑 redis.conf文件:
将daemonize选项由no置为yes,使redis能后台运行。
注释掉bind 127.0.0.1,将它改为bind 0.0.0.0, protected-mode yes 改为 protected-mode no (这个3.2版本以后才有)。
[root@node01 bin]#
[root@node01 bin]#
[root@node01 bin]# pwd
/usr/local/redis/bin [root@node01 bin]# ./redis-server ../etc/redis.conf
31444:C 27 Nov 10:48:10.708 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
31444:C 27 Nov 10:48:10.708 # Redis version=4.0.11, bits=64, commit=00000000, modified=0, pid=31444, just started
31444:C 27 Nov 10:48:10.708 # Configuration loaded
如果还是连不上,那肯定是你的服务器后台安全组设置没有吧6379放行;
systemctl stop firewalld.service
systemctl stop iptables.service
End.