运行模式_Standalone模式安装一-【官方】百战程序员_IT在线教育培训机构

Standalone集群是Spark自带的资源调度框架，支持分布式搭建，这里建议搭建Standalone节点数为3台，1台master节点，3台worker节点，这虚拟机中每台节点的内存至少给2G和2个core，这样才能保证后期Spark基于Standalone的正常运行。node1上配置好后，同步到node2和node3上。搭建Standalone集群的步骤如下：


xxxxxxxxxx
[root@node1 ~]# cd $SPARK_HOME/conf
[root@node1 conf]# pwd
/opt/spark-3.2.1/conf

将workers.template改为workers


xxxxxxxxxx
[root@node1 conf]# mv workers.template workers

配置workers文件


xxxxxxxxxx
[root@node1 conf]# vim workers 
# Shift+G到文件结尾处，删除localhost, 追加
node1
node2
node3
# :wq 保存并退出

将spark-env.sh.template改名为spark-env.sh


xxxxxxxxxx
[root@node1 conf]# mv spark-env.sh.template spark-env.sh

修改spark_env.sh文件


xxxxxxxxxx
[root@node1 conf]# vim spark-env.sh
#在底部追加如下内容
#设置JAVA安装目录
JAVA_HOME=/usr/java/default
#HADOOP软件配置文件目录，读取HDFS上文件和运行YARN集群
HADOOP_CONF_DIR=/opt/hadoop-3.1.3/etc/hadoop
YARN_CONF_DIR=/opt/hadoop-3.1.3/etc/hadoop
# master运行在哪个机器上
export SPARK_MASTER_HOST=node1
# master的通讯端口
export SPARK_MASTER_PORT=7077
# master的 webui端口
SPARK_MASTER_WEBUI_PORT=8080

# worker cpu可用核数
SPARK_WORKER_CORES=1
# worker可用内存
SPARK_WORKER_MEMORY=1g
# worker的工作通讯地址
SPARK_WORKER_PORT=7078
# worker的 webui地址
SPARK_WORKER_WEBUI_PORT=8081

# 将spark程序运行的历史日志存到hdfs的/sparklogs文件夹中
SPARK_HISTORY_OPTS="-Dspark.history.ui.port=18080
-Dspark.history.fs.logDirectory=hdfs://mycluster/sparklogs/ 
-Dspark.history.retainedApplications=30
-Dspark.history.fs.cleaner.enabled=true"

ui.port：WEB UI访问的端口号为18080

fs.logDirectory：指定历史服务器日志存储路径

retainedApplications：指定保存Application历史记录的个数，如果超过这个值，旧的应用程序信息将被删除，这个是内存中的应用数，而不是页面上显示的应用数。

cleaner.enabled=true:表示运行删除历史日志。

将spark-defaults.conf.template改名为spark-defaults.conf


xxxxxxxxxx
[root@node1 conf]# mv spark-defaults.conf.template spark-defaults.conf

配置spark-defaults.conf


xxxxxxxxxx
[root@node1 conf]# vim spark-defaults.conf
# 追加如下内容
# 开启spark的日志记录功能
spark.eventLog.enabled true
# 设置spark日志记录的路径
spark.eventLog.dir hdfs://mycluster/sparklogs/ 
# 设置spark日志是否启动压缩
spark.eventLog.compress true

启动hadoop集群(如果已经启动了，忽略此步)：startha.sh
HDFS上创建程序运行历史记录存放的文件夹，并修改权限。


xxxxxxxxxx
[root@node1 conf]# hdfs dfs -mkdir /sparklogs
[root@node1 conf]# hdfs dfs -chmod 777 /sparklogs

将log4j.properties.template改名为log4j.properties


xxxxxxxxxx
[root@node1 conf]# mv log4j.properties.template log4j.properties

配置log4j.properties 文件


xxxxxxxxxx
# Set everything to be logged to the console
log4j.rootCategory=WARN, console
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{MM/dd HH:mm:ss} %p %c{1}: %m%n

将Spark安装文件夹分发到node2和node3


xxxxxxxxxx
[root@node1 conf]# cd /opt
[root@node1 opt]# scp -r spark-3.2.1/ node2:/opt/
[root@node1 opt]# scp -r spark-3.2.1/ node3:/opt/

node2和node3配置环境变量,并使之生效


xxxxxxxxxx
[root@node2 ~]# vim /etc/profile
# Spark环境变量配置
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export SPARK_HOME=/opt/spark-3.2.1
export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin
[root@node2 ~]# source /etc/profile

运行模式_Standalone架构分析运行模式_Standalone模式安装二

北京市昌平区回龙观镇南店村综合商业楼2楼226室