反钓鱼网站建设期,网络营销专员岗位职责,云建站公司,网站设计思路方案高可用HaAdmin使用概览使用说明checkHealth查看NameNode的状态所有NN的服务状态查询指定NN的服务状态failovertransitionToActive概览
HDFS高可用特性解决了集群单点故障问题#xff0c;通过提供了两个冗余的NameNode以主动或被动的方式用于热备#xff0c;使得集群既可以从…
高可用HaAdmin使用概览使用说明checkHealth查看NameNode的状态所有NN的服务状态查询指定NN的服务状态failovertransitionToActive概览
HDFS高可用特性解决了集群单点故障问题通过提供了两个冗余的NameNode以主动或被动的方式用于热备使得集群既可以从机器宕机中快速恢复也可以优雅的在有计划的维护时快速恢复。
使用说明
此命令调用的是 org.apache.hadoop.hdfs.tools.DFSHAAdmin 类
hdfs haadmin -transitionToActive serviceId [--forceactive]hdfs haadmin -transitionToStandby serviceIdhdfs haadmin -transitionToObserver serviceIdhdfs haadmin -failover [--forcefence] [--forceactive] serviceId serviceIdhdfs haadmin -getServiceState serviceIdhdfs haadmin -getAllServiceStatehdfs haadmin -checkHealth serviceIdhdfs haadmin -help commandcheckHealth
监测NN健康状态类似心跳检测一下判断服务时否正常
如下 nn1 服务异常时 # 开启健康监测
[roothadoop-1 hadoop-3.3.1]# bin/hdfs haadmin -checkHealth nn1
2023-03-11 09:06:16,517 INFO ipc.Client: Retrying connect to server: hadoop-1/192.168.1.1:8020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries1, sleepTime1000 MILLISECONDS)
Operation failed: Call From hadoop-client.local/192.168.1.100 to hadoop-1:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused如下 nn2 服务正常时 # 开启健康监测 ,无任何异常就是正常
[roothadoop-1 hadoop-3.3.1]# bin/hdfs haadmin -checkHealth nn2查看NameNode的状态
所有NN的服务状态
执行hdfs haadmin -getAllServiceState命令返回所有NameNode的高可用状态。
[roothadoop-1 hadoop-3.3.1]# bin/hdfs haadmin -getAllServiceState
hadoop-1:8020 standby
hadoop-3:8020 active
查询指定NN的服务状态
执行hdfs haadmin -getServiceState serviceId命令返回active或者standby。
[roothadoop-1 hadoop-3.3.1]# bin/hdfs haadmin -getServiceState nn1
standby
[roothadoop-1 hadoop-3.3.1]# bin/hdfs haadmin -getServiceState nn2
active
failover
切换NameNode的主备状态一般推荐用此方式来切换主备
执行hdfs haadmin -failover serviceId of current active serviceId of new active命令切换NameNode的主备状态。
例如nn1当前是Active NameNode想让nn2成为新的Active NameNode可执行以下命令。如果nn2当前已是Active NameNode执行以下命令后nn2仍为新的Active NameNode。
[roothadoop-1 hadoop-3.3.1]# bin/hdfs haadmin -getAllServiceState
hadoop-1:8020 standby
hadoop-3:8020 active# 将 nn1 变为 主
[roothadoop-1 hadoop-3.3.1]# bin/hdfs haadmin -failover nn2 nn1
Failover to NameNode at /192.168.1.1:8020 successful
[roothadoop-1 hadoop-3.3.1]# bin/hdfs haadmin -getAllServiceState
spark-31:8020 active
spark-33:8020 standby# 将 nn2 变为 主
[roothadoop-1 hadoop-3.3.1]# bin/hdfs haadmin -failover nn1 nn2
Failover to NameNode at /192.168.1.3:8020 successful
[roothadoop-1 hadoop-3.3.1]# bin/hdfs haadmin -getAllServiceState
spark-31:8020 standby
spark-33:8020 activetransitionToActive
将给定的NameNode切换成主不会做fencing和failover有区别的点
当开启了故障自动切换failoverdfs.ha.automatic-failover.enabledtrue之后无法手动进行。想要 transitionToActive 切换主就需要 带上 强制手动的标志 --forcemanual
[roothadoop-1 hadoop-3.3.1]# bin/hdfs haadmin -transitionToActive nn1
Automatic failover is enabled for NameNode at /192.168.1.3:8020
Refusing to manually manage HA state, since it may cause
a split-brain scenario or other incorrect state.
If you are very sure you know what you are doing, please
specify the --forcemanual flag.此时 nn1: standby nn2: active
[roothadoop-1 hadoop-3.3.1]# bin/hdfs haadmin -getAllServiceState
hadoop-1:8020 standby
hadoop-3:8020 active[roothadoop-1 hadoop-3.3.1]# bin/hdfs haadmin -transitionToActive --forcemanual nn1
You have specified the --forcemanual flag. This flag is dangerous, as it can induce a split-brain scenario that WILL CORRUPT your HDFS namespace, possibly irrecoverably.It is recommended not to use this flag, but instead to shut down the cluster and disable automatic failover if you prefer to manually manage your HA state.You may abort safely by answering n or hitting ^C now.Are you sure you want to continue? (Y or N) y
2023-03-11 10:05:09,570 WARN ha.HAAdmin: Proceeding with manual HA state management even though
automatic failover is enabled for NameNode at /192.168.1.1:8020
transitionToActive: Node nn2 is already active
Usage: haadmin [-ns nameserviceId] [-transitionToActive [--forceactive] serviceId]此时提示的是 nn2 已经是 active切换不起作用
当active节点正常时使用hdfs haadmin -transitionToActive命令对两个namenode节点切换都不起作用.
此时试试将 active 状态切换成 standby
[roothadoop-1 hadoop-3.3.1]# bin/hdfs haadmin -getAllServiceState
hadoop-1:8020 standby
hadoop-3:8020 active[roothadoop-1 hadoop-3.3.1]# bin/hdfs haadmin -transitionToStandby --forcemanual nn2
You have specified the --forcemanual flag. This flag is dangerous, as it can induce a split-brain scenario that WILL CORRUPT your HDFS namespace, possibly irrecoverably.It is recommended not to use this flag, but instead to shut down the cluster and disable automatic failover if you prefer to manually manage your HA state.You may abort safely by answering n or hitting ^C now.Are you sure you want to continue? (Y or N) y
2023-03-11 10:09:40,129 WARN ha.HAAdmin: Proceeding with manual HA state management even though
automatic failover is enabled for NameNode at /192.168.1.3:8020[roothadoop-1 hadoop-3.3.1]# bin/hdfs haadmin -getAllServiceState
hadoop-1:8020 active
hadoop-3:8020 standby此时提示的是 nn2 已经是 standby切换生效
当active节点正常时执行hdfs haadmin -transitionToStandby命令可以将active的namenode节点转换成standby状态。