做公司 网站,注册域名用个人还是公司好,网页版梦幻西游地宫迷阵攻略,国内免费云主机数据库管理244期 2024-09-28 数据库管理-第244期 一次无法switchover的故障处理#xff08;20240928#xff09;1 问题展现2 问题排查与处理2.1 问题12.2 问题2 3 问题分析4 总结 数据库管理-第244期 一次无法switchover的故障处理#xff08;20240928#xff09; 作者202409281 问题展现2 问题排查与处理2.1 问题12.2 问题2 3 问题分析4 总结 数据库管理-第244期 一次无法switchover的故障处理20240928 作者胖头鱼的鱼缸尹海文 Oracle ACE Pro: DatabaseOracle与MySQL PostgreSQL ACE Partner 10年数据库行业经验现主要从事数据库服务工作 拥有OCM 11g/12c/19c、MySQL 8.0 OCP、Exadata、CDP等认证 墨天轮MVP、年度墨力之星ITPUB认证专家、专家百人团成员OCM讲师PolarDB开源社区技术顾问HaloDB外聘技术顾问OceanBase观察团成员青学会MOP技术社区青年数据库学习互助会技术顾问 圈内拥有“总监”、“保安”、“国产数据库最大敌人”等称号非著名社恐社交恐怖分子 公众号胖头鱼的鱼缸CSDN胖头鱼的鱼缸尹海文墨天轮胖头鱼的鱼缸ITPUByhw1809。 除授权转载并标明出处外均为“非法”抄袭 中秋前做了一次数据库的倒换演练结果发现无法switchover本期就来跟随总监一步一步的寻找并解决问题同时感谢SR支持。
1 问题展现
在DGMGRL中进行switchover的时候出现了下面的问题
DGMGRL switchover to dbdg;
Performing switchover NOW, please wait...
Error: ORA-16775: target standby database in broker operation has potential data lossFailed.
Unable to switchover, primary database is still dbaas2 问题排查与处理
2.1 问题1
第一个问题呢是在DGMGRL中show configuration:
DGMGRL show configurationConfiguration - dgProtection Mode: MaxPerformanceMembers:dbaas - Primary databasedbdg - Physical standby databaseError: ORA-16664: unable to receive the result from a memberFast-Start Failover: DisabledConfiguration Status:
ERROR (status updated 34 seconds ago)但是使用show configuration verbose则是显示正常查看主备库也是正常的
DGMGRL show configuration verboseConfiguration - dgProtection Mode: MaxPerformanceMembers:dbaas - Primary databasedbdg - Physical standby database Properties:FastStartFailoverThreshold 30OperationTimeout 30TraceLevel SUPPORTFastStartFailoverLagLimit 30CommunicationTimeout 180ObserverReconnect 0FastStartFailoverAutoReinstate TRUEFastStartFailoverPmyShutdown TRUEBystandersFollowRoleChange ALLObserverOverride FALSEExternalDestination1 ExternalDestination2 PrimaryLostWriteAction CONTINUEConfigurationWideServiceName dbaas_CFGFast-Start Failover: DisabledConfiguration Status:
SUCCESSDGMGRL show database dbaasDatabase - dbaasRole: PRIMARYIntended State: TRANSPORT-ONInstance(s):dbaas1dbaas2Database Status:
SUCCESSDGMGRL show database dbdg
Database - dbdgRole: PHYSICAL STANDBYIntended State: APPLY-ONTransport Lag: 0 seconds (computed 0 seconds ago)Apply Lag: 0 seconds (computed 0 seconds ago)Average Apply Rate: 61.30 MByte/sReal Time Query: ONInstance(s):dbdg1dbdg2 (apply instance)dbdg3dbdg4Database Status:
SUCCESS最终发现是数据库本身是配置了db_domain的而tnsname和静态监听中未配置domain即域名后缀如xxx.com随机调整监听和tnsname 监听调整以主库实例1和listener_scan1为例
SID_LIST_LISTENER (SID_LIST (SID_DESC (GLOBAL_DBNAME dbaas.scmcc.com) #增加domain(ORACLE_HOME /u01/app/oracle/product/19.0.0.0/dbhome_1)(SID_NAME dbaas1))(SID_DESC (GLOBAL_DBNAME dbaas_DGMGRL.scmcc.com) #增加domain(ORACLE_HOME /u01/app/oracle/product/19.0.0.0/dbhome_1)(SID_NAME dbaas1)))
SID_LIST_LISTENER_SCAN1 (SID_LIST (SID_DESC (GLOBAL_DBNAME dbaas.scmcc.com) #增加domain(ORACLE_HOME /u01/app/oracle/product/19.0.0.0/dbhome_1)(SID_NAME dbaas1))(SID_DESC (GLOBAL_DBNAME dbaas_DGMGRL.scmcc.com) #增加domain(ORACLE_HOME /u01/app/oracle/product/19.0.0.0/dbhome_1)(SID_NAME dbaas1)))完成后reload所有监听。 tnsname调整
DBAAS (DESCRIPTION (ADDRESS (PROTOCOL TCP)(HOST primary-scan)(PORT 1521))(CONNECT_DATA (SERVER DEDICATED)(SERVICE_NAME dbaas.xxx.com) #增加domain))DBDG (DESCRIPTION (ADDRESS (PROTOCOL TCP)(HOST standby-scan)(PORT 1521))(CONNECT_DATA (SERVER DEDICATED)(SERVICE_NAME dbdg.xxx.com) #增加domain))完成后show configuration恢复正常
DGMGRL show configurationConfiguration - dgProtection Mode: MaxPerformanceMembers:dbaas - Primary databasedbdg - Physical standby database Fast-Start Failover: DisabledConfiguration Status:
SUCCESS (status updated 59 seconds ago)2.2 问题2
接下来尝试switchover报错依然validate备库发现一些问题
DGMGRL validate database verbose dbdgDatabase Role: Physical standby databasePrimary Database: dbaasReady for Switchover: No #---HereReady for Failover: Yes (Primary Running)Flashback Database Status:dbaas: Offdbdg : OffCapacity Information:Database Instances Threads dbaas 2 4 dbdg 4 4 Managed by Clusterware:dbaas: YES dbdg : YES Temporary Tablespace File Information:dbaas TEMP Files: 56dbdg TEMP Files: 67Data file Online Move in Progress:dbaas: Nodbdg: NoStandby Apply-Related Information:Apply State: RunningApply Lag: 0 seconds (computed 0 seconds ago)Apply Delay: 0 minutesTransport-Related Information:Transport On: YesGap Status: Gap #---HereTransport Lag: 0 seconds (computed 0 seconds ago)Transport Status: SuccessLog Files Cleared:dbaas Standby Redo Log Files: Cleareddbdg Online Redo Log Files: Cleareddbdg Standby Redo Log Files: AvailableCurrent Log File Groups Configuration:Thread # Online Redo Log Groups Standby Redo Log Groups Status (dbaas) (dbdg) 2 4 7 Sufficient SRLs1 4 7 Sufficient SRLs3 4 7 Sufficient SRLs4 4 7 Sufficient SRLsFuture Log File Groups Configuration:Thread # Online Redo Log Groups Standby Redo Log Groups Status (dbdg) (dbaas) 2 4 7 Sufficient SRLs1 4 7 Sufficient SRLs3 4 7 Sufficient SRLs4 4 7 Sufficient SRLsCurrent Configuration Log File Sizes:Thread # Smallest Online Redo Smallest Standby Redo Log File Size Log File Size (dbaas) (dbdg) 2 2048 MBytes 2048 MBytes 1 2048 MBytes 2048 MBytes 3 2048 MBytes 2048 MBytes 4 2048 MBytes 2048 MBytes Future Configuration Log File Sizes:Thread # Smallest Online Redo Smallest Standby Redo Log File Size Log File Size (dbdg) (dbaas) 2 2048 MBytes 2048 MBytes 1 2048 MBytes 2048 MBytes 3 2048 MBytes 2048 MBytes 4 2048 MBytes 2048 MBytes Apply-Related Property Settings:Property dbaas Value dbdg ValueDelayMins 0 0ApplyParallel AUTO AUTOApplyInstances 0 0Transport-Related Property Settings:Property dbaas Value dbdg ValueLogShipping ON ONLogXptMode sync syncDependency empty emptyDelayMins 0 0Binding optional optionalMaxFailure 0 0ReopenSecs 300 300NetTimeout 30 30RedoCompression DISABLE DISABLE仍然显示无法switchover且存在GAP但是通过数据库查询发现并未出现GAP查询语句如下结果略
-- primary database
set markup HTML on
spool /tmp/primary_info.html
ALTER SESSION SET NLS_DATE_FORMAT DD-MON-YYYY HH24:MI:SS;select thread#, max(sequence#) Last Primary Seq Generated
from gv$archived_log val, gv$database vdb
where val.resetlogs_change# vdb.resetlogs_change#
group by thread# order by 1;SELECT thread#, dest_id, gvad.status, error, fail_sequence FROM gv$archive_dest gvad, gv$instance gvi WHERE gvad.inst_id gvi.inst_id AND destination is NOT NULL ORDER BY thread#, dest_id;select * from gv$dataguard_stats;SELECT a.thread#, b. last_seq, a.applied_seq, a. last_app_timestamp, b.last_seq-a.applied_seq ARC_DIFF FROM (SELECT thread#, MAX(sequence#) applied_seq, MAX(next_time) last_app_timestamp FROM gv$archived_log WHERE applied YES GROUP BY thread#) a, (SELECT thread#, MAX (sequence#) last_seq FROM gv$archived_log GROUP BY thread#) b WHERE a.thread# b.thread#;spool off
set markup HTML off-- standby database
set markup HTML on
spool /tmp/standby_info.htmlALTER SESSION SET NLS_DATE_FORMAT DD-MON-YYYY HH24:MI:SS;select process,thread#,sequence#,status from gv$managed_standby;select * from v$dataguard_stats;select a.thread#
,a.sequence#
,a.group# grp
, a.bytes/1024/1024 Size_MB
,a.status
,a.archived
,a.first_change# First SCN Number
,to_char(FIRST_TIME,DD-Mon-RR HH24:MI:SS) First SCN Time
,to_char(LAST_TIME,DD-Mon-RR HH24:MI:SS) Last SCN Time from
v$standby_log a order by 1,2,3,4;select thread#, max(sequence#) Last Standby Seq Received
from v$archived_log val, v$database vdb
where val.resetlogs_change# vdb.resetlogs_change#
group by thread# order by 1;select thread#, max(sequence#) Last Standby Seq Applied
from v$archived_log val, v$database vdb
where val.resetlogs_change# vdb.resetlogs_change#
and val.applied in (YES,IN-MEMORY)
group by thread# order by 1;spool off
set markup HTML off主库检查thread:
SQL SELECT thread#, instance, status FROM v$thread;THREAD# INSTANCE STATUS
---------- -------------------- ------1 dbaas1 OPEN2 dbaas2 OPEN3 UNNAMED_INSTANCE_3 CLOSED4 UNNAMED_INSTANCE_4 CLOSED这里后台建议做了一个操作
ALTER DATABASE DISABLE THREAD 3;
ALTER DATABASE DISABLE THREAD 4;运行一段时间后再次validate备库
DGMGRL validate database verbose dbdg
...Ready for Switchover: Yes
...Gap Status: No Gap
...显示可以切换且没有Gap了。目前还没有尝试再次switchover但应该是没问题了。
3 问题分析
这里也是我第一次遇到这个问题应该是一开始因为domain配置引起的元数据问题。关于thread的问题因为主备节点数量不一致但是其他类似配置的库并没有出现过相关问题所以我怀疑还是和domain配置有问题带来的连锁反应。
4 总结
本期处理了一个ADG无法switchover的问题源自于最早的错误配置。 老规矩知道写了些啥。