Oracle 11gR2 硬件导致重新添加节点
一、环境描述:
这是一套五年前部署的双节点单柜11g RAC,当时操作系统盘是一块164g的单盘,没有做RAID。
OS: EnterPrise 5.5 x86_x64
GI : Grid Infrastructure 11.2.0.3x86_x64
DB: Oracle Database EnterPrise 11.2.0.3x86_64
二、问题现象:
在12月24日的时候用户发现节点一主机名为oradb1数据库服务器操作系统键盘鼠标无响应,随即强行重启了数据库服务器。重启后发现服务器无法正常进入操作系统提示需要fsck –A 或 fsck –f 修复文件系统。实际上在两个月之前也遇到了此问题,当时就采取了fsck命令进行修复。两个月后的现在再次出现相同的问题。
三、问题分析:
从问题现象来看可以判断是节点一因为硬盘坏道(硬件故障)导致的操作体系不可用。
四、问题处理思路:
更换故障节点oradb1服务器硬盘并重装系统,重装GI软件和DB软件,基本步骤如下:
1、 重新更换故障节点oradb1服务器两块硬盘并作RAID1,避免操作系统硬盘单点故障。
2、 重新安装故障节点oradb1操作系统,配置网络信息,环境变量,认存储。
3、 删除现存节点oradb2中的故障节点oradb1集群信息,更新ocr和voting。
4、 在现存节点中删除故障节点oradb1实例信息。
5、 配置新节点和现存节点的互信。
6、 部署新节点集群软件,将新节点添加至现存节点中。
7、 部署新节点的数据库软件,将新节点添加至现存节点中。
8、 检查节点状态,测试新节点和老节点的可用性,切换测试。
五、操作步骤:
5.1重新更换oradb1服务器硬盘重做RAID:略
5.2更换故障服务器oradb1硬盘重装系统:略
5.3在现存节点oradb2中删除故障节点oradb1的集群信息:
登陆到现存节点二:
- prudentwoos-MacBook-Pro:/ prudentwoo$ ssh root@172.27.90.159
- The authenticity of host '172.27.90.159 (172.27.90.159)' can't be established.
- RSA key fingerprint is SHA256:M6s9wKplA6iMgmtDwb2bVAMFPRrMh7b5KmahvwF1BNo.
- Are you sure you want to continue connecting (yes/no)? yes
- Warning: Permanently added '172.27.90.159' (RSA) to the list of known hosts.
- root@172.27.90.159's password:
- Last login: Thu Dec 22 22:03:10 2016
六、确认节点状态:
6.1实例状态:
- [root@oradb2 bin]# ./srvctl status database -d tjtvdb
- 实例 tjtvdb1 没有在 oradb1 节点上运行
- 实例 tjtvdb2 正在节点 oradb2 上运行
6.2 节点信息状态:
- [root@oradb2 bin]# ./srvctl status nodeapps
- VIP oradb1-vip 已启用
- VIP oradb1-vip 正在节点上运行: oradb2
- VIP oradb2-vip 已启用
- VIP oradb2-vip 正在节点上运行: oradb2
- 网络已启用
- 网络未在节点上运行: oradb1
- 网络正在节点上运行: oradb2
- GSD 已禁用
- GSD 没有运行的节点: oradb1
- GSD 没有运行的节点: oradb2
- ONS 已启用
- ONS 守护程序未在节点上运行:oradb1
- ONS 守护程序正在节点上运行:oradb2
6.3 CRS状态
- [root@oradb2 bin]# ./crsctl check crs
- CRS-4638: Oracle High Availability Services is online
- CRS-4537: Cluster Ready Services is online
- CRS-4529: Cluster Synchronization Services is online
- CRS-4533: Event Manager is online
6.4 unpin 故障节点:
- [root@oradb2 bin]# ./olsnodes -t -s
- oradb1 Inactive Unpinned
- oradb2 Active Unpinned
6.5 查看实例活动状态:
- [root@oradb2 bin]# su - oracle -c "sqlplus / as sysdba"
-
- SQL*Plus: Release 11.2.0.3.0 Production on Sat Dec 24 17:50:00 2016
- Copyright (c) 1982, 2011, Oracle. All rights reserved.
-
- ???:
- Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
- With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
- Data Mining and Real Application Testing options
-
- SQL> col instance format a15
- SQL> select thread#,status,instance from v$thread
-
- THREAD# STATUS INSTANCE
- ---------- ------ ---------------
- 1 CLOSED tjtvdb1
- 2 OPEN tjtvdb2
-
- SQL>
-
- SQL> select instance_name,status from gv$instance;
-
- INSTANCE_NAME STATUS
- ---------------- ------------
- tjtvdb2 OPEN
-
- SQL>
七、正式开始删除故障节点一:
7.1 unpin故障节点:
- [root@oradb2 bin]# ./crsctl unpin css -n oradb1
7.2 删除数据库
[Silent]
- [oracle@db02 bin]$ dbca -silent -deleteInstance -nodeList oradb1 -gdbName TJTVDB - instanceName labdb11 -sysDBAUserName sys -sysDBAPassword oracle
[DBCA]
这一步一定要选择实例管理后再选择删除实例,千瓦不能选择删除数据库,否则就废了。
至此,实例已经删除完毕。
7.3 查看删除后的实例信息:
- SQL> col instance format a15
- SQL> select thread#,status,instance from v$thread
-
- THREAD# STATUS INSTANCE
- ---------- ------ ---------------
- 2 OPEN tjtvdb2
-
- SQL> select instance_name,status from gv$instance;
-
- INSTANCE_NAME STATUS
- ---------------- ------------
- tjtvdb2 OPEN
7.4 查看数据库情况:
- [root@oradb2 bin]# ./srvctl config database -d tjtvdb
- 数据库唯一名称: tjtvdb
- 数据库名: tjtvdb
- Oracle 主目录: /opt/app/oracle/product/11.2.0/dbhome_1
- Oracle 用户: oracle
- Spfile: +DATAVOL1/tjtvdb/spfiletjtvdb.ora
- 域:
- 启动选项: open
- 停止选项: immediate
- 数据库角色: PRIMARY
- 管理策略: AUTOMATIC
- 服务器池: tjtvdb
- 数据库实例: tjtvdb2
- 磁盘组: DATAVOL1,FRAVOL1
- 装载点路径:
- 服务:
- 类型: RAC
- 数据库是管理员管理的
八、停用节点一的监听并删除
- [root@oradb2 bin]# ./srvctl config listener -a
- 名称: LISTENER
- 网络: 1, 所有者: grid
- 主目录: <CRS home>
- PRCN-2037 : 无法检索 LISTENER 的 Oracle 主目录
- PRCR-1097 : 找不到资源属性: ORACLE_HOME
-
- [root@oradb2 bin]#./srvctl disable listener -l listener -n oradb01
- [root@oradb2 bin]# ./srvctl stop listener -l listener -n oradb1
- PRCC-1017 : LISTENER 已在 oradb1 上停止
- PRCR-1005 : 资源 ora.LISTENER.lsnr 已停止
九、用Oracle用户更新现存节点oradb2的集群列表:
- [oracle@oradb2 bin]$ $ORACLE_HOME/oui/bin/runInstaller -updateNodeList ORACLE_HOME=/opt/app/oracle/product/11.2.0/dbhome_1 "CLUSTER_NODES={oradb2}"
- 正在启动 Oracle Universal Installer...
-
- 检查交换空间: 必须大于 500 MB。 实际为 2920 MB 通过
- The inventory pointer is located at /etc/oraInst.loc
- The inventory is located at /opt/app/oraInventory
- 'UpdateNodeList' 成功。
备注:
ORACLE_HOME=$ORACLE_HOME 如果不写完整路径将会导致更新失败,如果更新失败可以尝试如下操作:
--因为oradb1 已经不存在了,所以手工编辑下oradb2的oraInventory删除节点一的信息:
- [oracle@oradb2 ~]$ cat /opt/app/oraInventory/ContentsXML/inventory.xml
- <?xml version="1.0" standalone="yes" ?>
- <!-- Copyright (c) 1999, 2011, Oracle. All rights reserved. -->
- <!-- Do not modify the contents of this file by hand. -->
- <INVENTORY>
- <VERSION_INFO>
- <SAVED_WITH>11.2.0.3.0</SAVED_WITH>
- <MINIMUM_VER>2.1.0.6.0</MINIMUM_VER>
- </VERSION_INFO>
- <HOME_LIST>
- <HOME NAME="Ora11g_gridinfrahome1" LOC="/opt/app/11.2.0/grid" TYPE="O" IDX="1" CRS="true">
- <NODE_LIST>
- <NODE NAME="oradb1"/>
- <NODE NAME="oradb2"/>
- </NODE_LIST>
- </HOME>
- <HOME NAME="OraDb11g_home1" LOC="/opt/app/oracle/product/11.2.0/dbhome_1" TYPE="O" IDX="2">
- <NODE_LIST>
- <NODE NAME="oradb1"/>
- <NODE NAME="oradb2"/>
- </NODE_LIST>
- </HOME>
- </HOME_LIST>
- <COMPOSITEHOME_LIST>
- </COMPOSITEHOME_LIST>
- </INVENTORY>
-
-
- 删除如上所有节点一的信息,修改为如下:
- [oracle@oradb2 ~]$ cat /opt/app/oraInventory/ContentsXML/inventory.xml
- <?xml version="1.0" standalone="yes" ?>
- <!-- Copyright (c) 1999, 2011, Oracle. All rights reserved. -->
- <!-- Do not modify the contents of this file by hand. -->
- <INVENTORY>
- <VERSION_INFO>
- <SAVED_WITH>11.2.0.3.0</SAVED_WITH>
- <MINIMUM_VER>2.1.0.6.0</MINIMUM_VER>
- </VERSION_INFO>
- <HOME_LIST>
- <HOME NAME="Ora11g_gridinfrahome1" LOC="/opt/app/11.2.0/grid" TYPE="O" IDX="1" CRS="true">
- <NODE_LIST>
- <NODE NAME="oradb2"/>
- </NODE_LIST>
- </HOME>
- <HOME NAME="OraDb11g_home1" LOC="/opt/app/oracle/product/11.2.0/dbhome_1" TYPE="O" IDX="2">
- <NODE_LIST>
- <NODE NAME="oradb2"/>
- </NODE_LIST>
- </HOME>
- </HOME_LIST>
- <COMPOSITEHOME_LIST>
- </COMPOSITEHOME_LIST>
- </INVENTORY>
九、删除故障节点VIP:
#查看vip信息:
- [root@oradb2 bin]# ./crsctl status res -t |grep oradb1
- ora.oradb1.vip
- [root@oradb2 bin]# ./crs_stat -t |grep oradb1
- ora.oradb1.vip ora....t1.type ONLINE ONLINE oradb2
#停止vip:
- [root@oradb2 bin]# ./crs_stop -f ora.oradb1.vip
- Attempting to stop `ora.oradb1.vip` on member `oradb2`
- Stop of `ora.oradb1.vip` on member `oradb2` succeeded.
- [root@oradb2 bin]# ./crs_stat -t |grep oradb1
- ora.oradb1.vip ora....t1.type OFFLINE OFFLINE
#删除vip
- [root@oradb2 bin]# ./srvctl remove vip -i ora.oradb1.vip -f
- PRKO-2313 : VIP ora.oradb1.vip 不存在。
- [root@oradb2 bin]# ./crsctl delete resource ora.oradb1.vip -f
- [root@oradb2 bin]# ./crs_stat -t |grep oradb1
十、删除故障节点:
#查看节点信息
- [root@oradb2 bin]# ./olsnodes -t -s
- oradb1 Inactive Unpinned
- oradb2 Active Unpinned
#删除故障节点
- [root@oradb2 bin]# ./crsctl delete node -n oradb1
- CRS-4661: Node oradb1 successfully deleted.
#查看删除后的故障节点信息
- [root@oradb2 bin]# ./olsnodes -t -s
- oradb2 Active Unpinned
十一、验证节点是否已被删除:
#cluvfy验证
- [grid@oradb2 ~]$ cluvfy stage -post nodedel -n oradb1
-
- 执行 删除节点 的后期检查
- 正在检查 CRS 完整性...
- 集群件版本一致性测试已通过
- CRS 完整性检查已通过
- 删除节点检查通过
- 删除节点 的后期检查成功。
#crs_stat 验证
- [grid@oradb2 ~]$ crs_stat -t
- Name Type Target State Host
- ------------------------------------------------------------
- ora.DATACRS.dg ora....up.type ONLINE ONLINE oradb2
- ora....VOL1.dg ora....up.type ONLINE ONLINE oradb2
- ora.FRAVOL1.dg ora....up.type ONLINE ONLINE oradb2
- ora....ER.lsnr ora....er.type ONLINE ONLINE oradb2
- ora....N1.lsnr ora....er.type ONLINE ONLINE oradb2
- ora.asm ora.asm.type ONLINE ONLINE oradb2
- ora.cvu ora.cvu.type ONLINE ONLINE oradb2
- ora.gsd ora.gsd.type OFFLINE OFFLINE
- ora....network ora....rk.type ONLINE ONLINE oradb2
- ora.oc4j ora.oc4j.type ONLINE ONLINE oradb2
- ora.ons ora.ons.type ONLINE ONLINE oradb2
- ora....SM2.asm application ONLINE ONLINE oradb2
- ora....B2.lsnr application ONLINE ONLINE oradb2
- ora.oradb2.gsd application OFFLINE OFFLINE
- ora.oradb2.ons application ONLINE ONLINE oradb2
- ora.oradb2.vip ora....t1.type ONLINE ONLINE oradb2
- ora....ry.acfs ora....fs.type ONLINE ONLINE oradb2
- ora.scan1.vip ora....ip.type ONLINE ONLINE oradb2
- ora.tjtvdb.db ora....se.type ONLINE ONLINE oradb2
- [grid@oradb2 ~]$ crsctl status res -t
- --------------------------------------------------------------------------------
- NAME TARGET STATE SERVER STATE_DETAILS
- --------------------------------------------------------------------------------
- Local Resources
- --------------------------------------------------------------------------------
- ora.DATACRS.dg
- ONLINE ONLINE oradb2
- ora.DATAVOL1.dg
- ONLINE ONLINE oradb2
- ora.FRAVOL1.dg
- ONLINE ONLINE oradb2
- ora.LISTENER.lsnr
- ONLINE ONLINE oradb2
- ora.asm
- ONLINE ONLINE oradb2 Started
- ora.gsd
- OFFLINE OFFLINE oradb2
- ora.net1.network
- ONLINE ONLINE oradb2
- ora.ons
- ONLINE ONLINE oradb2
- ora.registry.acfs
- ONLINE ONLINE oradb2
- --------------------------------------------------------------------------------
- Cluster Resources
- --------------------------------------------------------------------------------
- ora.LISTENER_SCAN1.lsnr
- 1 ONLINE ONLINE oradb2
- ora.cvu
- 1 ONLINE ONLINE oradb2
- ora.oc4j
- 1 ONLINE ONLINE oradb2
- ora.oradb2.vip
- 1 ONLINE ONLINE oradb2
- ora.scan1.vip
- 1 ONLINE ONLINE oradb2
- ora.tjtvdb.db
- 2 ONLINE ONLINE oradb2 Open
- SQL> select thread#,status,instance from v$thread
-
- THREAD# STATUS INSTANCE
- ---------- ------ --------------------
- 2 OPEN tjtvdb2
- [root@oradb2 bin]# ./srvctl status database -d tjtvdb
- 实例 tjtvdb2 正在节点 oradb2 上运行
-
- [root@oradb2 bin]# ./srvctl status nodeapps
- VIP oradb2-vip 已启用
- VIP oradb2-vip 正在节点上运行: oradb2
- 网络已启用
- 网络正在节点上运行: oradb2
- GSD 已禁用
- GSD 没有运行的节点: oradb2
- ONS 已启用
- ONS 守护程序正在节点上运行:oradb2
第二部分:将节点一添加至RAC
十二、在服务器节点一oradb1上创建用户及用户组:
#查看oradb2的uid和gid:
- [root@oradb2 ~]# id oracle
- uid=1101(oracle) gid=1000(oinstall) groups=1000(oinstall),1201(asmdba),1300(dba),1301(oper)
-
- [root@oradb2 ~]# id grid
- uid=1100(grid) gid=1000(oinstall) groups=1000(oinstall),1200(asmadmin),1201(asmdba),1202(asmoper)
#参照节点二的信息在节点一上创建用户和组
- [root@oradb1 ~]# groupadd -g 1000 oinstall
- [root@oradb1 ~]# groupadd -g 1300 dba
- [root@oradb1 ~]# groupadd -g 1301 oper
- [root@oradb1 ~]# groupadd -g 1200 asmadmin
- [root@oradb1 ~]# groupadd -g 1201 asmdba
- [root@oradb1 ~]# groupadd -g 1202 asmoper
- [root@oradb1 ~]# useradd -u 1100 -g oinstall -G asmadmin,asmdba,asmoper grid
- [root@oradb1 ~]# useradd -u 1101 -g oinstall -G asmdba,dba,oper oracle
- [root@oradb1 ~]# id oracle
- uid=1101(oracle) gid=1000(oinstall) groups=1000(oinstall),1300(dba),1301(oper),1201(asmdba)
-
- [root@oradb1 ~]# id grid
- uid=1100(grid) gid=1000(oinstall) groups=1000(oinstall),1200(asmadmin),1201(asmdba),1202(asmoper)
-
- [root@oradb1 ~]# passwd oracle
- Changing password for user oracle.
- New UNIX password:
- BAD PASSWORD: it is based on a dictionary word
- Retype new UNIX password:
- passwd: all authentication tokens updated successfully.
-
- [root@oradb1 ~]# passwd grid
- Changing password for user grid.
- New UNIX password:
- BAD PASSWORD: it is based on a dictionary word
- Retype new UNIX password:
- passwd: all authentication tokens updated successfully.
十三、修改oracle/grid用户.bash_profile文件并创建相应的目录:
- [root@oradb1 ~]# chown -R oracle:oinstall /home/oracle/.bash_profile
- [root@oradb1 ~]# chown -R grid:oinstall /home/oracle/.bash_profile
-
- [oracle@oradb1 ~]# vi .bash_profile
- [grid@oradb1 ~]# vi .bash_profile
-
- [root@oradb1 ~]# mkdir -p /opt/app/grid
- [root@oradb1 ~]# mkdir -p /opt/app/11.2.0/grid
- [root@oradb1 ~]# mkdir -p /opt/app/oracle/product/11.2.0/dbhome_1
-
- [root@oradb1 ~]# chown -R grid:oinstall /opt
- [root@oradb1 ~]# chown -R oracle:oinstall /opt/app/oracle
十四、#配置ssh信任关系
oracle:
- ssh-keygen -t rsa
- ssh-keygen -t dsa
- ssh oradb1 cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
- scp ~/.ssh/authorized_keys ayu2:~/.ssh/authorized_keys
oradb1:
- [oracle@oradb1 ~]$ ssh-keygen -t rsa
- Generating public/private rsa key pair.
- Enter file in which to save the key (/home/oracle/.ssh/id_rsa):
- Created directory '/home/oracle/.ssh'.
- Enter passphrase (empty for no passphrase):
- Enter same passphrase again:
- Your identification has been saved in /home/oracle/.ssh/id_rsa.
- Your public key has been saved in /home/oracle/.ssh/id_rsa.pub.
- The key fingerprint is:
- 63:f5:c5:66:ae:37:6b:ca:79:2b:2e:52:a7:1c:da:99 oracle@oradb1
-
- [oracle@oradb1 ~]$ ssh-keygen -t dsa
- Generating public/private dsa key pair.
- Enter file in which to save the key (/home/oracle/.ssh/id_dsa):
- Enter passphrase (empty for no passphrase):
- Enter same passphrase again:
- Your identification has been saved in /home/oracle/.ssh/id_dsa.
- Your public key has been saved in /home/oracle/.ssh/id_dsa.pub.
- The key fingerprint is:
- 41:87:12:2a:3b:1f:62:ad:0b:c4:b9:b7:a6:d0:52:1b oracle@oradb1
-
- [oracle@oradb1 ~]$ cat .ssh/id_rsa.pub >> .ssh/authorized_keys
- [oracle@oradb1 ~]$ cat .ssh/id_dsa.pub >> .ssh/authorized_keys
oradb2:
- [oracle@oradb2 ~]$ ssh-keygen -t rsa
- Generating public/private rsa key pair.
- Enter file in which to save the key (/home/oracle/.ssh/id_rsa):
- Created directory '/home/oracle/.ssh'.
- Enter passphrase (empty for no passphrase):
- Enter same passphrase again:
- Your identification has been saved in /home/oracle/.ssh/id_rsa.
- Your public key has been saved in /home/oracle/.ssh/id_rsa.pub.
- The key fingerprint is:
- 8a:33:96:c0:8f:b8:5e:0d:84:05:d9:9d:02:ce:6b:b7 oracle@oradb2
-
- [oracle@oradb2 ~]$ ssh-keygen -t dsa
- Generating public/private dsa key pair.
- Enter file in which to save the key (/home/oracle/.ssh/id_dsa):
- Enter passphrase (empty for no passphrase):
- Enter same passphrase again:
- Your identification has been saved in /home/oracle/.ssh/id_dsa.
- Your public key has been saved in /home/oracle/.ssh/id_dsa.pub.
- The key fingerprint is:
- 1d:e3:75:60:b7:df:a4:00:1b:b7:27:03:4a:ae:c4:10 oracle@oradb2
-
- [oracle@oradb2 ~]$ scp oradb1:/home/oracle/.ssh/authorized_keys .ssh/
- The authenticity of host 'oradb1 (172.27.90.158)' can't be established.
- RSA key fingerprint is e4:98:0f:bd:d8:5e:7c:d2:d8:bf:20:05:d3:34:c8:8e.
- Are you sure you want to continue connecting (yes/no)? yes
- Warning: Permanently added 'oradb1,172.27.90.158' (RSA) to the list of known hosts.
- oracle@oradb1's password:
- authorized_keys 100% 998 1.0KB/s 00:00
-
- [oracle@oradb2 ~]$ ll .ssh/authorized_keys
- -rw-r--r-- 1 oracle oinstall 998 12-24 21:10 .ssh/authorized_keys
-
- [oracle@oradb2 ~]$ cat .ssh/authorized_keys
- ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEA7ccTcH0c8vnqec0E3BZZTkWQWv6m/lQmePMHAFyw4+d3yWNPdEBJ9u0rOu5J81dJ53IDijLxQ1ivG6M6/WXIJ42e9PY4iFHXQZ7jk38dS6dBR4mYpl9EXPYlfgEInXTBazRbVU0HhlH0AyArVuSwYORy/7Hjr9QUQlV4RyGUaBuLxnvXSxErSjBtDsLGt6/RW5/PKQiBkglxmqOMl1F6XqaS6pQElcR2LPwaBQwqF8HpwL/wPwJS26B+04jZyzepZLgvzlbT9Obcdi7WBTkAAkXhY3l3Pgbf+XS3juf5tCw2RJmWRtBMa6DM8yIbuiLx3jE54GbcnXbs1zOyAoMOMQ== oracle@oradb1
- ssh-dss AAAAB3NzaC1kc3MAAACBANLZ+
来源: