Shafiulla Syed's Technical Blog.: 2018

Tuesday, July 3, 2018

Unable to open device "/dev/sdab1": Device or resource busy during oracleasm disk creation.

Resource or device busy issue when creating the ASM disks at OS level usign RAW disks provided from NETAPP Storage.

Issue:- Oracle ASM disks labeling failed for one of the Multi path RAW device which was provided from NETAPP storage as below

All the provided LUNs got mounted on DB Servers using below commands.

echo "- - -" > /sys/class/scsi_host/host1/scan

echo "- - -" > /sys/class/scsi_host/host2/scan

Once the disks got appeared at OS level in both nodes (2-Node RAC DB Server) the actual issue encountered.

[root@ebsdbser01 mapper]# fdisk -l /dev/sds

Disk /dev/sds: 214.7 GB, 214748364800 bytes

255 heads, 63 sectors/track, 26108 cylinders

Units = cylinders of 16065 * 512 = 8225280 bytes

Sector size (logical/physical): 512 bytes / 4096 bytes

I/O size (minimum/optimal): 4096 bytes / 65536 bytes

Disk identifier: 0x40402c59

Device Boot Start End Blocks Id System

/dev/sds1 1 26108 209712478+ 83 Linux

Partition 1 does not start on physical sector boundary.

[root@ebsdbser01 mapper]# fdisk -l /dev/sdab

Disk /dev/sdab: 214.7 GB, 214748364800 bytes

255 heads, 63 sectors/track, 26108 cylinders

Units = cylinders of 16065 * 512 = 8225280 bytes

Sector size (logical/physical): 512 bytes / 4096 bytes

I/O size (minimum/optimal): 4096 bytes / 65536 bytes

Disk identifier: 0x40402c59

Device Boot Start End Blocks Id System

/dev/sdab1 1 26108 209712478+ 83 Linux

Partition 1 does not start on physical sector boundary.

[root@ebsdbser01 mapper]# oracleasm createdisk FLASH01 /dev/sdab1

Unable to open device "/dev/sdab1": Device or resource busy

Work Around:-

As a work around, the Mapper device was taken for logical partitioning (fdisk) instead of its related one of the raw device path and then partprobe was used after partitioned. Finally all the dependent mutipaths got logically partitioned. And the ASM DISKS labelling successfully completed on both 2 node DB servers.

[root@ebsdbser01 mapper]# multipath -ll mpathe

mpathe (3600a098038303778462b49533170412f) dm-14 NETAPP,LUN C-Mode

size=200G features='4 queue_if_no_path pg_init_retries 50 retain_attached_hw_handle' hwhandler='0' wp=rw

|-+- policy='round-robin 0' prio=50 status=active

| |- 1:0:1:200 sds 65:32 active ready running

| `- 2:0:0:200 sdab 65:176 active ready running

`-+- policy='round-robin 0' prio=10 status=enabled

|- 1:0:0:200 sdj 8:144 active ready running

`- 2:0:1:200 sdak 66:64 active ready running

[root@ebsdbser01 mapper]# partprobe /dev/mapper/mpathe

[root@ebsdbser01 mapper]# ls -ltr /dev/mapper/mpathe*

lrwxrwxrwx. 1 root root 8 Jul 3 15:46 /dev/mapper/mpathep1 -> ../dm-24

lrwxrwxrwx. 1 root root 8 Jul 3 15:46 /dev/mapper/mpathe -> ../dm-14

[root@ebsdbser01 mapper]# oracleasm createdisk FLASH01 /dev/mapper/mpathep1

Writing disk header: done

Instantiating disk: done

[root@ebsdbser01 mapper]# oracleasm querydisk -d FLASH01

Disk "FLASH01" is a valid ASM disk on device [251,24]

[root@ebsdbser01 mapper]# oracleasm querydisk -p FLASH01

Disk "FLASH01" is a valid ASM disk

/dev/sdj1: LABEL="FLASH01" TYPE="oracleasm"

/dev/sds1: LABEL="FLASH01" TYPE="oracleasm"

/dev/sdab1: LABEL="FLASH01" TYPE="oracleasm"

/dev/sdak1: LABEL="FLASH01" TYPE="oracleasm"

/dev/mapper/mpathep1: LABEL="FLASH01" TYPE="oracleasm"

Tuesday, June 12, 2018

Database Incomplete recovery lead to ORA-00600: internal error code, arguments: [13013] ... On SYS related CLUSTER table.

An Oracle database hosting SAP SLT Application (SAP Landscape Transformation) got corrupted due to unexpected crash of ASM disk groups.

The most corrupted disks were related to FRA where the ORL and Archive logs placed upon.

When Oracle Support confirmed that these disks are no more in a state of recovery, One of my buddy Database Administrator Performed the Incomplete Recovery.Post of incomplete DB recovery, all were in an assumption that everything is working fine.

But The SAP SLT application users started complaining as below in their own words.

"We are seeing lot of db oracle related error messages in the application level, which hampers almost all activity in the system. Kindly check and do the needful."

Then myself started investigating. In my initial investigation itself found that below error was being triggered since above mentioned Incomplete DB recovery.

The Oracle Internal Error encountered as below.

ORA-00600: internal error code, arguments: [13013], [5001], [344], [8485480], [0], [8485480], [17], [], [], [], [], []

Oracle says if ORA-600 [13013] is due to corrupted Indexes and the following optional arguments should be considered as below(7 optional arguments), continued my investigation by matching the arguments as below.

Format: ORA-600 [13013] [a] [b] [c] [d] [e] [f]

Arg [a] Passcount -- 5001

Arg [b] Data Object number -- 344

Arg [c] Tablespace Decimal Relative DBA (RDBA) of block containing the row to be updated -- 8485480

Arg [d] Row Slot number -- [0]

Arg [e] Decimal RDBA of block being updated (Typically same as [c]) -- [8485480]

Arg [f] Code -- [17]

When i tried to catch the problematic table using corrupted Indexes, i found that is not a table but one of the SYS schema CLUSTER object.

A CLUSTER Object is a group of tables that share common columns and store related data in the same blocks,

Identification of DB Object affected by above corruption.

SET LINESIZE 200

COL OBJECT_NAME FORMAT A10

COL OWNER FORMAT A15

SET LINESIZE 200

SELECT OBJECT_NAME,OWNER,OBJECT_ID,OBJECT_TYPE FROM DBA_OBJECTS WHERE OBJECT_ID=344;

OBJECT_NAME OWNER OBJECT_ID OBJECT_TYPE

------------------------------ --------------- ---------- -----------------------

SMON_SCN_TO_TIME_AUX SYS 344 CLUSTER

The above cluster object is build up on the columns from a SYS table called SMON_SCN_TIME. I followed one odd way to find out the table building this cluster object by analyzing all the SYS related tables matching above CLUSTER name. Then i found the exact table and encountered the real issue when validating its structure.

Table affected by the above corruption

SQL> ANALYZE TABLE SYS."SMON_SCN_TIME" VALIDATE STRUCTURE CASCADE;

ERROR at line 1:

ORA-01499: table/index cross reference failure - see trace file

Issues got resolved post of rebuilding the corrupted index rebuilding below.

SQL> SELECT OWNER,INDEX_NAME FROM DBA_INDEXES WHERE TABLE_NAME='SMON_SCN_TIME';

OWNER INDEX_NAME

---------- -------------------------

SYS SMON_SCN_TIME_TIM_IDX

SYS SMON_SCN_TIME_SCN_IDX

SQL> ANALYZE INDEX SMON_SCN_TIME_TIM_IDX VALIDATE STRUCTURE;

Index analyzed.

SQL> ANALYZE INDEX SMON_SCN_TIME_SCN_IDX VALIDATE STRUCTURE;

Index analyzed.

SQL> ALTER INDEX SMON_SCN_TIME_TIM_IDX REBUILD ONLINE;

Index altered.

SQL> ALTER INDEX SMON_SCN_TIME_SCN_IDX REBUILD ONLINE;

Index altered.

Finally After rebuilding above corrupted indexes got recovered and i could able to re-analyze the problematic table without any issues.

SQL> ANALYZE TABLE SYS."SMON_SCN_TIME" VALIDATE STRUCTURE CASCADE;

Table analyzed.

Post of above the Oracle Internal error got disappeared from alert log and No complaints from Application user.

Happy reading...

Contd

Shafiulla Syed's Technical Blog.