Saturday, September 27, 2014

Data Pump Import job got stuck for space quota..

The import job got stuck and continuously alerting the below two error. 

ORA-39171: Job is experiencing a resumable wait.
ORA-01536: space quota exceeded for tablespace 'SPDEFAULT'

As per oracle the above errors information is like below.

----------------------------------------------------------------------------------------------------------------------------------------
ORA- 39171:  Job is experiencing a resumable wait.

Cause: The Data Pump job is stalled with one or more of its sessions having a resumable wait. Resumable waits are typically caused by a non-expandable tablespace running out of space. The follow-on message describes the nature of the wait.

Action: Correct the condition causing the wait. This will typically involve adding datafiles to the tablespace that is full

ORA-01536: space quota exceeded for tablespace "string"

Cause: The space quota for the segment owner in the tablespace has been exhausted and the operation attempted the creation of a new segment extent in the tablespace.

Action: Either drop unnecessary objects in the tablespace to reclaim space or have a privileged user increase the quota on this tablespace for the segment owner.

----------------------------------------------------------------------------------------------------------------------------------------

However the both alerts were not directing me the proper fix to my problem because though i wanted to grant some quota for a owner/schema which is unknown in this here and there are a lot of users getting imported. 

Then i thought to dig it further, so i looked into database to know what exactly is happening at this moment. Actually a user called  'SP'  is waiting to create an object in SPDEFAULT tablespace and I found this user does not have any quota on SPDEFAULT tablespace. 

set pages 50000 lines 32767
col "Last SQL" format a1000
SELECT t.inst_id,s.username, s.sid, s.serial#,t.sql_id,t.sql_text "Last SQL",t.module
FROM gv$session s, gv$sqlarea t
WHERE s.sql_address =t.address AND
s.sql_hash_value =t.hash_value;

   INST_ID USERNAME                              SID    SERIAL# SQL_ID        Last SQL
---------- ------------------------------ ---------- ---------- ------------- ---------------------------------------------------------------------------------------------------------------------------------------------------------------------
         1 EXP$ORACLE                             44        741 bjnvhuy0bayab CREATE TABLE "SP"."SPDBOWNER" ("DBNAME" VARCHAR2(8 BYTE) NOT NULL ENABLE, "OWNERID" VARCHAR2(8 BYTE) NOT NULL ENABLE) SEGMENT CREATION IMMEDIATE PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255 NOCOMPRESS LOGGING STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645 PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT) TABLESPACE "SPDEFAULT"


Finally i got the schema to whome the quota is needed, after granting quota here we go, the import progressed.

ORA-39171: Job is experiencing a resumable wait.
ORA-01536: space quota exceeded for tablespace 'SPDEFAULT'
Processing object type DATABASE_EXPORT/SCHEMA/TABLE/TABLE_DATA
. . imported "SYSDMPS"."SP_GLSI_ADEX_DTL"                25.89 GB 102216301 rows

Wednesday, September 17, 2014

The operations on a DATABASE/TABLESPACE under begin backup / offline mode - Understand better!!


Even though RMAN is being used as a powerful backup and recovery tool for the oracle databases instead of traditional user managed backups, yet some times questions posed to me or they just come in my mind about how Oracle manages datafiles and data of tablespaces/database when they are kept in BEGIN BACKUP mode. Let us explore it further.

Note: Before testing these scenarios please make sure that your database has a consistent backup if you think your data is important.

Lets keep the USERS tablespace in begin backup mode

SQL> alter tablespace users begin backup;

Tablespace altered.

Makre sure that the datafiles under this tablespace are ACTIVE status, which indicates that these datafiles are ACTIVE 
Backup mode.

SQL> SELECT t.name AS "TB_NAME", d.file# as "DF#", d.name AS "DF_NAME", b.status
FROM   V$DATAFILE d, V$TABLESPACE t, V$BACKUP b
WHERE  d.TS#=t.TS#
AND    b.FILE#=d.FILE#
AND    b.STATUS='ACTIVE'; 

TB_NAME         DF#      DF_NAME                                  STATUS
------------- ---------- --------------------------------------   ------------------
USERS            4       /u01/app/oracle/oradata/SPRD/users01.dbf  ACTIVE

Can we add a new datafile to a tablespace if it is in Backup mode ? 

-- YES, However this newly created datafile will not be in begin backup mode. To keep it in backup mode you need to re-

enable the backup mode of the entire tablespace.

Example:
--------

SQL> alter tablespace users add datafile '/u01/app/oracle/oradata/SPRD/users02.dbf' size 5m;

Tablespace altered.

SQL> SELECT t.name AS "TB_NAME", d.file# as "DF#", d.name AS "DF_NAME", b.status
FROM   V$DATAFILE d, V$TABLESPACE t, V$BACKUP b
WHERE  d.TS#=t.TS#
AND    b.FILE#=d.FILE#
AND    b.STATUS='ACTIVE'; 

TB_NAME         DF#      DF_NAME                                  STATUS
------------- ---------- --------------------------------------   ------------------
USERS            4       /u01/app/oracle/oradata/SPRD/users01.dbf  ACTIVE


SQL> ALTER TABLESPACE USERS END BACKUP;

Tablespace altered.

SQL> ALTER TABLESPACE USERS BEGIN BACKUP;

Tablespace altered.


SELECT t.name AS "TB_NAME", d.file# as "DF#", d.name AS "DF_NAME", b.status
FROM   V$DATAFILE d, V$TABLESPACE t, V$BACKUP b
WHERE  d.TS#=t.TS#
AND    b.FILE#=d.FILE#
AND    b.STATUS='ACTIVE';

TB_NAME                               DF# DF_NAME                                                      STATUS
------------------------------ ---------- ------------------------------------------------------------ ------------------
USERS                                   4 /u01/app/oracle/oradata/SPRD/users01.dbf                     ACTIVE
USERS                                   5 /u01/app/oracle/oradata/SPRD/users02.dbf                     ACTIVE

Take a physical copy of the above two datafiles which are in Backup mode for further scenarioes.

[oracle@oranode1 Manual_bkps]$ pwd
/fra/SPRD/Manual_bkps
oracle@oranode1 Manual_bkps]$ cp /u01/app/oracle/oradata/SPRD/users01.dbf .
[oracle@oranode1 Manual_bkps]$ cp /u01/app/oracle/oradata/SPRD/users02.dbf .
[oracle@oranode1 Manual_bkps]$ ls -ltr
total 107652
-rw-r----- 1 oracle oinstall 104865792 Sep 16 22:09 users01.dbf
-rw-r----- 1 oracle oinstall   5251072 Sep 16 22:09 users02.dbf

Can we resize a datafile which is in BBM ?

Yes and NO...

Yes, if we are expanding the size of the datafile.

SQL> alter database datafile 4 resize 100m;


Database altered.

SQL> ALTER DATABASE DATAFILE '/u01/app/oracle/oradata/SPRD/users02.dbf' RESIZE 10M;

Database altered.


SQL> select file_name,bytes/1024/1024 from dba_data_files where tablespace_name='USERS';

FILE_NAME                                     BYTES/1024/1024
--------------------------------------------   --------------

/u01/app/oracle/oradata/SPRD/users01.dbf         100
            

/u01/app/oracle/oradata/SPRD/users02.dbf          10


No -  If we are shrinking (reducing the space) the datafile than the existing size,  Below i tried to reduce its size from
100M to 99M (1 MB less) infact this datafile is almost empty in size.


SQL>  ALTER DATABASE DATAFILE '/u01/app/oracle/oradata/SPRD/users01.dbf' RESIZE 99M;
 ALTER DATABASE DATAFILE '/u01/app/oracle/oradata/SPRD/users01.dbf' RESIZE 99M
*
ERROR at line 1:
ORA-03298: cannot shrink datafile - file 4 is under hot backup
ORA-01110: data file 4: '/u01/app/oracle/oradata/SPRD/users01.dbf'

Can we drop a datafile ?

No - Here the example.

SQL> ALTER TABLESPACE USERS DROP DATAFILE '/u01/app/oracle/oradata/SPRD/users02.dbf';
ALTER TABLESPACE USERS DROP DATAFILE '/u01/app/oracle/oradata/SPRD/users02.dbf'
*
ERROR at line 1:
ORA-01150: cannot prevent writes - file 5 has online backup set
ORA-01110: data file 5: '/u01/app/oracle/oradata/SPRD/users02.dbf'


This time I added one more datafile and kept the whole tablespace again into backup mode and copied only this newly 

created datafile at OS level ( the two remainiing datafiles are already backed up above and our DB is in Archivelog mode 

so no issues)

SQL> select file_name,bytes/1024/1024 SIZE_MB from dba_data_files WHERE TABLESPACE_NAME='USERS';

FILE_NAME                                                       SIZE_MB
------------------------------------------------------------ ----------
/u01/app/oracle/oradata/SPRD/users01.dbf                            100
/u01/app/oracle/oradata/SPRD/users02.dbf                             10
/u01/app/oracle/oradata/SPRD/users03.dbf                              5

SQL> ALTER TABLESPACE USERS BEGIN BACKUP;

Tablespace altered.

[oracle@oranode1 Manual_bkps]$  cp /u01/app/oracle/oradata/SPRD/users03.dbf .

SQL> ALTER TABLESPACE USERS END BACKUP;

Tablespace altered.


Let's drop the datafiles at OS level for further investigation.. 
Dropped all the above datafiles from their original locations. Make sure these datafiles are backedup at OS level by 

keeping them in backup mode and keep the database in archivelog mode.


[oracle@oranode1 SPRD]$ pwd

/u01/app/oracle/oradata/SPRD

[oracle@oranode1 SPRD]$ rm users*

Now the datafiles belong to USERS tablespace got removed from their original locations. So let us examine how far we can
operate the USERS tablespace.

select/delete/update/Deletion of EMP table ? 

select username,default_tablespace from dba_users where default_tablespace='USERS';

USERNAME                       DEFAULT_TABLESPACE
------------------------------ ------------------------------
ORACLE_OCM                     USERS
SPATIAL_CSW_ADMIN_USR          USERS
SCOTT                          USERS
APEX_PUBLIC_USER               USERS
DIP                            USERS
MDDATA                         USERS
XS$NULL                        USERS
SPATIAL_WFS_ADMIN_USR          USERS

SQL> select table_name,tablespace_name from dba_tables where owner='SCOTT' and table_name='EMP';

TABLE_NAME                     TABLESPACE_NAME
------------------------------ ------------------------------
EMP                            USERS

SQL> select B.FILE_NAME from dba_segments a, dba_data_files b where a.header_file=b.file_ID and segment_name='EMP';

FILE_NAME
--------------------------------------------------------------------------------
/u01/app/oracle/oradata/SPRD/users01.dbf

SQL> conn scott/tiger;
Connected.

SQL> show user;
USER is "SCOTT"

SQL> select * from tab;

TNAME                          TABTYPE  CLUSTERID
------------------------------ ------- ----------
BONUS                          TABLE
DEPT                           TABLE
EMP                            TABLE
SALGRADE                       TABLE


SQL> select * from emp;

     EMPNO ENAME      JOB              MGR HIREDATE         SAL       COMM     DEPTNO
---------- ---------- --------- ---------- --------- ---------- ---------- ----------
      7369 SMITH      CLERK           7902 17-DEC-80        800                    20
      7499 ALLEN      SALESMAN        7698 20-FEB-81       1600        300         30
      7521 WARD       SALESMAN        7698 22-FEB-81       1250        500         30
      7566 JONES      MANAGER         7839 02-APR-81       2975                    20
      7654 MARTIN     SALESMAN        7698 28-SEP-81       1250       1400         30
      7698 BLAKE      MANAGER         7839 01-MAY-81       2850                    30
      7782 CLARK      MANAGER         7839 09-JUN-81       2450                    10
      7788 SCOTT      ANALYST         7566 19-APR-87       3000                    20
      7839 KING       PRESIDENT            17-NOV-81       5000                    10
      7844 TURNER     SALESMAN        7698 08-SEP-81       1500          0         30
      7876 ADAMS      CLERK           7788 23-MAY-87       1100                    20

     EMPNO ENAME      JOB              MGR HIREDATE         SAL       COMM     DEPTNO
---------- ---------- --------- ---------- --------- ---------- ---------- ----------
      7900 JAMES      CLERK           7698 03-DEC-81        950                    30
      7902 FORD       ANALYST         7566 03-DEC-81       3000                    20
      7934 MILLER     CLERK           7782 23-JAN-82       1300                    10

Yes - The DML operations also work on these datafiles, however this datafile goes offline automatically as i could not be accessed.

SQL> delete emp where empno=7934;

1 row deleted.

SQL> commit;

Commit complete.

SQL> update emp set EMPNO=1000 where ename='SMITH';

1 row updated.

SQL> COMMIT;

Commit complete.

You can see the above changes in this table. One row deleted and one row updated.

SQL> select * from emp;

     EMPNO ENAME      JOB              MGR HIREDATE         SAL       COMM     DEPTNO
---------- ---------- --------- ---------- --------- ---------- ---------- ----------
      1000 SMITH      CLERK           7902 17-DEC-80        800                    20
      7499 ALLEN      SALESMAN        7698 20-FEB-81       1600        300         30
      7521 WARD       SALESMAN        7698 22-FEB-81       1250        500         30
      7566 JONES      MANAGER         7839 02-APR-81       2975                    20
      7654 MARTIN     SALESMAN        7698 28-SEP-81       1250       1400         30
      7698 BLAKE      MANAGER         7839 01-MAY-81       2850                    30
      7782 CLARK      MANAGER         7839 09-JUN-81       2450                    10
      7788 SCOTT      ANALYST         7566 19-APR-87       3000                    20
      7839 KING       PRESIDENT            17-NOV-81       5000                    10
      7844 TURNER     SALESMAN        7698 08-SEP-81       1500          0         30
      7876 ADAMS      CLERK           7788 23-MAY-87       1100                    20

     EMPNO ENAME      JOB              MGR HIREDATE         SAL       COMM     DEPTNO
---------- ---------- --------- ---------- --------- ---------- ---------- ----------
      7900 JAMES      CLERK           7698 03-DEC-81        950                    30
      7902 FORD       ANALYST         7566 03-DEC-81       3000                    20

13 rows selected.

It seems all above are going fine, however the alert log file logs the errors which says that these datafiles are getting offline and could not be accessed for any DML opperations. The reason is Oracle automatically changes a tablespace from online to offline when certain errors are encountered like when the  DBWR fails in several attempts to write to a datafile of the tablespace. 

Then the question is how the operations are progressing though the datafile is offline, the answer from oracle is..

When a tablespace goes offline, Oracle does not permit any subsequent SQL statements to reference objects contained in the tablespace. Active transactions with completed statements that refer to data in a tablespace that has been taken offline  are not affected at the transaction level. Oracle saves rollback data corresponding to statements that affect data in the offline tablespace in a deferred rollback segment (in the SYSTEM tablespace). When the tablespace is brought back online, Oracle applies the rollback data to the tablespace, if needed.

==============ALERT LOG FILE===========================

ORA-01116: error in opening database file 4
ORA-01110: data file 4: '/u01/app/oracle/oradata/SPRD/users01.dbf'

ORA-01116: error in opening database file 5
ORA-01110: data file 5: '/u01/app/oracle/oradata/SPRD/users02.dbf'

ORA-01135: file 6 accessed for DML/query is offline
ORA-01110: data file 6: '/u01/app/oracle/oradata/SPRD/users03.dbf'

======================================================================


SQL> select B.FILE_NAME from dba_segments a, dba_data_files b 
where a.header_file=b.file_ID and segment_name='  2  EMP';

FILE_NAME
------------------------------------------------------------
/u01/app/oracle/oradata/SPRD/users01.dbf

SQL> ! ls -ltr /u01/app/oracle/oradata/SPRD/users01.dbf
ls: /u01/app/oracle/oradata/SPRD/users01.dbf: No such file or directory



SQL> select group#,members, status from v$log;

    GROUP#    MEMBERS STATUS
---------- ---------- ----------------
         1          1 INACTIVE
         2          1 INACTIVE
         3          1 CURRENT

SQL> select group#,status,member from v$logfile;

    GROUP# STATUS
---------- -------
MEMBER
--------------------------------------------------------------------------------------------------------------------------

------------------------------------------------------------------------------
         3
/u01/app/oracle/oradata/SPRD/redo03.log

         2
/u01/app/oracle/oradata/SPRD/redo02.log

         1
/u01/app/oracle/oradata/SPRD/redo01.log



SQL> ALTER SYSTEM DUMP LOGFILE '/u01/app/oracle/oradata/SPRD/redo03.log';

System altered.


SQL> CONN SCOTT/tiger;
Connected.


Even table creation also works at this situation which means the table creation physically does not write anything on the datafile which is missing and the metadata will not be updated properly.

SQL> CREATE TABLE T1 (id number);

Table created.

SQL> select * from tab;

TNAME                          TABTYPE  CLUSTERID
------------------------------ ------- ----------
BONUS                          TABLE
DEPT                           TABLE
EMP                            TABLE
SALGRADE                       TABLE
T1                             TABLE

SQL> select B.FILE_NAME from dba_segments a, dba_data_files b where a.header_file=b.file_ID and segment_name='T1';

no rows selected

SQL> select B.FILE_NAME from dba_segments a, dba_data_files b where a.header_file=b.file_ID and segment_name='EMP';

FILE_NAME
--------------------------------------------------------------------------------
/u01/app/oracle/oradata/SPRD/users01.dbf

Let us insert some value in T1 which has been created just now to check how it progresses. It should through error like below. Whereas you can yet insert/delete/update the rows in the tables(eg EMP) that are exisitng in the datafile (though the datafile is missing the changes are recorded in redo logs).


SQL> insert into t1 values (&id);
Enter value for id: 1
old   1: insert into t1 values (&id)
new   1: insert into t1 values (1)
insert into t1 values (1)
            *
ERROR at line 1:
ORA-01116: error in opening database file 4
ORA-01110: data file 4: '/u01/app/oracle/oradata/SPRD/users01.dbf'
ORA-27041: unable to open file
Linux Error: 2: No such file or directory
Additional information: 3

Now let us force oracle to read the required data from datafiles instead of memory by flushing shared pool and buffer cache. Which should through error as it could not read from a datafile which is actually missing at original location.
                                           

SQL> alter system flush shared_pool;

System altered.

SQL> ALTER SYSTEM FLUSH BUFFER_CACHE;

System altered.

SQL> alter system switch logfile;

System altered.

SQL> select * from emp;
select * from emp
              *
ERROR at line 1:
ORA-01116: error in opening database file 4
ORA-01110: data file 4: '/u01/app/oracle/oradata/SPRD/users01.dbf'
ORA-27041: unable to open file
Linux Error: 2: No such file or directory
Additional information: 3

Finally let us try to restore the datafiles which are deleted from their original locations.

SQL> select name,open_mode from v$database;

NAME      OPEN_MODE
--------- --------------------
SPRD      READ WRITE

SQL> shutdown immediate;
ORA-01116: error in opening database file 4
ORA-01110: data file 4: '/u01/app/oracle/oradata/SPRD/users01.dbf'
ORA-27041: unable to open file
Linux Error: 2: No such file or directory
Additional information: 3

SQL> select name,open_mode from v$database;

NAME      OPEN_MODE
--------- --------------------
SPRD      READ WRITE


SQL> shutdown abort;
ORACLE instance shut down.


oracle@oranode1 Manual_bkps]$ cp user* /u01/app/oracle/oradata/SPRD/
[oracle@oranode1 Manual_bkps]$ cd /u01/app/oracle/oradata/SPRD/
[oracle@oranode1 SPRD]$ ls -ltr
total 1529216
-rw-r----- 1 oracle oinstall  20979712 Sep 16 22:01 temp01.dbf
-rw-r----- 1 oracle oinstall  52429312 Sep 16 22:28 redo02.log
-rw-r----- 1 oracle oinstall  52429312 Sep 16 23:04 redo03.log
-rw-r----- 1 oracle oinstall  52436992 Sep 16 23:17 undotbs01.dbf
-rw-r----- 1 oracle oinstall 713039872 Sep 16 23:17 system01.dbf
-rw-r----- 1 oracle oinstall 503324672 Sep 16 23:17 sysaux01.dbf
-rw-r----- 1 oracle oinstall  52429312 Sep 16 23:17 redo01.log
-rw-r----- 1 oracle oinstall   9748480 Sep 16 23:17 control02.ctl
-rw-r----- 1 oracle oinstall   9748480 Sep 16 23:17 control01.ctl
-rw-r----- 1 oracle oinstall   5251072 Sep 16 23:21 users03.dbf
-rw-r----- 1 oracle oinstall   5251072 Sep 16 23:21 users02.dbf
-rw-r----- 1 oracle oinstall 104865792 Sep 16 23:21 users01.dbf


SQL> col name format a60
SQL> set linesize 200
SQL> select NAME, CHECKPOINT_TIME, STATUS from v$datafile;

NAME                                                         CHECKPOIN STATUS
------------------------------------------------------------ --------- -------
/u01/app/oracle/oradata/SPRD/system01.dbf                    16-SEP-14 SYSTEM
/u01/app/oracle/oradata/SPRD/sysaux01.dbf                    16-SEP-14 ONLINE
/u01/app/oracle/oradata/SPRD/undotbs01.dbf                   16-SEP-14 ONLINE
/u01/app/oracle/oradata/SPRD/users01.dbf                     16-SEP-14 ONLINE
/u01/app/oracle/oradata/SPRD/users02.dbf                     16-SEP-14 ONLINE
/u01/app/oracle/oradata/SPRD/users03.dbf                     16-SEP-14 RECOVER

6 rows selected.


SQL> alter session set nls_date_format='yyyy-mm-dd hh24:mi:ss';

Session altered.

SQL> select NAME, CHECKPOINT_TIME, STATUS from v$datafile;

NAME                                                         CHECKPOINT_TIME     STATUS
------------------------------------------------------------ ------------------- -------
/u01/app/oracle/oradata/SPRD/system01.dbf                    2014-09-16 23:04:53 SYSTEM
/u01/app/oracle/oradata/SPRD/sysaux01.dbf                    2014-09-16 23:04:53 ONLINE
/u01/app/oracle/oradata/SPRD/undotbs01.dbf                   2014-09-16 23:04:53 ONLINE
/u01/app/oracle/oradata/SPRD/users01.dbf                     2014-09-16 23:04:53 ONLINE
/u01/app/oracle/oradata/SPRD/users02.dbf                     2014-09-16 23:04:53 ONLINE
/u01/app/oracle/oradata/SPRD/users03.dbf                     2014-09-16 22:20:40 RECOVER

6 rows selected.

SQL> select name,open_mode from v$database;

NAME                                                         OPEN_MODE
------------------------------------------------------------ --------------------
SPRD                                                         MOUNTED

SQL> alter database recover datafile '/u01/app/oracle/oradata/SPRD/users03.dbf';

Database altered.

SQL> select NAME, CHECKPOINT_TIME, STATUS from v$datafile;

NAME                                                         CHECKPOINT_TIME     STATUS
------------------------------------------------------------ ------------------- -------
/u01/app/oracle/oradata/SPRD/system01.dbf                    2014-09-16 23:04:53 SYSTEM
/u01/app/oracle/oradata/SPRD/sysaux01.dbf                    2014-09-16 23:04:53 ONLINE
/u01/app/oracle/oradata/SPRD/undotbs01.dbf                   2014-09-16 23:04:53 ONLINE
/u01/app/oracle/oradata/SPRD/users01.dbf                     2014-09-16 23:04:53 ONLINE
/u01/app/oracle/oradata/SPRD/users02.dbf                     2014-09-16 23:04:53 ONLINE
/u01/app/oracle/oradata/SPRD/users03.dbf                     2014-09-16 22:33:06 OFFLINE

6 rows selected.

SQL> alter database datafile '/u01/app/oracle/oradata/SPRD/users03.dbf' online;

Database altered.

SQL> select NAME, CHECKPOINT_TIME, STATUS from v$datafile;

NAME                                                         CHECKPOINT_TIME     STATUS
------------------------------------------------------------ ------------------- -------
/u01/app/oracle/oradata/SPRD/system01.dbf                    2014-09-16 23:04:53 SYSTEM
/u01/app/oracle/oradata/SPRD/sysaux01.dbf                    2014-09-16 23:04:53 ONLINE
/u01/app/oracle/oradata/SPRD/undotbs01.dbf                   2014-09-16 23:04:53 ONLINE
/u01/app/oracle/oradata/SPRD/users01.dbf                     2014-09-16 23:04:53 ONLINE
/u01/app/oracle/oradata/SPRD/users02.dbf                     2014-09-16 23:04:53 ONLINE
/u01/app/oracle/oradata/SPRD/users03.dbf                     2014-09-16 22:33:06 ONLINE

SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-01113: file 4 needs media recovery
ORA-01110: data file 4: '/u01/app/oracle/oradata/SPRD/users01.dbf'


COL DF# FORMAT 999
COL DF_NAME FORMAT A50
COL TBSP_NAME FORMAT A7
COL STATUS FORMAT A7
COL CHANGE# FORMAT 99999999
SELECT r.FILE# AS df#, d.NAME AS df_name, t.NAME AS tbsp_name, 
       d.STATUS, r.CHANGE#, r.TIME
FROM V$RECOVER_FILE r, V$DATAFILE d, V$TABLESPACE t
WHERE t.TS# = d.TS#
AND d.FILE# = r.FILE#;

 DF# DF_NAME                                            TBSP_NA STATUS    CHANGE# TIME
---- -------------------------------------------------- ------- ------- --------- -------------------
   1 /u01/app/oracle/oradata/SPRD/system01.dbf          SYSTEM  SYSTEM     825879 2014-09-16 23:04:53
   2 /u01/app/oracle/oradata/SPRD/sysaux01.dbf          SYSAUX  ONLINE     825879 2014-09-16 23:04:53
   3 /u01/app/oracle/oradata/SPRD/undotbs01.dbf         UNDOTBS ONLINE     825879 2014-09-16 23:04:53
   4 /u01/app/oracle/oradata/SPRD/users01.dbf           USERS   ONLINE     822715 2014-09-16 22:06:11
   5 /u01/app/oracle/oradata/SPRD/users02.dbf           USERS   ONLINE     822715 2014-09-16 22:06:11


SQL> alter database open;

ORA-01113: file 4 needs media recovery
ORA-01110: data file 4: '/u01/app/oracle/oradata/SPRD/users01.dbf'


COL DF# FORMAT 999
COL DF_NAME FORMAT A35
COL TBSP_NAME FORMAT A7
COL STATUS FORMAT A7
COL CHANGE# FORMAT 99999999
SELECT r.FILE# AS df#, d.NAME AS df_name, t.NAME AS tbsp_name, 
       d.STATUS, r.ERROR, r.CHANGE#, r.TIME
FROM V$RECOVER_FILE r, V$DATAFILE d, V$TABLESPACE t
WHERE t.TS# = d.TS#
AND d.FILE# = r.FILE#;SQL> SQL> SQL> SQL> SQL>   2    3    4    5  

 DF# DF_NAME                             TBSP_NA STATUS  ERROR        CHANGE# TIME
---- ----------------------------------- ------- ------- ---------- --------- -------------------

   4 /u01/app/oracle/oradata/SPRD/users0 USERS   ONLINE                822715 2014-09-16 22:06:11
     1.dbf

   5 /u01/app/oracle/oradata/SPRD/users0 USERS   ONLINE                822715 2014-09-16 22:06:11
     2.dbf


The Null (empty values) shown in the ERROR colum indicates that these datafiles need recovery.

Let us recover the datafiles 4 & 5 ..and open the database.


SQL> alter database recover datafile 4;

Database altered.

SQL> alter database recover datafile 5;

Database altered.

Make sure that no other datafiles need recovery...

SQL> COL DF# FORMAT 999
COL DF_NAME FORMAT A50
COL TBSP_NAME FORMAT A7
COL STATUS FORMAT A7
COL CHANGE# FORMAT 99999999
SELECT r.FILE# AS df#, d.NAME AS df_name, t.NAME AS tbsp_name, 
       d.STATUS, r.CHANGE#, r.TIME
FROM V$RECOVER_FILE r, V$DATAFILE d, V$TABLESPACE t
WHERE t.TS# = d.TS#
AND d.FILE# = r.FILE#;SQL> SQL> SQL> SQL> SQL>   2    3    4    5  

no rows selected

SQL> alter database open;

Database altered.


Here is the snippet from alert log what is happening at the time of datafiles recovery.

======================ALERT LOG==========================
alter database recover datafile 4
Media Recovery Start
Serial Media Recovery started
Recovery of Online Redo Log: Thread 1 Group 2 Seq 5 Reading mem 0
  Mem# 0: /u01/app/oracle/oradata/SPRD/redo02.log
Recovery of Online Redo Log: Thread 1 Group 3 Seq 6 Reading mem 0
  Mem# 0: /u01/app/oracle/oradata/SPRD/redo03.log
Recovery of Online Redo Log: Thread 1 Group 1 Seq 7 Reading mem 0
  Mem# 0: /u01/app/oracle/oradata/SPRD/redo01.log
Media Recovery Complete (SPRD)
Completed: alter database recover datafile 4

alter database recover datafile 5
Media Recovery Start
Serial Media Recovery started
Recovery of Online Redo Log: Thread 1 Group 2 Seq 5 Reading mem 0
  Mem# 0: /u01/app/oracle/oradata/SPRD/redo02.log
Media Recovery Complete (SPRD)
Completed: alter database recover datafile 4
alter database recover datafile 5
Media Recovery Start
Serial Media Recovery started
Recovery of Online Redo Log: Thread 1 Group 2 Seq 5 Reading mem 0
  Mem# 0: /u01/app/oracle/oradata/SPRD/redo02.log
Recovery of Online Redo Log: Thread 1 Group 3 Seq 6 Reading mem 0
  Mem# 0: /u01/app/oracle/oradata/SPRD/redo03.log
Recovery of Online Redo Log: Thread 1 Group 1 Seq 7 Reading mem 0
  Mem# 0: /u01/app/oracle/oradata/SPRD/redo01.log
Media Recovery Complete (SPRD)
Completed: alter database recover datafile 5
Tue Sep 16 23:44:55 2014
alter database open
Beginning crash recovery of 1 threads
 parallel recovery started with 2 processes
Started redo scan
Completed redo scan
 read 4 KB redo, 11 data blocks need recovery
Started redo application at
 Thread 1: logseq 7, block 1170
Recovery of Online Redo Log: Thread 1 Group 1 Seq 7 Reading mem 0
  Mem# 0: /u01/app/oracle/oradata/SPRD/redo01.log

=============================================================================================

Enjoy....... Please Comment !!














Saturday, March 8, 2014

NFS locking issue while data pump export - Linux-x86_64 Error: 37: No locks available

Yesterday, as usual the cron job triggered a datapump export job against a database on a Linux Server.
Immediately post running the export job it got failed. When i look into the dump logfile i found below sort of errors.

Export: Release 11.2.0.3.0 - Production on Sat Mar 8 05:53:37 2014

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.
;;;
Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning and Automatic Storage Management options
ORA-39000: bad dump file specification
ORA-31641: unable to create dump file "/oraexp/NTLSNDB/Data_Pump_Export_NTLSNDB_FULL_030814_0553_01.dmp"
ORA-27086: unable to lock file - already in use
Linux-x86_64 Error: 37: No locks available
Additional information: 10

I verified at database level whether the dump directory, its path and the proper read & write privileges are granted on the directory. Yes everything was fine at database end.

I believed this could be an issue of nfs mount option at OS level. We are using an NFS shared mount point for all of the servers it needs to get mounted with proper options on each server to get it used by the database. I could see this mount point is mounted properly with recommended options by Oracle.

Then i checked the logs at OS level, then i found the issue is related to nfslock services. The nfslock service is not running on this database.  this service helps the client to lock a file in the related NFS mount point on the server to create a file and make write operations.

>cat messages | grep lockd
Mar  8 04:03:31 demoserver kernel: lockd: cannot monitor 10.207.80.179
Mar  8 04:03:31 demoserver kernel: lockd: failed to monitor 10.207.80.179
Mar  8 04:20:27 demoserver kernel: lockd: cannot monitor 10.207.80.179
Mar  8 04:20:27 demoserver kernel: lockd: failed to monitor 10.207.80.179

Further i came to know that t the server got rebooted couple of days ago for a reason, after reboot the nfslock services did not startup automatically. So manually we started the services. Note that  If the nfslock services need to get auto start after a reboot then we need to use chkconfig nfslock on. Later the same has been taken care. hence onwards whenever the server gets rebooted the nfslock services will automatically startup.

cat messages | grep rpc
Mar  8 07:01:43 demoserver rpc.statd[12667]: Version 1.0.9 Starting
Mar  8 07:01:49 demoserver rpc.statd[12667]: Caught signal 15, un-registering and exiting.
Mar  8 07:01:49 demoserver rpc.statd[12745]: Version 1.0.9 Starting

You can manage the nfslock services by below commands.

service nfslock status
service nfslock start
service nfslock stop

After making sure that the services got started and the client could able to lock the file on the NFS file system on the server. we re-triggered the export job. It executed successfully.



Wednesday, March 5, 2014

Killing process in Unix

 To check running process in Unix,

Command- ps –ef

Here we can use “grep” option to find out any particular process,

Example-

To find out running processes for apache,

root@sunpstsrv01# ps -ef | grep http



webservd   587   584   0   Sep 01 ?           0:00 /opt/csw/apache2/sbin/httpd -k start

    root   584     1   0   Sep 01 ?           0:47 /opt/csw/apache2/sbin/httpd -k start

  nobody  1498  1494   0   Sep 01 ?           0:00 /usr/local/apache2/bin/httpd -k start

webservd   586   584   0   Sep 01 ?           0:00 /opt/csw/apache2/sbin/httpd -k start

webservd   588   584   0   Sep 01 ?           0:00 /opt/csw/apache2/sbin/httpd -k start

nobody  8860  1494   0   Sep 02 ?           0:00 /usr/local/apache2/bin/httpd -k start

  nobody  1499  1494   0   Sep 01 ?           0:00 /usr/local/apache2/bin/httpd -k start

  nobody  8861  1494   0   Sep 02 ?           0:00 /usr/local/apache2/bin/httpd -k start

  nobody  1500  1494   0   Sep 01 ?           0:00 /usr/local/apache2/bin/httpd -k start

  nobody  1501  1494   0   Sep 01 ?           0:00 /usr/local/apache2/bin/httpd -k start

  nobody  1502  1494   0   Sep 01 ?           0:00 /usr/local/apache2/bin/httpd -k start

  nobody  2832  1494   0   Sep 01 ?           0:00 /usr/local/apache2/bin/httpd -k start

webservd  6031   584   0   Sep 01 ?           0:00 /opt/csw/apache2/sbin/httpd -k start





To find out parent & child processes in unix.



Command- ptree- To print process tree



Example-

root@sunpstsrv01# ptree 8860

1494  /usr/local/apache2/bin/httpd -k start

                 8860  /usr/local/apache2/bin/httpd -k start



Here in above example we took any process id “8860” and used ptree command, we can see pid “1494” is a parent process for child process “8860”


Using parent PID we can get all running child processes id’s.


Example-



root@sunpstsrv01# ptree 1494

1494  /usr/local/apache2/bin/httpd -k start              

                1498  /usr/local/apache2/bin/httpd -k start

                1499  /usr/local/apache2/bin/httpd -k start

                1500  /usr/local/apache2/bin/httpd -k start

                1501  /usr/local/apache2/bin/httpd -k start

                 1502  /usr/local/apache2/bin/httpd -k start

                2832  /usr/local/apache2/bin/httpd -k start

                8860  /usr/local/apache2/bin/httpd -k start

                8861  /usr/local/apache2/bin/httpd -k start

                8862  /usr/local/apache2/bin/httpd -k start



Here we can see all child PID’s associated with Parent process ID “1494”


To kill Parent & child process,


Command- kill -9 ‘PID’
 

Example-



To kill apache process,



root@sunpstsrv01# kill -9 1494
 

Here we are killing parent process running for apache.


Most of the time if we killed parent process then child process associated with that gets killed.

We can confirm that by using “ps –ef “ command.

  

Zombie process in Unix

 It is a process that has completed execution but still has an entry in the process table, allowing the process that started it to read its exit status.



When a process ends, all of the memory and resources associated with it are de-allocated so they can be used by other processes. However, the process entry in the process table remains. The parent is sent a SIGCHLD signal indicating that a child has died; the handler for this signal will typically execute the wait system call, which reads the exit status and removes the zombie.


Zombies can be identified in the output from the UNIX ps command by the presence of a “Z” in the STAT column.
 

Example-


ps -el | grep 'Z'

With a normal ps -el command you see an output with in the second colum the state of the process. Here are some states:



S : sleeping

R : running

D : waiting (over het algemeen voor IO)

T : gestopt (suspended) of getrasseerd

Z : zombie (defunct)

The output under this text is an example. We can see that dovecot-auth is the zombie.
 

[root@s324 /]# ps -el | grep 'Z'

F S   UID   PID  PPID  C PRI  NI ADDR    SZ WCHAN  TTY          TIME CMD

1 Z     0  1213   589  0  75   0    -     0 funct> ?        00:00:00 dovecot-auth

Here 2nd column “Z” indicates zombie process.
 

Most of the time zombie process can be killed by “kill -9 ‘Zombie PID’” but still if that zombie process is not being killed then we might need to restart that application related to process.

Oracle RAC node unavailable with error: Server unexpectedly closed network connection6]clsc_connect: (0x251c670) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_node2_))

 Early midnight I received a call from the monitoring team that one of the critical production database node is not available. As I am aware...