Shafiulla Syed's Technical Blog.: March 2014

Saturday, March 8, 2014

NFS locking issue while data pump export - Linux-x86_64 Error: 37: No locks available

Yesterday, as usual the cron job triggered a datapump export job against a database on a Linux Server.
Immediately post running the export job it got failed. When i look into the dump logfile i found below sort of errors.

Export: Release 11.2.0.3.0 - Production on Sat Mar 8 05:53:37 2014

Copyright (c) 1982, 2011, Oracle and/or its affiliates. All rights reserved.
;;;
Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning and Automatic Storage Management options
ORA-39000: bad dump file specification
ORA-31641: unable to create dump file "/oraexp/NTLSNDB/Data_Pump_Export_NTLSNDB_FULL_030814_0553_01.dmp"
ORA-27086: unable to lock file - already in use
Linux-x86_64 Error: 37: No locks available
Additional information: 10

I verified at database level whether the dump directory, its path and the proper read & write privileges are granted on the directory. Yes everything was fine at database end.

I believed this could be an issue of nfs mount option at OS level. We are using an NFS shared mount point for all of the servers it needs to get mounted with proper options on each server to get it used by the database. I could see this mount point is mounted properly with recommended options by Oracle.

Then i checked the logs at OS level, then i found the issue is related to nfslock services. The nfslock service is not running on this database. this service helps the client to lock a file in the related NFS mount point on the server to create a file and make write operations.

>cat messages | grep lockd
Mar 8 04:03:31 demoserver kernel: lockd: cannot monitor 10.207.80.179
Mar 8 04:03:31 demoserver kernel: lockd: failed to monitor 10.207.80.179
Mar 8 04:20:27 demoserver kernel: lockd: cannot monitor 10.207.80.179
Mar 8 04:20:27 demoserver kernel: lockd: failed to monitor 10.207.80.179

Further i came to know that t the server got rebooted couple of days ago for a reason, after reboot the nfslock services did not startup automatically. So manually we started the services. Note that If the nfslock services need to get auto start after a reboot then we need to use chkconfig nfslock on. Later the same has been taken care. hence onwards whenever the server gets rebooted the nfslock services will automatically startup.

cat messages | grep rpc
Mar 8 07:01:43 demoserver rpc.statd[12667]: Version 1.0.9 Starting
Mar 8 07:01:49 demoserver rpc.statd[12667]: Caught signal 15, un-registering and exiting.
Mar 8 07:01:49 demoserver rpc.statd[12745]: Version 1.0.9 Starting

You can manage the nfslock services by below commands.

service nfslock status
service nfslock start
service nfslock stop

After making sure that the services got started and the client could able to lock the file on the NFS file system on the server. we re-triggered the export job. It executed successfully.

Wednesday, March 5, 2014

Killing process in Unix

To check running process in Unix,

Command- ps –ef

Here we can use “grep” option to find out any particular process,

Example-

To find out running processes for apache,

root@sunpstsrv01# ps -ef | grep http

webservd 587 584 0 Sep 01 ? 0:00 /opt/csw/apache2/sbin/httpd -k start

root 584 1 0 Sep 01 ? 0:47 /opt/csw/apache2/sbin/httpd -k start

nobody 1498 1494 0 Sep 01 ? 0:00 /usr/local/apache2/bin/httpd -k start

webservd 586 584 0 Sep 01 ? 0:00 /opt/csw/apache2/sbin/httpd -k start

webservd 588 584 0 Sep 01 ? 0:00 /opt/csw/apache2/sbin/httpd -k start

nobody 8860 1494 0 Sep 02 ? 0:00 /usr/local/apache2/bin/httpd -k start

nobody 1499 1494 0 Sep 01 ? 0:00 /usr/local/apache2/bin/httpd -k start

nobody 8861 1494 0 Sep 02 ? 0:00 /usr/local/apache2/bin/httpd -k start

nobody 1500 1494 0 Sep 01 ? 0:00 /usr/local/apache2/bin/httpd -k start

nobody 1501 1494 0 Sep 01 ? 0:00 /usr/local/apache2/bin/httpd -k start

nobody 1502 1494 0 Sep 01 ? 0:00 /usr/local/apache2/bin/httpd -k start

nobody 2832 1494 0 Sep 01 ? 0:00 /usr/local/apache2/bin/httpd -k start

webservd 6031 584 0 Sep 01 ? 0:00 /opt/csw/apache2/sbin/httpd -k start

To find out parent & child processes in unix.

Command- ptree- To print process tree

Example-

root@sunpstsrv01# ptree 8860

1494 /usr/local/apache2/bin/httpd -k start

8860 /usr/local/apache2/bin/httpd -k start

Here in above example we took any process id “8860” and used ptree command, we can see pid “1494” is a parent process for child process “8860”

Using parent PID we can get all running child processes id’s.

Example-

root@sunpstsrv01# ptree 1494

1494 /usr/local/apache2/bin/httpd -k start

1498 /usr/local/apache2/bin/httpd -k start

1499 /usr/local/apache2/bin/httpd -k start

1500 /usr/local/apache2/bin/httpd -k start

1501 /usr/local/apache2/bin/httpd -k start

1502 /usr/local/apache2/bin/httpd -k start

2832 /usr/local/apache2/bin/httpd -k start

8860 /usr/local/apache2/bin/httpd -k start

8861 /usr/local/apache2/bin/httpd -k start

8862 /usr/local/apache2/bin/httpd -k start

Here we can see all child PID’s associated with Parent process ID “1494”

To kill Parent & child process,

Command- kill -9 ‘PID’

Example-

To kill apache process,

root@sunpstsrv01# kill -9 1494

Here we are killing parent process running for apache.

Most of the time if we killed parent process then child process associated with that gets killed.

We can confirm that by using “ps –ef “ command.

Zombie process in Unix

It is a process that has completed execution but still has an entry in the process table, allowing the process that started it to read its exit status.

When a process ends, all of the memory and resources associated with it are de-allocated so they can be used by other processes. However, the process entry in the process table remains. The parent is sent a SIGCHLD signal indicating that a child has died; the handler for this signal will typically execute the wait system call, which reads the exit status and removes the zombie.

Zombies can be identified in the output from the UNIX ps command by the presence of a “Z” in the STAT column.

Example-

ps -el | grep 'Z'

With a normal ps -el command you see an output with in the second colum the state of the process. Here are some states:

S : sleeping

R : running

D : waiting (over het algemeen voor IO)

T : gestopt (suspended) of getrasseerd

Z : zombie (defunct)

The output under this text is an example. We can see that dovecot-auth is the zombie.

[root@s324 /]# ps -el | grep 'Z'

F S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD

1 Z 0 1213 589 0 75 0 - 0 funct> ? 00:00:00 dovecot-auth

Here 2^nd column “Z” indicates zombie process.

Most of the time zombie process can be killed by “kill -9 ‘Zombie PID’” but still if that zombie process is not being killed then we might need to restart that application related to process.

Shafiulla Syed's Technical Blog.

Saturday, March 8, 2014

NFS locking issue while data pump export - Linux-x86_64 Error: 37: No locks available

Wednesday, March 5, 2014

Killing process in Unix

Oracle RAC node unavailable with error: Server unexpectedly closed network connection6]clsc_connect: (0x251c670) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_node2_))

Search This Blog