Thursday, 22 November 2012

Performance Troubleshooting on Solaris 10

Here I would like to share performance troubleshooting experience that I had recently. The database environment comprises Extended Oracle RAC on 2 Solaris Sparc M8000 machines, running Solaris 10, each located in a separate data centre which are 1-2 miles apart from each other. The oracle clusterware version is 11.1.0.7.  There are 14 different database instances(some are 10gR2 and others are 11gR1) running on each node in the cluster, which are monitored by Enterprise Manager Grid Control.

The problem is noticed after Enterprise Manager sending EM Alerts with ORA-12170 error for all the instances running on one node; the instances on the other node are fine. There was another alert with the message "Agent is unrecheable, but the host is available".

The EM agent status is shown as running. There are about 20 OS processes with the command /install/app/oracle/product/agent10g/perl/bin/perl /install/app/oracle/product/. The ps -ef | grep agent will list all such processes. Even after shutting down the agent, these processes did not disappear. All these processes are killed using kill command. It did not improve anything.

The top command shows the load averages as 15 to 20. At normal times the load averages are observed in the range of 3 to 7. The machine has 2 quad-core CPUs. The top command output also shows few zombie processes, the no of zombie processes are different each time top output is refreshed. The process ids of these zombie processes are also different. The command ps -ef | grep defunct confirmed this.

The database application users started complaining about connection issues with time out errors.

The load average is not high, the response time of commands like ls, ps, oraenv is not bad but we observed 2 or 3 seconds wait time before any command (ls, ps, oraenv) is returning the control to the shell afer showing the output, if any. Connecting to any database instance as sysdba locally is also quick(fraction of second) but the quit command spends 3 to 5 secs to exit.

Then I have found the article with ID 1002436.1 on My Oracle Support site. (I don't remember which search terms I have used to get this page from the knowledge base. I was using keywords like process exit slow on solaris; zombie process on solaris)

As explained in the article, the file /var/adm/pacct was 2GB. It looked like we encountered the specified bug at OS level which has caused the performance problems. The adm account was locked which has cron job scheduled to process this file on regular basis and hence the file has grown big. The process accounting was turned off (/usr/lib/acct turnacct off) and the performance of the server has come down to normal. Later process accounting was turned on (/usr/lib/acct turnacct on) but the performance was stable.

The OS reboot might have solved this, but I am glad that we got to the bottom of what has caused the issue.


Thursday, 8 November 2012

Is NLS_LENGTH_SEMANTICS really dynamic?

The Oracle database instance initialization parameter NLS_LENGTH_SEMANTICS can be changed from BYTE to CHAR or vice versa dynamically. However the client sessions will not inherit this value until the database is restarted. 

Wednesday, 24 October 2012

DIA-48313 and DIA-48322 Errors while purging files using adrci

Few months back I have upgraded Oracle Clusterware 11.1.0.6 to Oracle Grid Infrastructure 11.2.0.3 on Solaris Sparc machines. I invoked adrci of the new Oracle Gridinfrastructure home and tried to purge the logs and trace files in the listener's diagnostics destination directory but encountered the following error.

On first machine:
$adrci

ADRCI: Release 11.2.0.3.0 - Production on Tue Oct 23 10:24:54 2012

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

ADR base = "/misc/oracle"
adrci> show home
ADR Homes:
diag/tnslsnr/xxx/listener_xxx
adrci> purge
DIA-48313: Updates not allowed on ADR relation [INCIDENT] of Version=3

adrci> migrate schema
Schema migrated.
adrci> purge
adrci> quit

On second machine:

$adrci

ADRCI: Release 11.2.0.3.0 - Production on Tue Oct 23 10:26:03 2012

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

ADR base = "/misc/oracle"
adrci> purge
DIA-48322: Relation [INCIDENT] of ADR V[2] incompatible with V[2] tool
DIA-48210: Relation Not Found
DIA-48166: error with opening ADR block file because file does not exist [/misc/oracle/diag/tnslsnr/yyy/listener/metadata/INCIDENT.ams] [0]

adrci> show home
ADR Homes:
diag/tnslsnr/yyy/listener
adrci> migrate schema
Schema migrated.
adrci> purge
adrci> quit


The error is due to the mismatch in the metadata and one can use "migrate schema" command(as shown above) to upgrade the metadata of the corresponding diagnostics destination to the level mandated by the invoked adrci utility.

Similarly for errors like "DIA-48318: ADR Relation [INCIDENT] of version=4 cannot be supported", one has to downgrade the schema using the higher level adrci(adrci>migrate schema -downgrade) and then use low level adrci to get away with the error.

Monday, 24 September 2012

Issues in installing Oracle Forms, Reports, and Discoverer 11.1.1.3.0 on 64 bit Windows 7

Recently I faced few issues while installing Oracle Forms, Reports and Discoverer 11.1.1.3.0 on 64bit Windows 7 system.
I followed the following steps during the installation.
1. Install Oracle Weblogic Server 10.3.3. Download
2. Install Oracle Forms, Reports, and Discoverer 11.1.1.2.0. During the installation choose Install only option, don't configure it.Download
3. Apply Oracle Forms, Reports, and Discoverer 11.1.1.3.0 patchset.
4. Run the configuration wizard.

The configuration wizard is stalled in the following screen at "Creating the domain"



The following is the last entry in the log file.
LOADING DLL : C:\Oracle\Middleware\as_1\install\config
StartUtil64.dll

This seems to be a known issue. According to MOS article ID 953350.1, I have removed entries containing the string "(x86)" from PATH environment variable, reinstalled the above but the problem persisted.

Next time, I have removed entries containing the string "(x86)" from CLASSPATH environment variable, reinstalled the above and the configuration wizard completed successfully this time.