Monday, April 24, 2006

Daily error Oracle Server 10g(rdbms) + OHS + HTML DB on Linux x86

OHS - Oracle HTTP Server (basically Oracle Managed Apache Httpd)
OPMN - Oracle Process Manager and Notification Server
ONS - Oracle Notification Services

Well, one day I discovered that I had run out of space on my server.
"That's impossible," I said to myself. Of course it was, but impossible things tend to happen and well, as I discovered, OHS had produced 1.5GB logs every week until now. And the error came to my attention only because I ran out of disk space. (The latter is probably my own problem as I don't have the useful habit of reading all my server logs daily. I'd perfer something bit more imaginative.)

This case is noteworthy as it is a standard by-the-book installation onto SLES9 with 10gR2 rdbms, OHS and HTML DB upgraded to 2.0. So this is the default installation that gives us such a situation.

The log file being filled was $ORACLE_HTTP_HOME/opmn/logs/ons.log

-rw-r--r-- 1 oracle oinstall 786M 2006-04-24 17:19 ons.log
-rw-r--r-- 1 oracle oinstall 1.5G 2006-04-08 04:00 ons.log.06-04-08_04:00:11
-rw-r--r-- 1 oracle oinstall 1.5G 2006-04-14 15:28 ons.log.06-04-14_15:28:31
-rw-r--r-- 1 oracle oinstall 1.5G 2006-04-21 02:59 ons.log.06-04-21_02:59:47


The message being written hundreds and thousands of times over and over was
06/04/24 17:42:02 [4] Local connection 0,127.0.0.1,6113 missing form factor


And "netstat -na" gives me a throng of queries waiting for the better world...
tcp 0 0 127.0.0.1:6113 127.0.0.1:50201 TIME_WAIT

x times 100 or perhaps 1000. Didn't count, but it is apparent they seem to be origined from port 6113, which is defined as the ONS service under OHS.



So what's the matter?

I have Oracle Server and OHS with HTML DB installed and it appears that Oracle Notification Services is trying (or one of it's clients?) very hard to connect to localhost, but failes because of the listener conflict.

When OHS and databse are installed on the same box, the installer will mistakenly configure identical ONS ports in both homes which creates an operational conflict when both the HTTP ONS and database listener services are running.


Solution

I unsubscribed from ONS with my database listener. To do that, I added the following row to my database listener.ora:
SUBSCRIBE_FOR_NODE_DOWN_EVENT_=OFF


Now "netstat -na" is much more reasonable, but I still cannot start opmnctl.


So I went ahead and changed notification server ports by +1
$ORACLE_HTTP_HOME/opmn/conf/opmn.xml:



And then "./$ORACLE_HTTP_HOME/opmn/bin/opmnctl startall"
which responded "opmnctl: starting opmn and all managed processes..."

The ons.log said now only "06/04/25 10:23:38 [4] ONS server initiated".

So my OHS and access to HTML DB was up and running once again.

PS
When my database couldn't write anymore to the hard disk, I couldn't even log in with "sqlplus / as sysdba" because that connection needs to write an audit file. Having 0B of free space makes that somewhat troublesome. So I went ahead and deleted one of the massive 1.5GB logfiles and started the database once again as it had shut down because it couldn't expand one tablespace nomore.
I was very relieved to see that it didn't take any recovery as I haven't backed up anything (it's a development database). Saved me a lot of trouble and well.. it showed me that even the smallest dev-base should have a minimal backup at hand. So I'm off to backup some data without delay.

Ignite my anger with your delay
and punishments will come your way..
-- Ayreon "Into the Electric Castle"

0 Comments:

Post a Comment

<< Home