Tuesday, December 23, 2008

Fun with RAC on Windows (part 2)

Another issue during the installation with RAC on Windows was the fact that the VIPCA failed. Of course both machines were pingable so there was no obvious error.

After some searching the following emerged: the VIP's for both nodes were all on one node.

While trying to move the VIP to the other node it showed that the network was called "Team A + B" on machine 1 and "Team A+ B" on machine two. Apparently under windows the network name is used as well when it comes to the VIPCA.

After fixing this error the VIPCA completed and the resource could be moved to the second node.

Followers

I added a nice gadget to the sidebar. If you are a follower of this blog just get on it.

Monday, December 22, 2008

Fun with RAC on Windows (part 1)

After having some "nice" experiences with the installation of RAC on Windows I like to share this with you - so you might find a solution easier.

The environment is easy, but not without some nice "features".

We have a three node RAC cluster with two machines in room 1 and the third machine in room 2 (both on the same site).

We did not run cluvfy as this very often gives errors when everything is OK.

On the SAN we will start with only one voting disk, and add two more later. Also the OCR will be mirrored. Furthermore we intend to use OCFS on some of the LUN's of the SAN.

Well, the installation went ok, the other machine was found, etc etc etc.
However the first configuration assistant failed.
Research showed some error - the usual stuff about node connectivity.

Stopped the OUI and retried from the command line.

Found the following errors in the evmd.log

Oracle Database 10g CRS Release 10.2.0.1.0 Production Copyright 1996, 2005 Oracle. All rights reserved.

2008-12-17 23:33:27.430: [ EVMD][4900]32EVMD Starting

2008-12-17 23:33:27.445: [ EVMD][4900]32

Oracle Database 10g CRS Release 10.2.0.1.0 Production Copyright 1996, 2004, Oracle. All rights reserved

2008-12-17 23:33:28.414: [ COMMCRS][1456]clsc_send_msg: (00000000032B8BA0) NS err (12571, 12560), transport (533, 57, 0)


The TNS error did look promising but it was not related to the 12571 or 12560.

The solution was that the hostname of the second node was looked up as a FQDN in the OUI, while the command line looked for the hostname.
As both were one way or the other not correctly administrated in the DNS (other department did this) the OUI failed.

More too follow.

Monday, November 24, 2008

DMS aggrespy

In a standalone OC4J installation (e.g. SOA Suite) the dms can be reached with
http://host:port/dmsoc4j/AggreSpy

If the access is not allowed than a config in the $OH/Apache/Apache/conf/dms.conf does exist which states: Deny from all

Friday, November 21, 2008

What you see is not what you expected

Had a nice issue - which demonstrated that looks can be deceiving.

My colleague asked me to increase the size of a certain tablespace as it was full.
I added a datafile in ASM for this tablespace but put it on autoextend = ON .

He complained that it was still on 100 % full.

A second I thought I messed up things (happens to the best :-)

Then another look solved the problem:
The query - which apparently was used in a lot of Nagios installations - checked only the
datafiles which were on autoextend = no .


The following query was used in Nagios.

select d.tablespace_name "TABLESPACE",
sum(d.bytes)/1048576 "SIZE (M)",
100 -(nvl(round((FREESPCE/(sum(d.bytes)/1048576))*100),0)) "USED (%)"
FROM dba_data_files d,
( SELECT round(sum(f.bytes)/1048576,2) FREESPCE,
f.tablespace_name Tablespc
FROM dba_free_space f
GROUP BY f.tablespace_name)
WHERE d.autoextensible IN 'NO' AND d.tablespace_name = Tablespc (+)
group by d.tablespace_name,FREESPCE,d.autoextensible
order by 1 desc



So, never belief a monitoring tool - unless you have double checked the underlying query.

Sunday, November 02, 2008

OC4J status URL's

http://localhost:7200/oc4j-service?cmd=Getprocs
Show the processes

http://localhost:7200/oc4j-service?cmd=p
Show the status

Saturday, November 01, 2008

Import large amount of data

For a project we needed to import a large amount of data into a RAC database.

A certain partitioned table with some 4 million rows took ages.

Grid Control showed a large value for the log buffer waits.

So I increased the FAST_START_MTTR_TARGET to reduce the number of checkpoints which worked almost immediately.

Another possibility could have been to increase the log_buffer size (14 M to 32 M) as the ADDM suggested. Maybe I do this as a structural change. Right now changing the parameter on the instance where the import happened solved the problem.

RAC is beautiful

More often than once you will encounter the issue that you need to change an init-parameter that cannot be changed during an active state.


SQL> alter system set sessions=500 scope=both;
alter system set sessions=500 scope=both
*
ERROR at line 1:
ORA-02095: specified initialization parameter cannot be modified

Well - the beauty of RAC is of course that you can change init parameters in the spfile and bounce an instance. Your users won't notice - as the database is still available while you accomplish the change of the init parameter.

SQL> alter system set sessions=500 scope=spfile sid='ORCL4';

System altered.

SQL> shutdown immediate

SQL> startup
ORACLE instance started.

Total System Global Area 1862270976 bytes
Fixed Size 2072096 bytes
Variable Size 436208096 bytes

...

Saturday, October 18, 2008

Upcoming article over the history of Oracle on Linux

I'm just busy preparing an article on the history of Oracle on Linux for the Dutch magazine We Love IT.

This article will be pulished in edition number 5 of this excellent Oracle & Java magazine.

If you want to take a look at We Love IT Oracle & Java Magazine
you can also read past editions on-line or register for a free paper copy.

Wednesday, September 24, 2008

OOW2008 goes Tron 3.0

One of the things that has gotten me into the IT world was the movie TRON.

So you can imagine how it feels when you become part of the movie ;-)

Check out the following photo (it is a little bit dark - but we're inside the computer).

You know you are a nerd when ....

you attend the Oracle ACE's dinner and a lot of the Oracle GODS that make this community work are around.

You are even a bigger nerd when you go to another reception and try to tell your manager that you just had dinner with the who-is-who of the Oracle world and all your name-dropping just brings empty looks to your managers faces.

Are you experiencing the same?

Beehive

One of the big announcements this year on OOW is of course Beehive. As I did a lot in the area of the Oracle Collaboration Suite I am very curious what Beehive will bring.

I had a look on the Oracle Demogrounds on OOW and it definitely looks impressive. I will get my hands dirty as soon as I'm back home. My first impression is that Beehive is what the Collaboration Suite always intended to be.

Tuesday, September 23, 2008

OOW2008 in full swing

Since Sunday Oracle Open is in full swing. Sessions, hands-on labs, exhibitions, user groups, receptions, parties, and best of all: ALL OF YOU!

I meet so many people that I have heard of, people who have blogs as well, familiar faces from Oracle, lots of folks from my company and of course from other companies and Oracle customers.

Content of the sessions is usually very good, and again you get the insider information from Oracle product managers, and partners who implemented some cool pieces of Oracle software in their projects.

Hope you all enjoy the show and I also hope to see as many of my readers here in sunny San Francisco!

cu
Andreas

Sunday, September 21, 2008

Metalink is upgraded

Starting with OOW the Metalink interface was uograded. Although this was already available for a while as a beta and as metalink3.oracle.com it has now entered mainstream.

Let's see what else will be changed during this OOW.

Thursday, August 07, 2008

RAC and ORA-1102

Hit the following problem:

After a manual install of a database the database and its instances were not registered with the CRS.

Using srvctl I registered the database and its instances in the CRS.

Unfortunately I could not start the database or and instance. Well I could start one instance but none of the other three. Same thing happened when I used another instance (start inst1 but not inst2, inst3 and inst4, then start inst4, but not inst1, inst2, inst3).

Then I tried to start one of the other instances manually. See what happened:

SQL> startup
ORACLE instance started.

Total System Global Area 838860800 bytes
Fixed Size 2074992 bytes
Variable Size 218105488 bytes
Database Buffers 612368384 bytes
Redo Buffers 6311936 bytes
ORA-01102: cannot mount database in EXCLUSIVE mode

I found out that the culprit was an nice combination of init parameters.

CLUSTER_DATABASE_INSTANCES=4
CLUSTER_DATABASE=NO

So my explanation was that the database was willing to start one instance, but none of the other three.

So how could this happen?
Well, our partner came with its own setup tool. This is a mix of shell scripts that are governed by a collection of XML files.
As this is their first RAC environment I guess that they usually set CLUSTER_DATABASE to NO while in some other part they count the number of nodes where the instances will run. Hence the logical difference in their init parameters.

As they had pfiles on all nodes the solution was very easy. Correct the CLUSTER_DATABASE=YES, create a spfile and use that.

Sunday, July 06, 2008

AS Control in a Cluster

When you cluster two Oracle Application Servers (10.1.3) you will receive a warning that it is not recommended (supported) to have two Application Server Controls in this cluster.

This week I had the need to enable the second ASC as the deployment of an application failed if it was carried of from the remote ASC.

This is how I did this.

Go to the server.xml in j2ee/home/config and change
< application name="ascontrol" path="../../home/applications/ascontrol.ear" parent="system" start="false" / >

into

< application name="ascontrol" path="../../home/applications/ascontrol.ear" parent="system" start="true" / >

Friday, June 27, 2008

MRCA has problems with RAC

Had a problem with the installation of MRCA in a RAC cluster.



This can happen when you are installing the MRCA in a RAC environment and the total length of the description string is longer than 239 chars.


Found this in the logfile of the MRCA assistant:

[SQLPlusAction] Connect string: (DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=abcdefghijklmno01)(PORT=1582))(ADDRESS=(PROTOCOL=TCP)(HOST=abcdefghijklmno02)(PORT=1582))(ADDRESS=(PROTOCOL=TCP)(HOST=abcdefghijklmno03)(PORT=1582)))(CONNECT_DATA=(SERVICE_NAME=INFRA)))


SQL> string beginning "(DESCRIPTI..." is too long. maximum size is 239 characters.

Of course the above hostnames are fake (we have different ones, but just as long).

The strange thing is that not all parts fail. So apparently each assistant uses a different method.

Solution was to use just one or two of the hostnames in the MRCA assistant. It is a RAC cluster - so even one host would be sufficient.

Secure Grid Control agents

I encountered another strange thing in the Grid Control.
The intention was to secure the communication between the agents on AIX and the OMS. However on one machine this was working, while the other two machines where I installed the agent succeeded in putting the agent in secure mode but did never upload their data.

Using

emctl status agent -secure

I found out that they thought that they were not in secure mode.

The solution was pretty simple (and another proof that Oracle has to find some unique way to resolve network addresses)..


In my hostfile the hosts that did not work were described as
10.1.2.3 hostname hostname.domain.com

I changed this to be
10.1.2.3 hostname.domain.com hostname

resecured everything (OMS and agents) and it then it worked.

Monday, June 23, 2008

Hot of the press

I will be presenting at the Oracle Open World in San Francisco in September.

Session details:

Session ID: S300515
Session Title: How to Achieve 99.99 Percent Availability
Track: Database

More to come ....

Monday, June 16, 2008

Firefox - Google maps - skype

Hi folks!

I switched to Firefox RC2 and all of a sudden the Google Maps stopped.

I found out that the Skype Extension was messing it up. So if you have a similar problem (or switch to Firefox 3 from 17-JUN-2008) just try this to fix the problem.

Just wanted to let you know.

Sunday, June 08, 2008

Unpacking Oracle cpio for AIX

When extracting a cpio file with Oracle software from OTN on AIX you might encounter the following problem:

oracle@mymachine-app:/install/oracle/MRCA>cpio -idmv < ../as_ibm_aix_mrca_101203_disk1.cpio


cpio: 0511-903 Out of phase!
cpio attempting to continue...


cpio: 0511-904 skipping 642010 bytes to get back in phase!
One or more files lost and the previous file is possibly corrupt!

Segmentation fault

The solution is to use the option -idcmv
oracle@mymachine-app:/install/oracle/MRCA>cpio -idcmv < ../as_ibm_aix_mrca_101203_disk1.cpio


c
Reads and writes header information in ASCII character form. If a
cpio archive was created using the c flag, it must be extracted
with c flag.

Saturday, June 07, 2008

Oracle dbca does not create the archive directory

Found out that the DBCA does not create a specific ARCH location when you do not use the default one.

When you are in the dbca and edit the init parameter




So after my nice little database was frozen I created the arch directory in the ASM and voila, everything was back to normal.

Wednesday, May 28, 2008

Default character set used in the DB

As I always forget this (and have to look for it) I will put it into my blog - maybe I will memorize it now ;-)

SELECT value$ FROM sys.props$ WHERE name = 'NLS_CHARACTERSET';

Sunday, May 18, 2008

Oracle Data Guard screw up

Setting up Data Guard with Maximum Protection requires the Data Guard Broker.

Having setup Data Guard a couple of times without the broker I thought "how difficult can it be?" and started.

After some preparations I started the dgmgrl in order to configure the various sites.
To my astonishment no command at all was recognized. I have to admit that I even went back to the manuals to see if it was my fault. And then - shame on me - I tried to use the help function inside the dgmgrl.

See what happens then:



WHAHAHAHA!


Ok, so either I was seriously stupid or there was an error - probably a relink issue.
A good friend of mine at Oracle PTS - Robert Pastijn - offered the needed help and pointed out that a bug was introduced on AIX in the upgrade from 10.2.0.1.0 to 10.2.0.2.0 . The patch fixed this problem.

So after all Oracle did not (yet) follow the classic Microsoft error: Error - No Error.

Wednesday, April 23, 2008

IP with VMWare

Lately I ran into a strange problem.

I switched on my Wireless Network on my laptop only to find out that every time it receives a hostname localdomain
and an IP address of 192.168.2.174 .

Of course I blamed Windows as this is often the problem.
However I found out that the problem is the DHCP Server of the VMWare Server.
This seems to assign DHCP leases not only for the networks towards the Virtual Machines but also to the host as well.

The only remedy is to shutdown the Windows service of the VMWare DHCP Server when requesting a new IP address.

Setup Oracle RAC - problem with runcluvfy.sh

When running the runcluvfy.sh I received the following error:

/tmp does not exist and is not writable

The reason for this is the fact that during the setup of the user equivalence the ssh to the own host was not performed. This leads to the issue that the commands in the runcluvfy.sh are not issued on the localhost. This leads to the strange error, as the /tmp directory did of course exist.

Friday, April 18, 2008

Change the instancename of the Oracle Application Server

During the installation of an Oracle Application Server I mistyped the instancename. I was not so keen on re-installing everything. So I tried to change the name of the instance. Mind you - I'm talking about the name that is displayed when issuing an

opmnctl status

The solution is pretty easy: Edit the ias.properties and the opmn.xml . Change the instancename there and restart the opmn.

Sunday, March 02, 2008

User interface design @ Oracle - or how to distinguish blue on blue

Came across the following last week - and I have to admit that this is something which made my eyes hurt.

When you are in the Policy Manager of the Oracle Identity Management Suite and you want to add a group you REALLY need to know that there is a hidden link on this page. Try to find it yourself in the next picture:



Found it?

No?
Let me give you a hint. Look in the blue band and search for some blue text :-)

I'm not sure if anybody knew about this. I checked the documentation, and nothing about adding a group to such an Authorization Rule for groups is written in there.

I wonder if anybody at Oracle knows about this hidden gem.
I will log an SR with Support and also mail some people I know at Oracle to discuss this perfect piece of user interface design.

Saturday, February 23, 2008

A good old fashioned switchboard

Nowadays you see a lot of those nifty no-nonsense datacenters where everything is ordered and no room is left for human ingenuity.

However - once in a while you encounter some good old-fashioned folks, who think that intelligence rules over the chaos.

Here is a lovely picture from one of my customers.

Well - in order to protect the innocent let me say that their network is ok!