Finding the HACMP configuration instance #

It’s possible for the HACMP configuration between two different nodes to be out of sync.  Or, you may want to push a config from one node to another.  We had one admin make changes to a down node, then try to sync the cluster.  To clean it up, we had to figure out which node had the latest config. If you want to see which configuration instance number each of your HACMP nodes is using, you can run:

lssrc -l -s topsvcs | grep Instance

or:

odmget HACMPtopsvcs

secldapclntd Failure w/ HACMP

After upgrading to AIX 5.3 TL9 SP4, we found that secldapclntd will go into a death loop during a HACMP failover. It consumes more and more CPU until the system doesn’t have any capacity left, and stops the HACMP failover. Killing secldapclntd will let HACMP continue.

We didn’t see this behavior w/ AIX 5.3 TL8 SP3. IBM has identified a couple of issues that are probably coming together to cause our problem, but they won’t be fixed in TL9… ever. IBM’s work-around is to setup a pre and post-event script to stop secldapclntd before the IP takeover (and release) and restart it afterward. In testing, this works pretty well, and it only takes a few seconds to stop and start secldapclntd.

Testing Disk Heartbeats

To test your disk heartbeats, you can look at the output of “cllsif” or “lssrc -ls topsvcs”, or you can actively test them. IBM provides a command to do this. First, find the devices associated with the disk HB VG, I’ll assume hdisk4 on nodeA and hdisk5 on nodeB.

Enable cluster encryption

For more security you can make your cluster use encryption for inter-node communication with no downtime.  Otherwise operations are allowed or rejected based on IP address, hostname, and the cluster rhosts file.  And, C-SPOC operations are not encrypted one of the important ones being password changes.  Possibly an even better option would be to create a IPsec VPN tunnel between nodes, but I haven’t tested that.

Lazy Update – HACMP

On a multi-node HACMP cluster without enhanced concurrent VGs, anytime you add a LV to a volume group, you have to make sure the other nodes see the LV. This will also fix other VG out of sync issues. You can either take everything down and do an importvg on all the nodes, or you can do a “Lazy Update”: