My IBM MQ Cluster Ping Nagios script

Monitoring cluster health can be quite tricky. I have created a small Nagios script to help in the task. It uses the sample programs amqsput and amqsget, delivered with the IBM MQ installation

Preparations:
In this example I’m only going to use two queue managers QMBASE (full repository) and QMEXTERNAL (partial repository). Commands below are for runmqsc
QMEXTERNAL

DEFINE QA(QA.CLUSTER.PING) TARGET(TOP.CLUSTER.PING) TARGTYPE(TOPIC)
DEFINE QL(QL.CLUSTER.PING) CLUSTER(EXTERNAL)

QMBASE

DEFINE TOP(TOP.CLUSTER.PING) TOPICSTR('ping') CLUSTER(EXTERNAL) 
DEFINE SUB(SUB.CLUSTER.PING) TOPIC(TOP.CLUSTER.PING) TOPICSTR('#') DEST(QL.CLUSTER.PING)

How it works:
It puts a message on a queue alias, using amqsput, that is connected to a cluster topic on the base queue manager
A subscription to the topic picks up the message and put it on the cluster queue on the external queue manager
The message is then picked up by and checked. I also check the length of the message to check for more than one message

And here is the script:

#!/bin/bash

if [ $# -lt 1 ]; then
  echo "********************"
  echo "Cluster Ping"
  echo "********************"
  echo "NOTE!"
  echo "This script needs that all the MQ objects are in place (see below)"
  echo ""
  echo "QMEXTENAL                        QMBASE (REPOSITORY)"
  echo "QA.CLUSTER.PING            ->    TOP.CLUSTER.PING (CLUSTER)"
  echo "QL.CLUSTER.PING (CLUSTER)  <-    SUB.CLUSTER.PING(TOP.CLUSTER.PING)"
  echo ""
  echo "Usage: $0 <external queue manager>"
  echo ""
  echo "External Queue Manager: ex. QMEXTERNAL"
  echo ""
  echo "Ex. $0 QMEXTERNAL"
  exit 1
fi

# Define variables
qmanager=$1
inqueue=QA.CLUSTER.PING
outqueue=QL.CLUSTER.PING
timestamp=$(date +%s)
match=false
normal=true

#Send message to QMBASE
printf "%s\n\n" ${timestamp} | amqsput ${inqueue} ${qmanager} > /dev/null

msg=$(amqsget ${outqueue} ${qmanager})

if [[ ${msg} == *"${timestamp}"* ]]; then
  match=true
fi

if [ ${#msg} -gt 80 ]; then
  normal=false
fi

if [[ ${match} == "true" && ${normal} == "true" ]]; then
  echo "OK - Message received from cluster"
  exit 0
fi

if [[ ${match} == "true" && ${normal} == "false" ]]; then
  echo "WARNING - More then one message was received"
  exit 1
fi

if [[ ${match} == "false" ]]; then
  echo "ERROR - No message received!"
  exit 2
fi

echo "UNKNOWN - Script has not run correctly"
exit 3

Now, this will tell us that the cluster can transport messages to and from the base queue manager, but we need more to be able to feel safe (at least I do) so here are a few other things to look at:
* Keep track of the transmission queues so that they process messages. I have, so far, only worked with small clusters so monitoring queue depth of the SYSTEM.CLUSTER.TRANSMIT.QUEUE has been enough, but if you have more transmission queues you need to monitor them all.
* Keep track of the command queue SYSTEM.CLUSTER.COMMAND.QUEUE. This queue should also process messages and not only grow
* Look for error messages in the queue manager error log (AMQERR01.LOG). Here I look for the following codes (short description in parentheses):

- AMQ9465 (Failed republishing of cluster information)
- AMQ5534E (User ID 'anyuser' authentication failed)
- AMQ5542I (The failed authentication check was caused by the queue manager)
- AMQ9202E (Remote host 'anyhost (anyip) (anyport)' not available, retry)
- AMQ9259E (Connection timed out from host 'anyhost')
- AMQ9469W: (Update not received for CLUSRCVR channel any channel)
- AMQ9492E (The TCP/IP responder program encountered an error)
- AMQ9558E (The remote channel 'any channel' encountered a problem)
- AMQ9633E (Bad SSL certificate for channel 'any channel')
- AMQ9637E (Channel is lacking a certificate)
- AMQ9716E (Remote SSL certificate revocation status check failed for channel)

After all of the tests above I usually feel pretty confident that we have a working environment.

Tested on IBM MQ 9 and Red Hat 7

My Play Framework Systemd script

Ubuntu deprecated Upstart so I had to turn to Systemd for my app controls in Ubuntu 18.04. In this script I set 2 environment variables (HOME and LANG), change directory to the app directory and starts the Play Framework application

# Myapp systemd script
#
# Location:/lib/systemd/system/myapp.service
#
# Useful commands:
#
# Start Myapp: 		systemctl start myapp.service
# Stop Myapp:		systemctl stop myapp.service
# Restart Myapp:	systemctl restart myapp.service
# Show status:		systemctl status myapp.service
# Enable start on boot:	systemctl enable myapp.service
# Disable start on boot:systemctl disable myapp.service
#
# List all services running: systemctl
# Check config: systemd-analyze verify myapp.service
#
####################################################################################

[Unit]
Description=Job that runs my app daemon

[Service]
Type=forking
Environment=HOME=/opt/tankmin/app
Environment=LANG=en_US.UTF-8
ExecStartPre=/bin/bash -c 'cd /opt/myapp/app'
ExecStart=/bin/bash -c 'bin/myapp -J-Xms256M -J-Xmx768m -J-server -Dhttp.port=80 -Dconfig.file=conf/application.conf -Dlogger.file=conf/application-logger.xml'

[Install]
WantedBy=multi-user.target

The arguments for the Play service are what I normally use for AWS. You might need other settings

Tested on Ubuntu 18.04 and Play Framework 2.3

Add custom solution to IS Admin page

Since I do not do this often I will put the description of how here

1. Create a flow service called “callback” in your package (or actually anywhere the IS can run it)
This flow service should contain at least one MAP with the following values:

name - description of menu item. Example: MYKPI
text - name of menu item. Example MYKPI 
url - URL to dsp page. Example: ../MYPackage
target - name of start page. Example: index.dsp

2. Create a flow service called “addSolution”
This service should contain one service from the WmRoot package: wm.server.ui:addSolution with the input: path to your “callback” service

3. Create a flow service called “removeSolution”
This service should also contain one service from the WmRoot package: wm.server.ui:removeSolution with the input: path to your “callback” service

4. Lastly we need to make sure that the solution is added to the IS Admin page every time the package gets reloaded
In your manifest.v3 file at the root of your package (example: MYPACKAGE/manifest.v3) there is a line like this:

<null name="startup_services"/>

This needs to be changed to:


    <null name="MYPACKAGE.processing:addSolution"/>
 

5. Run the “addSolution” service once
The link to your solution should now be visible in the IS Admin (after a refresh)

6. Done

Tested on webMethods 9.9 on Red Hat Linux 7.5