Good monitoring and alerting is an essential, but often under-loved, part of any computing infrastructure. The complexities and multiple dependencies of even a straight-forward MIIS installation make systematic monitoring absolutely essential.
Server Health
Obviously you will be monitoring that the server itself is actually up. I believe something a little more than a ping is required to confirm the server is alive and well, so monitor key services such as MIIS and SQL Server.
Disk space monitoring is critical as a full partition will stop all MIIS activity. The SQL log drive (which you should have on completely separate disks to your data, as per SQL best practises) can fill up alarmingly quickly and needs to be checked regularly. You should be alerted at 85-90% capacity on your Data drive, and 50% on your Log drive.
CPU and Memory are less critical as MIIS won’t stop, it will just run slower. You should, however, be collecting stats over the long term so you can assess the performance of the server.
Application Events
There’s some sort of Logging class in MIIS, but I actually never used it because I was happy with the messages in the Application Event Log. I just set a watch for particular events and that let me know when there were sync and export errors.
Scheduled Tasks
If you are running any kind of scheduled tasks around MIIS you must monitor them to make sure they are actually happening. An absolutely critical one is the clear-down of the Run History. I set a watch on the log file to verify that it runs successfully every night.
SQL
Regular SQL maintenance tasks should be monitored, as well as any replication jobs or scheduled DTS packages. I believe this can all be done with native SQL tools, though I can’t say for sure as I’ve always left it up to the DBA!
Monitoring Software
I used Sitescope very successfully to do all the monitoring listed above, with the exception of the SQL stuff (which, as I said, was the DBA’s domain). I cannot comment on the effectiveness of any other package, but if you’re evaluating, look for something that can monitor:
- services,
- server physicals – memory, cpu, disk utilisation,
- the server event log, and
- log files.