FAQ ICS Application Monitoring
Background, Keeping Servers Running
Many businesses rely on TCP/IP server applications to be continually running unattended, often on remote hosted cloud computers. But those server applications are not always reliable, despite our best development practices, and sometimes have errors we try to catch and recover from, but sometimes recovery is not possible and the server either crashes or hangs.
Public servers are continually being attacked by hostile and white hat hackers, looking for vulnerabilities to compromise our servers, I have over 200 blocked hackers daily, some making thousands of requests a day.
For Windows, most server applications are written as Windows services which includes automatic recovery options, so that services are repeatedly restarted if they stop unexpectedly, the computer can even be restarted.
But our applications don't always crash and stop cleanly, sometimes the message loop on which all ICS applications depend can stop so the server becomes non-responsive, or memory may be corrupted and that prevents the application closing. There are programming techniques to try and stop this happening, a monitoring thread for instance, or deliberately stopping the application after errors so Windows restarts it. Mostly these work, but not always. So external monitoring may be necessary.
So ICS V9.3 now a new TIcsAppMonCli client monitoring component and IcsAppMon monitoring server.
Application Independent Monitoring
The ICS Application Monitor server IcsAppMon is designed to monitor any ICS applications using the TIcsAppMonCli client component, no configuration of the server is needed other than setting it's listening IP address and email account details, and the server has no prior knowledge of applications that may connect to it.
The server broadcasts it's availability, IP address and port to clients by three methods: Windows HLM registry entries, a named Windows message, and optionally a UDP broadcast message. The Windows registry and named messages are only valid for applications on the same computer, UDP messages can be broadcast to the local LAN.
When clients access the server by the broadcasted IP address, they first send a HELLO request, containing the required operating mode, currently 'Monitor Only', 'Non-Stop Monitor' or 'Installation', and provide all the information the server needs, such as application name, executable file name, Windows Service name, Windows Handle, Process ID, etc, and general information the client can add for information, and the client then keeps the server socket connection open. Note Non-Stop and Installation modes are only accepted it the application is running on the same computer as the server, and the server has the system rights to control the application.
Once accepted for monitoring, the client sends a small PING packet once a minute so the server knows it's running OK, with a STOP packet sent if monitoring should be stopped. If the client wants to be deliberately restarted, for instance after an unexpected exception or terminal error, it sends a special STOP packet that initiates an application restart process.
If the client connection is closed without a STOP packet or the PING packets stop for a while, the server will attempt to restart the client, by first stopping the Windows service or program, waiting for it to stop, and then starting the service or program again after a short delay. If the Windows service does not stop cleanly due to being non-responsive, the server will attempt to terminate the program by process ID to ensure it stops.
When sending packets, the client may add an email message and/or comments that the server will send to the admin email address, perhaps application start-up or close down information, error information, anything useful really. The server also sends admin emails when clients start and stop monitoring.
Not implemented yet, but installation mode will be similar to restarting, except the exe file is replaced by a new version while being restarted, allowing the application to update itself with a new version. Longer term, there may be application updating component to include downloading new versions and support to install multiple files.
The IcsAppMon server can potentially monitor an unlimited number of applications whose status will be available (soon) on a continuously refreshed web page using Websockets. There is a USERINFO command allowing the client and server to exchange application defined information.
For restarting to work correctly, the TIcsAppMonCli client component MUST always send a STOP/No-Restart command before stopping normally, otherwise the application will be continually restarted. While the regular PING can be easily sent from a timer, this should only be done if the application is actually doing something useful, in the OverbyteIcsDDWebService sample PING is only sent if the web server ListenAny method indicates listening sockets are working.
If the IcsAppMon server is only used for monitoring, it does not matter how it is used. But for Non-Stop monitoring of applications that requires the stopping and restarting of applications, administrative level access is required to start and stop Windows services, and Windows services can not easily start GUI applications. So effectively, the server needs to be installed as a Windows service to monitor other Windows Services, which is how most ICS server applications should be designed, for continual running.
This IcsAppMon server and client are still under development and testing. The final version will extract the actual server parts of this application into a new TIcsAppMonSrv server component that can be added to other applications for more complex requirements. But the IcsAppMon server is fully functional and being tested on live public servers monitoring ICS web, FTP and proxy servers.