Friday, December 13, 2019

Heartbeat/watchdog service in MS Excel

We have already covered automating Excel tasks at work -- and indeed it can be a life saver (or rather, work-life balance saver) not to have to be present at work every single day at time XX:XX no matter what only to have to push a few buttons.

However, there is an inconvenience that no automation is 100% fool-proof, especially in a corporate environment. The automation computer may be restarted (by your colleagues or your IT department who decided to push some security updates). It may lose power (because the cleaning personnel got a bit too vigorous with their brooms). It may crash, or freeze, or have a hardware failure. It is particularly nasty is the automation computer is a shared terminal that only gets used once in a while, so a user log-off due to restart may not be immediately detected. And of course once the user is logged off, no code nothing written by that user can be invoked any more, and needless to say that admin access to all computers in the corporate is strictly verboten.

One way of getting around the inconvenience would be to implement a "heartbeat" or "watchdog" service that would periodically query the availability of your automation machine and "phone home" if it goes offline.

The general idea is as follows:
  • The "server" is an Excel workbook that does nothing and resides in the user's private folder on the shared drive (most corporate environments have that), so that it's accessible from any computer in the office but only by the particular user. This worksheet contains no code, but is configured to open at startup on the automation machine.
  • The "client" is another Excel workbook that runs on the user's regular desktop (which is assumed to be operational). 
  • The client polls the server by trying to open the server workbook (assuming that if it's locked for editing, it's open on the automation machine so the latter is operational). 
  • Should the server workbook become unlocked, this means that the automation machine has logged the user off and automation won't run. The client then informs the user by sending an email. 
To determine whether a file is open on the automation machine, I use the following macro (adapted from this idea):


Function IsFileOpen(filename As String)
    Dim filenum As Integer, errnum As Integer
    On Error Resume Next   ' Turn error checking off.
    filenum = FreeFile()   ' Get a free file number.
    ' Attempt to open the file and lock it.
    Open filename For Input Lock Read As #filenum
    Close filenum          ' Close the file.
    errnum = Err           ' Save the error number that occurred.
    On Error GoTo 0        ' Turn error checking back on.
    ' Check to see which error occurred. Open for 
    IsFileOpen = (errnum = 70)
End Function

The periodic polling code may look something like this:



Sub Heartbeat()
 Dim monitor As String
 Dim span, tick, fail, restart As Integer
 With ThisWorkbook.Worksheets("Control")
  .Calculate
  monitor = .Range("monitor").Value
  span = .Range("span").Value: fail = .Range("numfail").Value: restart = .Range("autoreset").Value
  If IsFileOpen(monitor) Then
    .Range("tick").Value = 0
    .Range("retick").Value = 0
    Call LogMessage("INFO", "Heartbeat server detected")
  Else
    .Range("tick").Value = .Range("tick").Value + 1
    Call LogMessage("WARN", "Heartbeat server NOT detected")
  End If
  If .Range("tick").Value < fail Then
     Application.OnTime Now + span / 86400, "Heartbeat"
  Else
     Call LogMessage("FAIL", "Heartbeat server has failed, REPORTING FAILURE")
     Call SendEmail(.Range("ReportMail"))
     Application.OnTime Now + restart / 86400, "RestartHeartbeat"
  End If
 End With
End Sub

The idea is that the macro will restart itself using Application.OnTime if all goes well, until the server is not detected several times in a row (to guard against false positives due to intermittent network failures). In that case, a report is set via Outlook (adapted from here and possibly here) using this function:


Sub SendEmail(params As Range)
    Dim OutApp As Object, OutMail As Object, strbody As String, toaddr As String, ccaddr As String, subj As String
    strbody = "": toaddr = "": ccaddr = "": subj = "":
    Dim here As Range
    Set OutApp = CreateObject("Outlook.Application")
    Set OutMail = OutApp.CreateItem(0)
    Set here = params(1, 1)
    While Len(here.Value) > 0
       If UCase(Left(Trim(here.Value), 2)) = "TO" Then toaddr = here.Offset(0, 1).Value
       If UCase(Left(Trim(here.Value), 2)) = "CC" Then ccaddr = here.Offset(0, 1).Value
       If UCase(Left(Trim(here.Value), 2)) = "SU" Then subj = here.Offset(0, 1).Value
       If UCase(Left(Trim(here.Value), 2)) = "BO" Then strbody = here.Offset(0, 1).Value
       If UCase(Left(Trim(here.Value), 1)) = ":" Then strbody = strbody & vbNewLine & here.Offset(0, 1).Value
    Set here = here.Offset(1, 0)
    Wend
    On Error Resume Next
    With OutMail
        .to = toaddr
        .CC = ccaddr
        .BCC = ""
        .Subject = subj
        .Body = strbody
        .Send   'or use .Display
    End With
    On Error GoTo 0
    Set OutMail = Nothing: Set OutApp = Nothing
End Sub

Note that the parameters for the email are conveniently stored on the spreadsheet for easy editing, so that only the range pointing to the parameter block needs to be passed. The email can then be easily defined using something like this:


After the failure report is sent, a slightly different macro is invoked:


Sub ReStartHeartbeat()
 Dim monitor As String, span As Integer, autoreset As Integer
 With ThisWorkbook.Worksheets("Control")
  .Calculate
  monitor = .Range("monitor").Value
  span = .Range("span").Value
  autoreset = .Range("autoreset").Value
  If IsFileOpen(monitor) Then
    .Range("tick").Value = 0
    .Range("retick").Value = 0
    Call LogMessage("OK", "Heartbeat server restarted, monitoring resumed")
    Call SendEmail(.Range("ReSetupMail"))
    Application.OnTime Now + span / 86400, "Heartbeat"
  Else
    .Range("retick").Value = .Range("retick").Value + 1
    Call LogMessage("FAIL", "Heartbeat server NOT restarted, keeping on trying")
    Call SendEmail(.Range("ReSetupErrorMail"))
    Application.OnTime Now + autoreset / 86400, "ReStartHeartbeat"
  End If
 End With
End Sub

Here the idea is to keep reminding the user that the server is still down, and getting  back on track once the user has successfully corrected the problem.

The entire process is set into motion by two macros:


Sub StartHeartbeat()
 Dim monitor As String, span As Integer
 Call LogMessage("INFO", "Service started")
 With ThisWorkbook.Worksheets("Control")
  .Calculate
  monitor = .Range("monitor").Value
  span = .Range("span").Value
  If IsFileOpen(monitor) Then
    .Range("tick").Value = 0
    Call LogMessage("OK", "Heartbeat server detected, monitoring started")
    Call SendEmail(.Range("SetupMail"))
    Application.OnTime Now + span / 86400, "Heartbeat"
  Else
    Call LogMessage("WARN", "Service not started, heartbeat server not detected")
    Call SendEmail(.Range("SetupErrorMail"))
  End If
 End With
End Sub

Private Sub Workbook_Open() ' Put this in the Workbook code rather than Module1
 If ThisWorkbook.ReadOnly Then Exit Sub ' DO NOT execute if read only OR
 If Workbooks.Count > 1 Then
  Call LogMessage("WARN", "Started in non-clean session, entering setup mode. Service NOT starting. For production, run in a CLEAN session.")
  ThisWorkbook.Save
  Exit Sub ' ONLY execute in clean, dedicated session
 End If
 ThisWorkbook.Worksheets("Control").Calculate
 Application.OnTime Now + TimeValue("00:00:05"), "StartHeartbeat"
End Sub

The safeguards in place make sure that opening the client for debugging or viewing (or by accident) do not start spurious monitoring processes (.OnTime is nasty, once set it will persist untill that particular Excel session is ended, even after the workbook containing the .OnTime was closed). So the client only starts if the file is only opened in a clean, dedicated Excel session.

Finally, an auxiliary subroutine, purely aesthetic, is used to log the monitoring actions. Here goes:


Sub LogMessage(code As String, msg As String)
 Dim here As Range, c As Integer, level As Integer
 level = 255
 If UCase(Left(Trim(code), 3)) = "INF" Then level = 10
 If UCase(Left(Trim(code), 2)) = "OK" Then level = 5
 If UCase(Left(Trim(code), 3)) = "WAR" Then level = 2
 If UCase(Left(Trim(code), 3)) = "FAI" Then level = 1
 If UCase(Left(Trim(code), 3)) = "ERR" Then level = 0
 
 For c = 1 To 2
  Set here = ThisWorkbook.Worksheets(IIf(c = 1, "Log", "Errors")).Range("A1")
  If c = 2 And level >= 2 Then Exit For
  While Len(here.Value) > 0
   Set here = here.Offset(1, 0)
  Wend
  With ThisWorkbook.Worksheets("Control")
   .Calculate
   here.Value = IIf(Len(code) > 0, code, "___")
   here.Offset(0, 1).Value = Now
   here.Offset(0, 2).Value = .Range("monitor").Value
   here.Offset(0, 3).Value = .Range("span").Value
   here.Offset(0, 4).Value = .Range("numfail").Value
   here.Offset(0, 5).Value = .Range("tick").Value
   here.Offset(0, 6).Value = .Range("autoreset").Value
   here.Offset(0, 7).Value = .Range("retick").Value
   here.Offset(0, 8).Value = msg
   Set here = ThisWorkbook.Worksheets(IIf(c = 1, "Log", "Errors")).Range(here, here.Offset(0, 8))
   Select Case level
   Case 10
    here.Font.Color = RGB(200, 200, 200)
   Case 5
    here.Font.Color = RGB(0, 128, 0)
   Case 2
    here.Font.Color = RGB(255, 128, 0)
   Case 1
    here.Font.Color = RGB(128, 0, 0)
   Case 0
    here.Font.Color = RGB(255, 0, 0)
   Case Else
    here.Font.Color = RGB(255, 0, 255)
   End Select
  End With
  If c = 2 Then ThisWorkbook.Save ' only save on error
 Next c
End Sub