Friday, November 8, 2019

House hunting in the cloud

Let us continue the tradition of following a low-level language coding post with really high-level. This time, with a cloud to boot.

Anyone who was apartment and house hunting knows that it is time consuming and hard to go through individual listings. Extracting trends from a large number of them can be very time-consuming and involving a lot of manual input.

It would be really helpful to have a took that would extract the features you need from a listing feed, and visualize it in your preferred way, like so:




We can actually do this with a little bit of coding entirely in Wolfram Cloud. Let's get started with a sample Toronto MLS listing selection (the kind you will be receiving from a real estate agent), looking like so (sorry for a tall image).



So the task is (1) to web-scrape the listing web page for information, and (2) to visualize it in human digestible form.

Using this nice Wolfram Language scraping guide, we can get started by simply grabbing the table data in one line, like so:


url = "http://v3.torontomls.net/Live/Pages/Public/Link.aspx?Key=...&App=TREB";
structured = Import[url,"Data"];
mlstable = structured[[1,2]];
addresses = Transpose[mlstable][[2]];
rawprices = Transpose[mlstable][[4]];
prices = ToExpression[StringReplace[#,{"$"->"",","->""}]]&/@rawprices;


Now we have the list of street addresses and prices. To geo-locate the addresses, the easiest way is to complete each address with city and country info, ans then use Interpreter:


locations = Map[Interpreter["StreetAddress"][#<>", Mississauga ON Canada"]&, addresses];


We can then convert the list of prices into a list of colored pins where color is determined by the house price, using a slightly modified example from GeoMarker documentation:


pin[color_]:= Graphics[GraphicsGroup[{FaceForm[color],EdgeForm[Black],
   FilledCurve[{{Line[Join[{{0, 0}}, ({Cos[#1], 3 + Sin[#1]} &) /@
   Range[-((2 Pi)/20), Pi + (2 Pi)/20, Pi/20], {{0, 0}}]]}, {Line[(0.5 {Cos[#1], 6 + Sin[#1]} &) /@
   Range[0, 2 Pi, Pi/20]]}}]}]]
rcfun[sc_] :=Blend[{{0,Green},{0.5,RGBColor[0.75,0.75,0]},{1,Red}},sc]
pins = Map[pin[rcfun[(#-900000)/(1300000-900000)]]&,prices];


It only remains to convert the list of geographical coordinates and pins to GeoMarker and filter out failed address lookups (as well as missed lookups, defined as those more than say 10 miles away from the arbitrarily chosen city center), like so


markertable = MapThread[GeoMarker[#1,#2]&,{locations,pins}];
home = Interpreter["StreetAddress"]["Square One, Mississauga ON Canada"];
goodmarkers = Select[markertable,(!FailureQ[#[[1]]] && GeoDistance[#[[1]],home][[1]]<10)&];


In my example, 94 markers out of 99 remain as "good". Then, we simply plot the markers on a map using GeoGraphics:


gp = GeoGraphics[{"Mississauga",Append[goodmarkers,GeoMarker[home,pin[White]]]}];
gins=DensityPlot[(x*1000-900000)/(1300000-900000),{y,0,1},{x,900,1300},ColorFunction->rcfun,AspectRatio->5,FrameTicks->{None,Automatic},Background->RGBColor[1,1,1,0.5],FrameStyle->Directive[Thick],LabelStyle->Normal];
Show[gp,Epilog->Inset[gins,Scaled[{1,0}],Scaled[{1,0}],0.028]]






In the second example let us color code the markers using price per square foot. Note that the square footage of the houses is not in the table, so we need to parse individual listings. So we need to do a more complicated web scraping:


url = "real_estate.html"; xml = Import[url,"XMLObject"];
formitems=Cases[xml,XMLElement["span",{"class"->"formitem formfield"},x_]->x,Infinity];
sqfeet=ToExpression[Last[StringSplit[#,"-"]]]&/@((If[#[[3]]=={},"0-0",#[[3,1]]])&/@Extract[formitems,(#+{0,1})&/@Position[formitems,XMLElement["label",{},{"Apx Sqft:"}]]] )


Note several things about this code:
  • The specific criteria to supply to Cases have been determined by inspecting the page code in the browser; in this case all the information bearing fields conveniently are <span> tags with classes formitem formfield. Your particular case will be different.
  • In the last line, Extract basically retrieves "every element following a label saying Apx Sqft:". Again your case will be different, and I admit that this is not the only way to get to the right info.
  • Local HTML file is used instead of a live URL. This is a trick dome to work around asynchronous deferred loading of listings on Toronto MSL website; if live URL were used, the scraper would only retrieve some 25 listings. The HTML file is obtaied by loading the MLS link, scrolling all the way down (not too fat so that all listings have a chance to load), then saving the webpage as a complete package and loading its HTML file into Wolfram Cloud (or creating a text file in the cloud and copy-pasting, or hosting it locally and making it accessible to Wolfram Cloud)
  • The last portion, involving Last[StringSplit[...]] is needed to convert approximate designations like "1500-2000" into a number 2000.

After this, the code is familiar, except that marker filtering should now include filtering out the listings without square feet information:



pricesperfoot = Quiet[prices/cfeet];
score=Quiet[(pricesperfoot-250)/(800-250)];
scorepins=Map[pin[rcfun[#]]&,score];
smarkertable = MapThread[GeoMarker[#1,#2]&,{locations,scorepins}];
sgoodmarkers=Select[smarkertable,(!FailureQ[#[[1]]] && GeoDistance[#[[1]],home][[1]]<10 
                    && NumberQ[#[[2,1,1,1,-1,-1]]])&];
gs=GeoGraphics[{"Mississauga",Append[sgoodmarkers,GeoMarker[home,pin[White]]]}];
Show[gs,Epilog->Inset[ginss,Scaled[{1,0}],Scaled[{1,0}],0.028]]


Here's the final result:



This is only an example I spent about an hour coding, another 1-2 polishing and another 1-2 hours writing about. Your mileage may vary. By the same token you can easily visualize houses according to any score you compute (such as "price per score determined by adding the number of bedrooms and half the number of washrooms"). You can also add multidimensional visualization, where a pin's size, or border color, or shape, or all of these, would convey different information. You can use geo-location and scrape some other website to score neighborhoods and show the "best bang for the buck" according to that score. You can build a linear regression machine-learning house pricing service. If you are dragged into the boredom of house hunting, there are always ways to make some colorful fun out of it :) .

Thursday, September 26, 2019

Esoteric multiplication and an (almost) failed exercise

In the multiverse of programming languages it's hard not to mention Brainf*ck, purposely written to have a minimal set of commands needed to (theoretically) program just about any problem out there thanks to its Turing completeness. The set is actually so minimal that I can include all of it, below. For non-programmers it would be useful to imagine an infinite linear magnetic tape with a read/write head able to move along it, like so:

Command    C-equivalent     Action

   >       ++ptr;           Move the head one position to the right
   <       --ptr;           Move the head one position to the left
   +       ++*ptr;          Increment the byte of data at the head position
   -       --*ptr;          Decrement the byte of data at the head position
   .       putchar(*ptr);   Write a keyed-in byte of data at the head position
   ,       *ptr=getchar();  Read and display the byte of data at the head position

   [       while(*ptr){     If the byte at the head position is zero, 
                             rather than executing the next command, 
                             jump forward to the matching ]
   ]       }                If the byte at the head position is non-zero, 
                             rather than executing the next command, 
                             jump back to the matching [


Well I mentioned it in conversations so often that it would have been a shame not to have tried it hands on. I took on a randomly chosen problem, namely writing a program that would multiply two single-digit numbers, totally out of the zone, somewhere in between spending last half an hour of the working day, thinking during my commute and in between playing and walking with the kid(s), and some half an hour debugging after the said kids went to sleep.

I got inspired by the addition code snippet, namely, [->+<], and tried to generalize it to summing several single digits in a row. My first example, [[->+<]>], expectedly resulted in an infinite loop; however a slightly more complicated [<[->+<]>>] ended up working fine.

My next challenge was how to replicate one of the operands in m*n an arbitrary number of times, i.e. to translate the memory layout [m n] into something like [m m ... m] where m would be repeated n times. After a few unsuccessful tries a working algorithm was:

  1. Replicate one operand 1 time, then 2 times, then 3 times, ... all the way to m_max times. This is somewhat ugly but realizable using something like
    [- >>>>>>> >+>>>>>> >+>+>>>>> >+>+>+>>>> >+>+>+>+>>> >+>+>+>+>+>>
         <<<<<<<  <<<<<<<  <<<<<<<  <<<<<<< <<<<<<< <<<<<<<]

  2. Replicate the second operand, decreasing it on every iteration, so that for n, zero is reached in front of n copies of m, like so
    [[- >>>>>>> + <<<<<<<] >>>>>>>-]

  3. Apply our previously written "sum all numbers until zero" [<[->+<]>>]

Taken together and surrounded with some I/O, the final program ended up being

,------------------------------------------------> read and convert n
,------------------------------------------------> read and convert m
<[- >>>>>>> >+>>>>>> >+>+>>>>> >+>+>+>>>> >+>+>+>+>>> >+>+>+>+>+>>
     <<<<<<<  <<<<<<<  <<<<<<<  <<<<<<< <<<<<<< <<<<<<<] replicate m
<[[- >>>>>>> + <<<<<<<] >>>>>>>-]>> propagate n
[<[->+<]>>]<. sum n copies of m

Note that, for brevity, I have limited one of the operands to 5, however the expansion to an arbitrary number would be straightforward by increasing all stretches of >>>>>>> and increasing the number of replications from 5 to the desired maximum.

You can test it in a BrainF emulator for yourself that it works (screenshot).


For debugging you can use something like this visualizer, but note that it's heavily memory limited.

The real surprise was when I decided to compare my solution against the internet, dimly suspecting that there may be far superior alternatives out there. And indeed, as this StackOverflow discussion nicely points out, there is a MUCH more elegant code doing this. Here goes,
[
 >[>+>+<<-]
 >[<+>-]
 <<-]

Compared to this, my example above looks like such baaaad code :)
What has this taught me? Several things:
  • Brainf*ck can be used as a measure of one's "internal memory capacity" for solving puzzles and math problems. Mine turns out to be pretty limited. I simply cannot, at least not quickly and entirely in my mind, come up with a program that requires you to keep track of more than 3 conditions at once (like 3 loops). Well, I kinda knew this already from my school-time profound hate of learning poems by heart.
  • On the other hand, I was able to come up with a working solution, all by myself, in circumstances that were very very far from optimal.
  • Learning curve is very important. Even in Brainf*ck, there are typical nuts-and-bolts programming tricks that make life so much easier once you've mastered them. I guess that just as I found it very difficult to stop thinking in higher-level languages such as (at least) the assembly, many people who never coded in their life would find it very difficult to grasp what we almost take for granted, such as i=i+1, or for-loops, or pointer arithmetic, or polymorphism, or templates.

I hope this exercise will make it a bit easier for me to explain programming to non-programmers.


Wednesday, June 5, 2019

Excel VBA: Automatic macro execution for Windows Task Scheduler (and a few pranks to boot)

Despite all modern trends towards workflow automation, manual Excel-based workflows still abound in many areas of industry, notably finance. In our previous post we learned how to simplify and automate some of them, reducing manual copy-and-paste to a button push.

The next step of this would be further automation, so that the"button" gets pushed automatically, for example on a daily basis at a predetermined time.

One intuitive approach would be to invoke your code using an Auto Macro such as Workbook_Open, and then have Windows Task Scheduler open your workbook on schedule as required.

However, there is a catch here. Workbooks used for automated workflow are very often also used for ad-hoc manual tasks, or are simply opened to view some data, with no intention of performing any (re)calculations. Now your auto macro code would run every time the workbook is opened. If the automated workflow involves some data modification and/or email notification, you will have side effects very time you open your workbook just to see what is in there.

Worse still, if your workbook is on a shared drive, other used may stumble upon it by accident, triggering execution of the auto macros when they never expect it.

Of course, you could run two copies of your spreadsheet - one for automation and one for manual inquiries, naming the "automation" one something like "never_ever_open_this_manually.xlsm", but what happens in practice in such cases is that the two workbooks fall out of sync, and one of them is always out of date (or worse, they are both partially out of date).

A much more elegant solution would be for the auto macro to detect that the workbook was invoked by the Task Scheduler and not manually, and run the workflow only in this case.

After some probing and looking for inspiration I have come up with this solution:

Private Function IsScheduled() As Boolean
 ' Determine if the workbook was opened manually or as part of a Task Scheduler routine
 On Error GoTo Decided
 Dim Result As Boolean
 Result = True ' An invisible or programmatically started session is always assumed scheduled
  If Not Application.Visible Then GoTo Decided
  If Not Application.UserControl Then GoTo Decided
 Result = False ' A session with more that one workbooks is always assumed manual
  If Application.Workbooks.Count > 1 Then GoTo Decided
 ' otherwise assume scheduled task if the workbook name was supplied in the command line
 Dim wname As String, cmdline As String
 cmdline = UCase(Trim(GetCommandLine)): wname = UCase(Trim(ThisWorkbook.Name))
 Result = InStr(cmdline, wname) > 0
Decided: IsScheduled = Result
 On Error GoTo 0
End Function

Private Sub Workbook_Open()
 If Is Scheduled Then
  ' Run your automated workflow here
 End If
End Sub

It basically relies on the discovery that if a workbook is started by the Task Scheduler, its command line would be
C:\Program Files\OFFICE##\Excel.exe v:\path_to_workbook\workbook.xlsm
whereas for most other scenarios of workbook opening (directly from Excel or from Windows Explorer) the command line would look like
C:\Program Files\OFFICE##\Excel.exe
or
C:\Program Files\OFFICE##\Excel.exe /dde
i.e. would not contain the workbook name (instead, it passes the workbook via DDE)

To get to the command line, we need to implement the following API function (taken from here):

#If Win64 Then
    Private Declare PtrSafe Function GetCommandLineL Lib "kernel32" Alias "GetCommandLineA" () As LongPtr
    Private Declare PtrSafe Function lstrcpyL Lib "kernel32" Alias "lstrcpyA" (ByVal lpString1 As String, ByVal lpString2 As LongPtr) As Long
    Private Declare PtrSafe Function lstrlenL Lib "kernel32" Alias "lstrlenA" (ByVal lpString As LongPtr) As Long
#Else
    Private Declare Function GetCommandLineL Lib "kernel32" Alias "GetCommandLineA" () As Long
    Private Declare Function lstrcpyL Lib "kernel32" Alias "lstrcpyA" (ByVal lpString1 As String, ByVal lpString2 As Long) As Long
    Private Declare Function lstrlenL Lib "kernel32" Alias "lstrlenA" (ByVal lpString As Long) As Long
#End If

Private Function GetCommandLine() As String
GetCommandLine = "FAILED TO RETRIEVE" ' fallback value is case of error
On Error GoTo Finally ' suppress errors
   Dim strReturn As String
   #If Win64 Then
    Dim lngPtr As LongPtr
   #Else
    Dim lngPtr As Long
   #End If
   Dim StringLength As Long
   'Get the pointer to the commandline string
   lngPtr = GetCommandLineL
   'get the length of the string (not including the terminating null character):
   StringLength = lstrlenL(lngPtr)
   'initialize our string so it has enough characters including the null character:
   strReturn = String$(StringLength + 1, 0)
   'copy the string we have a pointer to into our new string:
   lstrcpyL strReturn, lngPtr
   'now strip off the null character at the end:
   GetCommandLine = Left$(strReturn, StringLength)
Finally: On Error GoTo 0
End Function

Note the compiler directives to make the code 32/64-bit aware -- this is absolutely essential if the workbook resides on a shared drive in a network with mixed 32/64-bit Excel installation. Otherwise opening the workbook might crash the entire Excel session for random unsuspecting users.


BONUS: When debugging the automation routine I have stumbled upon some wonderful tools you can use to play a practical joke on your peers. Namely you can use Application.OnKey to override the function of a commonly used key (like, a letter or an arrow) to something totally bizarre, and wrap it into Application.OnTime to activate at some (possibly random) point in the future. Put it in your auto macro, like so:

Public num As Integer

Function sov(sekunder As Double) As Double
starting_time = Timer
Do
 DoEvents
Loop Until (Timer - starting_time) >= sekunder
End Function

Sub GetDizzy()
num = num + 1
On Error Resume Next
Select Case (num Mod 5)
Case 2 'move to opposite drection
 ActiveCell.Offset(0, 1).Select 
Case 3 'shake the workbook
 dx = Round(Rnd * 10) - 5: dy = Round(Rnd * 10) - 5
 For i = 1 To 3
  ActiveWindow.SmallScroll toright:=dx, down:=dy: sov (50 / 1000)
  ActiveWindow.SmallScroll toleft:=dx, up:=dy: sov (50 / 1000)
 Next i
Case 4 'nudge the workbook
 ActiveCell.ColumnWidth = ActiveCell.ColumnWidth + Round(Rnd * 15) - 7
Case Else 'normal operation
 ActiveCell.Offset(0, -1).Select 
End Select
If num > 200 Then MsgBox "You are tired and dizzy... Why don't you call it a day and go home?"
On Error GoTo 0
End Sub

Sub Payload()
 Application.OnKey "{LEFT}", "GetDizzy"
End Sub

Private Sub Workbook_Open()
Application.OnTime Now + TimeValue("8:00:00"), "Payload"
End Sub


and watch your colleagues of choice get delighted at the emphatic computer offering them to go home at the end of the working day because they are tired and have worked long hours and it's late outside (I've borrowed the sleep function sov() from here). The code is very hard to detect because it executes totally silently and persists even after the workbook containing has long been closed; the empathy will last until Excel is closed or restarted -- but in most industry workplaces I have seen, this only happens when it crashes.

Your mileage may actually vary with the payload, but do remember not to use it for any real sabotage because this is a surefire way to get fired, if you excuse the tautology again.


Tuesday, March 5, 2019

Excel VBA: Automatic opening of external files and other neat tricks

Suppose you have a workflow in Excel that involves manual copy-and-paste of data from one spreadsheet (A) into another spreadsheet (B), and you want to automate it.

In some cases, you might make use of data sources and automatic linking between files, but there are many cases where this would be undesirable. Either the automatic update of links makes your process undesirably slow, or you need those intermediate values unchanged for reporting/auditing purposes, or the links update is just too buggy and unreliable (as are, alas, many automatic features in the MS Office suite), whatever the reason, sometimes you are better off with the "plain old-school" VBA macro approach, approximately,

  1. Open the external spreadsheet A
  2. Find the source data
  3. Copy the data into your destination spreadsheet B
To do this gracefully, one must think through several scenarios though. What if the source spreadsheet is already open - should we re-open it (possibly discarding unsaved changes), or should we reuse the already opened sheet? What do we do after the data copying is done - should we force-close the source sheet or should we keep it? 

After giving these questions some thought, I have created a short VBA code snippet, and after having re-used it about a dozen times over the course of less than one month, I am putting it here for reference. Here goes:

Set targetWB = Application.ActiveWorkbook
TargetPath = targetWB.Path & "\"
SourcePath = "..\otherfolder" 'Relative to target workbook 
SourceName = "somename.xlsx" 'Can be any calculation to determine name

alreadyOpen = False
For Each thisWB In Workbooks
  If thisWB.Name = SourceName Then
   alreadyOpen = True: Set sourceWB = thisWB
  End If
Next thisWB
On Error GoTo ERRORHANDLER ' to gracefully handle "File not found" scenarios
 If Not alreadyOpen Then Set sourceWB = Workbooks.Open(Filename:=TargetPath & SourcePath & SourceName, ReadOnly:=True)
On Error GoTo 0

' *** Data copying code goes here *** 

If Not alreadyOpen Then sourceWB.Close SaveChanges:=False
Set sourceWB = Nothing: Set targetWB = Nothing ' garbage collection

Note a few touches here: 
  • The code reuses the already opened spreadsheet if it is already open (pardon the tautology), and opens it otherwise. This makes the code both faster for debugging (no repeated open/close - the source workbook can be large!) and suitable for automated production workflows. The only downside here is that you need to separate the actual name of the spreadsheet, and a path to it; however if you only have the full path, you can easily work around it with a one-liner like so:
    SourceName = FullPathName: While InStr(SourceName,"\") > 0: SourceName = Right(SourceName,Len(SourceName)-InStr(SourceName,"\")): Wend
    using FullPathName instead of TargetPath & SourcePath & SourceName in Workbooks.Open(...).
  • The code automatically closes the source spreadsheet only if it had to open it. This way, no spurious workbooks are left open in a production workflow.
  • The files are opened read-only and closed without saving changes. This prevents unwanted queries that can otherwise pop up if the file happens to be opened by another user, or if the data copying process ends up messing up the source file (which, depending on how complicated the data processing routine is, is quite likely to happen). These queries are annoying whilst debugging and totally disruptive in a production workflow; last but not least we are preventing unwanted modification of the source file.




BONUS: Here is the routine I commonly use for the actual data copying. I find it preferable to using selection / clipboard / autofilter operations because it involves less GUI interaction and therefore is more robust (and, with Application.ScrenUpdating=True, can be faster!)
targetWS = "Analysis" : sourceWS = "Data" 'use any names 
targetStartAt = "A2": sourceStartAt = "A2" 
  'feel free to use named ranges here
  'or to have your code determine the locations based on search criteria
Set here = sourceWB.Worksheets(sourceWS).Range(sourceStartAt)
Set there = targetWB.Worksheets(targerWS).Range(targetStartAt)
While Len(here.value) > 0 
 If Not IsError(here.Value) Then
  If here.Value = "GOOD" Then ' put whatever condition to validate a source line to determine if it needs copying
   ' example of data copying, edit as your task requires
   there.Value = here.Value
   there.Offset(0,1).Value = Left(here.Offset(0,1).Value,8)
   For i = 4 to 12 
    there.Offset(0,i-2).Value = here.Offset(0,i).Value
   Next i
   Set there = there.Offset(1,0)
  End If
 End If
 Set here = here.Offset(1,0)
Wend


Wednesday, February 6, 2019

No more calculation babysitting!

Let's imagine you work with a computer on a daily basis (which, as it were, is a rather common scenario these days), and let's imagine your work includes "relatively lengthy" computational tasks -- lengthy enough that it is not productive to just sit there staring at the proverbial hourglass cursor (or meditating over the progress bar, or praying over the build output console window that it compiles without errors, or whatever it is). So...

So, after some time waiting (depending on your boredom threshold, about 30 seconds for me, I'd guess under 2 minutes for most people) you decide to multitask and switch to another task (work-related or otherwise). Before long, that other task immerses you and you realize that your lengthy computation was actually done minutes ago and you could have, and should have, resumed your workflow earlier. So in an attempt to avoid doing nothing and increase productivity, you just dumped it down the tubes.

Or consider another scenario. Your computation is now lengthy enough (say, 10-20 minutes) so that you decide to grab a coffee from the kitchen, or grab a quick bite/smoke/chat, or whatever it is. It would be cool if you could get an alert that your task has finished, so you can timely return and resume work without having to check on your workstation multiple times.

Or yet another scenario. You need to run an even lengthier calculation, perhaps a few hours on end, so you leave it running after hours. And you have several of them to run after each other, and you want to run as many as you can before the next working day. So again it would be cool if you can get an alert once your calculation finishes, rather than having to log in and check on the computation  progress repeatedly. More importantly, if your task is aborted early due to some mishap (which happened to me due to my own banana fingers more times than I'd confess), you really want to be alerted at once, rather than after what you thing the task should have taken, had it completed normally.

The first scenario can partially be mitigated by running Windows Task Manager, minimizing it, and noticing when the CPU usage drops, indicating that your task has finished. But you still need to remain vigilant and keep watching that tiny indicator, and with current multicore CPUs, the drop may be from 13% to some 2%, which is not very noticeable visually. Not to mention that the two remaining scenarios cannot be worked around in this way.

Wouldn't it be nice to have an automated monitor which can do this for you?

Yeah it sure would.

So let's see how we can do it in Windows Powershell. You can determine the CPU usage of a process using this function (loosely adapted from here):

function get-excel-CPU ($avgs=1)
{
 $result=0
 for ($i=1; $i -le $avgs; $i++)
 {
  $cpuinfo = Get-WmiObject Win32_PerfFormattedData_PerfProc_Process -filter "Name LIKE '%EXCEL%'"
  $result=$result + ($cpuinfo.PercentProcessorTime |Measure -max).Maximum
  start-sleep -m 150
 }
 $result = $result/$avgs
 $result
}

Note the following:

  • I use EXCEL as an example because I use it most often. It is trivial to modify it to work with any other program.
  • The function takes into account that there can be multiple instances of your program, and will report the CPU usage of the most CPU intensive process. Usually, this is what you want, because only one of your instances will be doing computations anyway, but you can easily fine-tune it to be more instance-specific.
  • The function measures CPU usage several ($avgs) times with a short waiting period in between, and then averages the measurement. This is done because some computations, mostly ones heavily on local or network I/O, will have wildly fluctuating CPU usage, so taking one measurement may trick the script into falsely deciding that the computation has finished. You may need to fine-tune the number of measurements and the wait time between them to reflect your specific computation pattern.

Now that we have a way to automatically determine the CPU usage, we can easily wrap it in a control loop:

$poll = 10
$homethresh = 600
$sensitivity = 5
$fullout = new-timespan -Seconds 86400
$mailout = new-timespan -Seconds $homethresh
$sw = [diagnostics.stopwatch]::StartNew()

$thiscpu = get-excel-cpu 5

if ($thiscpu -lt $sensitivity) 
{
 write-host $thiscpu, ": Excel not running, exiting."
}
else
{
 write-host "Initial CPU is ", $thiscpu
 $cnt=0
 while ($sw.elapsed -lt $fullout)
 {
  start-sleep -s $poll
  $thiscpu = get-excel-cpu 3
  $lock = is-locked
  write-host "Elapsed", $sw.elapsed, " -- CPU is ", $thiscpu, "   ",$lock
  if ($thiscpu -lt $sensitivity) {$cnt++} else {$cnt=0}
  if ($cnt -ge 2)
   {
    write-host "FINISHED!!!!"
     if ($lock -eq "UNLOCKED") {show-splash} else {phone-home}
    return
   }
 }  
 write-host "Timed out!"
}

Note the line if (...) {show-splash} else {phone-home} . There are two ways to alert you that the computation has finished. One is to show you a big splash screen, borrowed from here:

function show-splash 
{
 Add-Type -AssemblyName System.Windows.Forms
 $Form = New-Object system.Windows.Forms.Form
  $Form.Text = "Finished"
  $Form.AutoSize = $True
  $Form.AutoSizeMode = "GrowAndShrink"
  $Form.BackColor = "Lime"
  $Font = New-Object System.Drawing.Font("Arial",96,[System.Drawing.FontStyle]::Bold)
  $Form.Font = $Font
  $Label = New-Object System.Windows.Forms.Label
  $Label.Text = "Calculation finished!"
  $Label.AutoSize = $True
  $Form.Controls.Add($Label)
  $Form.Topmost=$True
  # -- this ensures your splash screen appears on top of other windows!
 $Form.ShowDialog()
}

The other is to simply send you an email that gets pushed to your smartphone or smart watch, borrowed from here (using Outlook rather than Send-MailMessage so that your IT department can safely inspect your outgoing email and won't mistake your script for a trojan):

function phone-home
{
 $Outlook = New-Object -ComObject Outlook.Application
 $Mail = $Outlook.CreateItem(0)
  $Mail.To = "youremailaddress@mailserver.com"
  $Mail.Subject = "Calculation finished"
  $Mail.Body ="Your calculation has finished. If you need to start another one, go for it."
 $Mail.Send()
}

Now, how to choose between the two? You will want the splash screen if you are sitting in front of your screen, and the email otherwise. So we need a way of discriminating between the two. Following this idea, we can use
function is-locked
{
try {
$currentuser = gwmi -Class win32_computersystem | select -ExpandProperty username
$process = get-process logonui -ea silentlycontinue
if($currentuser -and $process){"LOCKED"}else{"UNLOCKED"}
return}
#Always return LOCKED if logged in remotely
catch{"LOCKED";return} }

Finally, here is a BAT-file one-liner wrapper, called ps.bat to run your PowerShell script on systems where execution of random scripts has been disallowed by default (for good reason). We cannot override this default without admin privileges, but we can by pass it temporarily by calling

@powershell -ExecutionPolicy RemoteSigned .\%1.ps1

You can then call your PowerShell script, e.g., poll.ps1, and simply type ps poll in your command prompt to invoke it quickly.

Enjoy!

Friday, February 1, 2019

Electric cabinet lock and other small DIY

Every once in a while, we have people (such as cleaners, babysitters or contractors) who have access to our home while we are away or distracted. And I am perfectly aware that petty theft usually does not happen in these scenarios (one case is enough to ruin the perpetrator's career), we also know that from time to time, against all odds, it does happen. So we needed to come up with an idea of protecting my wife's jewelries from an opportunistic snatch.

In other words, I needed a lock on the jewelry drawer, preferably one that would be discreet, and would not involve drilling the front of the (moderately beautiful) dresser cabinet. 

In a previous edition of this scenario, I accomplished this using two magnetic child locks (like these ones, they come in a huge variety) - to open the drawer you'd need to simultaneously place magnetic keys at two unmarked, previously known spots, which makes for an excellent discrete opening mechanism. But this time the cabinet dimensions proved incompatible with the locks. To add to that, even a casual glimpse of the open drawer, sporting the big white child locks, would instantly reveal our trick. Finally, the mounting holes for magnetic locks look bad even on the inside of the cabinet. (Whoever tries to convince you that these locks would hold on an adhesive mount, has not tried it. Adhesive mounting tape can be strong, but is is invariably terrible for dynamic loads such as repeated banging from trying to open a cabinet with the lock engaged but forgotten about.)

So I was searching for a geometrically suitable lock and stumbled upon this part, which just happened to have just the right dimensions to fit between the back wall of the drawer and the back wall of the cabinet. In addition, an electromagnetic lock has the advantage of not requiring submillimeter-precision alignment between the lock and the armature/striker plate - something that would totally plague most mechanical designs (like this one). 

I already had an unused electric key switch, which could be discretely built into the back wall of the cabinet, totally out of sight yet within easy reach. The only missing piece now was how to power the lock. Using a DC power adapter seemed an easy choice, but it is very easily defeated by unplugging it from the wall. Using batteries (and placing them in the same space between the drawer and the cabinet) was more secure, but with the lock's power consumption of 100 mA, batteries would require replacement every 1-2 days. 

Therefore, an ideal trade-off would be a plugged in adapter with a battery back-up that would take over the lock if the adapter is unplugged. After giving it some thought I came up with this circuit:



The central component here is the relay K1 that connects the battery through the normally closed contacts and disconnects it when external DC power is present. The diode D1 is there to ensure that the battery is not getting charged by the adapter (non-rechargeable batteries don't like that). The diode D2 ensures that the battery is not getting discharged through the output circuitry of the adapter. The diodes D3 and D4 are flyback diodes, which prevent arcing and sparking at the key switch (or the power jack).

Finally, the resistor R1 is the current limiting resistor for the relay coil. It actually proved the most problematic component because of the need to mitigate heat dissipation in the enclosed space behind the cabinet. With some experimenting I found that the relay coil current for reliable operation was 40 mA, yielding 0.32 W of dissipated heat, so when I stupidly put a 0.125W resistor, it got pretty charred very soon. Even a 0.5W resistor was getting worryingly hot. Since I totally don't want my lock to start a house fire - that would be the exact opposite of what a "security lock" is supposed to do - I first hooked up four 800 Ohm, 0.5W  resistors in parallel (2W total). This got the resistors slightly warm to the touch but not hot. Here is how it looks like from the inside and outside:




As an upgrade, I later replaced the 2W resistor arrangement with a 5W component on a heat sink. Now operating at 6% capacity, the heat dissipation was small enough for the lock to run for days on end without getting warm. As an additional precaution, I have installed a thermal fuse designed to cut the AC power circuit should the temperature ever exceed 73 degrees C (this was the lowest value I could get off Amazon). So this is the new set-up:



The beauty of the design, aside from it being totally discrete, is its being both fail-secure and fail-safe. It is fail-secure in the short term, meaning that power outage will keep the lock running on battery power long enough for a malefactor to not want to stick around. On the other hand, the rightful owner can wait several days for the batteries to discharge, and have the cabinet open if the key gets misplaced or lost.


BONUS: Here's some more office DIY. At one of my workplaces, the desk phone used too much useful space on my desk, so I wanted it next to my desk instead. Not wanting to drill any holes in the shiny new company property, I came up with a mount out some stuff lying around in the office, namely:

  • an old cardboard small packet from a recent online order
  • some Scotch tape
  • some good supply of cable ties
  • and a jar of spare furniture bits and pieces, apparently mostly from IKEA. 
This is the "before" and "after" image. If interested, I can give you more detail.




Tuesday, November 20, 2018

Memory leak tester in Excel

In many scenarios involving complicated Excel calculations, especially those relying on extensive VBA code and/or custom-made add-ins, there is a danger that memory leaks could be created by programming errors such as careless object operations. 

Most often, such memory leaks are discovered post factum, when, left unchecked, they cause slowdowns or crashes when an Excel session runs out of memory. In such a case, correcting the error becomes a major pain. How to spot a memory leak offender in a calculation intensive spreadsheet containing thousands of custom functions? 

To this end, I have written a very simple program that basically evaluates a given formula multiple times and measures how the memory consumption increased. I have used this googled snippet to determine the memory consumption:

Declare Function GetCurrentProcessId Lib "kernel32" () As Long

Function GetMemUsage()
  ' Returns the current Excel.Application memory usage in KB
  Set objSWbemServices = GetObject("winmgmts:")
  GetMemUsage = objSWbemServices.Get( _
    "Win32_Process.Handle='" & _
    GetCurrentProcessId & "'").WorkingSetSize / 1024

  Set objSWbemServices = Nothing ' We don't want to cause memory leaks here :)
    ' We don't want to cause memory leaks here :)  
End Function

and wrote a very simple wrapper: 

Sub Measure_Leak()
 Set here = ActiveSheet.Range("A2")
 template = here.Value
 i1from = here.Offset(0, 1).Value: i1to = here.Offset(0, 2).Value
 i2from = here.Offset(0, 3).Value: i2to = here.Offset(0, 4).Value
 Set there = here.Offset(0, 5)
 
 there.Offset(0, 1).Value = GetMemUsage()
 
 For i1 = i1from To i1to
 For i2 = i2from To i2to
  working = "=" & template
  On Error Resume Next
   working = Replace(working, "$1", i1)
   working = Replace(working, "$2", i2)
  On Error GoTo 0
  there.Formula = working
 Next i2, i1
 
 there.Offset(0, 2).Value = GetMemUsage()
 Set here = Nothing
 Set there = Nothing
 ActiveSheet.Calculate
End Sub

Now I put this on an even simpler spreadsheet which looks like this:



Basically, pressing the Measure button evaluates the template expression substituting $1 and $2 with values spanning two ranges, a total of (i2t-i2f+1)*(i1t-i1f+1) times. An increased memory footprint at the end of the execution means there is a memory leak.

In the screenshot we see that a built-in Excel function does not cause any memory leaks (hurra!)

To test it, let us define a really leaky VBA function using this example:

' Put this into class module Class1
Option Explicit
Private A As New Class2
Private Str As String
Private Sub Class_Initialize()
   Set A.B = Me ' Fool garbage collector
   Str = Space(1024 * 10) ' Allocate lots of memory 
End Sub

'Put this into class module Class2
Option Explicit
Public B As Class1

with two functions that look similar but one is known to be leaky:

Function vbaNoLeak()
 Dim MyObject As Class2
 Set MyObject = New Class2
 Set MyObject = Nothing
End Function

Function vbaLeak()
 Dim MyObject As Class1
 Set MyObject = New Class1
 Set MyObject = Nothing
End Function

...and...




Note that using this method to troubleshoot parts of your VBA macro won't always work because many functions are prohibited inside VBA functions (they abort immediately, basically ensuring that VBA functions have no side effects). But it is very easy to modify the code above to be callable as a procedure from within a VBA macro.