Anyone who was apartment and house hunting knows that it is time consuming and hard to go through individual listings. Extracting trends from a large number of them can be very time-consuming and involving a lot of manual input.
It would be really helpful to have a took that would extract the features you need from a listing feed, and visualize it in your preferred way, like so:
We can actually do this with a little bit of coding entirely in Wolfram Cloud. Let's get started with a sample Toronto MLS listing selection (the kind you will be receiving from a real estate agent), looking like so (sorry for a tall image).
So the task is (1) to web-scrape the listing web page for information, and (2) to visualize it in human digestible form.
Using this nice Wolfram Language scraping guide, we can get started by simply grabbing the table data in one line, like so:
url = "http://v3.torontomls.net/Live/Pages/Public/Link.aspx?Key=...&App=TREB"; structured = Import[url,"Data"]; mlstable = structured[[1,2]]; addresses = Transpose[mlstable][[2]]; rawprices = Transpose[mlstable][[4]]; prices = ToExpression[StringReplace[#,{"$"->"",","->""}]]&/@rawprices;
Now we have the list of street addresses and prices. To geo-locate the addresses, the easiest way is to complete each address with city and country info, ans then use Interpreter:
locations = Map[Interpreter["StreetAddress"][#<>", Mississauga ON Canada"]&, addresses];
We can then convert the list of prices into a list of colored pins where color is determined by the house price, using a slightly modified example from GeoMarker documentation:
pin[color_]:= Graphics[GraphicsGroup[{FaceForm[color],EdgeForm[Black], FilledCurve[{{Line[Join[{{0, 0}}, ({Cos[#1], 3 + Sin[#1]} &) /@ Range[-((2 Pi)/20), Pi + (2 Pi)/20, Pi/20], {{0, 0}}]]}, {Line[(0.5 {Cos[#1], 6 + Sin[#1]} &) /@ Range[0, 2 Pi, Pi/20]]}}]}]] rcfun[sc_] :=Blend[{{0,Green},{0.5,RGBColor[0.75,0.75,0]},{1,Red}},sc] pins = Map[pin[rcfun[(#-900000)/(1300000-900000)]]&,prices];
It only remains to convert the list of geographical coordinates and pins to GeoMarker and filter out failed address lookups (as well as missed lookups, defined as those more than say 10 miles away from the arbitrarily chosen city center), like so
markertable = MapThread[GeoMarker[#1,#2]&,{locations,pins}]; home = Interpreter["StreetAddress"]["Square One, Mississauga ON Canada"]; goodmarkers = Select[markertable,(!FailureQ[#[[1]]] && GeoDistance[#[[1]],home][[1]]<10)&];
In my example, 94 markers out of 99 remain as "good". Then, we simply plot the markers on a map using GeoGraphics:
gp = GeoGraphics[{"Mississauga",Append[goodmarkers,GeoMarker[home,pin[White]]]}]; gins=DensityPlot[(x*1000-900000)/(1300000-900000),{y,0,1},{x,900,1300},ColorFunction->rcfun,AspectRatio->5,FrameTicks->{None,Automatic},Background->RGBColor[1,1,1,0.5],FrameStyle->Directive[Thick],LabelStyle->Normal]; Show[gp,Epilog->Inset[gins,Scaled[{1,0}],Scaled[{1,0}],0.028]]
In the second example let us color code the markers using price per square foot. Note that the square footage of the houses is not in the table, so we need to parse individual listings. So we need to do a more complicated web scraping:
url = "real_estate.html"; xml = Import[url,"XMLObject"]; formitems=Cases[xml,XMLElement["span",{"class"->"formitem formfield"},x_]->x,Infinity]; sqfeet=ToExpression[Last[StringSplit[#,"-"]]]&/@((If[#[[3]]=={},"0-0",#[[3,1]]])&/@Extract[formitems,(#+{0,1})&/@Position[formitems,XMLElement["label",{},{"Apx Sqft:"}]]] )
Note several things about this code:
- The specific criteria to supply to Cases have been determined by inspecting the page code in the browser; in this case all the information bearing fields conveniently are <span> tags with classes formitem formfield. Your particular case will be different.
- In the last line, Extract basically retrieves "every element following a label saying Apx Sqft:". Again your case will be different, and I admit that this is not the only way to get to the right info.
- Local HTML file is used instead of a live URL. This is a trick dome to work around asynchronous deferred loading of listings on Toronto MSL website; if live URL were used, the scraper would only retrieve some 25 listings. The HTML file is obtaied by loading the MLS link, scrolling all the way down (not too fat so that all listings have a chance to load), then saving the webpage as a complete package and loading its HTML file into Wolfram Cloud (or creating a text file in the cloud and copy-pasting, or hosting it locally and making it accessible to Wolfram Cloud)
- The last portion, involving Last[StringSplit[...]] is needed to convert approximate designations like "1500-2000" into a number 2000.
After this, the code is familiar, except that marker filtering should now include filtering out the listings without square feet information:
pricesperfoot = Quiet[prices/cfeet]; score=Quiet[(pricesperfoot-250)/(800-250)]; scorepins=Map[pin[rcfun[#]]&,score]; smarkertable = MapThread[GeoMarker[#1,#2]&,{locations,scorepins}]; sgoodmarkers=Select[smarkertable,(!FailureQ[#[[1]]] && GeoDistance[#[[1]],home][[1]]<10 && NumberQ[#[[2,1,1,1,-1,-1]]])&]; gs=GeoGraphics[{"Mississauga",Append[sgoodmarkers,GeoMarker[home,pin[White]]]}]; Show[gs,Epilog->Inset[ginss,Scaled[{1,0}],Scaled[{1,0}],0.028]]
Here's the final result:
This is only an example I spent about an hour coding, another 1-2 polishing and another 1-2 hours writing about. Your mileage may vary. By the same token you can easily visualize houses according to any score you compute (such as "price per score determined by adding the number of bedrooms and half the number of washrooms"). You can also add multidimensional visualization, where a pin's size, or border color, or shape, or all of these, would convey different information. You can use geo-location and scrape some other website to score neighborhoods and show the "best bang for the buck" according to that score. You can build a linear regression machine-learning house pricing service. If you are dragged into the boredom of house hunting, there are always ways to make some colorful fun out of it :) .
No comments:
Post a Comment