Internet of things (IoT) - another one of these modern day buzzwords. Everybody wants some IoT.  But what is it "everybody" wants? You’ve probably heard about smart electronics, such as being able to switch on a kitchen appliance while at work, and smart buildings that regulate temperature according to the number of people inside of it.

Let’s use the smart building as an example. A common business domain is smart HVAC solutions (Heating, Ventilation and Air Conditioning).

Wouldn't it be great if you always enjoyed perfect temperature and air quality?

Let’s take our Deep office situated in central Oslo as an example.

The temperature in our office has been suboptimal. With suboptimal we mean that some of us think it’s too warm or cold, and at times we all think it’s too warm or cold in all or some areas of the office. Let's take this example to discuss what it would take to solve that with some ML (of course).

Let's inventory the steps. Machine learning needs data to learn: phase one is to get some data.

  • That means thermometers. Let’s pretend the 'T' in IoT stands for Thermometer, makes perfect sense.
  • That means connectivity. 'I' stands for Internet, so you can't use simple run-of-the-mill thermometers; they need that connection.
  • That means storage for the data.
  • That means logistics: The data isn't going to put itself there. A process is needed to ask the devices for measurement data and get it to the storage, at regular time intervals for instance.

Once we have the process in place, we see our data piling up. Then what?

At this point you may start to think, what is the goal of this?

Well, saving costs would be nice, but then there’s the upfront costs of hardware and labor hours, plus there's a large uncertainty on what you're going to save... If saving costs is all you're after, the solution is pretty trivial: turn off the heaters!

If it's comfort you're optimizing for, then you may end up spending more rather than saving. Moreover, how will the machines in ML even know what is comfortable? Is the definition of comfort just 20°C? Is that really ML? Most of us would expect something fancier,  some form of interactivity where people can voice their comfort, like an app or webpage. So we just went from 'just optimize the thermostat' to 'we need a data platform and an app and whatnot'. The scope just grew a lot, but say we decide that we want to optimize for comfort first and cost reduction second.

We’ve decided what to optimize for. What’s next?

Given thermometer readings and feedback, you realize that in order to automate temperature regulation, you need your heating system to be operable by the algorithm (phase 2). So then you run into other problems like:

  • Can you even program against your heating system?
  • Is it connected to the Internet?

Upgrading for instance radiator valves to those standards is no trivial undertaking. You will probably need a plumber to shut down the building's heating system, drain it and install the new valves ($$$). The new IoT devices are often battery powered, so you induce maintenance costs by having to replace the batteries - this should probably be monitored into the dashboard as well.

Have we even mentioned integration with the energy meter system yet? Are the valves manually adjustable? Should they even be? Does the definition of comfort include no frustration due to technology being in your way?

Well, this is getting out of hand. The smaller project would be just phase one: gathering the data, and only at a later stage implement the automated regulation. There are a couple of upsides.

From a data science perspective it would be very nice to have data representing the baseline: what the temperatures, costs and comfort levels are before the algorithms take control.

It manages risk in the sense that upon reevaluation you might conclude the next phase isn't necessary. The goals can always be refined. Just by making this information available, our office discussions about the temperature now have additional facts. We can with certainty say that the kitchen area is more chilly than the corner meeting room.

Since we are a team of data scientists and engineers: We're going to build the best, the most intelligent, the grandiosest heating system ever!

Even though the project is getting out of hand, the scope is still relatively restricted. Since ML really shines in the face of lots of (relevant) data, you could argue that more sources of relevant data is better. And more fun. It's easy to come up with a few sources:

  • Meteorological data, both forecasts and the current. Local luminescence sensors for clouds/window shades.
  • Positional data (of the thermometers and radiators, maybe even entire floor plans).
  • Building occupancy. In a crowded office, heat generated by people could be significant. Even if you could track occupancy, this is dangerously close to GDPR. A few options are:
    • electronic key usage
    • count of connected devices to Wi-Fi
    • count of available parking spots
    • online calendar event details
    • (security) cameras
    • cafeteria sales

Obviously this is over the top: it's going to be nearly impossible to incorporate all of these, nor should you want to, nor will it positively affect the effectiveness of ML (probably). It's an engineer's dream, but an investor's nightmare.


And oh yeah, we haven't even gotten around to any actual ML yet. As for a conclusion: "How hard can it be...?" Famous last words.