The IoT will give rise to the IoD – the Internet of (very Big) Data. The business opportunity is big too, but what about the costs?
How Big is Big Data?
Let’s start by discussing how big is Big Data. For example, Walmart customer transactions generate about 1 million data points per hour. Big, isn’t it?
Know more @ what is NOC?
Well, one single energy meter generates 1 million data points per day. This means that about 20 energy meters would generate as much data as 20,000 Walmart stores.
Big Data = Big Money?
If this sounds scary, it’s because you are thinking that big data equals big data storage – and consequently big money. And you’d be right, since this is what we have always been doing with data generated by human activities – store data first, use it later. But with the data volumes generated by things we need to rethink this approach.
Why do we need Big Data?
Let’s start by reflecting upon the reasons why we might need big data generated by things. I believe that we can trim them down to 2 main categories:
To discover something that we do not expect
To respond to something that we do expect
We will discuss how can we achieve these goals effectively and without having to invest in huge data storage infrastructures.
Let’s start with Discovery. This is mostly about finding unexpected patterns – and data aggregation makes it easier to visualise patterns. Data aggregation can be done in memory, propagating to data storage only aggregated data. This approach cuts the number of stored data records by 6 orders of magnitudes: taking the energy meter as an example, rather that storing 1 million data points per day this means storing just one single data record per day.
In Memory Data Stream Aggregation
The following diagram shows an example of pattern discovery enabled by in-memory data stream aggregation.
A printer’s usage and its energy consumption have been aggregated by hour along a 12 months period. We can see that energy consumption only increases marginally when the printer is in use, but it shows a steady profile in standby. We can also see that the printer gets into actual standby about 3 hours after its latest usage.