Internet of Things Analytics Landscape


Internet-of-Things (IoT) is the emerging computing paradigm, which leverages data and services from the proliferating number of internet-connected devices in order to deliver a large number of novel, disruptive services in areas such as manufacturing, smart cities, healthcare, fitness and more. The vast majority of IoT services involve the analysis of IoT data, which is conveniently called IoT Analytics. IoT analytics enable services such as energy optimization in buildings or entire neighborhoods, predictive maintenance in factories and lifestyle tracking as part of healthcare or fitness applications. The scope of IoT analytics applications is not limited to services involving the production of reports, dashboards and data-driven recommendations. Rather they also extend to applications involving real-time actuation and control (e.g., driving robots in factories), since actuating functions are in most cases data-driven.

Even though early IoT applications include IoT analytics components, the potential of IoT analytics is far from being realized. According to a recent report by the McKinsey Global Institute, less than 1% of IoT data is used in the scope of current applications, which are more focused on control and alarms, rather than optimization and maintenance. Therefore, the report concludes that IoT Analytics will be one of the main enablers of IoT’s multi-billion dollar potential in the coming years. The importance of IoT analytics will be also reinforced as a result of the exponential increase of the amount of data produced.

IoT Data as BigData

IoT data are essentially BigData, given that they are characterized by the famous Vs (Volume, Velocity, Variety, Veracity). For example, large scale IoT applications will be leveraging extreme volumes of data that will stem from thousands or even millions of internet connected devices. The latter will be in principle heterogeneous in terms of their types and will produce data in various formats and with a great variety of semantics, giving rise to the need for dealing with variety. Moreover, most IoT data stem from noisy sensors or even sensor processing components, which justifies the need to handle veracity. Finally, IoT data are typically characterized from very high ingestion rates and subsequently the need to cope with streaming data and high velocity.

Despite their close affiliation with BigData, IoT data have their own peculiar characteristics which differentiate them from prominent classes of BigData such as the conventional, transactional, batch datasets. Specifically, IoT data are very usually streaming data and require handling by data streaming engines rather than by tools for batch processing (such as MapReduce). Several IoT applications are real-time or nearly real-time and cannot afford the latency of conventional batch processing. Likewise, IoT data are multi-modal and heterogeneous, which asks for significant efforts in handling their semantics.

IoT data are also time and location dependent, which are properties that data analysts and data scientists have to consider in the scope of IoT Analytics applications. Furthermore, their inherent noise and uncertainty can be significantly more severe in cases of crowd-sourced data. Overall, BigData technologies are highly relevant to IoT Analytics, especially when customized to the needs of IoT applications and the nature of IoT datasets.

IoT Analytics Lifecycle

IoT LifecycleA typical IoT Analytics pipeline, involves the following stages:

01 Data Collection: Data collection tasks access data from the various IoT data sources, while at the same validating them for consistency, accuracy and integrity. This stage involves interfacing to heterogeneous internet-connected devices for the purpose of accessing and validating their data streams. It may also entail access to other data sources (e.g., open data sets), which enrich the IoT data streams in-line with the application context.

02 Data Unification and Consolidation: At this stage the various data streams are unified and consolidated in terms of their formats and/or semantics. This enables the unified processing and fusion of diverse data streams, while enabling their consolidation in appropriate storage or in-memory data structures.

03 Data Analytics: This is the data analytics stage. Similar to Big Data analytics, IoT data are modeled and processed towards extracting knowledge for the business problem at hand. This is typically performed based on the application of data mining and machine learning techniques over IoT streams.

04 Deployment and Operationalization: This is the last stage of the pipeline and involves the use of the extracted knowledge by business applications (e.g., web or mobile applications).

Even though several variations of this baseline pipeline can be implemented, the listed functions are part of the majority of non-trivial IoT analytics deployments.

IoT Analytics Architectures

From an architecture viewpoint, IoT Analytics systems can be deployed based on a variety of deployment configurations, much in the same way Big Data systems comprise multiple elements in various configurations as well. However, the following building blocks are essential:

Cloud Infrastructure: IoT analytics systems are commonly deployed in public or private clouds, in order to benefit from the capacity, scalability and pay-as-you-go nature of cloud computing.

Big Data Infrastructure: Due to the large volume of IoT data, IoT Analytics architecture rely on Big Data infrastructures such as those empowered by the Hadoop ecosystem. Nevertheless, due to the streaming nature of IoT data, streaming engines are preferred over batch processing tools.

Data Warehouse: Similar to Big Data analytics systems, IoT Analytics systems are likely to deploy a data warehouse in conjunction with Big Data databases. This is typical the case for data with high business value that should be accessed in a more structured way than the rest of the IoT Data.

IoT Analytics & Edge Analytics

Apart from leveraging cloud computing resources, IoT Analytics systems are very commonly deployed based on an edge computing paradigm. Edge computing architectures are deployed in order to enable data storage and filtering at the very edge of the network, prior to transferring data to the cloud. This can result to considerable energy and bandwidth savings, while at the same time moving time-critical operations closer to the physical world in order to reduce latency. Moreover, by isolating private data at the edge servers (rather than transferring them at the cloud), edge computing applications can facilitate the implementation of privacy and data protection policies, which are indispensible in the scope of IoT applications in healthcare, fitness, urban mobility and other areas.

Data Streaming Engines

As already outlined, IoT data mining involves processing and analysis of streams with high ingestion rates. Hence, the deployment of streaming engines and low-latency middleware for streams processing and analysis is a norm in IoT Analytics applications.

The open source community has produced several distributed real-time computation systems, which are suitable for handling IoT streams. Prominent examples include the Apache Storm , the Apache Spark and the Apache Flink systems, which are commonly deployed in the scope of IoT Analytics applications. Some of these systems are compatible with Hadoop’s HDFS (Hadoop Distributed File System) and can be therefore used as part of Big Data deployments.

Apart from these open source engines, there is a host of similar distributed processing systems (including enhanced versions of these open source systems) that are part of Big Data suites offered by most major Big Data vendors.

The above technologies convey a small, yet representative part of the IoT Analytics technological landscape. In the coming years we will see a proliferation and rapid expansion of IoT Analytics technologies, which will empower IoT tremendous business potential based on the Industry 4.0 paradigm.