How to plan data collection, storage and visualization in an IIoT deployment

Be sure to consider scalability and future-proofing to accommodate evolving manufacturing processes and technologies.

When it comes to an IIoT (Industrial Internet of Things) implementation in manufacturing, data collection, storage, analytics and visualization are the core backplane that drives actionable insights and enables smarter operations.

How do these components typically align in an IIoT system and what considerations should a manufacturing engineer should keep in mind when planning an implementation? It can certainly get complicated, but breaking things down into their smaller parts makes it more manageable.

Data Collection

The effectiveness of data collection largely depends on sensor architecture. Depending on the equipment or process, various types of sensors (temperature, pressure, vibration, etc.) need to be deployed across critical points in the manufacturing process. Ensure sensors are selected with appropriate accuracy, environmental tolerance and response time for the specific application.


A Data Acquisition Systems (DAS) act as an interface between these sensors and the IIoT platform. It gathers real-time data from sensors and transmits it to the edge or cloud infrastructure. The big decision here is whether to use edge processing (local data pre-processing) or rely on centralized data gathering at the cloud level. Edge processing offers lower latency, making it ideal for real-time tasks. It also reduces bandwidth needs by processing data locally. However, it requires more upfront investment in hardware and can be harder to scale. In contrast, cloud processing handles large data volumes more easily and scales better, though it comes with higher latency and ongoing costs for bandwidth and storage. Cloud systems also need robust security measures for data transmission. A hybrid approach combining both edge and cloud processing might be an option that balances real-time processing with scalable, centralized data management, but it depends on each application and the desired outcomes.

The next big decision is to determine the optimal sampling rate. Too high of a sampling frequency can overwhelm your storage and bandwidth, while too low may miss critical insights, particularly in dynamic manufacturing processes. Work with process engineers to determine the data sampling frequency based on process variability. Test this often to ensure what you think is the optimal sampling rate isn’t leaking potential value.

If you are going to base major decision off the insights gained through this IIoT system, you must ensure the integrity of collected data. This means that error checking (e.g., using checksums or hashing) and redundancy mechanisms (e.g., backup data paths or local buffering) are in place to handle network failures or sensor malfunctions.

A checksum is a small-sized piece of data derived from a larger set of data, typically used to verify the integrity of that data. It acts as a digital fingerprint, created by applying a mathematical algorithm to the original data. When the data is transmitted or stored, the checksum is recalculated at the destination and compared with the original checksum to ensure that the data has not been altered, corrupted or tampered with during transmission or storage.

Hashing is the process of converting input data into a fixed-size string of characters, typically a unique value (hash), using a mathematical algorithm. This hash is used for verifying data integrity, securing communication, and enabling fast data retrieval, with each unique input producing a unique hash.

When planning sensor deployment, focus on critical assets and key process variables that directly impact production efficiency, quality or safety. Implementing a hierarchical sensor strategy (high-priority sensors collecting frequent data, lower-priority ones providing long-term insights) can help balance costs and data richness.

Data Storage

Here again you are faced with a decision between either local (edge) or a centralized cloud environment for data storage. The same the same pros and cons apply as did in data acquisition, but your needs may be different.

Edge storage is useful for real-time, low-latency processing, especially in critical operations where immediate decision-making is necessary. It also reduces the amount of data that needs to be transmitted to the cloud.

Cloud storage is scalable and ideal for long-term storage, cross-site access and aggregation of data from multiple locations. However, the bandwidth required for real-time data streaming to the cloud can be costly, especially in large-scale manufacturing operations.

Manufacturing environments typically generate large volumes of data due to high-frequency sensors. Plan for data compression and aggregation techniques at the edge to minimize storage overhead.

Lossless compression reduces data size without any loss of information, ideal for critical data. Popular algorithms include GZIP, effective for text data, LZ4, which is fast and low latency for real-time systems, and Zstandard (Zstd), offering high compression and quick decompression for IIoT.

Lossy compression, on the other hand, is suitable for sensor data where some precision loss is acceptable in exchange for better compression. Wavelet compression is efficient for time-series data, and JPEG/MJPEG is often used for images or video streams, reducing size while maintaining most visual information.

Data aggregation techniques help reduce data volume by combining or filtering information before transmission. Summarization involves averaging or finding min/max values over a time period. Sliding window aggregation and time bucketing group data into time intervals, reducing granularity. Event-driven aggregation sends data only when conditions are met, while threshold-based sampling and change-detection algorithms send data only when significant changes occur. Edge-based filtering and preprocessing ensure only relevant data is transmitted, and spatial and temporal aggregation combines data from multiple sources to reduce payload size.

Because edge devices often operate in resource-constrained environments, deal with real-time data and must efficiently manage the communication between local systems and central servers, there are several edge-specific considerations for optimizing data management in IIoT systems. For real-time applications, techniques like streaming compression (e.g., LZ4) and windowed aggregation help minimize latency by processing data locally. Delta encoding reduces data size by only transmitting changes from previous values, minimizing redundancy. Additionally, hierarchical aggregation allows data to be aggregated at intermediate nodes, such as gateways, before being sent to the central system, further reducing the transmission load and improving overall efficiency in multi-layered edge networks. These considerations are uniquely suited to edge computing because edge devices need to be efficient, autonomous, and responsive without relying heavily on central systems or expensive bandwidth.

You’ll also need a storage architecture that can scale to accommodate both current and future data growth. Also, implement a robust redundancy and backup strategy. With critical manufacturing data, losing information due to hardware failure or network issues can be costly. Redundant storage, preferably in different geographic locations (for disaster recovery), is crucial for resilience.

TIP: For time-sensitive data (e.g., real-time process control), store at the edge and use data batching for non-urgent data that can be transmitted to the cloud periodically, reducing latency and network costs.

Analytics

Real-time analytics is essential for immediate decision-making (shutting down a faulty machine or adjusting a process parameter), while historical analytics provides long-term insights into trends and performance (predictive maintenance, yield optimization).

To enable real-time analytics, data should undergo initial pre-processing and filtering at the edge, so that only relevant insights or alerts are passed to the cloud or central system. This reduces data transfer overhead and minimizes latency in decision-making. For long-term analysis (identifying trends, root cause analysis), use batch processing techniques to handle large datasets over time. Machine learning (ML) and AI models are increasingly integrated into IIoT systems to identify anomalies, predict failures or optimize operations based on historical data.

IIoT analytics is more than just looking at individual sensor data; it’s about correlating data across multiple devices, sensors and even different factory lines to uncover patterns. Implement data fusion techniques where data from different sensors or sources can be combined to improve the accuracy and richness of insights.

Visualization

Visualization tools are essential for both operators and decision-makers to quickly assess the performance of processes and machines. These should include customizable dashboards that display real-time Key performance indicators (KPIs) like throughput, efficiency, downtime and machine health. KPIs should be linked to the specific objectives of the manufacturing process.

For process optimization and long-term planning, historical trends and patterns should be visualized clearly. This allows for root-cause analysis, identifying inefficiencies and making data-driven decisions about process improvements.

These visualizations should be tailored to different user roles. Operators need real-time alerts and immediate insights into machine performance, while managers or engineers might need access to historical data and trend analysis. Design the user interface (UI) and access controls with these distinctions in mind.

For advanced implementations, digital twins and augmented reality can be used to simulate and visualize complex data in 3D. Digital twins create a virtual replica of the manufacturing environment, allowing engineers to monitor and optimize operations without needing to be physically present.

Planning IIoT implementations

When planning IIoT in manufacturing, focus on building a scalable, resilient and secure architecture for data collection, storage, analytics and visualization. Ensure that data collection is optimized to balance cost and data richness, using both edge and cloud storage appropriately. Analytics capabilities should provide real-time decision support while enabling deep insights through predictive maintenance and long-term performance analysis. Visualization tools should cater to different user needs, ensuring clear, actionable insights through both real-time dashboards and historical data views. Keep in mind the challenges of data volume, latency, network bandwidth and data integrity as you design the IIoT system, with attention to scalability and future-proofing the infrastructure to accommodate evolving manufacturing processes and t