In today's society, valuable data is treasure and productivity, and one of the values of the Internet of Things is that it can collect and provide us with useful data. In this article, we first look at the classification of IoT data, and then analyze the characteristics of IoT data.
- IoT Data Classification
- Static data and dynamic data
In terms of data changes, IoT data can be divided into static data and dynamic data.
Most of the static data are label and address data, such as the data generated by RFID are mostly static data. Generally, structural and relational databases are used to store static data; dynamic data is time-series data, and its characteristic is that each data corresponds to time. Dynamic data is usually stored in a time-series database.
Generally speaking, static data will increase with the increase in the number of sensors and control devices; dynamic data will not only increase with the increase in the number of devices
Plus, it will increase with the passage of time.
- Energy, asset attribute, diagnosis and signal data
According to the original characteristics of the data, IoT data can be divided into: energy data, asset attribute data, diagnostic data and signal data
according to.
Energy data refers to data related to energy consumption, or related data required to calculate energy consumption, such as: current, voltage, power factor, frequency, harmonics, etc. In addition, energy data is also one of the most critical data types for the Internet of Things, because one of the ultimate purposes of the Internet of Things is to save energy.
Asset attribute data usually refers to hardware asset data, such as: attributes such as specifications and parameters of equipment, location information of equipment, affiliation relationship between equipment, etc. This type of data is primarily used for asset management.
Diagnostic data refers to the data used to detect the operating status of the equipment during its operation, and it can be divided into two categories: one is equipment operating parameters, and the other is equipment peripheral diagnostic data.
Signal data is currently the most popular data used in the industrial field, because it is intuitive, easy to understand, and can be viewed and processed locally and remotely at the same time.
- 3. Characteristics of Internet of Things Data
mass sex. A large number of various types of sensors are deployed on the Internet of Things, each sensor is an information source, and the information content and information format captured by different types of sensors are different. In addition to people and servers, objects, equipment, and sensor networks are all components of the Internet of Things, and their number is much larger than that of the Internet; for example, most of the sensor nodes are working full-time, and the data flow is continuous. The Internet of Things is constantly generating incredible amounts of data
According to the Gartner report, the number of connected devices has exceeded 14.2 billion in 2019, and it is expected to reach 25 billion in 2021. This is a huge number and generates massive amounts of data. Additionally, IDC predicts that IoT devices will generate more than 90 zettabytes of data by 2025.
Taking smart meters as an example, a smart meter collects data every 15 minutes and automatically generates 96 records every day. If there are nearly 500 million smart meters in the country, nearly 50 billion records will be generated by smart meters alone every day. -A networked car collects data and sends it to the cloud every 10 to 15 seconds, and one car can easily generate 1,000 records a day. If all 200 million vehicles in China are connected to the Internet, 200 billion records will be generated in the sky. Within five years, data generated by IoT devices will account for more than 90% of the world's total data.
relevance. In IoT, data is inextricably linked. The relevance of IoT data can be understood from the following two aspects:
First, temporal relevance. That is, the data photographed at the same time, the data is generated by the system at the same time, and it reflects the state of the system at this time. From the perspective of the data world, this system is the collection of data at this moment.
Second, process relevance. That is, the data of one point affects the generation of data of the second point after a certain period of time, which reflects the dynamic process display of the system.
Timeliness. The timeliness of data refers to the time from when the data is generated to when it is cleared, and the timeliness of data is determined by the implementation and deployment of the system. Data can be used multiple times, or it can be cleared after being used once. Generally speaking, whether the data is deployed remotely or at the edge affects its timeliness. Generally, the data timeliness of the edge deployment is short, and the data timeliness of the remote deployment is long.
The real-time nature of data is also part of the timeliness of data. Real-time is related to the deployment location of data, the importance of data, and the transmission method.
The data is sequential and must have a time stamp: networked devices generate data continuously according to the set period or triggered by external events. Each data point is generated at a point in time. This time is important for data Calculations and analyzes are important and must be documented.
Data is structured: The data generated by IoT devices is often structured and numerical. For example, the current and voltage collected by smart meters can be represented by 4-byte standard floating-point numbers.
Data is rarely updated: the data generated by networked devices is machine log data, which is generally not allowed and there is no need to modify it. There are very few scenarios where modification of the collected raw data is required. But for a typical informatization or Internet application, records must be modified or deleted
The data source is unique: the data collected by one IoT device is completely independent of the data collected by another device. -The data of a device must be generated by this device, and cannot be generated manually or by other devices. That is to say, the data of a device has only one producer, and the data source is unique. Compared with Internet applications, write more and read less: For Internet applications, a data record is often written once and read many times. For example, a Weibo or a WeChat public account article is written once, but it may be read by millions of people. However, the data generated by IoT devices is different. The generated data is usually read automatically by calculation and analysis programs, and the calculation and analysis times are not many. Only when analyzing accidents and other scenarios, people will actively look at the original data.
The data has a retention period: the collected data generally has a retention policy based on the length of time, for example, it is only retained for one day, one week, one month, one year or even longer. In order to save storage space, it is best for the system to automatically delete . The query and analysis of data is often based on a time period and a certain group of devices: for IoT data, when doing calculations and analysis, the time range must be specified, not just for one point in time or the entire history. And it is often necessary to analyze the data collected by a subset of IoT devices according to the dimension of analysis, such as devices in a certain geographical area, devices of a certain model, a certain batch of devices, devices of a certain manufacturer, and so on. In addition to storage and query, real-time analysis and calculation operations are often required: For IoT applications, the requirements for real-time calculation of data are often high, because real-time alarms are required based on calculation results to avoid accidents. Stable and predictable traffic: Given the number of IoTs and the frequency of data collection, the required bandwidth and traffic, and the size of newly generated data per day can be estimated more accurately.
The particularity of data processing: For example, it is necessary to check a certain quantity collected by a device at a specific time, but the actual collection time of the sensor is not at this time point, and interpolation processing is often required at this time. There are also many scenarios that require complex mathematical function calculations based on the amount of collection.