4.2.3 Streaming data

In the context of ADAS or CCAM testing, some data may be transmitted continuously and in real-time over a network of the internet, from one source node into a recipient node (peer-to-peer), or even from multiple sources into several recipients (broadcasting).

Streaming data refers to real-time data continuously transferred from a data source and processed as soon as it arrives its destination. A streaming data architecture focuses on technologies that allow to process data in motion, such us with extract-transform-load (ETL) batch processing paradigms.

Streaming data may be non-persistent, i.e., volatile, or generated, consumed, and destroyed without entering into any serialization or persistent process. However, it can also be made persistent, if captured and stored into storage systems by the processing node, and as such is subject to consideration under “Acquired or derived data”.

Functions and use cases using short range communication protocols, are especially susceptible to generate streaming data. It may be produced sparsely (e.g., at vehicles, or other RSU), and streamed through communication and networking architectures into recipient nodes at the edge or cloud. Examples of such data streams include (but are not limited to):

V2V, V2X messages (e.g., ISO CAM, DENM, CPM messages)
sensor data (e.g., video streaming, point cloud, vehicle data)
traffic data (e.g., road occupant statistics).

Video data streaming might need special treatment as it often demands higher bandwidth and lower latency networking to ensure images are delivered in real-time for its consumption at each specific use case. The main technical difference with other data flows is that video needs to flow continuously compared to other data types that might be buffered and processed in batch processing steps that collect data trunks to be processed as a group at some future time. Consequently, it is important to distinguish between streaming data use cases that require video (or equivalent data, such as point clouds), versus lightweight data streaming such as V2X messages or other metadata.

Therefore, streamed data needs to be taken into consideration in the context of CCAM project, as it may play a key role during the computation of KPIs, aggregation of information from multiple sources, or to provide provenance and traceability mechanisms.

Processing of data streams implies the node can produce updated responses observing only recent data, without the need to have the entire history of received data, which may be discarded once consumed. Such behaviour might be needed for a number of reasons: excessive data volume, high-frequency, or due to the presence of potentially sensitive or private information (e.g., in federated learning frameworks).

Depending on the nature of the data (e.g., light-weight messages, heavy raw sensor data, etc.), treatment and format of streaming data may be different, and the requirements of format might differ as well.

In general, data streaming implies sending data in small packets, rather than in large blocks, which allows for faster and more efficient data transmission, robustness against errors, etc. Data streaming requires a protocol stack that defines how to interpret binary data, de-codification methods, add necessary redundancy and headers at application level, and handle other technical aspects of data transmission. Some examples of technologies often used at transmission level include TCP (Transmission Control Protocol) and UDP (User Datagram Protocol).

TCP is a reliable protocol that ensures that data is transmitted accurately and in the correct order. It uses a three-way handshake to establish a connection between the sender and receiver and includes error-checking and flow control mechanisms.

UDP, on the other hand, is a faster protocol that does not include the same level of reliability as TCP. It is often used for applications that require fast data transmission, such as online gaming or streaming video. Other protocols are specifically designed for controlling the delivery of real-time data, such as audio or video, over a network. For instance, RTSP (Real-time Streaming Protocol) works by establishing a connection between a client and a server, and then sending requests and responses between the two to control the streaming session. RTSP is often used with other protocols, such as RTP (Real-time Transport Protocol) and RTCP (Real-time Transport Control Protocol) to cover the entire set of requirements of effective real-time data transmission.

There are several other real-time protocols that are used in applications. One example is ITS-G5, which is a communication stack specifically designed for Intelligent Transportation Systems (ITS) applications. ITS-G5 includes a variety of protocols, including IEEE 802.11p for wireless communication. WebRTC (Web Real-Time Communication) is another example of a real-time data streaming protocol. It is an open-source protocol that enables real-time communication between web browsers and other devices, such as smartphones and tablets. WebRTC uses several technologies, including audio and video codecs, NAT (Network Address Translation) traversal techniques, and SRTP (Secure Real-Time Transport Protocol) encryption, to facilitate real-time communication over the internet.