Interview Prep Docs
HelioVision/2. Hardware and Edge / Data Serialization

Data Serialization

Serialization converts a complex data structure in memory (like an Object, Array, or Class) into a linear format (string or bytes) that can be stored in a file or transmitted over a network. When the destination receives it, it performs Deserialization.

If your Python script on an edge device detects a defect, it holds a Python Dictionary: {'id': 5, 'confidence': 0.99}. You cannot send a raw Python dictionary over a TCP socket to a Node.js server because Node doesn't understand Python memory allocation. It must be serialized.

Below are the most relevant formats for an Edge-to-Cloud architecture.

pie title Approximate Bandwidth Usage (Bytes per Message)
    "JSON (Text & Keys)" : 55
    "MessagePack (Binary Keys)" : 42
    "Protobuf (Strict Binary Schema)" : 15

1. JSON (JavaScript Object Notation)

The undisputed king of web APIs.

  • Pros: Human-readable, native to browsers/JavaScript, extremely easy to debug via network inspectors.
  • Cons: Very verbose (wastes bandwidth), slow to parse, strictly text-based.
  • Example (Python -> JSON String):
import json
data = {"defect": "scratch", "camera_id": 5, "confidence": 0.98}

# Serialize:
serialized_string = json.dumps(data)
# Result: '{"defect": "scratch", "camera_id": 5, "confidence": 0.98}'

2. Protocol Buffers (Protobuf)

Invented by Google. This is the industry standard for high-performance microservices (like gRPC). You define a strict schema in a .proto file, and a compiler generates the serialization code for C++, Python, JS, etc.

graph LR
    Schema["Defect.proto (Schema)"] --> Compiler["protoc (Compiler)"]
    Compiler --> Py["Python Class"]
    Compiler --> JS["Node.js Class"]

    Py -- "Tiny b'\x07' bytes over network" --> JS
  • Pros: Binary format, extremely compact, blazing fast, strongly typed (prevents massive bugs).
  • Cons: Not human-readable, requires a compilation step before you can use it.
  • Example: First, define the .proto schema file:
message DefectAlert {
  string defect_type = 1;
  int32 camera_id = 2;
  float confidence = 3;
}

Then use it in your code:

import alert_pb2 # Auto-generated by Protobuf compiler

alert = alert_pb2.DefectAlert()
alert.defect_type = "scratch"
alert.camera_id = 5
alert.confidence = 0.98

# Serialize:
serialized_bytes = alert.SerializeToString()
# Result (Raw Bytes): b'\n\x07scratch\x10\x05\x1d\xd7\xa3z?' (Notice how tiny this is compared to JSON!)

3. MessagePack (Highly Relevant IoT Standard!)

MessagePack acts exactly like JSON but naturally compresses into a binary format.

  • The Big Advantage: It does not require you to write a complex Schema file and compile it like Protobuf, but it still gives you the tiny binary size and speed.
  • Example:
import msgpack
data = {"defect": "scratch", "camera_id": 5, "confidence": 0.98}

# Serialize:
serialized_bytes = msgpack.packb(data)
# Result: b'\x83\xa6defect\xa7scratch\xa9camera_id\x05\xaaconfidence\xcb?\xef\\(\xf5\xc2\x8f\\'

4. FlatBuffers

Similar to Protobuf (binary schema), but meticulously designed so that the data can be accessed without parsing/unpacking. You can read a value directly from the raw byte buffer in memory across the network. It offers extreme performance for game engines and high-speed embedded systems.

  • Example (C++ reading directly from network RAM):
// Instantly read the FlatBuffer without an expensive unpacking loop:
auto alert = GetDefectAlert(network_buffer);
printf("Defect: %s\n", alert->defect_type()->c_str());

5. Struct Padding (C/C++ Bare Metal)

In embedded C (like your Automotive/AUTOSAR work), you often skip complex serialization libraries entirely to save memory. You define a raw struct and just send its exact memory layout over UART or UDP.

  • Example:
struct Message {
    uint8_t id;
    float value;
};
// Send raw memory bytes directly over network:
send(socket, &my_message, sizeof(Message));
  • The Danger (Endianness & Padding): Different CPUs (ARM on Raspberry Pi vs x86 on Cloud) might pad the struct differently or read bytes in reverse order (Little-Endian vs Big-Endian). This causes total data corruption. Explicit serialization libraries like Protobuf or MessagePack solve this safely across architectures.

6. The "Pickle" Anti-Pattern (Interview Trap!)

If you are doing an interview in Python, a junior engineer might suggest using Python's native pickle module to serialize data from the Raspberry Pi. Do NOT use this.

  1. Language Locked: A Node.js backend cannot read Python's custom Pickle format.
  2. Massive Security Vulnerability: Deserializing a Pickle object literally executes whatever code is inside it. If a hacker intercepts the message, they gain Remote Code Execution (RCE) on your server instantly.

Always use JSON, Protobuf, or MessagePack.