Timestamped Value Design for Multimodal Video Search
In video search and recommendation systems, multimodal content recognition is a core capability. The system needs to process features from multiple modalities: video, audio, text, and more. These features come from different computation pipelines with varying processing delays and update frequencies. How to select among these features to ensure recommendation timeliness and accuracy is a critical system design problem.
This article provides an in-depth analysis of the Timestamped Value design in industrial-grade code, exploring how to use time-aware data merging strategies to select the freshest features in multimodal scenarios, with a clean-room reconstruction in Go.
The Problem: Feature Fusion Dilemma
In typical multimodal systems, feature fusion is often done like this:
// Pseudo-code - simple overwrite
if (newFeature.HasValue()) {
currentFeature = newFeature; // Direct overwrite
}
The drawbacks of this approach are obvious:
- No timeliness consideration: New features might be older than old ones (e.g., cache invalidation)
- No source consideration: Cannot distinguish between "calculation failed" and "old data"
- No initialization state: Cannot distinguish between "not set" and "zero value"
Industrial Solution: Timestamped Value Pattern
The original code uses TTsValue<T> template class to solve this problem:
template <typename T>
class TTsValue {
time_t LastUpdate;
T Value;
public:
bool Merge(const TTsValue& other) {
if (other.LastUpdate >= LastUpdate) {
LastUpdate = other.LastUpdate;
Value = other.Value;
return true;
}
return false;
}
};
The core ideas of this design:
- Value bound to timestamp: Each data point carries timing information
- Timestamp merge strategy: Only overwrite when new data is fresher
- Initialization state tracking: Use
LastUpdate > 0to indicate initialized
Trade-off Analysis
Advantages
- Data freshness guarantee: Always keep the most recently computed features
- Multi-source fusion friendly: Automatically handle different pipeline latency differences
- Clear initialization semantics: Distinguish between "not set" and "zero value"
Costs
- Memory overhead: Each value stores an additional timestamp
- Comparison overhead: Every merge requires timestamp comparison
- Thread safety: Additional synchronization needed for multi-threaded access
Go Clean-Room Demonstration
Below is a clean-room demonstration in Go, reconstructing the above design philosophy:
package main
import (
"fmt"
"time"
)
// TsValue - corresponds to C++ TTsValue<T>
type TsValue struct {
LastUpdate time.Time
Value interface{}
}
func NewTsValue(value interface{}) *TsValue {
return &TsValue{
LastUpdate: time.Now(),
Value: value,
}
}
// Merge strategy: only merge when incoming data is fresher
func (v *TsValue) Merge(other *TsValue) bool {
if other == nil {
return false
}
if other.LastUpdate.After(v.LastUpdate) {
v.LastUpdate = other.LastUpdate
v.Value = other.Value
return true
}
return false
}
func main() {
// Simulate multi-source data fusion
mergedFeature := &TsValue{LastUpdate: time.Unix(0, 0), Value: nil}
// Simulate different sources returning features
videoFeature := NewTsValue("video_feature")
mergedFeature.Merge(videoFeature)
fmt.Printf("Merged: %v\n", mergedFeature.GetValue())
}
Summary
This article provides an in-depth analysis of timestamped value design for multimodal content, exploring the following core trade-offs:
- Time-aware vs Simple Overwrite: Trade extra timestamp overhead for data freshness guarantee
- Initialization State: Use non-zero timestamp to indicate initialization
- Multi-source Fusion: Automatically handle different pipeline latency differences
This design pattern is very common in systems that need to handle multi-source heterogeneous data. Understanding the trade-offs behind it is crucial for designing reliable systems.