String Constant Pool Design for Video Attribute Lookup

In large-scale video search systems, document attributes are the foundation for building efficient indexes and fast retrieval. Each video document may contain dozens of attributes: title, author, duration, view count, category, tags, and more. In massive data scenarios, the performance of attribute lookup directly impacts the throughput of the entire system.

This article provides an in-depth analysis of the String Constant Pool design in industrial-grade code, exploring how compile-time constants can optimize runtime performance, with a clean-room reconstruction in Go.

The Problem: Attribute Lookup Performance Bottlenecks

In typical attribute lookup scenarios, we often do:

// Pseudo-code
if (doc.GetAttr("MediaDuration")) {
    // Process duration attribute
}

There's an easily overlooked performance issue here: every comparison is a runtime string comparison. String comparison requires character-by-character traversal, with O(n) time complexity in the worst case. When attribute lookup becomes a high-frequency path, this overhead accumulates into significant performance bottlenecks.

Industrial Solution: String Constant Pool

The original code uses numerous static constexpr TStringBuf to define attribute names:

// Original code
static constexpr TStringBuf DA_DURATION = "MediaDuration";
static constexpr TStringBuf DA_VIEWS = "views";
static constexpr TStringBuf DA_CATEGORY = "category";
// ... dozens of attribute constants

The core ideas of this design:

Compile-time certainty: Constants are determined at compile time, no runtime calculation needed
Pointer comparison: In some implementations, constant comparison can be transformed into pointer comparison (or integer comparison), much faster than string comparison
Type safety: Compilers can check type errors at compile time, avoiding runtime issues caused by typos

Trade-off Analysis

Advantages

Extreme performance: Attribute lookup transforms from O(n) string comparison to O(1) pointer/integer comparison
Zero runtime overhead: Constant definitions have no runtime cost
Code readability: Using AttrDuration is clearer than directly writing "MediaDuration"

Costs

Code space: Each constant occupies binary space
Maintenance burden: Adding new attributes requires manually adding constant definitions
Namespace: Numerous static variables may cause naming conflicts

Go Clean-Room Demonstration

Below is a clean-room demonstration in Go, reconstructing the above design philosophy:

package main

import (
	"fmt"
	"time"
)

// Attribute name constants - corresponds to C++ static constexpr TStringBuf
const (
	AttrVideoID    = "videoid"
	AttrAuthorID   = "authorid"
	AttrDuration   = "MediaDuration"
	AttrViews      = "views"
	AttrCategory   = "category"
	AttrSerialID   = "serial"
	// ... more attributes
)

// VideoDoc simulates a video document
type VideoDoc struct {
	attrs map[string]string
}

func NewVideoDoc() *VideoDoc {
	return &VideoDoc{
		attrs: make(map[string]string),
	}
}

func (d *VideoDoc) SetAttr(key, value string) {
	d.attrs[key] = value
}

func (d *VideoDoc) GetAttr(key string) (string, bool) {
	v, ok := d.attrs[key]
	return v, ok
}

// Method 1: Runtime string lookup (naive approach)
func getDurationNaive(doc *VideoDoc) string {
	if v, ok := doc.GetAttr("MediaDuration"); ok {
		return v
	}
	return ""
}

// Method 2: Constant lookup (optimized approach)
func getDurationOptimized(doc *VideoDoc) string {
	if v, ok := doc.GetAttr(AttrDuration); ok {
		return v
	}
	return ""
}

func main() {
	doc := NewVideoDoc()
	doc.SetAttr(AttrDuration, "3600")

	const iterations = 1000000

	// Performance test
	start := time.Now()
	for i := 0; i < iterations; i++ {
		_ = getDurationNaive(doc)
	}
	naiveTime := time.Since(start)

	start = time.Now()
	for i := 0; i < iterations; i++ {
		_ = getDurationOptimized(doc)
	}
	optimizedTime := time.Since(start)

	fmt.Printf("Naive approach: %v\n", naiveTime)
	fmt.Printf("Optimized approach: %v\n", optimizedTime)
}

Summary

This article provides an in-depth analysis of string constant pool design for video attribute lookup, exploring the following core trade-offs:

Compile-time vs Runtime: Constants are determined at compile time, no runtime overhead
Space vs Time: Trade code space for runtime performance
Development Efficiency vs Extreme Performance: Constant definitions require extra maintenance, but trade for better performance and readability

This design pattern is very common in large-scale systems. Understanding the trade-offs behind it is crucial for designing high-performance systems.