Building Hybrid Search for Static Blogs: BM25 + Vector Search with Cloudflare Workers

Static site generators are great until you need search. Most solutions either require a backend server or rely on basic client-side text matching. This post walks through building a hybrid search system that combines instant BM25 keyword matching with semantic vector search - all while keeping your site fully static.

The Architecture

┌─────────────────┐     ┌──────────────────┐     ┌─────────────────┐
│   Build Time    │     │   Cloudflare     │     │    Browser      │
│                 │     │     Worker       │     │                 │
│  Markdown       │     │                  │     │  ┌───────────┐  │
│  Posts          │     │  /api/embedding  │     │  │ Keyword   │  │
│      ↓          │     │       ↓          │     │  │ Search    │  │
│  ┌─────────┐    │     │  OpenAI API      │     │  │ (instant) │  │
│  │ Index   │    │     │  (protected)     │     │  └─────┬─────┘  │
│  │ Builder │    │     │       ↓          │     │        │        │
│  └────┬────┘    │     │  512-dim vector  │     │        ▼        │
│       │         │     │                  │     │  ┌───────────┐  │
│       ▼         │     └──────────────────┘     │  │  Merge    │  │
│  search.dat     │◄────────────────────────────►│  │  Results  │  │
│  inverted.json  │                              │  └───────────┘  │
│  metadata.json  │                              │        ▲        │
│                 │                              │  ┌─────┴─────┐  │
└─────────────────┘                              │  │  Vector   │  │
                                                 │  │  Search   │  │
                                                 │  │  (Voy)    │  │
                                                 │  └───────────┘  │
                                                 └─────────────────┘

Key design decisions:

No backend required - everything runs in the browser or edge
API key protection - OpenAI calls go through Cloudflare Worker
Progressive UX - keyword results appear instantly, AI results merge in
Hybrid ranking - Reciprocal Rank Fusion combines both result sets

Part 1: Building the Search Index

At build time, we generate three files:

File	Purpose	Size
`search.dat`	Voy vector index	~50KB
`search-inverted.json`	BM25 inverted index	~10KB
`search-metadata.json`	Title, date, preview	~5KB

The Index Builder

// scripts/index-builder.mjs
import { Voy } from 'voy-search/voy_search.js';

const CONFIG = {
    postsDir: 'blog/posts',
    dimensions: 512,        // text-embedding-3-small supports 512
    model: 'text-embedding-3-small',
    chunkSize: 500,
};

// Tokenizer for both Chinese and English
function tokenize(text) {
    const normalized = text.toLowerCase().trim();
    const tokens = normalized.match(/[\u4e00-\u9fff]+|[a-z0-9]+/g) || [];

    const result = [];
    for (const token of tokens) {
        if (/[\u4e00-\u9fff]/.test(token)) {
            // Chinese: unigrams + bigrams
            for (let i = 0; i < token.length; i++) {
                result.push(token[i]);
                if (i < token.length - 1) {
                    result.push(token.slice(i, i + 2));
                }
            }
        } else if (token.length >= 2) {
            result.push(token);
        }
    }

    return result.filter(t => t.length >= 2 && !STOPWORDS.has(t));
}

// Build inverted index for BM25
function buildInvertedIndex(documents) {
    const index = {};
    const docLengths = {};

    for (const doc of documents) {
        const tokens = tokenize(doc.title + ' ' + doc.content);
        const termFreq = {};

        for (const token of tokens) {
            termFreq[token] = (termFreq[token] || 0) + 1;
        }

        for (const [term, freq] of Object.entries(termFreq)) {
            if (!index[term]) index[term] = [];
            index[term].push({ id: doc.url, tf: freq });
        }

        docLengths[doc.url] = tokens.length;
    }

    return { index, docLengths, avgDocLength, docCount };
}

The bilingual tokenizer is crucial - it handles:

English: standard word tokenization with stopword removal
Chinese: character unigrams + bigrams (no word segmentation library needed)

Getting Embeddings from OpenAI

async function getEmbeddings(texts) {
    const response = await fetch('https://api.openai.com/v1/embeddings', {
        method: 'POST',
        headers: {
            'Authorization': `Bearer ${process.env.OPENAI_API_KEY}`,
            'Content-Type': 'application/json',
        },
        body: JSON.stringify({
            model: 'text-embedding-3-small',
            input: texts,
            dimensions: 512,  // Halves storage vs 1536
        }),
    });

    return (await response.json()).data.map(item => item.embedding);
}

Why 512 dimensions? OpenAI's text-embedding-3-small supports dimension reduction. Using 512 instead of 1536 cuts index size by ~65% with minimal quality loss.

Embedding Cache for Incremental Updates

Calling OpenAI API on every rebuild is slow and expensive. Content hashing enables incremental updates:

import crypto from 'crypto';

// Compute content hash
function contentHash(text) {
    return crypto.createHash('md5').update(text).digest('hex');
}

// Cache format: { [chunkId]: { hash, embedding } }
const cache = loadCache('.cache/embeddings.json');

for (const doc of documents) {
    const hash = contentHash(doc.embeddingText);

    if (cache[doc.id]?.hash === hash) {
        // Cache hit - skip API call
        embeddings[i] = cache[doc.id].embedding;
    } else {
        // Content changed - needs re-embedding
        toEmbed.push({ index: i, text: doc.embeddingText, hash, id: doc.id });
    }
}

// Only call API for new/changed documents
const newEmbeddings = await batchGetEmbeddings(toEmbed.map(t => t.text));

Result: First build requires full embedding (~93 batches), but subsequent updates only process changed articles. Adding one new post needs just 1-2 API calls instead of re-processing all 135 articles.

Part 2: The Cloudflare Worker

The Worker acts as a secure proxy, keeping your API key hidden from the browser.

// workers/embed-worker.js
export default {
    async fetch(request, env) {
        const corsHeaders = {
            "Access-Control-Allow-Origin": "https://yuxu.ge",
            "Access-Control-Allow-Methods": "POST, OPTIONS",
            "Access-Control-Allow-Headers": "Content-Type",
        };

        if (request.method === "OPTIONS") {
            return new Response(null, { headers: corsHeaders });
        }

        if (request.method !== "POST") {
            return new Response("Method Not Allowed", { status: 405 });
        }

        try {
            const { text } = await request.json();

            const response = await fetch("https://api.openai.com/v1/embeddings", {
                method: "POST",
                headers: {
                    "Content-Type": "application/json",
                    "Authorization": `Bearer ${env.OPENAI_API_KEY}`,
                },
                body: JSON.stringify({
                    input: text,
                    model: "text-embedding-3-small",
                    dimensions: 512,
                }),
            });

            const data = await response.json();
            return new Response(
                JSON.stringify({ embedding: data.data[0].embedding }),
                { headers: { ...corsHeaders, "Content-Type": "application/json" } }
            );
        } catch (err) {
            return new Response(
                JSON.stringify({ error: err.message }),
                { status: 500, headers: corsHeaders }
            );
        }
    },
};

Cloudflare setup:

Create a Worker at api-embedding.your-worker.workers.dev
Add route: yuxu.ge/api/* → your Worker
Set environment variable: OPENAI_API_KEY

Now your frontend calls /api/embedding and never sees the API key.

Part 3: The Hybrid Search Client

The magic happens in the browser. We run two searches in parallel:

// components/search-client.js
export class SearchClient {
    async init() {
        // Load all indexes in parallel
        const [voyModule, indexRes, metaRes, invertedRes] = await Promise.all([
            import('/lib/voy-loader.js').then(m => m.getVoy()),
            fetch('/search.dat'),
            fetch('/search-metadata.json'),
            fetch('/search-inverted.json'),
        ]);

        this.voy = voyModule.deserialize(await indexRes.text());
        this.metadata = await metaRes.json();
        this.invertedIndex = await invertedRes.json();
    }

    // BM25 keyword search - instant, no API call
    keywordSearch(query, limit = 10) {
        const tokens = tokenize(query);
        const { idx, dl, avgDL, N } = this.invertedIndex;
        const scores = {};

        for (const term of tokens) {
            const postings = idx[term];
            if (!postings) continue;

            const df = postings.length;
            const idf = Math.log((N - df + 0.5) / (df + 0.5) + 1);

            for (const { id, tf } of postings) {
                const docLen = dl[id] || avgDL;
                const k1 = 1.2, b = 0.75;
                const tfNorm = (tf * (k1 + 1)) /
                    (tf + k1 * (1 - b + b * docLen / avgDL));
                scores[id] = (scores[id] || 0) + idf * tfNorm;
            }
        }

        return Object.entries(scores)
            .sort((a, b) => b[1] - a[1])
            .slice(0, limit);
    }

    // Semantic search - requires API call
    async semanticSearch(query, limit = 10) {
        const embedding = await this.getVector(query);
        const results = this.voy.search(embedding, limit * 3);
        // Deduplicate by URL...
        return deduplicated;
    }

    // Hybrid search with Reciprocal Rank Fusion
    async search(query, limit = 5) {
        const [keywordResults, semanticResults] = await Promise.all([
            this.keywordSearch(query, limit * 2),
            this.semanticSearch(query, limit * 2),
        ]);

        // RRF merging
        const rrfScores = {};
        const k = 60;

        for (const r of keywordResults) {
            rrfScores[r.url] = (rrfScores[r.url] || 0) +
                0.4 / (k + r.rank);  // 40% weight
        }
        for (const r of semanticResults) {
            rrfScores[r.url] = (rrfScores[r.url] || 0) +
                0.6 / (k + r.rank);  // 60% weight
        }

        return Object.entries(rrfScores)
            .sort((a, b) => b[1] - a[1])
            .slice(0, limit);
    }
}

Why Reciprocal Rank Fusion?

RRF is elegantly simple: score = Σ 1/(k + rank) for each retriever. It doesn't require score normalization and handles the different scales of BM25 scores vs vector distances gracefully.

Part 4: Progressive UX

The key insight: keyword search is instant, semantic search needs an API call. We show keyword results immediately, then merge in AI results with animation:

async function handleSearch(query) {
    // 1. Instant keyword results
    const keywordResults = client.searchKeywordOnly(query);
    showResults(query, keywordResults, showAiLoading: true);

    // 2. AI search runs in background
    const semanticResults = await client.semanticSearch(query);

    // 3. Merge with animation
    mergeResults(semanticResults, keywordResults);
}

The CSS animations make the experience feel polished:

@keyframes slideIn {
    from { opacity: 0; transform: translateY(-10px); }
    to { opacity: 1; transform: translateY(0); }
}

.search-result-item.new {
    animation: slideIn 0.3s ease-out;
}

Each result shows badges indicating its source:

keyword (blue) - matched via BM25
AI (pink) - matched via vector similarity
Both badges if found by both methods

Performance Characteristics

Metric	Keyword Search	Semantic Search
Latency	<10ms	200-500ms
API calls	0	1
Index size	~10KB	~50KB
Ranking	BM25 (exact match)	Cosine similarity

The hybrid approach gives you the best of both worlds:

Exact matches via keyword search (searching "Docker" finds "Docker" immediately)
Semantic matches via vectors (searching "container security" finds Docker articles)

Deployment Checklist

Build indexes: OPENAI_API_KEY=sk-xxx node scripts/index-builder.mjs
Deploy Worker: Set up Cloudflare Worker with route mapping
Upload static files: search.dat, search-inverted.json, search-metadata.json
Test: Open browser console, type in search box

Conclusion

This architecture proves you don't need a backend server for sophisticated search. By splitting the work between build-time indexing, edge compute (Cloudflare Worker), and browser-side retrieval (Voy WASM), we get:

Security: API keys never touch the browser
Speed: Keyword results appear instantly
Quality: Semantic search understands intent
Cost: Only pay for actual search queries, not server uptime

The full source code is available at github.com/geyuxu/yuxu.ge.

Built with: OpenAI text-embedding-3-small, Voy WASM, Cloudflare Workers, vanilla JavaScript