Building a RAG-Powered AI Assistant for My Personal Website

I recently added an AI chat assistant to my personal website that can answer questions about my blog posts, projects, and background. This post documents the implementation journey, from architecture design to security considerations.

The Goal

Create a floating chat widget that:

Answers questions about blog content using RAG (Retrieval-Augmented Generation)
Provides friendly technical discussions
Handles sensitive topics gracefully
Works entirely on static hosting (GitHub Pages) with Cloudflare Workers as the API layer

Architecture Overview

┌─────────────────────────────────────────────────────────────────┐
│                        Browser (Frontend)                        │
├─────────────────────────────────────────────────────────────────┤
│  Chat Widget ──────▶ Search Client                              │
│       │              ├─ BM25 Keyword Search (local)             │
│       │              └─ Voy WASM Semantic Search                │
│       │                        │                                │
│       │◀───────────────────────┘ (Top 3 chunks as context)      │
└───────┼─────────────────────────────────────────────────────────┘
        │ POST /api/chat { messages, context }
        ▼
┌─────────────────────────────────────────────────────────────────┐
│              Cloudflare Worker (yuxu.ge/api/*)                  │
│  ┌─────────────────┐        ┌─────────────────┐                 │
│  │ /api/embedding  │        │   /api/chat     │                 │
│  │ (query vector)  │        │ system_prompt   │                 │
│  └────────┬────────┘        │ + RAG context   │                 │
│           │                 └────────┬────────┘                 │
└───────────┼──────────────────────────┼──────────────────────────┘
            │                          │
            ▼                          ▼
┌─────────────────────────────────────────────────────────────────┐
│                         OpenAI API                              │
│  text-embedding-3-small (512d)    gpt-4o-mini                   │
└─────────────────────────────────────────────────────────────────┘

Implementation

1. Chat Widget (Frontend)

The chat widget is a self-contained JavaScript module that creates a floating bubble UI:

export class ChatWidget {
    constructor() {
        this.messages = [];
        this.isOpen = false;
        this.searchClient = null;
    }

    async sendMessage(text) {
        // Get RAG context from search client
        let context = '';
        if (this.searchClient?.isReady()) {
            const results = await this.searchClient.search(text, 3);
            context = results.map(r => `[${r.title}]\n${r.text}`).join('\n\n');
        }

        // Call chat API with context
        const response = await fetch('https://yuxu.ge/api/chat', {
            method: 'POST',
            headers: { 'Content-Type': 'application/json' },
            body: JSON.stringify({
                messages: this.messages,
                context,
            }),
        });

        const data = await response.json();
        return data.reply;
    }
}

Key features:

Markdown rendering: Converts **bold** and [links](url) to HTML
CSS-in-JS: All styles are injected dynamically, no external dependencies
ESC to close: Keyboard accessibility
Mobile responsive: Adapts to screen size

2. Hybrid Search for RAG

I already had a search system built with:

BM25 keyword search: Local inverted index for exact term matching
Voy WASM semantic search: Vector similarity using pre-computed embeddings
RRF fusion: Combines both results using Reciprocal Rank Fusion

The chat widget reuses this existing infrastructure:

const [keywordResults, semanticResults] = await Promise.all([
    this.keywordSearch(query, limit * 2),
    this.semanticSearch(query, limit * 2),
]);

// Merge using RRF
for (const result of keywordResults) {
    rrfScores[result.url] = keywordWeight / (k + result.rank);
}
for (const result of semanticResults) {
    rrfScores[result.url] += semanticWeight / (k + result.rank);
}

3. Cloudflare Worker (API Proxy)

The Worker handles two endpoints:

/api/embedding - Converts query text to vector for semantic search:

const response = await fetch('https://api.openai.com/v1/embeddings', {
    method: 'POST',
    headers: {
        'Authorization': `Bearer ${env.OPENAI_API_KEY}`,
        'Content-Type': 'application/json',
    },
    body: JSON.stringify({
        model: 'text-embedding-3-small',
        input: text,
        dimensions: 512,
    }),
});

/api/chat - Chat completion with RAG context:

const systemMessage = context
    ? `${env.system_prompt}\n\n## Related blog content:\n${context}`
    : env.system_prompt;

const response = await fetch('https://api.openai.com/v1/chat/completions', {
    method: 'POST',
    headers: { 'Authorization': `Bearer ${env.OPENAI_API_KEY}` },
    body: JSON.stringify({
        model: 'gpt-4o-mini',
        messages: [
            { role: 'system', content: systemMessage },
            ...messages,
        ],
    }),
});

4. Security: Handling Sensitive Topics

For a personal website, I needed the assistant to:

Freely discuss the website owner (me)
Refuse political/sensitive topics gracefully

This is handled in the system prompt (stored as environment variable):

## About the website owner
Yuxu Ge is the website owner. You can freely discuss:
- Professional background, technical experience, projects
- Blog content, technical opinions
- Public information (education, work history)

## Conversation boundaries
Politely decline and redirect for:
- Political figures, government policies, geopolitical disputes
- Religious or ideological debates

When declining: "This is beyond my scope as a technical assistant.
Shall we discuss something technical instead?"

## Public contact info (can share)
- Email: [email protected]
- GitHub: https://github.com/geyuxu
- LinkedIn: https://linkedin.com/in/yuxuge

5. Conversation History with localStorage

For better UX, the chat widget persists conversation history in the browser's localStorage:

const CHAT_CONFIG = {
    storageKey: 'chat_history',
    historyTTL: 24 * 60 * 60 * 1000, // 24 hours
};

// Save after each successful response
saveHistory() {
    const data = {
        messages: this.messages.slice(-20),
        timestamp: Date.now(),
    };
    localStorage.setItem(CHAT_CONFIG.storageKey, JSON.stringify(data));
}

// Load on widget initialization
loadHistory() {
    const raw = localStorage.getItem(CHAT_CONFIG.storageKey);
    if (!raw) return;

    const data = JSON.parse(raw);

    // Check TTL expiration
    if (Date.now() - data.timestamp > CHAT_CONFIG.historyTTL) {
        localStorage.removeItem(CHAT_CONFIG.storageKey);
        return;
    }

    this.messages = data.messages;
    this.renderHistory();
}

Features:

Auto-save: Saves after each assistant response
Auto-restore: Restores conversation when page loads
24-hour TTL: Automatically clears stale conversations
Clear button: Manual clear via trash icon in header

This is a pragmatic choice over server-side storage (Cloudflare KV) because:

Most visitors have one-time conversations
No user identification needed
Zero additional cost
Simpler implementation

Lessons Learned

Environment variables > hardcoded prompts: Storing system_prompt as a Cloudflare environment variable allows prompt iteration without code deployment.
Reuse existing search infrastructure: Building RAG on top of an existing hybrid search system saved significant effort.
Nuanced content filtering: Initial attempts at sensitive topic filtering were too aggressive (blocking questions about myself). The key is explicitly whitelisting allowed topics.
Markdown in chat: Simple regex-based markdown parsing is sufficient for bold and links. No need for heavy libraries.
Start simple with state management: localStorage is sufficient for conversation history. Server-side storage (KV) adds complexity without clear benefit for a personal site.

Cost Analysis

With gpt-4o-mini and text-embedding-3-small:

Embedding: ~$0.00002 per query
Chat: ~$0.0001-0.0005 per response (depending on context length)
Estimated monthly cost: < $5 for moderate traffic

Next Steps

Add streaming responses for better UX
~~Implement conversation memory~~ (Done with localStorage)
Add usage analytics
Support image understanding for blog screenshots

The complete implementation is open source in my website repository. Feel free to adapt it for your own projects!