Full-Stack Architecture Upgrade: From Free Tier to Cloudflare's $5 Production Plan

My personal website yuxu.ge is more than just a blog—it's a playground for technical exploration, hosting a blog, photo gallery, AI assistant, semantic search, graffiti-style comments, and more. Initially, everything was built on Cloudflare's generous free tier. But as features grew more complex, the limitations of the free lunch became increasingly apparent.

Recently, I decided to upgrade to Cloudflare Workers' $5/month paid plan. This isn't just about spending five dollars more each month—it unlocks a powerful suite of production-grade tools, giving me the opportunity to completely restructure and upgrade the entire website architecture. This blog post documents the complete journey from "toy" to "production-grade" platform.

1. Pre-Upgrade Architecture: Dancing in Free-Tier Chains

Under the free tier, I pushed limited resources to their limits, building what appeared to be a fully-featured system.

Architecture Overview (Free Version)

                  ┌────────────────────────────────┐
                  │         yuxu.ge User           │
                  └────────────────────────────────┘
                           │
                           │ HTTPS
                           ▼
┌─────────────────────────────────────────────────────────────────┐
│                     Cloudflare Edge                             │
│                                                                 │
│  ┌────────────────┐   ┌─────────────────┐   ┌───────────────┐  │
│  │  Static Assets │──▶│ Cloudflare Worker│──▶│ Cloudflare KV │  │
│  │ (GitHub Pages) │   └─────────────────┘   └───────────────┘  │
│  └────────────────┘           │                    │           │
│                               │                    │ (Session) │
│                               ▼                    │           │
│                       ┌────────────┐               │           │
│                       │ OpenAI API │◀──────────────┘           │
│                       └────────────┘                           │
└─────────────────────────────────────────────────────────────────┘
                  │
                  │ (Browser Fallback)
                  ▼
         ┌──────────────────────────┐
         │      Browser             │
         │ ┌──────────────────────┐ │
         │ │ ONNX Runtime (WASM)  │ │
         │ │ BGE-small-zh Model   │ │
         │ └──────────────────────┘ │
         └──────────────────────────┘

Core System Analysis

AI Assistant & Semantic Search: The site's highlight. I implemented a RAG (Retrieval-Augmented Generation) system:

Server Mode: When users ask questions, the Worker calls OpenAI's text-embedding-3-small API to generate 512-dimensional vectors, then performs brute-force similarity search in KV (yes, iteration!), combined with BM25 keyword retrieval from a 2.6MB search-inverted.json inverted index. Results are fed as context to gpt-4o-mini for answer generation.
Browser Mode (Fallback): To save API calls and speed up cold starts, I implemented a pure client-side fallback using ONNX Runtime with the BGE-small-zh model (384 dimensions) for local vector computation.

Graffiti Comment System: Allows users to post comments anywhere on the page. Comments are stored in both static JSON files and KV. New comments go to KV; reads merge static JSON and KV data.

Authentication System: Simple OTP (One-Time Password) email verification. Sessions, users, and OTPs are all stored in KV, with HttpOnly cookies maintaining login state.

2. Bottlenecks and Pain Points: The "Original Sin" of Free Architecture

While this architecture worked, problems emerged as traffic and complexity grew.

Security Issues: Wide Open Doors

No CSRF Protection: All POST requests lack CSRF token validation—a serious vulnerability.
No Rate Limiting: Any API endpoint can be called unlimited times, potentially draining OpenAI credits or enabling attacks.
OTP Brute Force Risk: 6-digit numeric codes can theoretically be easily cracked without rate limiting.

Data Consistency: Schrödinger's Comments

Cloudflare KV has "eventual consistency." A write at one data center takes time (typically tens of seconds) to sync globally. This caused bizarre issues:

Disappearing Comments: User submits a comment, write goes to Singapore. Page refresh routes to Tokyo where data hasn't synced yet—comment "disappears," then "magically reappears" later.
Position Reversion: Drag-and-drop graffiti repositioning frequently reverted after refresh due to KV delays.

Performance Issues: Every Deploy is a Major Operation

Full Index Rebuilds: Every content update requires complete reconstruction of the 2.6MB BM25 index and all document vectors.
Huge JSON Files: Loading and parsing the inverted index increases cold start time.
Client Performance: MutationObserver watching the entire <body> causes unnecessary overhead.

Missing Features: The Eternal TODO List

Comment Management: No delete or edit functionality.
Content Moderation: No automatic spam or inappropriate content filtering.
Real-time Collaboration: No live updates when others interact with the graffiti wall.

3. Unlocking New Capabilities: The $5 Plan's Power Matrix

Five dollars monthly investment opens another door in the Cloudflare ecosystem:

Capability	Free Tier Limit	Paid Plan ($5) Unlock	Problems Solved
CPU Time	10ms (Bundled)	15 minutes (Unbound)	Complex computation, incremental indexing, AI inference
D1 Database	Unavailable	Edge SQLite, strong consistency	Data consistency, complex queries, relational data
Durable Objects	Unavailable	Strong consistency, WebSocket, Actor model	Real-time collaboration, state management
Workers AI	Unavailable	Edge GPU inference (Embeddings, LLM, etc.)	AI cost, latency, content moderation
Vectorize	Unavailable	Native vector database	Vector search performance, scalability
Logpush	Unavailable	Log streaming to storage	Observability, debugging
KV Quotas	Lower	Higher read/write/list operation quotas	Handle higher traffic

This isn't just quantitative change—it's qualitative transformation. I now have all the weapons needed to build a truly robust, scalable edge application.

4. Architecture Upgrade Blueprint: Four Phases to Production

I divided the upgrade into four phases, progressing steadily to ensure each step is solid.

New Architecture Overview (Paid Version)

                  ┌────────────────────────────────┐
                  │         yuxu.ge User           │
                  └────────────────────────────────┘
                           │
                           │ HTTPS / WebSocket
                           ▼
┌─────────────────────────────────────────────────────────────────┐
│                     Cloudflare Edge                             │
│                                                                 │
│  ┌────────────────┐   ┌─────────────────┐                      │
│  │  Static Assets │──▶│ Cloudflare Worker│──────────┐          │
│  │ (GitHub Pages) │   └─────────────────┘          │          │
│  └────────────────┘           │                    │          │
│                               │                    ▼          │
│  ┌────────────┐◀──────────────┤           ┌──────────────┐    │
│  │     D1     │               │           │   Durable    │    │
│  │ (Database) │               │           │   Objects    │    │
│  └────────────┘               │           │ (Real-time   │    │
│                               │           │  Graffiti)   │    │
│  ┌────────────┐◀──────────────┤           └──────────────┘    │
│  │ Vectorize  │               │                               │
│  │ (Vectors)  │               │                               │
│  └────────────┘               │                               │
│                               ▼                               │
│  ┌────────────┐       ┌────────────┐                          │
│  │ Workers AI │◀──────│    KV      │                          │
│  │ (GPU)      │       │(Cache/OTP) │                          │
│  └────────────┘       └────────────┘                          │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Phase 1: Foundation - Security Hardening & D1 Migration

First, address core security and data consistency issues.

1. Security Hardening:

CSRF Protection: Classic Double Submit Cookie pattern. On login, Worker generates CSRF token stored in both HttpOnly session cookie and regular cookie for frontend access. POST requests include X-CSRF-Token header.
Rate Limiting: KV-based sliding window counter keyed by IP address.

2. D1 Migration:

D1 is Cloudflare's edge SQLite database with strong consistency:

-- Users table
CREATE TABLE users (
    id TEXT PRIMARY KEY,
    email TEXT UNIQUE NOT NULL,
    nickname TEXT,
    created_at INTEGER DEFAULT (strftime('%s', 'now'))
);

-- Sessions table
CREATE TABLE sessions (
    id TEXT PRIMARY KEY,
    user_id TEXT NOT NULL,
    csrf_token TEXT NOT NULL,
    expires_at INTEGER NOT NULL,
    FOREIGN KEY (user_id) REFERENCES users(id)
);

-- Comments table
CREATE TABLE comments (
    id TEXT PRIMARY KEY,
    page_url TEXT NOT NULL,
    content TEXT NOT NULL,
    user_id TEXT NOT NULL,
    anchor_id TEXT,
    anchor_type TEXT,
    anchor_offset_x REAL,
    anchor_offset_y REAL,
    rotation REAL,
    status TEXT DEFAULT 'pending',
    created_at INTEGER DEFAULT (strftime('%s', 'now')),
    FOREIGN KEY (user_id) REFERENCES users(id)
);

CREATE INDEX idx_comments_page ON comments(page_url);
CREATE INDEX idx_comments_status ON comments(status);

Phase 2: Search Revolution - Embracing Vectorize and Workers AI

Restructure the entire search system with Cloudflare's native AI services.

Vectorize Replaces KV Storage:

// Insert vectors into Vectorize
const vectors = [{
  id: 'post_1_chunk_1',
  values: embedding, // from Workers AI
  metadata: { postId: 'post_1', text: '...' }
}];
await env.VECTOR_INDEX.insert(vectors);

// Query similar vectors
const results = await env.VECTOR_INDEX.query(queryVector, { topK: 5 });

Workers AI Replaces OpenAI Embeddings - Recommended: bge-m3

For the strongest alternative, BAAI's bge-m3 hosted on Cloudflare Workers AI is the top choice:

Model ID: @cf/baai/bge-m3
Multilingual Champion: Unlike OpenAI's English-centric models, bge-m3 excels in Chinese (and 100+ other languages), perfect for mixed-language environments
Long Context: Supports 8192 token input, handles long paper abstracts with ease
Dimensions: Outputs 1024-dimensional vectors

⚠️ Migration Warning: OpenAI's text-embedding-ada-002 and 3-small default to 1536 dimensions. You cannot mix them directly. Switching models requires re-indexing your vector database.

Cloudflare provides an OpenAI-compatible interface—no code logic changes needed:

from openai import OpenAI

client = OpenAI(
    api_key=os.environ.get("CLOUDFLARE_API_TOKEN"),
    base_url=f"https://api.cloudflare.com/client/v4/accounts/{CF_ACCOUNT_ID}/ai/v1"
)

# Code stays the same, just change model name
response = client.embeddings.create(
    model="@cf/baai/bge-m3",  # Replace text-embedding-3-small
    input=["User intends to apply for a PhD in AI Agents.", "用户想申请人工智能博士"]
)

# Note: embedding length is 1024, update database schema accordingly
print(len(response.data[0].embedding))  # 1024

Benefits:

Latency drops from seconds to hundreds of milliseconds
Significant cost reduction (Workers AI is nearly free)
Data stays entirely within Cloudflare ecosystem
Better Chinese semantic understanding than OpenAI
Incremental indexing support

Smart Image Tagging - Llama 3.2 Vision

Don't use outdated resnet-50 (it only outputs rigid ImageNet classifications like "Egyptian cat"). With the Paid plan, use Vision-Language Models (VLM):

Recommended Model: @cf/meta/llama-3.2-11b-vision-instruct
Capabilities: Not just object recognition—understands scenes, text (OCR), and relationships

// Smart Image Tagger Worker Example
export default {
  async fetch(request, env) {
    const imageUrl = "https://example.com/your-image.jpg";

    const imageRes = await fetch(imageUrl);
    const imageBuffer = await imageRes.arrayBuffer();
    const imageArray = [...new Uint8Array(imageBuffer)];

    const response = await env.AI.run(
      "@cf/meta/llama-3.2-11b-vision-instruct",
      {
        prompt: "Analyze this image and provide 5-10 relevant tags. Output ONLY a JSON array of strings.",
        image: imageArray
      }
    );

    return new Response(JSON.stringify(response));
  }
};

Comparison:

ResNet Output: {"label": "notebook", "score": 0.9} (too rigid)
Llama 3.2 Vision Output: ["personal knowledge management", "obsidian software", "digital garden", "graph view", "productivity"] (understands the Obsidian interface in your screenshot!)

Phase 3: Bringing Life - Durable Objects for Real-time Graffiti Wall

Durable Objects (DO) are strongly-consistent, stateful Worker instances perfect for real-time collaboration.

Create a DO instance for each commentable page:

export class DoodleWall {
  state: DurableObjectState;
  sessions: WebSocket[] = [];

  constructor(state: DurableObjectState) {
    this.state = state;
  }

  async fetch(request: Request) {
    // Upgrade to WebSocket
    const { webSocket, response } = new WebSocketPair();
    this.sessions.push(webSocket);
    webSocket.accept();

    // Load persisted doodles and send to new client
    const doodles = await this.state.storage.get('doodles') || {};
    webSocket.send(JSON.stringify({ type: 'INIT', payload: doodles }));

    webSocket.addEventListener('message', async (msg) => {
      const data = JSON.parse(msg.data as string);
      // Update state and persist
      await this.state.storage.put('doodles', updatedDoodles);

      // Broadcast to all clients
      this.broadcast(JSON.stringify({ type: 'UPDATE', payload: data }));
    });

    return response;
  }

  broadcast(message: string) {
    this.sessions.forEach(session => {
      try { session.send(message); } catch (e) {}
    });
  }
}

Now when one user drags a graffiti, all other users viewing that page see the position change in real-time!

Phase 4: Intelligence - Deep Workers AI Integration

Content Moderation: Call Workers AI before saving comments to D1:

const { result } = await env.AI.run('@cf/meta/llama-2-7b-chat-fp16', {
    prompt: `Is the following comment spam, hateful, or inappropriate?
             Answer with only "safe" or "unsafe".
             Comment: "${commentText}"`
});

if (result.includes('unsafe')) {
  comment.status = 'rejected';
}

Sentiment Analysis: Analyze comment sentiment to display different colors or emojis based on mood.

5. Cost Analysis: Is $5 Really Enough?

The most common question. Answer: Absolutely sufficient.

Resource	Free Quota	Estimated Usage	Extra Cost
Base Fee	-	-	$5/month
D1 Database	500M reads/5M writes/1GB	Far below limits	$0
Durable Objects	Pay-as-you-go	Only when users online	$1-3/month
Workers AI	Per neuron	~$0.0001/comment	<$1/month
Vectorize	Free beta	-	$0
Total			$6-9/month

This investment brings system stability, scalability, and unlimited possibilities—excellent value.

Conclusion: From "Working" to "Well-Built"

This architecture upgrade marks a significant milestone in my personal project development journey. It's not just a tech stack update—it's a transformation from "hacker" thinking to "engineer" thinking.

Farewell to Compromises: No more complex compensation logic for eventual consistency, no more security vulnerabilities, no more slow builds.
Embrace Native Services: Deep integration with Cloudflare's native services creates a highly cohesive, low-latency, maintainable system.
Future-Ready: The new architecture provides a solid foundation for future features like real-time collaboration and smarter AI applications.

For developers who love tinkering with personal projects, the Cloudflare Workers ecosystem offers a smooth growth path. Start free to validate ideas, then upgrade to world-class edge computing infrastructure for the price of a lunch when your project matures.

This is yuxu.ge's next chapter.