Full-Stack Architecture Upgrade: From Free Tier to Cloudflare's $5 Production Plan
My personal website yuxu.ge is more than just a blog—it's a playground for technical exploration, hosting a blog, photo gallery, AI assistant, semantic search, graffiti-style comments, and more. Initially, everything was built on Cloudflare's generous free tier. But as features grew more complex, the limitations of the free lunch became increasingly apparent.
Recently, I decided to upgrade to Cloudflare Workers' $5/month paid plan. This isn't just about spending five dollars more each month—it unlocks a powerful suite of production-grade tools, giving me the opportunity to completely restructure and upgrade the entire website architecture. This blog post documents the complete journey from "toy" to "production-grade" platform.
1. Pre-Upgrade Architecture: Dancing in Free-Tier Chains
Under the free tier, I pushed limited resources to their limits, building what appeared to be a fully-featured system.
Architecture Overview (Free Version)
┌────────────────────────────────┐
│ yuxu.ge User │
└────────────────────────────────┘
│
│ HTTPS
▼
┌─────────────────────────────────────────────────────────────────┐
│ Cloudflare Edge │
│ │
│ ┌────────────────┐ ┌─────────────────┐ ┌───────────────┐ │
│ │ Static Assets │──▶│ Cloudflare Worker│──▶│ Cloudflare KV │ │
│ │ (GitHub Pages) │ └─────────────────┘ └───────────────┘ │
│ └────────────────┘ │ │ │
│ │ │ (Session) │
│ ▼ │ │
│ ┌────────────┐ │ │
│ │ OpenAI API │◀──────────────┘ │
│ └────────────┘ │
└─────────────────────────────────────────────────────────────────┘
│
│ (Browser Fallback)
▼
┌──────────────────────────┐
│ Browser │
│ ┌──────────────────────┐ │
│ │ ONNX Runtime (WASM) │ │
│ │ BGE-small-zh Model │ │
│ └──────────────────────┘ │
└──────────────────────────┘
Core System Analysis
AI Assistant & Semantic Search: The site's highlight. I implemented a RAG (Retrieval-Augmented Generation) system:
Server Mode: When users ask questions, the Worker calls OpenAI's
text-embedding-3-smallAPI to generate 512-dimensional vectors, then performs brute-force similarity search in KV (yes, iteration!), combined with BM25 keyword retrieval from a 2.6MBsearch-inverted.jsoninverted index. Results are fed as context togpt-4o-minifor answer generation.Browser Mode (Fallback): To save API calls and speed up cold starts, I implemented a pure client-side fallback using ONNX Runtime with the
BGE-small-zhmodel (384 dimensions) for local vector computation.
Graffiti Comment System: Allows users to post comments anywhere on the page. Comments are stored in both static JSON files and KV. New comments go to KV; reads merge static JSON and KV data.
Authentication System: Simple OTP (One-Time Password) email verification. Sessions, users, and OTPs are all stored in KV, with HttpOnly cookies maintaining login state.
2. Bottlenecks and Pain Points: The "Original Sin" of Free Architecture
While this architecture worked, problems emerged as traffic and complexity grew.
Security Issues: Wide Open Doors
- No CSRF Protection: All POST requests lack CSRF token validation—a serious vulnerability.
- No Rate Limiting: Any API endpoint can be called unlimited times, potentially draining OpenAI credits or enabling attacks.
- OTP Brute Force Risk: 6-digit numeric codes can theoretically be easily cracked without rate limiting.
Data Consistency: Schrödinger's Comments
Cloudflare KV has "eventual consistency." A write at one data center takes time (typically tens of seconds) to sync globally. This caused bizarre issues:
- Disappearing Comments: User submits a comment, write goes to Singapore. Page refresh routes to Tokyo where data hasn't synced yet—comment "disappears," then "magically reappears" later.
- Position Reversion: Drag-and-drop graffiti repositioning frequently reverted after refresh due to KV delays.
Performance Issues: Every Deploy is a Major Operation
- Full Index Rebuilds: Every content update requires complete reconstruction of the 2.6MB BM25 index and all document vectors.
- Huge JSON Files: Loading and parsing the inverted index increases cold start time.
- Client Performance:
MutationObserverwatching the entire<body>causes unnecessary overhead.
Missing Features: The Eternal TODO List
- Comment Management: No delete or edit functionality.
- Content Moderation: No automatic spam or inappropriate content filtering.
- Real-time Collaboration: No live updates when others interact with the graffiti wall.
3. Unlocking New Capabilities: The $5 Plan's Power Matrix
Five dollars monthly investment opens another door in the Cloudflare ecosystem:
| Capability | Free Tier Limit | Paid Plan ($5) Unlock | Problems Solved |
|---|---|---|---|
| CPU Time | 10ms (Bundled) | 15 minutes (Unbound) | Complex computation, incremental indexing, AI inference |
| D1 Database | Unavailable | Edge SQLite, strong consistency | Data consistency, complex queries, relational data |
| Durable Objects | Unavailable | Strong consistency, WebSocket, Actor model | Real-time collaboration, state management |
| Workers AI | Unavailable | Edge GPU inference (Embeddings, LLM, etc.) | AI cost, latency, content moderation |
| Vectorize | Unavailable | Native vector database | Vector search performance, scalability |
| Logpush | Unavailable | Log streaming to storage | Observability, debugging |
| KV Quotas | Lower | Higher read/write/list operation quotas | Handle higher traffic |
This isn't just quantitative change—it's qualitative transformation. I now have all the weapons needed to build a truly robust, scalable edge application.
4. Architecture Upgrade Blueprint: Four Phases to Production
I divided the upgrade into four phases, progressing steadily to ensure each step is solid.
New Architecture Overview (Paid Version)
┌────────────────────────────────┐
│ yuxu.ge User │
└────────────────────────────────┘
│
│ HTTPS / WebSocket
▼
┌─────────────────────────────────────────────────────────────────┐
│ Cloudflare Edge │
│ │
│ ┌────────────────┐ ┌─────────────────┐ │
│ │ Static Assets │──▶│ Cloudflare Worker│──────────┐ │
│ │ (GitHub Pages) │ └─────────────────┘ │ │
│ └────────────────┘ │ │ │
│ │ ▼ │
│ ┌────────────┐◀──────────────┤ ┌──────────────┐ │
│ │ D1 │ │ │ Durable │ │
│ │ (Database) │ │ │ Objects │ │
│ └────────────┘ │ │ (Real-time │ │
│ │ │ Graffiti) │ │
│ ┌────────────┐◀──────────────┤ └──────────────┘ │
│ │ Vectorize │ │ │
│ │ (Vectors) │ │ │
│ └────────────┘ │ │
│ ▼ │
│ ┌────────────┐ ┌────────────┐ │
│ │ Workers AI │◀──────│ KV │ │
│ │ (GPU) │ │(Cache/OTP) │ │
│ └────────────┘ └────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
Phase 1: Foundation - Security Hardening & D1 Migration
First, address core security and data consistency issues.
1. Security Hardening:
- CSRF Protection: Classic Double Submit Cookie pattern. On login, Worker generates CSRF token stored in both HttpOnly session cookie and regular cookie for frontend access. POST requests include
X-CSRF-Tokenheader. - Rate Limiting: KV-based sliding window counter keyed by IP address.
2. D1 Migration:
D1 is Cloudflare's edge SQLite database with strong consistency:
-- Users table
CREATE TABLE users (
id TEXT PRIMARY KEY,
email TEXT UNIQUE NOT NULL,
nickname TEXT,
created_at INTEGER DEFAULT (strftime('%s', 'now'))
);
-- Sessions table
CREATE TABLE sessions (
id TEXT PRIMARY KEY,
user_id TEXT NOT NULL,
csrf_token TEXT NOT NULL,
expires_at INTEGER NOT NULL,
FOREIGN KEY (user_id) REFERENCES users(id)
);
-- Comments table
CREATE TABLE comments (
id TEXT PRIMARY KEY,
page_url TEXT NOT NULL,
content TEXT NOT NULL,
user_id TEXT NOT NULL,
anchor_id TEXT,
anchor_type TEXT,
anchor_offset_x REAL,
anchor_offset_y REAL,
rotation REAL,
status TEXT DEFAULT 'pending',
created_at INTEGER DEFAULT (strftime('%s', 'now')),
FOREIGN KEY (user_id) REFERENCES users(id)
);
CREATE INDEX idx_comments_page ON comments(page_url);
CREATE INDEX idx_comments_status ON comments(status);
Phase 2: Search Revolution - Embracing Vectorize and Workers AI
Restructure the entire search system with Cloudflare's native AI services.
Vectorize Replaces KV Storage:
// Insert vectors into Vectorize
const vectors = [{
id: 'post_1_chunk_1',
values: embedding, // from Workers AI
metadata: { postId: 'post_1', text: '...' }
}];
await env.VECTOR_INDEX.insert(vectors);
// Query similar vectors
const results = await env.VECTOR_INDEX.query(queryVector, { topK: 5 });
Workers AI Replaces OpenAI Embeddings - Recommended: bge-m3
For the strongest alternative, BAAI's bge-m3 hosted on Cloudflare Workers AI is the top choice:
- Model ID:
@cf/baai/bge-m3 - Multilingual Champion: Unlike OpenAI's English-centric models,
bge-m3excels in Chinese (and 100+ other languages), perfect for mixed-language environments - Long Context: Supports 8192 token input, handles long paper abstracts with ease
- Dimensions: Outputs 1024-dimensional vectors
⚠️ Migration Warning: OpenAI's text-embedding-ada-002 and 3-small default to 1536 dimensions. You cannot mix them directly. Switching models requires re-indexing your vector database.
Cloudflare provides an OpenAI-compatible interface—no code logic changes needed:
from openai import OpenAI
client = OpenAI(
api_key=os.environ.get("CLOUDFLARE_API_TOKEN"),
base_url=f"https://api.cloudflare.com/client/v4/accounts/{CF_ACCOUNT_ID}/ai/v1"
)
# Code stays the same, just change model name
response = client.embeddings.create(
model="@cf/baai/bge-m3", # Replace text-embedding-3-small
input=["User intends to apply for a PhD in AI Agents.", "用户想申请人工智能博士"]
)
# Note: embedding length is 1024, update database schema accordingly
print(len(response.data[0].embedding)) # 1024
Benefits:
- Latency drops from seconds to hundreds of milliseconds
- Significant cost reduction (Workers AI is nearly free)
- Data stays entirely within Cloudflare ecosystem
- Better Chinese semantic understanding than OpenAI
- Incremental indexing support
Smart Image Tagging - Llama 3.2 Vision
Don't use outdated resnet-50 (it only outputs rigid ImageNet classifications like "Egyptian cat"). With the Paid plan, use Vision-Language Models (VLM):
- Recommended Model:
@cf/meta/llama-3.2-11b-vision-instruct - Capabilities: Not just object recognition—understands scenes, text (OCR), and relationships
// Smart Image Tagger Worker Example
export default {
async fetch(request, env) {
const imageUrl = "https://example.com/your-image.jpg";
const imageRes = await fetch(imageUrl);
const imageBuffer = await imageRes.arrayBuffer();
const imageArray = [...new Uint8Array(imageBuffer)];
const response = await env.AI.run(
"@cf/meta/llama-3.2-11b-vision-instruct",
{
prompt: "Analyze this image and provide 5-10 relevant tags. Output ONLY a JSON array of strings.",
image: imageArray
}
);
return new Response(JSON.stringify(response));
}
};
Comparison:
- ResNet Output:
{"label": "notebook", "score": 0.9}(too rigid) - Llama 3.2 Vision Output:
["personal knowledge management", "obsidian software", "digital garden", "graph view", "productivity"](understands the Obsidian interface in your screenshot!)
Phase 3: Bringing Life - Durable Objects for Real-time Graffiti Wall
Durable Objects (DO) are strongly-consistent, stateful Worker instances perfect for real-time collaboration.
Create a DO instance for each commentable page:
export class DoodleWall {
state: DurableObjectState;
sessions: WebSocket[] = [];
constructor(state: DurableObjectState) {
this.state = state;
}
async fetch(request: Request) {
// Upgrade to WebSocket
const { webSocket, response } = new WebSocketPair();
this.sessions.push(webSocket);
webSocket.accept();
// Load persisted doodles and send to new client
const doodles = await this.state.storage.get('doodles') || {};
webSocket.send(JSON.stringify({ type: 'INIT', payload: doodles }));
webSocket.addEventListener('message', async (msg) => {
const data = JSON.parse(msg.data as string);
// Update state and persist
await this.state.storage.put('doodles', updatedDoodles);
// Broadcast to all clients
this.broadcast(JSON.stringify({ type: 'UPDATE', payload: data }));
});
return response;
}
broadcast(message: string) {
this.sessions.forEach(session => {
try { session.send(message); } catch (e) {}
});
}
}
Now when one user drags a graffiti, all other users viewing that page see the position change in real-time!
Phase 4: Intelligence - Deep Workers AI Integration
Content Moderation: Call Workers AI before saving comments to D1:
const { result } = await env.AI.run('@cf/meta/llama-2-7b-chat-fp16', {
prompt: `Is the following comment spam, hateful, or inappropriate?
Answer with only "safe" or "unsafe".
Comment: "${commentText}"`
});
if (result.includes('unsafe')) {
comment.status = 'rejected';
}
Sentiment Analysis: Analyze comment sentiment to display different colors or emojis based on mood.
5. Cost Analysis: Is $5 Really Enough?
The most common question. Answer: Absolutely sufficient.
| Resource | Free Quota | Estimated Usage | Extra Cost |
|---|---|---|---|
| Base Fee | - | - | $5/month |
| D1 Database | 500M reads/5M writes/1GB | Far below limits | $0 |
| Durable Objects | Pay-as-you-go | Only when users online | $1-3/month |
| Workers AI | Per neuron | ~$0.0001/comment | <$1/month |
| Vectorize | Free beta | - | $0 |
| Total | $6-9/month |
This investment brings system stability, scalability, and unlimited possibilities—excellent value.
Conclusion: From "Working" to "Well-Built"
This architecture upgrade marks a significant milestone in my personal project development journey. It's not just a tech stack update—it's a transformation from "hacker" thinking to "engineer" thinking.
- Farewell to Compromises: No more complex compensation logic for eventual consistency, no more security vulnerabilities, no more slow builds.
- Embrace Native Services: Deep integration with Cloudflare's native services creates a highly cohesive, low-latency, maintainable system.
- Future-Ready: The new architecture provides a solid foundation for future features like real-time collaboration and smarter AI applications.
For developers who love tinkering with personal projects, the Cloudflare Workers ecosystem offers a smooth growth path. Start free to validate ideas, then upgrade to world-class edge computing infrastructure for the price of a lunch when your project matures.
This is yuxu.ge's next chapter.