Technical Architecture - Jeff's Resume Chat

🚀 Technical Architecture

This resume chatbot is built using a modern serverless architecture with Retrieval-Augmented Generation (RAG) capabilities. Here's how it all works together.

🏗️ Core Components

📦 S3 Storage

Document Storage: Raw documents (PDF, TXT) before vectorization

Chat Data: Processed embeddings and chat history

Website: Static files (HTML, CSS, JS)

AWS S3

⚡ Lambda Functions

FastAPI Application: Runs using Mangum adapter

RAG Processing: Document embedding and similarity search

Serverless: Pay only for execution time

Python FastAPI Mangum

🌐 API Gateway

HTTP Routing: Handles incoming requests

HTTPS: Secure communication

Lambda Integration: Routes to appropriate functions

API Gateway

☁️ CloudFront

CDN: Global content delivery

SSL: Custom domain with TLS

Performance: Fast loading worldwide

CloudFront

🧠 RAG (Retrieval-Augmented Generation) System

This application uses advanced AI techniques to provide accurate, context-aware responses based on your professional documents.

How RAG Works

Document Processing Flow

1. Document Upload
PDF and text documents stored in S3

2. Text Chunking
Split into 500-token chunks with 50-token overlap

3. Embedding Generation
OpenAI text-embedding-ada-002 model creates vectors

4. Storage
Embeddings stored as JSON in S3 (practically free)

Query Processing Flow

1. User Question
User asks about experience, skills, etc.

2. Question Embedding
Convert question to vector using same model

3. Similarity Search
Cosine similarity finds top 3 most relevant chunks

4. Context Generation
GPT-3.5-turbo generates response using relevant context

💰 Cost Analysis

Monthly Cost Breakdown (Estimated)

Embedding Generation (one-time) ~$0.001

Storage (S3) ~$0.01

Lambda Executions ~$0.10

GPT-3.5-turbo (per query) ~$0.00075

Total per query ~$0.001

Why this is cost-effective:

text-embedding-ada-002: $0.0001 per 1K tokens (very cheap)
S3 Storage: Practically free for small datasets
Lambda Free Tier: 1M requests/month free
No Vector Database: Saves $50-200/month
Serverless: No idle infrastructure costs

🚀 Benefits of This Architecture

⚡ Serverless

No infrastructure to maintain, scales automatically

💰 Cost-Effective

Pay only for what you use, no idle costs

📈 Scalable

Handles any document size and query volume

🔧 Simple

No complex vector database setup required

⚡ Fast

Pre-computed embeddings for quick retrieval

🎯 Accurate

Uses state-of-the-art OpenAI embeddings

🛠️ Maintainable

Easy to update documents and retrain

🔒 Secure

HTTPS, IAM policies, secure API keys

🛠️ Technical Stack

Backend

Python 3.9+ FastAPI Mangum OpenAI PyPDF NumPy

AWS Services

Lambda API Gateway S3 CloudFront Route 53 ACM

Frontend

HTML5 CSS3 JavaScript Fetch API

AI/ML

OpenAI GPT-3.5 text-embedding-ada-002 Cosine Similarity RAG

🔐 Security & Permissions

S3 Bucket Policy: Allows Lambda function to read/write documents
IAM Roles: Least privilege access for Lambda execution
API Keys: OpenAI API key stored as environment variable
HTTPS: All communication encrypted with TLS
CORS: Configured for secure cross-origin requests

📊 Performance Metrics

Response Time

Embedding Search: ~200ms

GPT Generation: ~1-2 seconds

Total Response: ~1.5-2.5 seconds

Scalability

Concurrent Users: Unlimited

Document Size: Up to 50MB per file

Query Volume: 1000+ requests/minute

Reliability

Uptime: 99.9%+ (AWS SLA)

Backup: S3 versioning enabled

Monitoring: CloudWatch logs