🚀 Technical Architecture

This resume chatbot is built using a modern serverless architecture with Retrieval-Augmented Generation (RAG) capabilities. Here's how it all works together.

🏗️ Core Components

📦 S3 Storage

Document Storage: Raw documents (PDF, TXT) before vectorization

Chat Data: Processed embeddings and chat history

Website: Static files (HTML, CSS, JS)

AWS S3

⚡ Lambda Functions

FastAPI Application: Runs using Mangum adapter

RAG Processing: Document embedding and similarity search

Serverless: Pay only for execution time

Python FastAPI Mangum

🌐 API Gateway

HTTP Routing: Handles incoming requests

HTTPS: Secure communication

Lambda Integration: Routes to appropriate functions

API Gateway

☁️ CloudFront

CDN: Global content delivery

SSL: Custom domain with TLS

Performance: Fast loading worldwide

CloudFront

🧠 RAG (Retrieval-Augmented Generation) System

This application uses advanced AI techniques to provide accurate, context-aware responses based on your professional documents.

How RAG Works

Document Processing Flow

1. Document Upload
PDF and text documents stored in S3
2. Text Chunking
Split into 500-token chunks with 50-token overlap
3. Embedding Generation
OpenAI text-embedding-ada-002 model creates vectors
4. Storage
Embeddings stored as JSON in S3 (practically free)

Query Processing Flow

1. User Question
User asks about experience, skills, etc.
2. Question Embedding
Convert question to vector using same model
3. Similarity Search
Cosine similarity finds top 3 most relevant chunks
4. Context Generation
GPT-3.5-turbo generates response using relevant context

💰 Cost Analysis

Monthly Cost Breakdown (Estimated)

Embedding Generation (one-time) ~$0.001
Storage (S3) ~$0.01
Lambda Executions ~$0.10
GPT-3.5-turbo (per query) ~$0.00075
Total per query ~$0.001

Why this is cost-effective:

🚀 Benefits of This Architecture

⚡ Serverless

No infrastructure to maintain, scales automatically

💰 Cost-Effective

Pay only for what you use, no idle costs

📈 Scalable

Handles any document size and query volume

🔧 Simple

No complex vector database setup required

⚡ Fast

Pre-computed embeddings for quick retrieval

🎯 Accurate

Uses state-of-the-art OpenAI embeddings

🛠️ Maintainable

Easy to update documents and retrain

🔒 Secure

HTTPS, IAM policies, secure API keys

🛠️ Technical Stack

Backend

Python 3.9+ FastAPI Mangum OpenAI PyPDF NumPy

AWS Services

Lambda API Gateway S3 CloudFront Route 53 ACM

Frontend

HTML5 CSS3 JavaScript Fetch API

AI/ML

OpenAI GPT-3.5 text-embedding-ada-002 Cosine Similarity RAG

🔐 Security & Permissions

📊 Performance Metrics

Response Time

Embedding Search: ~200ms

GPT Generation: ~1-2 seconds

Total Response: ~1.5-2.5 seconds

Scalability

Concurrent Users: Unlimited

Document Size: Up to 50MB per file

Query Volume: 1000+ requests/minute

Reliability

Uptime: 99.9%+ (AWS SLA)

Backup: S3 versioning enabled

Monitoring: CloudWatch logs