🚀 Technical Architecture
This resume chatbot is built using a modern serverless architecture with Retrieval-Augmented Generation (RAG) capabilities. Here's how it all works together.
🏗️ Core Components
📦 S3 Storage
Document Storage: Raw documents (PDF, TXT) before vectorization
Chat Data: Processed embeddings and chat history
Website: Static files (HTML, CSS, JS)
⚡ Lambda Functions
FastAPI Application: Runs using Mangum adapter
RAG Processing: Document embedding and similarity search
Serverless: Pay only for execution time
🌐 API Gateway
HTTP Routing: Handles incoming requests
HTTPS: Secure communication
Lambda Integration: Routes to appropriate functions
☁️ CloudFront
CDN: Global content delivery
SSL: Custom domain with TLS
Performance: Fast loading worldwide
🧠 RAG (Retrieval-Augmented Generation) System
This application uses advanced AI techniques to provide accurate, context-aware responses based on your professional documents.
How RAG Works
Document Processing Flow
PDF and text documents stored in S3
Split into 500-token chunks with 50-token overlap
OpenAI text-embedding-ada-002 model creates vectors
Embeddings stored as JSON in S3 (practically free)
Query Processing Flow
User asks about experience, skills, etc.
Convert question to vector using same model
Cosine similarity finds top 3 most relevant chunks
GPT-3.5-turbo generates response using relevant context
💰 Cost Analysis
Monthly Cost Breakdown (Estimated)
Why this is cost-effective:
- text-embedding-ada-002: $0.0001 per 1K tokens (very cheap)
- S3 Storage: Practically free for small datasets
- Lambda Free Tier: 1M requests/month free
- No Vector Database: Saves $50-200/month
- Serverless: No idle infrastructure costs
🚀 Benefits of This Architecture
⚡ Serverless
No infrastructure to maintain, scales automatically
💰 Cost-Effective
Pay only for what you use, no idle costs
📈 Scalable
Handles any document size and query volume
🔧 Simple
No complex vector database setup required
⚡ Fast
Pre-computed embeddings for quick retrieval
🎯 Accurate
Uses state-of-the-art OpenAI embeddings
🛠️ Maintainable
Easy to update documents and retrain
🔒 Secure
HTTPS, IAM policies, secure API keys
🛠️ Technical Stack
Backend
AWS Services
Frontend
AI/ML
🔐 Security & Permissions
- S3 Bucket Policy: Allows Lambda function to read/write documents
- IAM Roles: Least privilege access for Lambda execution
- API Keys: OpenAI API key stored as environment variable
- HTTPS: All communication encrypted with TLS
- CORS: Configured for secure cross-origin requests
📊 Performance Metrics
Response Time
Embedding Search: ~200ms
GPT Generation: ~1-2 seconds
Total Response: ~1.5-2.5 seconds
Scalability
Concurrent Users: Unlimited
Document Size: Up to 50MB per file
Query Volume: 1000+ requests/minute
Reliability
Uptime: 99.9%+ (AWS SLA)
Backup: S3 versioning enabled
Monitoring: CloudWatch logs