Upendra Pant
Upendra P.

Architecting Scalable RAG Systems & Autonomous Agents

I take AI from prototype to production, with a focus on systems that are reliable, observable, and built to last.

Available for Work
10+AI Systems Shipped
40%Avg. Cost Reduction
10M+Documents Processed
1000+ hoursManual Time Saved
Industries Served
Enterprise SaaSContent AutomationCustomer SupportEdTech Platform
02 / Work

SELECTED WORK

View GitHub
Print-on-Demand CLI Tool01

POD Automator

End-to-end workflow automated in a single command

CLI tool that automates the full print-on-demand pipeline — from AI image generation to SEO-optimized listings and scheduled RedBubble uploads. Built for freelancers who want to ship designs without the manual overhead.

PythonSeleniumOpenAIClickPillow
Adobe CC Caption Add-on02

CaptionForge

Production-ready add-on with Document Sandbox integration

A full-featured Adobe CC add-on for advanced video captioning, built with React and TypeScript inside Adobe's Document Sandbox Runtime. Supports customizable caption styles, local storage, and a responsive UI across Adobe applications.

ReactTypeScriptAdobe SDKWebpackCSS
NLP-Powered Spotify Playlist Generator03

MoodQueue

Natural language to playlist in under 2 seconds

Full-stack app that lets users describe a mood in plain English and get a matched Spotify playlist back. A React frontend talks to a Python backend that maps natural language to Spotify's audio feature API — because keyword search doesn't understand feelings.

ReactPythonSpotify APINLPFastAPI
Document RAG System04

DocMind

Instant answers from your own documents

Streamlit-based RAG app that lets users upload documents and query them in natural language. Uses ChromaDB for vector storage and Gemini as the reasoning layer — making document search feel like talking to someone who actually read everything.

StreamlitChromaDBGeminiPythonLangChain
03 / Stack

TECHNICAL TOOLKIT

Languages

  • Python
  • TypeScript
  • SQL

LLM Frameworks

  • LangChain
  • LlamaIndex
  • LangGraph
  • Haystack

Models

  • OpenAI
  • Claude
  • Gemini
  • Llama
  • Whisper

Vector DBs

  • Pinecone
  • ChromaDB

Infrastructure

  • AWS
  • Docker
  • Vercel

Backend

  • FastAPI
  • PostgreSQL
  • Supabase
Engineering Philosophy

Why I Prioritize Modular Agent Design

Every system I build follows a composable, evaluation-first approach. Agents should be testable in isolation, observable in production, and replaceable without rewriting the orchestration layer. I optimize for debuggability over cleverness, and measure everything.

04 / Writing & Code

HOW I THINK

Technical Writing

Engineering

Optimizing Vector Search Latency at Scale

A deep dive into HNSW parameter tuning, index sharding, and pre-filtering strategies that brought p99 latency under 200ms for 10M+ documents.

RAG Systems

Handling Context Window Limits in Production RAG

Practical strategies for chunking, summarization chains, and dynamic context assembly when your retrieval returns more than the model can handle.

ML Ops

Evaluating LLM Outputs Without Human Labels

Building automated evaluation pipelines using LLM-as-judge, reference-free metrics, and regression testing for continuous deployment.

Pinned Repositories

View GitHub
05 / Contact

LET'S BUILD YOUR AI INFRASTRUCTURE

Have a project in mind? I work best when the problem is hard and the stakes are real.

© 2026 Upendra Pant. All rights reserved.