Hello, I'm

Balachandra Devarangadi Sunil

Software Engineer · Applied Scientist · MS CS @ UMass Amherst

Software engineer who likes building thoughtful, playful things.

scroll to explore ↓

About Me

Hi! I'm Balachandra, a graduate student at UMass Amherst who loves turning ideas into polished, working software. I care about clean architecture, thoughtful details, and building things that actually feel good to use.

When I'm not coding, I'm usually outdoors, sketching something, or tinkering with side projects like this one.

By training, I'm a machine learning researcher and software engineer with 2+ years of experience spanning product engineering and applied AI. My M.S. work at UMass focused on LLMs, retrieval, and efficient ML systems, including research on controllable memory for LLM agents, RAG pipelines, and edge inference, I'm drawn to questions about how models behave and how to make them genuinely useful in the real world.

Before grad school, I spent time at Epsilon building backend systems and data pipelines that needed to just work at scale, and more recently I've been a Graduate Research Extern at Microsoft exploring memory frameworks for LLM agents. What excites me most is the space where solid engineering meets real research, that's usually where the interesting problems live.

Experience

Accomplishments

  • Architected a dual-latent encoder splitting memories into a stable 'gist' latent and a noisable 'detail' latent, using cosine-scheduled forward diffusion to simulate forgetting and reverse diffusion to simulate cue-conditioned recall.
  • Ran ablations across one-step vs. multi-step latent estimation, denoiser training length, and semantic projection modules, improving reconstruction quality (BERTScore F1 up to 0.8251) through targeted architectural changes.
  • Evaluated selective recall of PII on a 1.4M-record synthetic privacy dataset (Privasis), uncovering a key finding: the model abstracts or fabricates identity details while reliably preserving event-level structure, a discovery with direct implications for privacy-preserving memory systems.

Tools

PyTorchDiffusion ModelsHuggingFace TransformersFlan-T5Weights & BiasesSLURMPython

Accomplishments

  • Achieved up to 90.9% latency improvement (11x speedup) on individual ResNet50V2 partitions and drove the system-wide result of up to 63.8% mean latency reduction for single-tenant and 77.4% for multi-tenant workloads versus the default Edge TPU compiler.
  • Diagnosed and solved a hard cross-stack correctness problem, inconsistent TPU/CPU execution results traced to TFLite versioning, restoring consistent output and unlocking a validated 24% latency improvement for InceptionV4 pipelines.
  • Built a dual-threadpool inference pipeline (separate TPU and CPU thread pools with closed-loop request handoff) to enable accurate throughput benchmarking across split points and CPU thread counts, and generated model-size and partition tables that directly informed the paper's analytic queueing model design.

Tools

TensorFlow LiteEdge TPURaspberry Pi 5Jetson Orin NanoPythonONNXKeras

Accomplishments

  • Built an ensemble architecture where a meta-LLM selects the best answer from structured Chain-of-Thought candidates generated by each retrieval method.
  • Implemented dual retrieval backends (BM25 & Elasticsearch + FAISS) to support flexible sparse and dense retrieval across different invocation strategies.
  • Solved the challenge of dynamically deciding which retrieval strategy fits a given query, unifying multiple RAG invocation methods into one adaptive, ensemble-driven system.

Tools

BM25ElasticsearchFAISSLLMsPython

Accomplishments

  • Designed 32+ conversation stories and NLU training data covering intents like NPA tracking, loan disbursement, regional sales performance, and live loan counts, each with multi-turn slot-filling dialogue flows.
  • Integrated Duckling's time-entity extraction to resolve natural language expressions (e.g. 'last quarter', 'same month last year') into structured timestamps for backend queries.
  • Built custom Rasa SDK actions that mapped extracted entities and slots (type, region, time, order) into structured backend API calls, with automatic slot-resets and fallback handling for robust conversation recovery.

Tools

RASASpaCyFacebook DucklingPythonNLUCRF

Projects

…and here are more

ML / AI

Fullstack / Web Development

Flutter / Mobile

Hardware / Embedded

Desktop Applications

Wins

Education

Publications

Skills

hover (or tap, on phones) a skill to see where it earned its keep

Languages

used in: Epsilon, Research work, LoRA/RAG projects, Microsoft & CIIRused in: Epsilon, backend services & AWS Lambda APIsused in: Systems coursework, performance-critical modules, Personal Projectsused in: CI/CD pipelines, Personal Projectsused in: Ohealer Healthcare, Personal Projectsused in: Flutter App published on playstore, Personal Projects

ML & LLM

used in: Density estimation, LoRA adapters, LLM alignmentused in: Edge TPU inference pipeline, LASS Labused in: QLoRA fine-tuning, DPO alignment on LLaMA-3.1used in: CIIR, dynamic RAG ensemble pipelineused in: Research prototyping

Distributed Systems

used in: Epsilonused in: Epsilon CI/CDused in: crowd-counting deploymentused in: Streaming data processing projectused in: Epsilon, Ohealer Healthcare

Databases, Cloud & Infra

used in: Epsilonused in: Backend datastores at Epsilonused in: Epsilonused in: Epsilonused in: Epsilon, LASS

Frontend

used in: Ohealher Healthcareused in: Epsilonused in: Personal Projectsused in: Epsilon, Personal Projectsused in: Personal Projects

Backend & Data

used in: Ohealer Healthcareused in: Personal Projectsused in: Previous Company, event pipelinesused in: Personal Projects

Contact Me

I'd love to hear from you - about work, ideas, or a good trail recommendation.

bdevarangadi@umass.edu

© 2026 Balachandra Devarangadi Sunil · Built with Next.js