Birat Poudel

Research Profile

I am Birat Poudel, an AI/ML Research Engineer with a strong foundation in machine learning theory and extensive experience in transforming research innovations into scalable, production-ready systems. My work bridges academic research and applied AI, with core expertise in Conversational AI, Model Evaluation Frameworks, and Multi-Agent Systems.

My research and professional contributions focus on the following areas:

Conversational AI Systems: Building advanced dialogue and voice-based systems (IVR) for healthcare and enterprise use cases, emphasizing contextual understanding and empathetic response generation.
Model Evaluations: Developing comprehensive evaluation frameworks for LLMs and ML models, featuring:
- Automated pipelines for quantitative performance metrics
- Robustness testing using adversarial and edge-case inputs
- A/B testing for production-level model comparison
- LLM-as-a-Judge methods for qualitative assessment of outputs
- Integration of DeepEval for automated evaluation of LLM workflows
- Use of Langfuse for tracing, monitoring, and debugging model behavior in production environments
Multi-Agent Systems: Architecting scalable agentic frameworks for autonomous task orchestration and dynamic workflow management.
Natural Language Processing: Implementing state-of-the-art NLP models for semantic understanding, intent detection, and context-aware response generation.
Retrieval-Augmented Generation (RAG): Designing knowledge-grounded AI systems optimized for accuracy, efficiency, and retrieval quality.
Time Series Forecasting: Applying statistical and deep learning models (SARIMA, LSTM, Prophet, etc.) for predictive analytics and data-driven decision-making.

Technical Expertise & Research Tools

Machine Learning & AI Frameworks

Deep Learning: PyTorch, TensorFlow, Transformers, Hugging Face
Classical ML: Scikit-Learn, XGBoost, LightGBM
Time Series: Prophet, SARIMA, LSTM, ARIMA
NLP: BERT, GPT variants, T5, Sentence Transformers

Research & Development Stack

Data Science: NumPy, Pandas, SciPy, Matplotlib, Seaborn
Databases: PostgreSQL, MongoDB, Redis, Vector Databases (Pinecone, Weaviate)
MLOps: Docker, Kubernetes, AWS, Model Versioning, Experiment Tracking
Development: Python, FastAPI, React, TypeScript, Git
Evaluation: Statistical Analysis, A/B Testing, Performance Metrics, LLM As A Judge

Research & Professional Experience

AI/ML Research Engineer | Leapfrog Technology (June 2024 - Present)

Built AI systems, including IVR (Interactive Voice Response) and Conversational Voice AI for medical patient follow-ups and referrals
Designed a Foundational Models Evaluation Pipeline that generates automated PDF reports for large-scale model benchmarking
Engineered and deployed AI Agents and MCP Servers to enable scalable, modular, and autonomous agent orchestration across workflows

Machine Learning Research Engineer | Jobsflow.ai (Dec 2023 - May 2024)

Built AI systems, including an AI Voice Interviewer and an intelligent Chatbot capable of making tool calls to over ten plus services like Google Calendar, Meet, Gmail, Zoom, etc
Developed algorithms for calculating match score of a particular applicant for a job based on job descriptions and applicant’s resume and answers for the job related questions
Implemented contextual searching, filtering and sorting using embeddings to enhance candidate selection accuracy

Machine Learning Research Engineer | Fusemachines (Sep 2023 - Nov 2023)

Preprocessed and transformed datasets using NumPy and Pandas, applying advanced feature engineering techniques for time series forecasting and machine learning applications
Designed and implemented ML models, including SARIMA, LSTM, Prophet, and XGBoost, for time series forecasting and predictive analytics, achieving a 15% improvement over previous models
Enhanced RAG-based systems by optimizing vector storage and retrieval

Machine Learning Research Engineer | Maven Solutions (Aug 2022 - Aug 2024)

Worked on data preprocessing and feature engineering using libraries like Numpy and Pandas to prepare datasets for model training
Developed and implemented machine learning algorithms using libraries such as Scikit-Learn for tasks like classification, regression, and clustering achieving an accuracy improvement of 15% over previous models
Employed advanced automation scripts and conducted precise web scraping operations to streamline workflows and gather mission-critical data efficiently
Orchestrated the development and seamless integration of backend APIs, collaborating closely with cross-functional teams to enhance application functionality and performance

Research Projects

Nepali Sign Language Characters Recognition

Deep Learning & Accessibility Technology

Developed the benchmark dataset for Nepali Sign Language (NSL) with 36 gesture classes and 1,500 samples per class
Fine-tuned MobileNetV2 and ResNet50 architectures achieving classification accuracies of 90.45% and 88.78% respectively
Demonstrated effectiveness of transfer learning and fine-tuning for underexplored sign languages in low-resource settings
Contributed to accessibility research by creating systematic dataset and deep learning approaches for NSL recognition

Automobile License Plate Detection & Recognition

Computer Vision & Pattern Recognition

Developed a comprehensive two-stage system: initial implementation with Inception-ResNet-v2, later enhanced with YOLOv8
Integrated state-of-the-art object detection with Tesseract OCR for robust license plate localization and text extraction
Achieved robust performance across diverse environmental conditions including varying lighting, angles, and image quality
Demonstrated practical applicability for real-time traffic monitoring and automated vehicle identification systems

Fine-Tuning DialoGPT on Common Diseases in Rural Nepal for Medical Conversations

Natural Language Processing & Human-Computer Interaction

Fine-tuned Microsoft DialoGPT-medium on a synthetically generated dataset of 1,000 doctor–patient dialogues covering ten common diseases prevalent in rural Nepal to create an offline-capable medical conversational AI system
Designed a two-stage data generation and validation pipeline using Gemini 2.5 Pro and Claude 4 Sonnet, followed by expert medical review, ensuring accuracy, empathy, and contextual relevance of dialogues
Conducted quantitative (perplexity, loss, token-level F1) and qualitative evaluations by healthcare professionals, demonstrating the model’s ability to produce coherent, medically appropriate, and empathetic responses for low-resource healthcare settings

Amazon Bedrock Foundational Models Evaluation Pipeline

Models Evaluation

Implemented an end-to-end evaluation pipeline for Amazon Bedrock models (multi-region support) with performance (latency, throughput, time-to-first-token) and quality metrics (helpfulness, faithfulness, completeness, coherence, etc.)
Used LangChain orchestration and an LLM-as-a-judge approach for automated, multi-dimensional quality and responsible-AI assessments (harmfulness, bias, refusal appropriateness), plus async processing for concurrent benchmarking
Produced structured evaluation outputs and human-readable reports to compare models, tune prompts, and inform safe deployment decisions

Deep Research Agent using Amazon Strands Agents

Multi-Agents Orchestration

Engineered an automated research agent pipeline that uses Amazon Strands Agents for domain-specific data gathering, filtering, and summarization, reducing manual research time by a significant margin.
Implemented a multi-agent architecture combining “Agent Loop,” “Agent-as-Tool,” “Web Search,” and “Extraction” modules to autonomously perform requirements analysis, competitive landscape research, and effort estimation.
Incorporated iterative feedback loops and inter-agent coordination to refine project scoping, synthesize comparative analyses, and output actionable estimations and insights.

Vector Search & RAG Systems

Information Retrieval & Knowledge Systems

Implemented a semantic search feature to find movies using natural language queries. Utilized Hugging Face sentence-transformers model and Atlas Vector Search
Developed a Document Q&A project using Gemma Model, Langchain and Streamlit. Utilized Google Generative AI Embeddings and FAIS Vector Store

Comprehensive Machine Learning Research

Statistical Learning & Predictive Modeling

Conducted extensive research across multiple ML domains with rigorous evaluation methodologies
Achieved 92.98% accuracy in breast cancer classification using ensemble methods
Developed automobile price prediction models with R² = 0.8709 through advanced feature engineering

Academic Background & Qualifications

Bachelor of Engineering in Electronics, Communication & Information Engineering

Tribhuvan University, IOE, Thapathali Campus (2019-2023)

Relevant Coursework & Research Areas:

Artificial Intelligence & Machine Learning Theory
Advanced Probability & Statistics
Discrete Mathematics & Computational Structures
Big Data Analytics & Processing
Signal Processing & Information Theory
Software Engineering & System Design

Professional Certifications:

Machine Learning Specialization (Stanford University/Coursera)
Advanced Web Development Specialization

Research Interests for Doctoral Studies:

Conversational AI and human-computer interaction
Multimodal AI systems and cross-modal learning
Retrieval-augmented generation and knowledge systems
Time series analysis and predictive modeling

Research Vision & Future Directions

My research vision is to develop AI systems that are scientifically rigorous, ethically grounded, and socially impactful. I aim to bridge the gap between theoretical AI research and deployable real-world solutions, ensuring that the systems we build are not only intelligent but also responsible and accessible.

Primary Research Interests:

Human-Centered AI: Building conversational and interactive systems that exhibit empathy, contextual understanding, and transparency in decision-making.
Multimodal AI Systems: Advancing learning models capable of integrating and reasoning across text, speech, and visual modalities.
Knowledge-Grounded Generation: Enhancing Retrieval-Augmented Generation (RAG) frameworks to improve factual consistency, interpretability, and reliability of large language models.
Temporal Intelligence: Innovating in time series forecasting and dynamic systems modeling to better understand temporal dependencies in data-driven environments.
AI Safety, Ethics, and Alignment: Researching frameworks to ensure that AI systems are safe, fair, and aligned with human values, particularly in sensitive domains like healthcare.

Research Methodology: I emphasize reproducibility, interpretability, and empirical rigor in my work—combining strong theoretical grounding with applied experimentation. My approach integrates interdisciplinary collaboration, leveraging insights from linguistics, cognitive science, and systems engineering to build more holistic AI systems.

Academic Aspirations: I am seeking opportunities to pursue graduate/doctoral studies where I can contribute to the advancement of artificial intelligence through innovative research, collaborate with leading researchers in the field, and mentor the next generation of AI practitioners.

My long-term goal is to contribute to academia and industry through innovative research, impactful publications, and mentorship, helping shape the next generation of ethical and intelligent AI systems.

Contact & Collaboration: 🔗 GitHub · LinkedIn · Email

Open to research collaborations, academic discussions, graduate/doctoral program opportunities.