Birat Poudel
Download ResumeResearch Profile
I am Birat Poudel, an AI/ML Research Engineer with a strong foundation in machine learning theory and extensive experience in transforming research innovations into scalable, production-ready systems. My work bridges academic research and applied AI, with core expertise in Conversational AI, Model Evaluation Frameworks, and Multi-Agent Systems.
My research and professional contributions focus on the following areas:
-
Conversational AI Systems: Building advanced dialogue and voice-based systems (IVR) for healthcare and enterprise use cases, emphasizing contextual understanding and empathetic response generation.
-
Model Evaluations: Developing comprehensive evaluation frameworks for LLMs and ML models, featuring:
-
Automated pipelines for quantitative performance metrics
-
Robustness testing using adversarial and edge-case inputs
-
A/B testing for production-level model comparison
-
LLM-as-a-Judge methods for qualitative assessment of outputs
-
Integration of DeepEval for automated evaluation of LLM workflows
-
Use of Langfuse for tracing, monitoring, and debugging model behavior in production environments
-
-
Multi-Agent Systems: Architecting scalable agentic frameworks for autonomous task orchestration and dynamic workflow management.
-
Natural Language Processing: Implementing state-of-the-art NLP models for semantic understanding, intent detection, and context-aware response generation.
-
Retrieval-Augmented Generation (RAG): Designing knowledge-grounded AI systems optimized for accuracy, efficiency, and retrieval quality.
-
Time Series Forecasting: Applying statistical and deep learning models (SARIMA, LSTM, Prophet, etc.) for predictive analytics and data-driven decision-making.
Technical Expertise & Research Tools
Machine Learning & AI Frameworks
- Deep Learning: PyTorch, TensorFlow, Transformers, Hugging Face
- Classical ML: Scikit-Learn, XGBoost, LightGBM
- Time Series: Prophet, SARIMA, LSTM, ARIMA
- NLP: BERT, GPT variants, T5, Sentence Transformers
Research & Development Stack
- Data Science: NumPy, Pandas, SciPy, Matplotlib, Seaborn
- Databases: PostgreSQL, MongoDB, Redis, Vector Databases (Pinecone, Weaviate)
- MLOps: Docker, Kubernetes, AWS, Model Versioning, Experiment Tracking
- Development: Python, FastAPI, React, TypeScript, Git
- Evaluation: Statistical Analysis, A/B Testing, Performance Metrics, LLM As A Judge
Research & Professional Experience
AI/ML Research Engineer | Leapfrog Technology (June 2024 - Present)
- Built AI systems, including IVR (Interactive Voice Response) and Conversational Voice AI for medical patient follow-ups and referrals
- Designed a Foundational Models Evaluation Pipeline that generates automated PDF reports for large-scale model benchmarking
- Engineered and deployed AI Agents and MCP Servers to enable scalable, modular, and autonomous agent orchestration across workflows
Machine Learning Research Engineer | Jobsflow.ai (Dec 2023 - May 2024)
- Built AI systems, including an AI Voice Interviewer and an intelligent Chatbot capable of making tool calls to over ten plus services like Google Calendar, Meet, Gmail, Zoom, etc
- Developed algorithms for calculating match score of a particular applicant for a job based on job descriptions and applicant’s resume and answers for the job related questions
- Implemented contextual searching, filtering and sorting using embeddings to enhance candidate selection accuracy
Machine Learning Research Engineer | Fusemachines (Sep 2023 - Nov 2023)
- Preprocessed and transformed datasets using NumPy and Pandas, applying advanced feature engineering techniques for time series forecasting and machine learning applications
- Designed and implemented ML models, including SARIMA, LSTM, Prophet, and XGBoost, for time series forecasting and predictive analytics, achieving a 15% improvement over previous models
- Enhanced RAG-based systems by optimizing vector storage and retrieval
Machine Learning Research Engineer | Maven Solutions (Aug 2022 - Aug 2024)
- Worked on data preprocessing and feature engineering using libraries like Numpy and Pandas to prepare datasets for model training
- Developed and implemented machine learning algorithms using libraries such as Scikit-Learn for tasks like classification, regression, and clustering achieving an accuracy improvement of 15% over previous models
- Employed advanced automation scripts and conducted precise web scraping operations to streamline workflows and gather mission-critical data efficiently
- Orchestrated the development and seamless integration of backend APIs, collaborating closely with cross-functional teams to enhance application functionality and performance
Research Projects
Nepali Sign Language Characters Recognition
Deep Learning & Accessibility Technology
- Developed the benchmark dataset for Nepali Sign Language (NSL) with 36 gesture classes and 1,500 samples per class
- Fine-tuned MobileNetV2 and ResNet50 architectures achieving classification accuracies of 90.45% and 88.78% respectively
- Demonstrated effectiveness of transfer learning and fine-tuning for underexplored sign languages in low-resource settings
- Contributed to accessibility research by creating systematic dataset and deep learning approaches for NSL recognition
Automobile License Plate Detection & Recognition
Computer Vision & Pattern Recognition
- Developed a comprehensive two-stage system: initial implementation with Inception-ResNet-v2, later enhanced with YOLOv8
- Integrated state-of-the-art object detection with Tesseract OCR for robust license plate localization and text extraction
- Achieved robust performance across diverse environmental conditions including varying lighting, angles, and image quality
- Demonstrated practical applicability for real-time traffic monitoring and automated vehicle identification systems
Fine-Tuning DialoGPT on Common Diseases in Rural Nepal for Medical Conversations
Natural Language Processing & Human-Computer Interaction
- Fine-tuned Microsoft DialoGPT-medium on a synthetically generated dataset of 1,000 doctor–patient dialogues covering ten common diseases prevalent in rural Nepal to create an offline-capable medical conversational AI system
- Designed a two-stage data generation and validation pipeline using Gemini 2.5 Pro and Claude 4 Sonnet, followed by expert medical review, ensuring accuracy, empathy, and contextual relevance of dialogues
- Conducted quantitative (perplexity, loss, token-level F1) and qualitative evaluations by healthcare professionals, demonstrating the model’s ability to produce coherent, medically appropriate, and empathetic responses for low-resource healthcare settings
Amazon Bedrock Foundational Models Evaluation Pipeline
Models Evaluation
- Implemented an end-to-end evaluation pipeline for Amazon Bedrock models (multi-region support) with performance (latency, throughput, time-to-first-token) and quality metrics (helpfulness, faithfulness, completeness, coherence, etc.)
- Used LangChain orchestration and an LLM-as-a-judge approach for automated, multi-dimensional quality and responsible-AI assessments (harmfulness, bias, refusal appropriateness), plus async processing for concurrent benchmarking
- Produced structured evaluation outputs and human-readable reports to compare models, tune prompts, and inform safe deployment decisions
Deep Research Agent using Amazon Strands Agents
Multi-Agents Orchestration
- Engineered an automated research agent pipeline that uses Amazon Strands Agents for domain-specific data gathering, filtering, and summarization, reducing manual research time by a significant margin.
- Implemented a multi-agent architecture combining “Agent Loop,” “Agent-as-Tool,” “Web Search,” and “Extraction” modules to autonomously perform requirements analysis, competitive landscape research, and effort estimation.
- Incorporated iterative feedback loops and inter-agent coordination to refine project scoping, synthesize comparative analyses, and output actionable estimations and insights.
Information Retrieval & Knowledge Systems
- Implemented a semantic search feature to find movies using natural language queries. Utilized Hugging Face sentence-transformers model and Atlas Vector Search
- Developed a Document Q&A project using Gemma Model, Langchain and Streamlit. Utilized Google Generative AI Embeddings and FAIS Vector Store
Comprehensive Machine Learning Research
Statistical Learning & Predictive Modeling
- Conducted extensive research across multiple ML domains with rigorous evaluation methodologies
- Achieved 92.98% accuracy in breast cancer classification using ensemble methods
- Developed automobile price prediction models with R² = 0.8709 through advanced feature engineering
Academic Background & Qualifications
Bachelor of Engineering in Electronics, Communication & Information Engineering
Tribhuvan University, IOE, Thapathali Campus (2019-2023)
Relevant Coursework & Research Areas:
- Artificial Intelligence & Machine Learning Theory
- Advanced Probability & Statistics
- Discrete Mathematics & Computational Structures
- Big Data Analytics & Processing
- Signal Processing & Information Theory
- Software Engineering & System Design
Professional Certifications:
- Machine Learning Specialization (Stanford University/Coursera)
- Advanced Web Development Specialization
Research Interests for Doctoral Studies:
- Conversational AI and human-computer interaction
- Multimodal AI systems and cross-modal learning
- Retrieval-augmented generation and knowledge systems
- Time series analysis and predictive modeling
Research Vision & Future Directions
My research vision is to develop AI systems that are scientifically rigorous, ethically grounded, and socially impactful. I aim to bridge the gap between theoretical AI research and deployable real-world solutions, ensuring that the systems we build are not only intelligent but also responsible and accessible.
Primary Research Interests:
-
Human-Centered AI: Building conversational and interactive systems that exhibit empathy, contextual understanding, and transparency in decision-making.
-
Multimodal AI Systems: Advancing learning models capable of integrating and reasoning across text, speech, and visual modalities.
-
Knowledge-Grounded Generation: Enhancing Retrieval-Augmented Generation (RAG) frameworks to improve factual consistency, interpretability, and reliability of large language models.
-
Temporal Intelligence: Innovating in time series forecasting and dynamic systems modeling to better understand temporal dependencies in data-driven environments.
-
AI Safety, Ethics, and Alignment: Researching frameworks to ensure that AI systems are safe, fair, and aligned with human values, particularly in sensitive domains like healthcare.
Research Methodology: I emphasize reproducibility, interpretability, and empirical rigor in my work—combining strong theoretical grounding with applied experimentation. My approach integrates interdisciplinary collaboration, leveraging insights from linguistics, cognitive science, and systems engineering to build more holistic AI systems.
Academic Aspirations: I am seeking opportunities to pursue graduate/doctoral studies where I can contribute to the advancement of artificial intelligence through innovative research, collaborate with leading researchers in the field, and mentor the next generation of AI practitioners.
My long-term goal is to contribute to academia and industry through innovative research, impactful publications, and mentorship, helping shape the next generation of ethical and intelligent AI systems.
Contact & Collaboration: 🔗 GitHub · LinkedIn · Email
Open to research collaborations, academic discussions, graduate/doctoral program opportunities.