CV

Technical Skills

Speech & Audio TTS / Flow Matching Speaker Verification Diarization Voice Cloning Anti-spoofing Speech Enhancement Sound Event Detection

ML & LLM PyTorch ONNX Triton TorchServe SLM Reinforcement Learning Agno Agent RAG Speech SSL

Engineering Docker FastAPI Flask RabbitMQ Milvus Qdrant Slurm Datadog Jenkins Jfrog

Languages Python C++ Java React

Education

VNUHCM – University of Science Ho Chi Minh City, Vietnam

Bachelor of Science, Advanced Program in Computer Science (APCS K17) Sep 2017 – Jun 2021

GPA: 3.83 / 4.0 · TOEFL ITP: 550 / 677

Experience

Vinfast Automotive AI Development Institute - Vingroup JSC Ho Chi Minh City, Vietnam

Senior AI Engineer – Conversational Speech Technology Jun 2026 – Present

Unified TTS Serving Gateway: Architected a production FastAPI gateway consolidating multiple TTS model tiers and normalization backends — supporting streaming, async job queue, API key auth with per-key usage tracking, and horizontal shard deployment for multilingual production serving.
TTS Normalization Automation Harness: Designed an end-to-end QC automation framework with LLM-based error classification, BA-configurable per-project switch-code vocabulary management, automated Java SDK and prompt rebuild, and full regression evaluation — reducing engineering intervention for vocabulary updates to zero.

VinSmart Future – Vingroup JSC Ho Chi Minh City, Vietnam

AI Team Lead – Speech Generation Feb 2026 – Jun 2026

Research Team Leadership: Led a team of 6 on speech generation: voice design, non-verbal audio, and reinforcement learning for speech synthesis.
Voice Model Product Management: Coordinated between research and product; standardized QA and release processes across robot assistants, in-vehicle AI, and super-app voice mode.

Senior AI Engineer – Core AI Research Jul 2025 – Feb 2026

Vietnamese Phoneme-Based Flow Matching TTS: Developed a phoneme-adaptive fine-tuning TTS model achieving stable quality with low WER in production.
TensorRT & Triton Serving Optimization: Achieved 2× throughput improvement and increased concurrent user capacity.
Hybrid Text Normalization: SLM + rule-based hybrid pipeline improving accuracy 13% over the rule-based baseline.

Vingroup Big Data Institute – Vingroup JSC Hanoi, Vietnam

Middle AI Engineer – AI Agent & Voice Biometric Jan 2024 – Jul 2025

AI Agent with Knowledge Base: Built a RAG-based agent with query processing, embedding, function calling, and automatic double-bot QC evaluation (2025).
Speech Data Warehouse: Managed a team building storage, querying, processing, release management, and NL querying integration (2025).
Streaming Diarization Optimization: Triton EEND-VC — 2× speed, superior quality over third-party solutions (2025).
Anti-spoofing Module: EER of 3.75% combining RawBoost augmentation, speech SSL, and Graph Neural Networks (2024).

AI Engineer – Speech Synthesis & Voice Biometric Oct 2021 – Dec 2023

Smart AI Voice Recording Service: Speaker Diarization + ASR + Voice Biometrics for insurance fraud detection (2023).
Government Voice Biometric Project: Microservice gateway with SNR Estimator, VAD, ASR, Speaker Counter via TorchServe and Milvus (2023).
Multispeaker Acoustic Model: 4× GPU reduction, 30-min fine-tuning, zero-shot cloning with 30s audio (2023).
Universal Multistream Vocoder: HiFi-GAN optimization — 1.5× faster inference without quality loss (2022).
AI Service User Interfaces: Demo and labeling tools using Streamlit, Material UI React, Wavesurfer.js (2022).
Tacotron2 Enhancement: Monotonic Alignment Attention integration to eliminate noise artifacts (2021).
Massive Speech Data Crawler: Large-scale automated pipeline covering selection, denoising, transcription, and quality ranking (2021).

VinAI Research Institute – Vingroup JSC Hanoi, Vietnam

Engineering Resident – ERP Batch 1 Dec 2020 – Sep 2021

Vietnamese TTS for Electric Vehicles: Integrated Tacotron2, FastSpeech2, GlowTTS, HiFi-GAN; optimized with ONNX Runtime for edge inference.
SpeechMT: Real-time voice-based machine translation service (Docker + FastAPI).
Speech-Based QA for Car Manuals: Lightweight end-to-end voice-driven QA system.

KMS Technology Ho Chi Minh City, Vietnam

Software Engineer – Center of Excellence Jun 2020 – Sep 2020

RASA Chatbot Core Research: Intent classification combining Dense (ConveRT, BERT) and Sparse (TF-IDF, Count Vectors) features.
Reading Comprehension Backend: BIDAF-based services with AllenNLP, Flask, Swagger, Docker Compose, RabbitMQ.

Scooter Saigon Tour – Tourism Startup Ho Chi Minh City, Vietnam

SEO Developer – Founder Jan 2014 – Aug 2017

Built WordPress site with PHP backend and VTCpay integration; managed SEO and social channels.

Freelance AI Projects: Voice-Preserving Speech MT · Singing Voice Conversion · Pronunciation Assessment · Speech Enhancement · Sleep Stage Classification · Emotional Dubbing

Research Affiliations

Speech and Language Lab, Nanyang Technological University Singapore

Research Intern (Remote) Sep 2022 – Mar 2023

Government project with ST Engineering — classifying emergency sound events in urban areas.

Artificial Intelligence Lab, University of Science Ho Chi Minh City

Research Assistant Feb 2019 – Sep 2021

Built a Vietnamese speech corpus (19 GB); researched text normalization and phonology.
Organized ERC2019 – Emotion Recognition Challenge.
Teaching assistant at VNsigma Python beginner class (American Consulate).

Robotics & IoT Lab, University of Science Ho Chi Minh City

Research Assistant May 2019 – Sep 2021

Teacher for EV3 Mindstorm; coach for FLL, EVJ Makethon, and WRO competitions.

Computational Linguistics Center, University of Science Ho Chi Minh City

Research Assistant Apr 2019 – Sep 2019

Teaching assistant for English Linguistics faculty (USSH), guiding SDL Trados Studio.

Publications

MAPR 2025 Unified Acoustic Representation Learning for Vietnamese Speech Classification Tasks
MAPR 2025 Analyzing the Correlation and Impact of Speech Evaluation Metrics on Speaker Verification and ASR
MAPR 2025 Adapting WavLM for Vietnamese Speaker Diarization in Real-world Conversations
RIVF 2024 Enhancing Deepfake Detection: WavLM and Advanced RawBoost Augmentation
Voice Privacy 2024 Voice Attacker: Multi-Head Factorized Attentive Reconstructor and Gradient Reversal for Random Prosody Anonymization
DCASE 2023 Sound Event Detection with Soft Labels Using Self-Attention for Global Scene Feature Extraction
IJAL 2022 FastSpeechStyle: Fast, Emotion-Controllable, High-Quality Speech Synthesis
NeurIPS 2021 Vietnamese Speech-based Question Answering over Car Manuals
NAFOSTED 2020 Vietnamese Speech Synthesis with End-to-End Model and Text Normalization
MediaEval 2020 Emotion Classification Using WaveNet Features with SpecAugment and EfficientNet

PyPI vinorm — Vietnamese Text Normalization · 34,933+ downloads / 6 months

PyPI viphoneme — Vietnamese Grapheme-to-IPA Phonetization · 4,195+ downloads / 6 months

Thesis: DeepSpeechVC – Voice Cloning Framework with Speech Synthesis and Voice Conversion

Awards & Recognition

Outstanding Employee of the Year – VinBigdata2023
First Place, VLSP TTS Emotional Speech Synthesis2022
First Prize, Science-A-Thon, 9th Vietnamese Summer School of Science2022
Outstanding Student Award in Artificial Intelligence, Ho Chi Minh City2021
Excellent Thesis – Science and Technology Student Award2021
Runner-Up, KMS Hackathon2020

Certifications & Competitions

ISO/IEC 19795-1:2021 (NIST/NVLAP) for Voice Biometric Testing2024
IEEE Spoken Language Technology Workshop Hackathon2022
7th/23 teams, IEEE Signal Processing Cup2022
FPT Scholarship, Final Round, Code War Competition2019
Final Round, Samsung Collegiate Programming Cup2018
ACM-ICPC Vietnam National Programming Contest – Rank 332018
Self-Driving Cars Specialisation – University of Toronto (Coursera)2018

Volunteering

Volunteer for medical research on dental surgery at HCMC Oromaxillofacial Hospital2021
F1 contact tracing support during the Covid-19 pandemic2022
Organized ERC2019 – Emotion Recognition Challenge2019
Teaching assistant at VNsigma Python beginner class (American Consulate)2019

Do Tri Nhan

CV