Experience
Sep 2024 - Aug 2025
1 year
- Enhanced in-car voice assistant responsiveness by optimizing engineering solutions for audio data processing.
- Improved dataset quality and model robustness by generating synthetic data with Hugging Face sources.
- Increased contextual relevance of voice assistant outputs through optimized AI prompt engineering.
- Reduced transcription errors by debugging inconsistencies between audio inputs and transcripts.
- Boosted team productivity and software delivery speed by automating workflows and actively driving Agile updates.
- Mapped intents across different datasets; standardized data using JSON.
- Created annotation guidelines and evaluated annotator quality.
- Generated diverse synthetic utterances using LLMs, including Korean.
- Used clustering to assign utterances to intents and find sub-intents.
- Identified and removed noisy or duplicate utterances using LLMs.
- Created visualizations to analyze data and improve dataset quality.
Mar 2023 - Aug 2024
1 year 6 months
- Enabled advanced analysis of large-scale datasets by engineering text clustering solution with BERT, distilBERT, RoBERTa, and text-embedding-ada-002.
- Tackled information trust issues in crisis scenarios by developing search engine modules with Elasticsearch on Docker.
- Expanded research data sources by automating API-call scripts to retrieve YouTube transcripts and comments.
- Facilitated research collaboration by leading virtual workshops with structured discussions and participant voting.
- Conducted scientific literature searches.
- Performed data maintenance and processing with Excel and Python data pipelines, reporting and presenting results.
- Implemented analysis of text clustering methods using BERT-based transformers.
- Developed prototype of search engine for a COVID-19 social media data knowledge base using Python, Docker, Jupyter, and ElasticSearch.
- Implemented Python scripts for data retrieval from YouTube and preprint servers.
- Supported internal documentation of technical procedures, such as guideline for GPU access in the department.
- Moderated and supported the organization of a SciBeh’s virtual workshop 2024 on the topic Epistemic Trespassing.