Publications

2025

< ACL 2025 Main> Call for Rigor in Reporting Quality of Instruction Tuning Data

Hyeonseok Moon, Jaehyung Seo, Heuiseok Lim

< ACL 2025 Main> Cross-Lingual Optimization for Language Transfer in Large Language Models

Jungseob Lee, Seongtae Hong, Hyeonseok Moon, Heuiseok Lim

< ACL 2025 Main> Rethinking KenLM: Good and Bad Model Ensembles for Efficient Text Quality Filtering in Large Web Corpora

Chanjun Park

< ACL 2025 Findings> Semantic Aware Linear Transfer by Recycling Pre-trained Language Models for Cross-lingual Transfer

Seungyoon Lee, Seongtae Hong, Hyeonseok Moon, Heuiseok Lim

< ACL 2025 Findings> Enhancing Automatic Term Extraction with Large Language Models via Syntactic Retrieval

Yongchan Chun, Minhyuk Kim, Dongjun Kim, Chanjun Park, Heuiseok Lim

< ACL 2025 Industry-Oral Track> REVISE: A Framework for Revising OCRed text in Practical Information Systems with Data Contamination Strategy

Gyuho Shim, Seongtae Hong, Heuiseok Lim

<ICRL 2025> K-HALU: Multiple Answer Korean Hallucination Benchmark for Large Language Models

Jaehyung Seo · Heuiseok Lim

<NAACL 2025 Main> CoME: An Unlearning-based Approach to Conflict-free Model Editing

Dahyun Jung, Jaehyung Seo, Jaewook Lee, Chanjun Park, Heuiseok Lim

<NAACL 2025 Main> LCIRC: A Recurrent Compression Approach for Efficient Long-form Context and Query Dependent Modeling in LLMs

Chanjun Park

<NAACL 2025 Findings> Find the Intention of Instruction: Comprehensive Evaluation of Instruction Understanding for Large Language Models

Hyeonseok Moon, Jaehyung Seo, Seungyoon Lee, Chanjun Park, Heuiseok Lim

<NAACL 2025 Findings> FLEX: A Benchmark for Evaluating Robustness of Fairness in Large Language Models

Dahyun Jung, Seungyoon Lee, Hyeonseok Moon, Chanjun Park, Heuiseok Lim

<NAACL 2025 Findings> MIRAGE: A Metric-Intensive Benchmark for Retrieval-Augmented Generation Evaluation

Chanhee Park, Hyeonseok Moon, Chanjun Park, Heuiseok Lim

<NAACL 2025 Industry Track> Open Ko-LLM Leaderboard2: Bridging Foundational and Practical Evaluation for Korean LLMs

Chanjun Park

<NAACL 2025 Industry Track> Understanding LLM Development Through Longitudinal Study: Insights from the Open Ko-LLM Leaderboard

Chanjun Park

<NAACL 2025 Industry Track> CharacterGPT: A Persona Reconstruction Framework for Role-Playing Agents

Jeiyoon Park, Chanjun Park, Heuiseok Lim

<NAACL 2025 Demo Track> Dataverse: Open-Source ETL (Extract, Transform, Load) Pipeline for Large Language Models

Chanjun Park

<COLING 2025> MIGRATE: Cross-Lingual Adaptation of Domain-Specific LLMs through Code-Switching and Embedding Transfer

Seongtae Hong, Seungyoon Lee, Hyeonseok Moon, Heuiseok Lim

<COLING 2025> Representing the Under-Represented: Cultural and Core Capability Benchmarks for Developing Thai Large Language Models

Chanjun Park

<COLING 2025> sDPO: Don’t Use Your Data All at Once

Chanjun Park

<Expert Systems With Applications> An analysis on language transfer of pre-trained language model with cross-lingual post-training

Suhyune Son, Chanjun Park, Jungseob Lee, Midan Shim, Chanhee Lee, Yoonna Jang, Jaehyung Seo, Jungwoo Lim, Heuiseok Lim

2024

<EMNLP 2024 Main> Where am I? Large Language Models Wandering between Semantics and Structures in Long Contexts

Seonmin Koo, Jinsung Kim, YoungJoon Jang, Chanjun Park, Heuiseok Lim

<EMNLP 2024 Findings> PANDA: Persona Attributes Navigation for Detecting and Alleviating Overuse Problem in Large Language Models

Jinsung Kim, Seonmin Koo, Heuiseok Lim

<EMNLP 2024 Findings> Translation of Multifaceted Data without Re-Training of Machine Translation Systems

Hyeonseok Moon, Seungyoon Lee, SeongTae Hong, Seungjun Lee, Chanjun Park, Heuiseok Lim

<EMNLP 2024 Findings> Search if you don’t know! Knowledge-Augmented Korean Grammatical Error Correction with Large Language Models

Seonmin Koo, Jinsung Kim, Chanjun Park, Heuiseok Lim

<EMNLP 2024 Industry Track> Intelligent Predictive Maintenance RAG framework for Power Plants: Enhancing QA with StyleDFS and Domain Specific Instruction Tuning

Seongtae Hong, Joong Min Shin, Jaehyung Seo, Taemin Lee, Jeongbae Park, Cho Man Young, Byeongho Choi, Heuiseok Lim

<EMNLP 2024 Industry Track> SAAS: Solving Ability Amplification Strategy for Enhanced Mathematical Reasoning in Large Language Models

Chanjun Park

<EMNLP 2024 Demo Track> Evalverse: Unified and Accessible Library for Large Language Model Evaluation

Chanjun Park

<ECAI 2024> Revisiting Under-Represented Knowledge of Latin American Literature in Large Language Models

Jinsung Kim, Seonmin Koo, Heuiseok Lim

<ACL 2024 Main> Open Ko-LLM Leaderboard: Evaluating Large Language Models in Korean with Ko-H5 Benchmark

Chanjun Park

<ACL 2024 Findings> KoCommonGEN v2: A Benchmark for Navigating Korean Commonsense Reasoning Challenges in Large Language Models

Jaehyung Seo, Jaewook Lee, Chanjun Park, SeongTae Hong, Seungjun Lee, Heuiseok Lim

<ACL 2024 Findings> Length-aware Byte Pair Encoding for Mitigating Over-segmentation in Korean Machine Translation

Jungseob Lee, Hyeonseok Moon, Seungjun Lee, Chanjun Park, Sugyeong Eo, Hyunwoong Ko, Jaehyung Seo, Seungyoon Lee, Heuiseok Lim

<ACL 2024 Findings> Towards Precise Localization of Critical Errors in Machine Translation

Dahyun Jung, Sugyeong Eo, Heuiseok Lim

<NAACL Student Research Workshop> Explainable CED: A Dataset for Explainable Critical Error Detection in Machine Translation

Dahyun Jung, Sugyeong Eo, Chanjun Park, Heuiseok Lim

<NAACL Student Research Workshop> Exploring Inherent Biases in LLMs within Korean Social Context: A Comparative Analysis of ChatGPT and GPT-4

Seungyoon Lee, Dong Kim, Dahyun Jung, Chanjun Park, Heuiseok Lim

<LREC-COLING 2024> Leveraging Pre-existing Resources for Data-Efficient Counter-Narrative Generation in Korean

Seungyoon Lee, Chanjun Park, DaHyun Jung, Hyeonseok Moon, Jaehyung Seo, Sugyeong Eo, Heuiseok Lim

<EACL 2024 Findings> Generative Interpretation: Toward Human-Like Evaluation for Educational Question-Answer Pair Generation

Hyeonseok Moon, Jaewook Lee, Sugyeong Eo, Chanjun Park, Jaehyung Seo, Heuiseok Lim

<EACL 2024 Findings> Hyper-BTS Dataset: Scalability and Enhanced Analysis of Back TranScription (BTS) for ASR Post-Processing

Chanjun Park, Jaehyung Seo, Seolhwa Lee, Junyoung Son, Hyeonseok Moon, Sugyeong Eo, Chanhee Lee, Heuiseok Lim

<EACL 2024 Main> Ask, Assess, and Refine: Rectifying Factual Consistency and Hallucination in LLMs with Metric-Guided Feedback Learning

Dongyub Lee, Eunhwan Park, Hodong Lee, Heuiseok Lim

2023

<EMNLP 2023 Main> CHEF in the Language Kitchen: A Generative Data Augmentation Leveraging Korean Morpheme Ingredients

Jaehyung Seo, Hyeonseok Moon, Jaewook Lee, Sugyeong Eo, Chanjun Park, Heuiseok Lim

<EMNLP 2023 Findings> Explore the Way: Exploring Reasoning Path by Bridging Entities for Effective Cross-Document Relation Extraction

Junyoung Son, Jinsung Kim, Jungwoo Lim, Yoonna Jang, Heuiseok Lim

<EMNLP 2023 Main> KEBAP: Korean Error Explainable Benchmark Dataset for ASR and Post-processing

Seonmin Koo, Chanjun Park, Jinsung Kim, Jaehyung Seo, Sugyeong Eo, Hyeonseok Moon, Heuiseok Lim

<EMNLP 2023 Main> Post-hoc Utterance Refining Method by Entity Mining for Faithful Knowledge Grounded Conversations

Yoonna Jang, Suhyune Son, Jeongwoo Lee, Junyoung Son, Yuna Hur, Jungwoo Lim, Hyeonseok Moon, Kisu Yang, Heuiseok Lim

<EMNLP 2023 Findings> Beyond Candidates : Adaptive Dialogue Agent Utilizing Persona and Knowledge

Jungwoo Lim, Myunghoon Kang, Jinsung Kim, Jeongwook Kim, Yuna Hur, Heuiseok Lim

<EMNLP 2023 Findings> CReTIHC: Designing Causal Reasoning Tasks about Temporal Interventions and Hallucinated Confoundings

Changwoo Chun, SongEun Lee, Jaehyung Seo, Heuiseok Lim

<ICDM 2023 DCAI Workshop> Alternative Speech: Complementary Method to Counter-Narrative for Better Discourse

Seungyoon Lee, Dahyun Jung, Chanjun Park, Seolhwa Lee, Heuiseok Lim

<IJCNLP-AACL 2023> Informative Evidence-guided Prompt-based Fine-tuning for English-Korean Critical Error Detection

DaHyun Jung, Sugyeong Eo, Chanjun Park, Hyeonseok Moon, Jaehyung Seo, Heuiseok Lim

<IWSLT 2023> Improving Formality-Sensitive Machine Translation using Data-Centric Approaches and Prompt Engineering

Seungjun Lee, Hyeonseok Moon, Chanjun Park, Heuiseok Lim

<ICML 2023 DMLR Workshop> Toward Practical Automatic Speech Recognition and Post-Processing: a Call for Explainable Error Benchmark Guideline

Seonmin KooChanjun ParkJinsung KimJaehyung SeoSugyeong EoHyeonseok MoonHeuiseok Lim

<ICML 2023 DMLR Workshop> Knowledge Graph-Augmented Korean Generative Commonsense Reasoning

Dahyun Jung, Jaehyung Seo, Jaewook Lee, Chanjun Park, Heuiseok Lim

<ICML 2023 DMLR Workshop> Synthetic Alone: Exploring the Dark Side of Synthetic Data for Grammatical Error Correction

Chanjun ParkSeonmin KooSeolhwa LeeJaehyung SeoSugyeong EoHyeonseok MoonHeuiseok Lim

<ICML 2023 DMLR Workshop> Data-Driven Approach for Formality-Sensitive Machine Translation: Language-Specific Handling and Synthetic Data Generation

Seungjun Lee, Hyeonseok Moon, Chanjun Park, Heuiseok Lim

<ACL 2023 Findings> Towards Diverse and Effective Question-Answer Pair Generation from Children Storybooks

Sugyeong Eo, Hyeonseok Moon, Jinsung Kim, Yuna Hur, Jeongwook Kim, SongEun Lee, Changwoo Chun, Sungsoo Park, Heuiseok Lim

<ACL 2023 Demo Track> PEEP-Talk: A Situational Dialogue-based Chatbot for English Education

Seungjun Lee, Yoonna Jang, Chanjun Park, Jungseob Lee, Jaehyung Seo, Hyeonseok Moon, Sugyeong Eo, Seounghoon Lee, Bernardo Yahya, Heuiseok Lim

<IEEE Transactions on Audio, Speeck and Language Processing> Adaptive Multi-Domain Dialogue State Tracking on Spoken Conversations

Jungwoo Lim, Taesun Whang, Dongyub Lee, Heuiseok Lim

<Expert Systems With Applications> Doubts on the reliability of parallel corpus filtering

Hyeonseok Moon, Chanjun Park, Seonmin Koo, Jungseob Lee, Seungjun Lee, Jaehyung Seo, Sugyeong Eo, Yoonna Jang, Hyunjoong Kim, Hyoung-gyu Lee, Heuiseok Lim

2022

<AACL 2022 Demo> PicTalky: Augmentative and Alternative Communication for Language Developmental Disabilities

Chanjun Park, Yoonna Jang, Seolhwa Lee, Jaehyung Seo, Kisu Yang, Heuiseok Lim

<EMNLP 2022 Findings> You Truly Understand What I Need : Intellectual and Friendly Dialog Agents grounding Persona and Knowledge

Jungwoo Lim, Myunghoon Kang, Yuna Hur, Seungwon Jeong, Jinsung Kim, Yoonna Jang, Dongyub Lee, Hyesung Ji, Donghoon Shin, Seungryong Kim, Heuiseok Lim

<AACL 2022 Demo> KU X Upstage’s Submission for the WMT22 Quality Estimation: Critical Error Detection Shared Task

Sugyeong Eo, Chanjun Park, Hyeonseok Moon, Jaehyung Seo, Heuiseok Lim

<COLING 2022> QUAK: A Synthetic Quality Estimation Dataset for Korean-English Neural Machine Translation

Sugyeong Eo, Chanjun Park, Hyeonseok Moon, Jaehyung Seo, Gyeongmin Kim, Jungseob Lee, Heuiseok Lim

<COLING 2022> GRASP: Guiding Model with RelAtional Semantics Using Prompt for Dialogue Relation Extraction

Junyoung Son, Jinsung Kim, Jungwoo Lim, Heuiseok Lim

<COLING 2022> KoCHET: A Korean Cultural Heritage Corpus for Entity-related Tasks

Gyeongmin Kim, Jinsung Kim, Junyoung Son, Heuiseok Lim

<COLING 2022> Don’t Judge a Language Model by Its Last Layer: Contrastive Learning with Layer-Wise Attention Pooling

Dongsuk Oh, Hodong Lee, Heuiseok Lim

<COLING 2022> Focus on FoCus: Is FoCus focused on Context, Knowledge and Persona?

SeungYoon Lee, Jungseob Lee, Chanjun Park, Sugyeong Eo, Hyeonseok Moon, Jaehyung Seo, Jeongbae Park, Heuiseok Lim

<NAACL 2022 Findings> A Dog Is Passing Over The Jet? A Text-Generation Dataset for Korean Commonsense Reasoning and Evaluation

Jaehyung Seo, Seounghoon Lee, Chanjun Park, Yoonna Jang, Hyeonseok Moon, Sugyeong Eo, Seonmin Koo, Heuiseok Lim

<AAAI 2022> Focus on FoCus: Is FoCus focused on Context, Knowledge and Persona?

SeungYoon Lee, Jungseob Lee, Chanjun Park, Sugyeong Eo, Hyeonseok Moon, Jaehyung Seo, Jeongbae Park, Heuiseok Lim

<ICML 2022 DataPref> A Self-Supervised Automatic Post-Editing Data Generation Tool

Hyeonseok Moon, Chanjun Park, Sugyeong Eo, Jaehyung Seo, Seungjun Lee, Heuiseok Lim

<LREC 2022 Main> Priming Ancient Korean Neural Machine Translation

Chanjun Park, Seolhwa Lee, Jaehyung Seo, Hyeonseok Moon, Sugyeong Eo, Heuiseok Lim

<LREC 2022 Main> FreeTalky: Don’t Be Afraid! Conversations Made Easier by a Humanoid Robot using Persona-based Dialogue

Chanjun Park, Yoonna Jang, Seolhwa Lee, Sungjin Park, Heuiseok Lim

<LREC 2022 Main> Empirical Analysis of Noising Scheme based Synthetic Data Generation for Automatic Post-editing

Hyeonseok Moon, Chanjun Park, Seolhwa Lee, Jaehyung Seo, Jungseob Lee, Sugyeong Eo, Heuiseok Lim

<Knowledge-Based Systems> PU-GEN: Enhancing generative commonsense reasoning for language models with human-centered knowledge

Jaehyung Seo, Dongsuk Oh, Sugyeong Eo, Chanjun Park, Kisu Yang, Hyeonseok Moon, Kinam Park, Heuiseok Lim