NormGenesis

This repo provides the dataset introduced by our EMNLP 2025 paper "NormGenesis: Multicultural Dialogue Generation via Exemplar-Guided Social Norm Modeling and Violation Recovery".

🗓️ Update History

🚀 November 2025: Code, Dataset, and Project Page have been released!
🏆 November 2025: We received the SAC Highlights Award at EMNLP 2025!
🏆 October 2025: Nominated for Outstanding Paper, SAC Highlight, and Resource Paper Award at EMNLP 2025.
📘 September 2025: Our paper was released on arXiv.
🎤 August 2025: Selected for Main Conference (Oral Presentation) at EMNLP 2025.

🎯 Project Overview

NormGenesis introduces a framework for multicultural dialogue generation that models social norms and recovers from violations using exemplar-guided methods.

Key Components

Dataset: Multicultural social norm datasets (American, Chinese, Korean).
Generation: Pipelines for generating dialogues that adhere to or violate social norms.
Evaluation: Comprehensive evaluation of dialogue quality including consistency, naturalness, relevance, and norm appropriateness.

📁 Repository Structure

NormGenesis/
├── 📊 dataset/                 # Dataset files
│   ├── American/               # American culture dataset
│   ├── Chinese/                # Chinese culture dataset
│   └── Korean/                 # Korean culture dataset
├── 🔬 evaluation_code/         # Evaluation scripts
│   ├── evaluation_dialogue_quality.py    # Dialogue quality evaluation
│   └── evaluation_refinement_quality.py  # Refinement quality evaluation
├── 🏭 generation_code/         # Generation pipelines
│   ├── American/               # American generation scripts
│   ├── Chinese/                # Chinese generation scripts
│   ├── Korean/                 # Korean generation scripts
│   └── refine_situation.py     # Situation refinement script
├── 📄 labeling_dialogue.py     # Dialogue labeling script
└── 📄 README.md

🚀 Quick Start

Prerequisites

Ensure you have Python installed and the following dependencies:

pip install openai pandas tenacity tqdm

Setup

Set your OpenAI API key as an environment variable:

export OPENAI_API_KEY="your-api-key"

Usage

Generation: Navigate to generation_code and use the scripts for specific cultures or refine_situation.py to refine scenarios. Note: You may need to update input/output paths in the scripts.
Evaluation: Use evaluation_code/evaluation_dialogue_quality.py to assess generated dialogues. Note: Ensure you configure the evaluation parameters in the script.

🌍 Evaluation Scope

Cultures Covered

🇺🇸 American
🇨🇳 Chinese
🇰🇷 Korean

Evaluation Metrics

Consistency: Logical and contextual consistency.
Naturalness: Fluency and native-like expression.
Relevance: Alignment with the scenario and situation.
Coherence: Logical flow from scenario to dialogue.
Emotion Appropriateness: Matching emotional tone.
Social Norm Appropriateness: Adherence to social norms.

🚀 Release Plan

We plan to release both the code and dataset after the EMNLP 2025 conference.

[✅] Code
[✅] Dataset
[✅] Project Page

📄 Paper

You can find our paper at the following link:
👉 https://arxiv.org/abs/2509.18395

📚 Citation

If you use this dataset or code in your research, please cite our paper:

@inproceedings{hong2025normgenesis,
  title={NormGenesis: Multicultural Dialogue Generation via Exemplar-Guided Social Norm Modeling and Violation Recovery},
  author={Hong, Minki and Choi, Jangho and Kim, Jihie},
  booktitle={Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing},
  pages={33781--33819},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
dataset		dataset
docs		docs
evaluation_code		evaluation_code
generation_code		generation_code
.gitignore		.gitignore
2025.emnlp-main.1715.pdf		2025.emnlp-main.1715.pdf
README.md		README.md
labeling_dialogue.py		labeling_dialogue.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NormGenesis

🗓️ Update History

🎯 Project Overview

Key Components

📁 Repository Structure

🚀 Quick Start

Prerequisites

Setup

Usage

🌍 Evaluation Scope

Cultures Covered

Evaluation Metrics

🚀 Release Plan

📄 Paper

📚 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

NormGenesis

🗓️ Update History

🎯 Project Overview

Key Components

📁 Repository Structure

🚀 Quick Start

Prerequisites

Setup

Usage

🌍 Evaluation Scope

Cultures Covered

Evaluation Metrics

🚀 Release Plan

📄 Paper

📚 Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages