Skip to content

Experiment Stack Exchange Quality Classifier #596

@BabyChouSr

Description

@BabyChouSr

Description

Current quality classifiers are trained using conversational-like data using ELI5 and OH2.5, but we explore the idea of using StackExchange which is conversational but also more task-specific. Perhaps, this can be a good data mix for better quality classification.

Hypothesis or Goal

Goal is that quality classifier using StackExchange can better pick up diverse information from various categories of info (math, science, coding, polisci, etc.). This is reflected in downstream MMLU evaluation

Links

(Delete any that aren't applicable)

Results

Achieves:

  • best c4_en/bpb and eval/bpb
  • higher mmlu/bpb than the mmlu classifier

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions