Aggregated Span-level Hallucination Evaluation
ASHE is a unified benchmark for span-level hallucination detection on six tasks: open_qa, context_qa, data-to-text, open_biography, summarization, and machine translation, with 6,461 examples in total.
| Method | Reason | Opt | Overall (std) | open_qa | ctx_qa | data2txt | open_bio | summ | mt |
|---|