Making NLG evaluations reproducible
报告题目:Making NLG evaluations reproducible
报告人:Professor Ehud Reiter, University of Aberdeen
时间:10月31日上午10:00~11:30
地点:北京大学王选计算机研究所106报告厅
摘要:Like other scientific experiments, evaluations in natural language generation research must be reproducible by other researchers. Unfortunately many published experiments cannot be replicated, or give very different (and usually worse) results when they are replicated. I will discuss challenges I have seen in reproducing NLG evaluations, including uncooperative authors, buggy experiments, confused subjects (in human evaluations), data contamination, use of older (and in some cases retired) closed LLMs, and biased (or indeed incorrect)reporting of results. I will conclude by giving advice on how to make evaluations and other NLG experiments more reproducible and reliable.
![]() |
报告人简介:Ehud Reiter is a Professor of Computing Science at the University of Aberdeen, and was formerly Chief Scientist of Arria NLG (a spinout he cofounded). He has been working on Natural Language Generation for 35 years, and in recent years has focused on evaluation of language generation; he also has a longstanding interest in healthcare applications. He is one of the most cited and respected researchers in NLG, and his awards include an INLG Test of Time award for his work on data-to-text. He writes a widely read blog on NLG and evaluation (ehudreiter.com) |
上一篇 下一篇