RepEval: Effective Text Evaluation with LLM Representation | allainews.com

May 1, 2024, 4:48 a.m. | Shuqian Sheng, Yi Xu, Tianhang Zhang, Zanwei Shen, Luoyi Fu, Jiaxin Ding, Lei Zhou, Xinbing Wang, Chenghu Zhou

cs.CL updates on arXiv.org arxiv.org

arXiv:2404.19563v1 Announce Type: new
Abstract: Automatic evaluation metrics for generated texts play an important role in the NLG field, especially with the rapid growth of LLMs. However, existing metrics are often limited to specific scenarios, making it challenging to meet the evaluation requirements of expanding LLM applications. Therefore, there is a demand for new, flexible, and effective metrics. In this study, we introduce RepEval, the first metric leveraging the projection of LLM representations for evaluation. RepEval requires minimal sample pairs …

abstract applications arxiv cs.cl demand evaluation evaluation metrics generated growth however llm llm applications llms making metrics nlg representation requirements role text type

More from arxiv.org / cs.CL updates on arXiv.org

Statler: State-Maintaining Language Models for Embodied Reasoning 12 hours ago | arxiv.org

abstract arxiv cs.cl cs.ro +16

MoSECroT: Model Stitching with Static Word Embeddings for Crosslingual Zero-shot Transfer 12 hours ago | arxiv.org

abstract arxiv cs.ai cs.cl +26

Deception Detection from Linguistic and Physiological Data Streams Using Bimodal Convolutional Neural Networks 12 hours ago | arxiv.org

abstract application arxiv concerns +19

Using Natural Language Explanations to Improve Robustness of In-context Learning 12 hours ago | arxiv.org

abstract adversarial arxiv context +22

Direct Neural Machine Translation with Task-level Mixture of Experts models 12 hours ago | arxiv.org

abstract arxiv cs.cl data +16

Jury: A Comprehensive Evaluation Toolkit 12 hours ago | arxiv.org

arxiv cs.ai cs.cl evaluation +3

You Only Look at Screens: Multimodal Chain-of-Action Agents 12 hours ago | arxiv.org

action agents arxiv cs.ai +6

Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding 12 hours ago | arxiv.org

abstract arxiv cs.cl decoding +19

NaijaRC: A Multi-choice Reading Comprehension Dataset for Nigerian Languages 12 hours ago | arxiv.org

abstract arxiv create cross-lingual +16

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

View on ai-jobs.net

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

View on ai-jobs.net

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

View on ai-jobs.net

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

View on ai-jobs.net

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

View on ai-jobs.net

Intern - Robotics Industrial Engineer Summer 2024

@ Vitesco Technologies | Seguin, US

View on ai-jobs.net