Modeling Orthographic Variation in Occitan's Dialects | allainews.com

May 1, 2024, 4:47 a.m. | Zachary William Hopton (Language,Space Lab, University of Zurich), No\"emi Aepli (Department of Computational Linguistics, University of Zurich)

cs.CL updates on arXiv.org arxiv.org

arXiv:2404.19315v1 Announce Type: new
Abstract: Effectively normalizing textual data poses a considerable challenge, especially for low-resource languages lacking standardized writing systems. In this study, we fine-tuned a multilingual model with data from several Occitan dialects and conducted a series of experiments to assess the model's representations of these dialects. For evaluation purposes, we compiled a parallel lexicon encompassing four Occitan dialects. Intrinsic evaluations of the model's embeddings revealed that surface similarity between the dialects strengthened representations. When the model was …

abstract arxiv challenge cs.cl data evaluation languages low modeling multilingual series study systems textual type variation writing

More from arxiv.org / cs.CL updates on arXiv.org

Statler: State-Maintaining Language Models for Embodied Reasoning 7 hours ago | arxiv.org

abstract arxiv cs.cl cs.ro +16

MoSECroT: Model Stitching with Static Word Embeddings for Crosslingual Zero-shot Transfer 7 hours ago | arxiv.org

abstract arxiv cs.ai cs.cl +26

Deception Detection from Linguistic and Physiological Data Streams Using Bimodal Convolutional Neural Networks 7 hours ago | arxiv.org

abstract application arxiv concerns +19

Using Natural Language Explanations to Improve Robustness of In-context Learning 7 hours ago | arxiv.org

abstract adversarial arxiv context +22

Direct Neural Machine Translation with Task-level Mixture of Experts models 7 hours ago | arxiv.org

abstract arxiv cs.cl data +16

Jury: A Comprehensive Evaluation Toolkit 7 hours ago | arxiv.org

arxiv cs.ai cs.cl evaluation +3

You Only Look at Screens: Multimodal Chain-of-Action Agents 7 hours ago | arxiv.org

action agents arxiv cs.ai +6

Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding 7 hours ago | arxiv.org

abstract arxiv cs.cl decoding +19

NaijaRC: A Multi-choice Reading Comprehension Dataset for Nigerian Languages 7 hours ago | arxiv.org

abstract arxiv create cross-lingual +16

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

View on ai-jobs.net

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

View on ai-jobs.net

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

View on ai-jobs.net

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

View on ai-jobs.net

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

View on ai-jobs.net

Lead Developer (AI)

@ Cere Network | San Francisco, US

View on ai-jobs.net