Language Resources and Evaluation

This paper presents a case study on motion-related metaphors that demonstrates the viability of the MetaNet computational metaphor identification system in a corpus-based analysis of the expression of conceptual metaphor. The rich annotation that is produced by the MetaNet system supports many types of linguistic analysis, such as the examination of the relative frequencies within a corpus of the…

Experimental and Cognitive PsychologyLanguage, Metaphor, and CognitionPsychologySocial Sciences

Under-resourced languages (and musics) pose a challenge to machine translation (MT). The challenge is greater when the content of the collected dataset is a varied sample taken from a data population that is even more diverse and dynamic. This is the challenge of Arab music vocal improvisation (mawwal). Here, we present the development of AMICOR, a parallel dataset consisting of vocal improvisato…

Arts and HumanitiesDiverse Musicological StudiesMusicSocial Sciences

Social bias in language models continues to create fairness risks in multilingual and multicultural environments. Existing datasets provide limited cultural diversity, insufficient support for overlapping bias categories, and minimal availability of human-interpretable reasoning, which reduces transparency and reliability in the bias detection. The ToxicBias-Reasoning dataset addresses these gaps…

Artificial IntelligenceComputer ScienceHate Speech and Cyberbullying DetectionPhysical Sciences

In this work, we introduce the construction of a machine translation (MT) assisted and human-in-the-loop multilingual parallel corpus with annotations of multi-word expressions (MWEs), named AlphaMWE. The MWEs include verbal MWEs (vMWEs) defined in the PARSEME shared task that have a verb as the head of the studied terms. The annotated vMWEs are also bilingually and multilingually aligned manuall…

Artificial IntelligenceComputer ScienceNatural Language Processing TechniquesPhysical Sciences

In this paper, we propose SEMCAT (Semantic Evaluation Metric Conforms to AMR Theory), a novel similarity measuring method for Abstract Meaning Representation (AMR). AMR is a semantic structure used to explicitly express the truth-conditional meaning aspect of a natural language sentence. Our evaluation strategy is mainly designed to reflect the theoretical basis of AMR. Specifically, based on the…

Artificial IntelligenceComputer SciencePhysical SciencesTopic Modeling

Abstract Text sanitization is the task of redacting a document to mask all occurrences of (direct or indirect) personal identifiers, with the goal of concealing the identity of the individual(s) referred in it. In this paper, we consider a two-step approach to text sanitization and provide a detailed analysis of its empirical performance on two recently published datasets: the Text Anonymization …

Artificial IntelligenceComputer SciencePhysical SciencesTopic Modeling
Paper
Christina Niklaus·...·Victoria García-Muñoz
3/3/2026

Millions of people worldwide face barriers in accessing and understanding complex written information due to limited literacy. Automatic text simplification (ATS) addresses this challenge by transforming complex texts into simpler, more accessible versions. However, most existing ATS research focuses on English, leaving Spanish, a language spoken by over 500 million people, underrepresented. This…

Artificial IntelligenceComputer SciencePhysical SciencesText Readability and Simplification

This paper introduces JurisTCU, a Brazilian Portuguese dataset for legal information retrieval (LIR). The dataset is freely available ( https://huggingface.co/datasets/LeandroRibeiro/JurisTCU ) and consists of 16,045 jurisprudential documents from the Brazilian Federal Court of Accounts, along with 150 queries annotated with relevance judgments. It addresses the scarcity of Portuguese-language LI…

Artificial IntelligenceComputer SciencePhysical SciencesTopic Modeling
research.ioresearch.io

Sign up to keep scrolling

Create your feed subscriptions, save articles, keep scrolling.

Already have an account?