Language Resources and Evaluation

Paper

Building a semantic resource for the Moroccan dialect: a hybrid approach with LeOnI and semantic similarity

Said Belbachir·...·Mohamed Chahhou

2d ago

Arts and HumanitiesLanguage and LinguisticsLanguage, Linguistics, Cultural AnalysisSocial Sciences

Paper

Correction: The corpus of aggressive language in Polish parliamentary debates

Justyna Sarzyńska‐Wawer·Aleksander Wawer

2d ago

Arts and HumanitiesDiscourse Analysis in Language StudiesLiterature and Literary TheorySocial Sciences

Paper

Aratox: a multi-dialect, multi-label arabic dataset and model benchmark for toxicity detection

Faisal Alshargi·+5 more

22d ago

Artificial IntelligenceComputer SciencePhysical SciencesText and Document Classification Technologies

Paper

Corpus-based computational frame and construction analysis of motion metaphors

Elise Stickles·Ellen Dodge

3/27/2026

This paper presents a case study on motion-related metaphors that demonstrates the viability of the MetaNet computational metaphor identification system in a corpus-based analysis of the expression of conceptual metaphor. The rich annotation that is produced by the MetaNet system supports many types of linguistic analysis, such as the examination of the relative frequencies within a corpus of the…

Experimental and Cognitive PsychologyLanguage, Metaphor, and CognitionPsychologySocial Sciences

Paper

Arab music improvisation corpus for research (AMICOR): development and machine translation experiments

Fadi M. Al-Ghawanmeh·...·Alexander Refsum Jensenius

3/27/2026

Under-resourced languages (and musics) pose a challenge to machine translation (MT). The challenge is greater when the content of the collected dataset is a varied sample taken from a data population that is even more diverse and dynamic. This is the challenge of Arab music vocal improvisation (mawwal). Here, we present the development of AMICOR, a parallel dataset consisting of vocal improvisato…

Arts and HumanitiesDiverse Musicological StudiesMusicSocial Sciences

Paper

Toxicbias-reasoning: a multicultural dataset for social bias detection with human-aligned reasoning

Anuj Kumar·+4 more

3/25/2026

Social bias in language models continues to create fairness risks in multilingual and multicultural environments. Existing datasets provide limited cultural diversity, insufficient support for overlapping bias categories, and minimal availability of human-interpretable reasoning, which reduces transparency and reliability in the bias detection. The ToxicBias-Reasoning dataset addresses these gaps…

Artificial IntelligenceComputer ScienceHate Speech and Cyberbullying DetectionPhysical Sciences

Paper

Towards a resource for multilingual lexicons: an MT assisted and human-in-the-loop multilingual parallel corpus with multi-word expression annotation

Lifeng Han·+5 more

3/14/2026

In this work, we introduce the construction of a machine translation (MT) assisted and human-in-the-loop multilingual parallel corpus with annotations of multi-word expressions (MWEs), named AlphaMWE. The MWEs include verbal MWEs (vMWEs) defined in the PARSEME shared task that have a verb as the head of the studied terms. The annotated vMWEs are also bilingually and multilingually aligned manuall…

Artificial IntelligenceComputer ScienceNatural Language Processing TechniquesPhysical Sciences

Paper

Semantic evaluation metric conforming to AMR theory (SEMCAT): a new similarity metric for abstract meaning representation

Kyung Seo Ki·...·Bugeun Kim

3/14/2026

In this paper, we propose SEMCAT (Semantic Evaluation Metric Conforms to AMR Theory), a novel similarity measuring method for Abstract Meaning Representation (AMR). AMR is a semantic structure used to explicitly express the truth-conditional meaning aspect of a natural language sentence. Our evaluation strategy is mainly designed to reflect the theoretical basis of AMR. Specifically, based on the…

Artificial IntelligenceComputer SciencePhysical SciencesTopic Modeling

Paper

Neural text sanitization with privacy risk indicators: an empirical analysis

Anthi Papadopoulou·+4 more

3/13/2026

Abstract Text sanitization is the task of redacting a document to mask all occurrences of (direct or indirect) personal identifiers, with the goal of concealing the identity of the individual(s) referred in it. In this paper, we consider a two-step approach to text sanitization and provide a detailed analysis of its empirical performance on two recently published datasets: the Text Anonymization …

Artificial IntelligenceComputer SciencePhysical SciencesTopic Modeling

Paper

Detecting racism in the digital age: a survey of datasets and algorithms

Ikram El Miqdadi·+5 more

3/9/2026

Artificial IntelligenceComputer ScienceHate Speech and Cyberbullying DetectionPhysical Sciences

Paper

Aspect sentiment triplet extraction via integrating contextual semantic relevance and syntactic relevance

Xiaodong Zhu·...·Ting Zhang

3/6/2026

Artificial IntelligenceComputer SciencePhysical SciencesSentiment Analysis and Opinion Mining

Paper

A comparative study of sentence alignment methods for Spanish text simplification

Christina Niklaus·...·Victoria García-Muñoz

3/3/2026

Millions of people worldwide face barriers in accessing and understanding complex written information due to limited literacy. Automatic text simplification (ATS) addresses this challenge by transforming complex texts into simpler, more accessible versions. However, most existing ATS research focuses on English, leaving Spanish, a language spoken by over 500 million people, underrepresented. This…

Artificial IntelligenceComputer SciencePhysical SciencesText Readability and Simplification

Paper

Parafrasário: a variety-based paraphrasary for Portuguese

Anabela Barreiro·...·Ida Rebelo-Arnold

3/2/2026

Artificial IntelligenceComputer ScienceNatural Language Processing TechniquesPhysical Sciences

Paper

OjibweMorph: an approachable finite-state transducer for Ojibwe (and beyond)

Christopher Hammerly·+4 more

2/28/2026

Experimental and Cognitive PsychologyPhonetics and Phonology ResearchPsychologySocial Sciences

Paper

Dhati+: fine-tuned large language models for Arabic subjectivity evaluation

Slimane Bellaouar·...·Soumia Souffi

2/26/2026

Artificial IntelligenceComputer SciencePhysical SciencesSentiment Analysis and Opinion Mining

Paper

Multilingual speech representation for the Manipuri automatic speech recognition system

Thangjam Clarinda Devi·...·Kabita Thaoroijam

2/23/2026

Artificial IntelligenceComputer SciencePhysical SciencesSpeech Recognition and Synthesis

Paper

Text-Muddler: an advanced adversarial paradigm for disrupting NLP-based neural architectures in sentiment analysis frameworks

Ashish Bajaj

2/10/2026

Artificial IntelligenceComputer ScienceHate Speech and Cyberbullying DetectionPhysical Sciences

Paper

JurisTCU: a Brazilian Portuguese information retrieval dataset with query relevance judgments

Leandro Carísio Fernandes·+4 more

2/9/2026

This paper introduces JurisTCU, a Brazilian Portuguese dataset for legal information retrieval (LIR). The dataset is freely available ( https://huggingface.co/datasets/LeandroRibeiro/JurisTCU ) and consists of 16,045 jurisprudential documents from the Brazilian Federal Court of Accounts, along with 150 queries annotated with relevance judgments. It addresses the scarcity of Portuguese-language LI…

Artificial IntelligenceComputer SciencePhysical SciencesTopic Modeling

Paper

Linguistic knowledge injected into large language model for Urdu-English neural machine translation

Muhammad Naeem Ul Hassan·+7 more

2/9/2026

Artificial IntelligenceComputer ScienceNatural Language Processing TechniquesPhysical Sciences

Paper

Cyberspace fake news and manipulator accounts detection and language governance

Chen Hongsong·Zhao Xiufeng

1/24/2026

Misinformation and Its ImpactsSocial SciencesSociology and Political Science

research.io

Sign up to keep scrolling

Create your feed subscriptions, save articles, keep scrolling.

Already have an account?