Proceedings of the AAAI Conference on Artificial Intelligence

Paper

Resource Efficient Sleep Staging via Multi-Level Masking and Prompt Learning

Lejun Ai·+7 more

3/17/2026

Automatic sleep staging plays a vital role in assessing sleep quality and diagnosing sleep disorders. Most existing methods rely heavily on long and continuous EEG recordings, which poses significant challenges for data acquisition in resource-constrained systems, such as wearable or home-based monitoring systems. In this paper, we propose the task of resource-efficient sleep staging, which aims …

Cognitive NeuroscienceEEG and Brain-Computer InterfacesLife SciencesNeuroscience

Paper

GRIM: Task-Oriented Grasping with Conditioning on Generative Examples

Shailesh·+6 more

3/14/2026

Task-Oriented Grasping (TOG) presents a significant challenge, requiring a nuanced understanding of task semantics, object affordances, and the functional constraints dictating how an object should be grasped for a specific task. To address these challenges, we introduce GRIM (Grasp Re-alignment via Iterative Matching), a novel training-free framework for task-oriented grasping. Initially, a coar…

Control and Systems EngineeringEngineeringPhysical SciencesRobot Manipulation and Learning

Paper

Mind the Gap: Quantifying and Aligning Human-AI Visual Attention for Accident Anticipation

Hoe Sung Ryu·Christian Wallraven

3/14/2026

Quantifying and understanding human-AI alignment in high-risk tasks such as traffic accident prediction is crucial for deployment of AI systems. Existing alignment studies, however, focus mostly on the static domain and neglect the importance of attentional processing. Here, we present Attention‑DADA, a dataset of accident and non-accident traffic situations that contains detailed human predictio…

Computer ScienceGaze Tracking and Assistive TechnologyHuman-Computer InteractionPhysical Sciences

Paper

History-Aware Reasoning for GUI Agents

Ziwei Wang·+6 more

3/14/2026

Advances in Multimodal Large Language Models have significantly enhanced Graphical User Interface (GUI) automation. Equipping GUI agents with reliable episodic reasoning capabilities is essential for bridging the gap between users’ concise task descriptions and the complexities of real-world execution. Current methods integrate Reinforcement Learning (RL) with System-2 Chain-of-Thought, yielding …

Artificial IntelligenceComputer ScienceExplainable Artificial Intelligence (XAI)Physical Sciences

Paper

Towards Generalist Robot Learning from Internet Video: A Survey (Abstract Reprint)

Robert McCarthy·+7 more

3/14/2026

Scaling deep learning to massive and diverse internet data has driven remarkable breakthroughs in domains such as video generation and natural language processing. Robot learning, however, has thus far failed to replicate this success and remains constrained by a scarcity of available data. Learning from Videos (LfV) methods aim to address this data bottleneck by augmenting traditional robot data…

Computer ScienceComputer Vision and Pattern RecognitionHuman Pose and Action RecognitionPhysical Sciences

Paper

Investigating Social Bias Propagation in Federated Fine-tuning of Large Language Models

Jing Zhao·+5 more

3/14/2026

Large language models (LLMs) have achieved remarkable success in many domains, but concerns about data quality and privacy are growing. Federated Learning (FL) offers a privacy-preserving solution by training a model on local clients without sharing data. However, the impact of biased private data on LLMs fine-tuned through FL remains understudied. This work investigates how client-side biased da…

Artificial IntelligenceComputer SciencePhysical SciencesPrivacy-Preserving Technologies in Data

Paper

Evaluating, Synthesizing, and Enhancing for Customer Support Conversation

Jie Zhu·+6 more

3/14/2026

Effective customer support requires not only accurate problem-solving but also structured and empathetic communication aligned with professional standards. However, existing dialogue datasets often lack strategic guidance, and real-world service data is difficult to access and annotate. To address this, we introduce the task of Customer Support Conversation (CSC), aimed at training customer servi…

AI in Service InteractionsArtificial IntelligenceComputer SciencePhysical Sciences

Paper

Towards Data-Efficient Deep Learning for RNA 3D Structure Prediction and Design

Yimeng Liu

3/14/2026

RNA 3D structure prediction is essential for understanding regulatory mechanisms, catalysis, and therapeutic RNA design, yet progress has lagged behind proteins due to limited structural data and the complexity of RNA folding. This work proposes a data-efficient, physics-informed deep learning framework for full atomistic prediction of transfer RNA (tRNA) tertiary structures directly from sequenc…

Biochemistry, Genetics and Molecular BiologyLife SciencesMolecular BiologyRNA and protein synthesis mechanisms

Paper

DualSpeechLM: Towards Unified Speech Understanding and Generation via Dual Speech Token Modeling with Large Language Models

Yuanyuan Wang·+7 more

3/14/2026

Extending pre-trained text Large Language Models (LLMs)’s speech understanding or generation abilities by introducing various effective speech tokens has attracted great attention in the speech research community. However, building a unified speech understanding and generation model still faces the following challenges: (1) Due to the huge modality gap between speech and text tokens, extending te…

Artificial IntelligenceComputer SciencePhysical SciencesSpeech Recognition and Synthesis

Paper

Learning from Imperfect Data: Robust Inference of Dynamic Systems Using Simulation-Based Generative Model

Hyunwoo Cho·...·Hyeontae Jo

3/14/2026

System inference for nonlinear dynamic models, represented by ordinary differential equations (ODEs), remains a significant challenge in many fields, particularly when the data are noisy, sparse, or partially observable. In this paper, we propose a Simulation-based Generative Model for Imperfect Data (SiGMoID), that enables precise and robust inference for dynamic systems. The proposed approach i…

Model Reduction and Neural NetworksPhysical SciencesPhysics and AstronomyStatistical and Nonlinear Physics

Paper

WingBeats and Snapshots: Fusing Sound and Vision for Mosquito Monitoring (Student Abstract)

Ahana Chanda·Akshay Agarwal

3/14/2026

Accurate identification of mosquito species is crucial for controlling vector-borne diseases, yet visual or acoustic methods alone are often insufficient. We propose a multimodal deep-learning framework that combines high-resolution images with wingbeat audio using a SwinV2 vision transformer and an Audio Spectrogram Transformer, thereby capturing complementary cues. On a six-species dataset, it …

Computer ScienceMusic and Audio ProcessingPhysical SciencesSignal Processing

Paper

Post-Hoc Refinement for Multitask Symbolic Regression via Consensus-Accelerated Shapley Analysis

X. Li·...·Wei Hu

3/14/2026

Multitask genetic programming (MTGP) is one of the primary methods for solving multitask symbolic regression (MTSR), the problem of discovering mathematical expressions for multiple interconnected tasks simultaneously. However, conventional MTGP approaches discard a wealth of valuable knowledge from the population of expressions due to their inherent “winner-take-all” selection criteria. To addre…

Artificial IntelligenceComputer ScienceEvolutionary Algorithms and ApplicationsPhysical Sciences

Paper

AniTales: End-to-End Multimodal Story Generation Through Natural Language Prompting (Student Abstract)

Mrigendra Agrawal·Yunze Xiao

3/14/2026

We present AniTales, a system designed to generate multimodal visual novels from natural language prompts. Our system integrates large language models for story generation, diffusion models for character art, and text-to-speech for voice acting. This paper describes the system's architecture and presents findings from a pilot user study. We evaluated the system with general users (n=10) and domai…

Computer ScienceComputer Vision and Pattern RecognitionMultimodal Machine Learning ApplicationsPhysical Sciences

Paper

Building Domain-Specific Small Language Models via Guided Data Generation

Aman Kumar·+7 more

3/14/2026

Large Language Models (LLMs) have shown remarkable success in supporting a wide range of knowledge-intensive tasks. In specialized domains, there is growing interest in leveraging LLMs to assist subject matter experts with domain-specific challenges. However, deploying LLMs as SaaS solutions raises data privacy concerns, while many open-source models demand significant computational resources for…

Artificial IntelligenceComputer SciencePhysical SciencesTopic Modeling

Paper

ProEthica: A Professional Role Based Ethical Analysis Tool Using LLM-Orchestrated, Ontology Supported Case Based Reasoning

Christopher B. Rauch·Rosina Weber

3/14/2026

Professional ethics committees currently lack structured tools to identify relevant ethical concepts from complex narratives and compare them against prior decisions. ProEthica analyzes professional ethical scenarios against established codes and precedent cases. The system uses large language models (LLMs), leveraging their natural language processing capabilities to extract nine types of compon…

Artificial Intelligence in LawPolitical Science and International RelationsSocial Sciences

Paper

CL-Guard: Defending DNNs Against Backdoors via Fine-Grained Neuron Analysis and Collaborative Dual-Network Learning

Jie Xiao·+7 more

3/14/2026

Backdoor attacks on deep neural networks (DNNs) have garnered significant attention, particularly in edge computing applications. Given the complexity and opacity of DNNs, defending against backdoor attacks remains a formidable challenge. To address this, we propose CL-Guard, a dual-network-based defense framework designed to effectively eliminate potential backdoors in models. First, it leverage…

Adversarial Robustness in Machine LearningArtificial IntelligenceComputer SciencePhysical Sciences

Paper

Unsupervised Semantic Discovery via Global and Local Semantic Alignment in Multimodal Clustering

Zhengzhong Zhu·+4 more

3/14/2026

Unsupervised multimodal semantic discovery aims to learn discriminative representations from multimodal data. However, existing methods suffer from two key limitations. First, they only align instances across modalities without modeling semantic-level consistency, which fails to mitigate semantic bias caused by the gaps among feature distributions of multiple modalities. Second, they inevitably g…

Artificial IntelligenceComputer ScienceDomain Adaptation and Few-Shot LearningPhysical Sciences

Paper

REAP: Enhancing RAG with Recursive Evaluation and Adaptive Planning for Multi-Hop Question Answering

Y. Zhu·+4 more

3/14/2026

Retrieval-augmented generation (RAG) has been extensively employed to mitigate hallucinations in large language models (LLMs). However, existing methods for multi-hop reasoning tasks often lack global planning, increasing the risk of falling into local reasoning impasses. Insufficient exploitation of retrieved content and the neglect of latent clues fail to ensure the accuracy of reasoning outcom…

Artificial IntelligenceComputer SciencePhysical SciencesTopic Modeling

Paper

Atom-level Adaptive Receptive Fields: A Pruning-Based Encoder for 2D Molecular Graphs (Student Abstract)

Yuhao Zhang·+5 more

3/14/2026

The two-dimensional (2D) graph structure of a molecule encodes abundant latent property information. A well-designed molecular graph encoder can capture informative low-dimensional dense representations of molecules, which can subsequently be applied to a widerange of downstream tasks. To achieve fine-grained anddiscriminative molecular representations that capture localized structural informatio…

Advanced Graph Neural NetworksArtificial IntelligenceComputer SciencePhysical Sciences

Paper

De-Speakerizing Accented ASR: Measuring and Mitigating Speaker Entanglement for Fair, Reliable Recognition

Jitao Sun

3/14/2026

This research statement proposes to measure and mitigate speaker entanglement, where accent features inadvertently encode who is speaking in accented automatic speech recognition (ASR). We argue that entanglement inflates scores under lenient split for the same speaker and worsens fairness gaps across accents, and we outline a parameter-efficient mitigation that combines adversarial de-speakeriza…

Artificial IntelligenceComputer SciencePhysical SciencesSpeech Recognition and Synthesis

research.io

Sign up to keep scrolling

Create your feed subscriptions, save articles, keep scrolling.

Already have an account?