Proceedings of the AAAI Conference on Artificial Intelligence

Automatic sleep staging plays a vital role in assessing sleep quality and diagnosing sleep disorders. Most existing methods rely heavily on long and continuous EEG recordings, which poses significant challenges for data acquisition in resource-constrained systems, such as wearable or home-based monitoring systems. In this paper, we propose the task of resource-efficient sleep staging, which aims …

Cognitive NeuroscienceEEG and Brain-Computer InterfacesLife SciencesNeuroscience

Task-Oriented Grasping (TOG) presents a significant challenge, requiring a nuanced understanding of task semantics, object affordances, and the functional constraints dictating how an object should be grasped for a specific task. To address these challenges, we introduce GRIM (Grasp Re-alignment via Iterative Matching), a novel training-free framework for task-oriented grasping. Initially, a coar…

Control and Systems EngineeringEngineeringPhysical SciencesRobot Manipulation and Learning

Quantifying and understanding human-AI alignment in high-risk tasks such as traffic accident prediction is crucial for deployment of AI systems. Existing alignment studies, however, focus mostly on the static domain and neglect the importance of attentional processing. Here, we present Attention‑DADA, a dataset of accident and non-accident traffic situations that contains detailed human predictio…

Computer ScienceGaze Tracking and Assistive TechnologyHuman-Computer InteractionPhysical Sciences
Paper
Ziwei Wang·+6 more
3/14/2026

Advances in Multimodal Large Language Models have significantly enhanced Graphical User Interface (GUI) automation. Equipping GUI agents with reliable episodic reasoning capabilities is essential for bridging the gap between users’ concise task descriptions and the complexities of real-world execution. Current methods integrate Reinforcement Learning (RL) with System-2 Chain-of-Thought, yielding …

Artificial IntelligenceComputer ScienceExplainable Artificial Intelligence (XAI)Physical Sciences

Scaling deep learning to massive and diverse internet data has driven remarkable breakthroughs in domains such as video generation and natural language processing. Robot learning, however, has thus far failed to replicate this success and remains constrained by a scarcity of available data. Learning from Videos (LfV) methods aim to address this data bottleneck by augmenting traditional robot data…

Computer ScienceComputer Vision and Pattern RecognitionHuman Pose and Action RecognitionPhysical Sciences

Large language models (LLMs) have achieved remarkable success in many domains, but concerns about data quality and privacy are growing. Federated Learning (FL) offers a privacy-preserving solution by training a model on local clients without sharing data. However, the impact of biased private data on LLMs fine-tuned through FL remains understudied. This work investigates how client-side biased da…

Artificial IntelligenceComputer SciencePhysical SciencesPrivacy-Preserving Technologies in Data

Effective customer support requires not only accurate problem-solving but also structured and empathetic communication aligned with professional standards. However, existing dialogue datasets often lack strategic guidance, and real-world service data is difficult to access and annotate. To address this, we introduce the task of Customer Support Conversation (CSC), aimed at training customer servi…

AI in Service InteractionsArtificial IntelligenceComputer SciencePhysical Sciences

RNA 3D structure prediction is essential for understanding regulatory mechanisms, catalysis, and therapeutic RNA design, yet progress has lagged behind proteins due to limited structural data and the complexity of RNA folding. This work proposes a data-efficient, physics-informed deep learning framework for full atomistic prediction of transfer RNA (tRNA) tertiary structures directly from sequenc…

Biochemistry, Genetics and Molecular BiologyLife SciencesMolecular BiologyRNA and protein synthesis mechanisms

Extending pre-trained text Large Language Models (LLMs)’s speech understanding or generation abilities by introducing various effective speech tokens has attracted great attention in the speech research community. However, building a unified speech understanding and generation model still faces the following challenges: (1) Due to the huge modality gap between speech and text tokens, extending te…

Artificial IntelligenceComputer SciencePhysical SciencesSpeech Recognition and Synthesis

System inference for nonlinear dynamic models, represented by ordinary differential equations (ODEs), remains a significant challenge in many fields, particularly when the data are noisy, sparse, or partially observable. In this paper, we propose a Simulation-based Generative Model for Imperfect Data (SiGMoID), that enables precise and robust inference for dynamic systems. The proposed approach i…

Model Reduction and Neural NetworksPhysical SciencesPhysics and AstronomyStatistical and Nonlinear Physics

Accurate identification of mosquito species is crucial for controlling vector-borne diseases, yet visual or acoustic methods alone are often insufficient. We propose a multimodal deep-learning framework that combines high-resolution images with wingbeat audio using a SwinV2 vision transformer and an Audio Spectrogram Transformer, thereby capturing complementary cues. On a six-species dataset, it …

Computer ScienceMusic and Audio ProcessingPhysical SciencesSignal Processing

Multitask genetic programming (MTGP) is one of the primary methods for solving multitask symbolic regression (MTSR), the problem of discovering mathematical expressions for multiple interconnected tasks simultaneously. However, conventional MTGP approaches discard a wealth of valuable knowledge from the population of expressions due to their inherent “winner-take-all” selection criteria. To addre…

Artificial IntelligenceComputer ScienceEvolutionary Algorithms and ApplicationsPhysical Sciences

We present AniTales, a system designed to generate multimodal visual novels from natural language prompts. Our system integrates large language models for story generation, diffusion models for character art, and text-to-speech for voice acting. This paper describes the system's architecture and presents findings from a pilot user study. We evaluated the system with general users (n=10) and domai…

Computer ScienceComputer Vision and Pattern RecognitionMultimodal Machine Learning ApplicationsPhysical Sciences

Large Language Models (LLMs) have shown remarkable success in supporting a wide range of knowledge-intensive tasks. In specialized domains, there is growing interest in leveraging LLMs to assist subject matter experts with domain-specific challenges. However, deploying LLMs as SaaS solutions raises data privacy concerns, while many open-source models demand significant computational resources for…

Artificial IntelligenceComputer SciencePhysical SciencesTopic Modeling

Professional ethics committees currently lack structured tools to identify relevant ethical concepts from complex narratives and compare them against prior decisions. ProEthica analyzes professional ethical scenarios against established codes and precedent cases. The system uses large language models (LLMs), leveraging their natural language processing capabilities to extract nine types of compon…

Artificial Intelligence in LawPolitical Science and International RelationsSocial Sciences

Backdoor attacks on deep neural networks (DNNs) have garnered significant attention, particularly in edge computing applications. Given the complexity and opacity of DNNs, defending against backdoor attacks remains a formidable challenge. To address this, we propose CL-Guard, a dual-network-based defense framework designed to effectively eliminate potential backdoors in models. First, it leverage…

Adversarial Robustness in Machine LearningArtificial IntelligenceComputer SciencePhysical Sciences

Unsupervised multimodal semantic discovery aims to learn discriminative representations from multimodal data. However, existing methods suffer from two key limitations. First, they only align instances across modalities without modeling semantic-level consistency, which fails to mitigate semantic bias caused by the gaps among feature distributions of multiple modalities. Second, they inevitably g…

Artificial IntelligenceComputer ScienceDomain Adaptation and Few-Shot LearningPhysical Sciences

Retrieval-augmented generation (RAG) has been extensively employed to mitigate hallucinations in large language models (LLMs). However, existing methods for multi-hop reasoning tasks often lack global planning, increasing the risk of falling into local reasoning impasses. Insufficient exploitation of retrieved content and the neglect of latent clues fail to ensure the accuracy of reasoning outcom…

Artificial IntelligenceComputer SciencePhysical SciencesTopic Modeling

The two-dimensional (2D) graph structure of a molecule encodes abundant latent property information. A well-designed molecular graph encoder can capture informative low-dimensional dense representations of molecules, which can subsequently be applied to a widerange of downstream tasks. To achieve fine-grained anddiscriminative molecular representations that capture localized structural informatio…

Advanced Graph Neural NetworksArtificial IntelligenceComputer SciencePhysical Sciences

This research statement proposes to measure and mitigate speaker entanglement, where accent features inadvertently encode who is speaking in accented automatic speech recognition (ASR). We argue that entanglement inflates scores under lenient split for the same speaker and worsens fairness gaps across accents, and we outline a parameter-efficient mitigation that combines adversarial de-speakeriza…

Artificial IntelligenceComputer SciencePhysical SciencesSpeech Recognition and Synthesis
research.ioresearch.io

Sign up to keep scrolling

Create your feed subscriptions, save articles, keep scrolling.

Already have an account?