IEEE Transactions on Image Processing

Synthetic Aperture Radar (SAR) images offer unique advantages in all-weather, all-day remote sensing, but the high acquisition costs and time-consuming annotation processes limit their widespread implementation. Semi-supervised domain adaptation leverages abundant annotated optical images and a small number of labeled SAR images to achieve great performance on SAR images. However, existing semi-s…

Advanced SAR Imaging TechniquesAerospace EngineeringEngineeringPhysical Sciences

Recent advances in propagation-based phase-contrast imaging, such as hierarchical imaging, have enabled the visualization of internal structures in large biological specimens and material samples. However, modulation-based techniques, which provide quantitative electron density information, face challenges when imaging larger objects due to stringent beam stability requirements and detector disto…

Advanced X-ray Imaging TechniquesPhysical SciencesPhysics and AstronomyRadiation

Deep unfolding network has gained significant attention for magnetic resonance imaging super-resolution (MRI SR) due to its performance and interpretability. However, 1) existing methods predominantly focus on cross-contrast correlations while neglecting high-order correlations embedded within spatially adjacent slices in volumetric MRI data. 2) Their degradation models are optimized via the prox…

Advanced Image Processing TechniquesComputer ScienceComputer Vision and Pattern RecognitionPhysical Sciences

The visual quality of point clouds is critical for perception-centric immersive media. Point Cloud Quality Assessment (PCQA) is crucial for reducing costs associated with human evaluation, optimizing compression pipeline and enhancing human visual perception. However, real-valued PCQA methods often struggle to capture the coupled geometric and perceptual cues that govern quality. Com-PCQA, a nove…

Artificial IntelligenceComputer SciencePhysical SciencesStochastic Gradient Optimization Techniques
Paper
Rongkun Yang·+8 more
1/1/2026

3D scene CAD recomposition aims to reconstruct a given scene by retrieving and assembling CAD models from a database, so as to accurately simulate the geometric properties and spatial arrangement of the original environment. Recent methods learn this task through training on limited scan-to-CAD annotation data, which hinders their generalization to diverse real-world scenes. In this paper, we pro…

Advanced Vision and ImagingComputer ScienceComputer Vision and Pattern RecognitionPhysical Sciences

Facial expressions (FEs) and action units (AUs) are facial emotional representations at different levels of granularity. In the past, recognizing them has often been treated as two separate tasks. There are also some methods that use the knowledge of one to aid in recognizing the other, but currently, unified models capable of recognizing both FEs and AUs simultaneously remain rare. In this paper…

Emotion and Mood RecognitionExperimental and Cognitive PsychologyPsychologySocial Sciences

Accurate and interpretable detection of AI-generated images is essential for mitigating risks associated with AI misuse. However, the substantial domain gap among generative models makes it challenging to develop a generalizable forgery detection model. Moreover, since every pixel in an AI-generated image is synthesized, traditional saliency-based forgery explanation methods are not well suited f…

Adversarial Robustness in Machine LearningArtificial IntelligenceComputer SciencePhysical Sciences

Infrared (IR) and visible image fusion (IVIF) has become prevalent in recent years. By leveraging the complementary characteristics of infrared and visible images, we can obtain visually-appealing fused images, which further facilitate subsequent scene understanding and object detection from day to night. Integrating complementary information while simultaneously eliminating redundancy is a cruci…

Advanced Image Fusion TechniquesEngineeringMedia TechnologyPhysical Sciences

Audio-driven talking face video generation has attracted increasing attention due to its huge industrial potential. Some previous methods focus on learning a direct mapping from audio to visual content. Despite progress, they often struggle with the ambiguity of the mapping process, leading to flawed results. An alternative strategy involves facial structural representations (e.g., facial landmar…

Computer ScienceComputer Vision and Pattern RecognitionFace recognition and analysisPhysical Sciences

The absence of real-world ground truth (GT) remains a challenge in multi-exposure image fusion (MEF). Benchmarks synthesizing pseudo GT through algorithm ensembles. Existing methods, hampered by inherent imperfections of pseudo GT and fixed mapping relationships, show limited performance and robustness. To address the limitations, we propose a novel cross-modal diffusion framework that synergizes…

Advanced Image Fusion TechniquesEngineeringMedia TechnologyPhysical Sciences

Online continual learning studies how models learn from continuous and non-stationary data streams. In this paper, we observe that CLIP models exhibit an asymmetric image-text interaction under online continual learning. Specifically, text features of previously seen classes may introduce unfavorable supervision when paired with visual features of newly observed data, leading to catastrophic forg…

Artificial IntelligenceComputer ScienceDomain Adaptation and Few-Shot LearningPhysical Sciences

Each sequence in existing RGBT tracking datasets is typically captured from a single platform equipped with both RGB (visible light) and TIR (thermal infrared) sensors. In real-world applications, tracking some objects requires cross-platform collaboration and these platforms might be equipped with different sensors. However, changes in modalities and platforms may cause significant variations in…

Adversarial Robustness in Machine LearningArtificial IntelligenceComputer SciencePhysical Sciences

Semi-supervised learning (SSL) provides an effective means of reducing reliance on large-scale annotated datasets by leveraging unlabeled data. However, existing SSL methods often struggle with semantic ambiguity, especially under limited supervision. Recent studies have incorporated textual information to provide contextual guidance, yet most focus on feature fusion rather than emphasizing targe…

Computer ScienceComputer Vision and Pattern RecognitionMultimodal Machine Learning ApplicationsPhysical Sciences

Fine-grained cross-view localization seeks to estimate precise camera poses by matching ground images with GPS-tagged aerial imagery. Existing methods typically employ first-order iterative optimization to progressively update the camera pose based on cross-view feature correspondences. However, they rely on local features and neglect global and complementary contextual information, making them p…

Control and Systems EngineeringEngineeringPhysical SciencesRobotic Mechanisms and Dynamics

In low-light environments, conventional cameras often struggle to capture clear multi-view images of objects due to dynamic range limitations and motion blur caused by long exposure. Event cameras, with their high-dynamic range and high-speed properties, have the potential to mitigate these issues. Additionally, 3D Gaussian Splatting (GS) enables radiance field reconstruction, facilitating bright…

Nuclear and High Energy PhysicsParticle Detector Development and PerformancePhysical SciencesPhysics and Astronomy

Multi-view clustering (MVC) based on anchor learning has been proven to be effective in improving clustering accuracy and efficiency. Existing MVC methods are mainly based on single-granularity anchor learning, that is, the number of anchors corresponding to different views is constant and consistent, which will lead to information redundancy or insufficient mining. In addition, aggregating ancho…

Advanced Clustering Algorithms ResearchArtificial IntelligenceComputer SciencePhysical Sciences

The effective fusion of multi-modal remote sensing images, particularly hyperspectral imagery (HSI) and light detection and ranging (LiDAR) data, is pivotal for accurate land use and land cover (LULC) classification. However, this process is hindered by two inherent challenges: pervasive data redundancy and the underutilization of cross-modal complementarity, largely due to the lack of a unifying…

EngineeringMedia TechnologyPhysical SciencesRemote-Sensing Image Classification

Although video generation and editing models have advanced significantly, individual models remain restricted to specific tasks, often failing to meet diverse user needs. Effectively coordinating these models in pipelines can unlock a wide range of video generation and editing capabilities. However, manual orchestration is complex, time-consuming, and requires deep expertise in model performance …

Computer ScienceComputer Vision and Pattern RecognitionGenerative Adversarial Networks and Image SynthesisPhysical Sciences

Accurate cross-modality cardiac image segmentation is essential for effectively diagnosing and treating heart disease. Different imaging modalities help to determine suitable pre-procedure planning. However, most methods face the difficulty of spatial-temporal confounding, where the anatomy element and modality element of cardiac images are intertwined across both spatial and temporal dimensions.…

Computer ScienceComputer Vision and Pattern RecognitionMedical Image Segmentation TechniquesPhysical Sciences

As most optical satellites remotely acquire multispectral images (MSIs) with limited spatial resolution, multispectral unmixing (MU) becomes a critical signal processing technology for analyzing the pure material spectra for high-precision classification and identification. Unlike the widely investigated hyperspectral unmixing (HU) problem, MU is much more challenging as it corresponds to the und…

EngineeringMedia TechnologyPhysical SciencesRemote-Sensing Image Classification
research.ioresearch.io

Sign up to keep scrolling

Create your feed subscriptions, save articles, keep scrolling.

Already have an account?