IEEE Transactions on Image Processing
Recent advances in propagation-based phase-contrast imaging, such as hierarchical imaging, have enabled the visualization of internal structures in large biological specimens and material samples. However, modulation-based techniques, which provide quantitative electron density information, face challenges when imaging larger objects due to stringent beam stability requirements and detector disto…
Deep unfolding network has gained significant attention for magnetic resonance imaging super-resolution (MRI SR) due to its performance and interpretability. However, 1) existing methods predominantly focus on cross-contrast correlations while neglecting high-order correlations embedded within spatially adjacent slices in volumetric MRI data. 2) Their degradation models are optimized via the prox…
The visual quality of point clouds is critical for perception-centric immersive media. Point Cloud Quality Assessment (PCQA) is crucial for reducing costs associated with human evaluation, optimizing compression pipeline and enhancing human visual perception. However, real-valued PCQA methods often struggle to capture the coupled geometric and perceptual cues that govern quality. Com-PCQA, a nove…
3D scene CAD recomposition aims to reconstruct a given scene by retrieving and assembling CAD models from a database, so as to accurately simulate the geometric properties and spatial arrangement of the original environment. Recent methods learn this task through training on limited scan-to-CAD annotation data, which hinders their generalization to diverse real-world scenes. In this paper, we pro…
Facial expressions (FEs) and action units (AUs) are facial emotional representations at different levels of granularity. In the past, recognizing them has often been treated as two separate tasks. There are also some methods that use the knowledge of one to aid in recognizing the other, but currently, unified models capable of recognizing both FEs and AUs simultaneously remain rare. In this paper…
Accurate and interpretable detection of AI-generated images is essential for mitigating risks associated with AI misuse. However, the substantial domain gap among generative models makes it challenging to develop a generalizable forgery detection model. Moreover, since every pixel in an AI-generated image is synthesized, traditional saliency-based forgery explanation methods are not well suited f…
Infrared (IR) and visible image fusion (IVIF) has become prevalent in recent years. By leveraging the complementary characteristics of infrared and visible images, we can obtain visually-appealing fused images, which further facilitate subsequent scene understanding and object detection from day to night. Integrating complementary information while simultaneously eliminating redundancy is a cruci…
Audio-driven talking face video generation has attracted increasing attention due to its huge industrial potential. Some previous methods focus on learning a direct mapping from audio to visual content. Despite progress, they often struggle with the ambiguity of the mapping process, leading to flawed results. An alternative strategy involves facial structural representations (e.g., facial landmar…
The absence of real-world ground truth (GT) remains a challenge in multi-exposure image fusion (MEF). Benchmarks synthesizing pseudo GT through algorithm ensembles. Existing methods, hampered by inherent imperfections of pseudo GT and fixed mapping relationships, show limited performance and robustness. To address the limitations, we propose a novel cross-modal diffusion framework that synergizes…
Online continual learning studies how models learn from continuous and non-stationary data streams. In this paper, we observe that CLIP models exhibit an asymmetric image-text interaction under online continual learning. Specifically, text features of previously seen classes may introduce unfavorable supervision when paired with visual features of newly observed data, leading to catastrophic forg…
Each sequence in existing RGBT tracking datasets is typically captured from a single platform equipped with both RGB (visible light) and TIR (thermal infrared) sensors. In real-world applications, tracking some objects requires cross-platform collaboration and these platforms might be equipped with different sensors. However, changes in modalities and platforms may cause significant variations in…
Semi-supervised learning (SSL) provides an effective means of reducing reliance on large-scale annotated datasets by leveraging unlabeled data. However, existing SSL methods often struggle with semantic ambiguity, especially under limited supervision. Recent studies have incorporated textual information to provide contextual guidance, yet most focus on feature fusion rather than emphasizing targe…
Fine-grained cross-view localization seeks to estimate precise camera poses by matching ground images with GPS-tagged aerial imagery. Existing methods typically employ first-order iterative optimization to progressively update the camera pose based on cross-view feature correspondences. However, they rely on local features and neglect global and complementary contextual information, making them p…
In low-light environments, conventional cameras often struggle to capture clear multi-view images of objects due to dynamic range limitations and motion blur caused by long exposure. Event cameras, with their high-dynamic range and high-speed properties, have the potential to mitigate these issues. Additionally, 3D Gaussian Splatting (GS) enables radiance field reconstruction, facilitating bright…
Multi-view clustering (MVC) based on anchor learning has been proven to be effective in improving clustering accuracy and efficiency. Existing MVC methods are mainly based on single-granularity anchor learning, that is, the number of anchors corresponding to different views is constant and consistent, which will lead to information redundancy or insufficient mining. In addition, aggregating ancho…
The effective fusion of multi-modal remote sensing images, particularly hyperspectral imagery (HSI) and light detection and ranging (LiDAR) data, is pivotal for accurate land use and land cover (LULC) classification. However, this process is hindered by two inherent challenges: pervasive data redundancy and the underutilization of cross-modal complementarity, largely due to the lack of a unifying…
Although video generation and editing models have advanced significantly, individual models remain restricted to specific tasks, often failing to meet diverse user needs. Effectively coordinating these models in pipelines can unlock a wide range of video generation and editing capabilities. However, manual orchestration is complex, time-consuming, and requires deep expertise in model performance …
Accurate cross-modality cardiac image segmentation is essential for effectively diagnosing and treating heart disease. Different imaging modalities help to determine suitable pre-procedure planning. However, most methods face the difficulty of spatial-temporal confounding, where the anatomy element and modality element of cardiac images are intertwined across both spatial and temporal dimensions.…
As most optical satellites remotely acquire multispectral images (MSIs) with limited spatial resolution, multispectral unmixing (MU) becomes a critical signal processing technology for analyzing the pure material spectra for high-precision classification and identification. Unlike the widely investigated hyperspectral unmixing (HU) problem, MU is much more challenging as it corresponds to the und…
research.ioSign up to keep scrolling
Create your feed subscriptions, save articles, keep scrolling.