IEEE Transactions on Pattern Analysis and Machine Intelligence
CFSM: A Novel Causal Feature Selection Module for Two-Dimensional Out-of-Distribution Generalization
In real-world scenarios, training and test data are often collected in diverse settings, leading to domain shifts arising from evolving environments and selection bias. While causality-inspired methods have shown promising results in tackling the out-of-distribution (OOD) generalization issue, prior methods treat the discovered differences across domains as confounding variables. While effective …
Recently, the mainstream practice for training low-light raw image denoising methods has shifted towards employing synthetic data. Noise modeling, which focuses on characterizing the noise distribution of real-world sensors, profoundly influences the effectiveness and practicality of synthetic data. Currently, physics-based noise modeling struggles to characterize the entire real noise distributi…
This paper addresses the important and challenging task of large-scale unsupervised semantic segmentation (LUSS). We present the first attempt to unleash the power of foundation models (FMs) for the challenging, dense prediction task LUSS, and our main objective is to present simple, effective yet efficient solutions for LUSS, namely Prompting foundation models for LUSS (PLUSS). Firstly, we propo…
Minimax optimization is gaining increasing attention in modern machine learning applications. Driven by large-scale models and massive volumes of data collected from edge devices, as well as the concern to preserve client privacy, distributed minimax optimization algorithms become popular, such as Local Stochastic Gradient Descent Ascent (Local-SGDA), and Local Decentralized SGDA (Local-DSGDA). W…
Trajectory prediction is a fundamental problem in computer vision, vision-language-action models, world models, and autonomous systems, with broad impact on applications including autonomous driving, robotics, and surveillance. Most existing approaches assume observations are complete and relatively clean, and thus do not adequately address out-ofsight agents or the intrinsic noise in sensing mod…
Multimodal fusion is susceptible to modality imbalance, where dominant modalities overshadow weak ones, easily leading to biased learning and suboptimal fusion, especially for incomplete modality conditions. To address this problem, we introduce an Equilibrium Deviation Metric (EDM) to quantify this imbalance and verify, in both theoretical and empirical terms, that the optimization order of moda…
Despite strong performances on many generative tasks, diffusion and flow matching models require a large number of sampling steps to generate high-quality images. This has motivated the community to develop effective methods to distill pre-trained models into more efficient models. In this paper, we present Implicit Generator Matching (IGM), a systematic approach to distill both pre-trained diffu…
Large graphs are becoming ubiquitous, presenting significant computational hurdles in data processing and analysis. Graph Coarsening algorithms are frequently employed to condense large graphs while preserving key graph properties. Real-world graphs also have features or contexts associated with each node. However, existing coarsening methods often overlook simultaneity across node features and s…
The remarkable success of GNNs has provoked the challenge of high computational and memory overhead when training with large-scale graphs. As a promising solution, graph condensation is committed to constructing synthetic graphs with significantly smaller size, which are expected to preserve the essential characteristics of the original ones. During this process, a core problem is how to accurate…
Multimodal representation learning seeks to create a unified representation space by integrating diverse data modalities to improve multimodal understanding. Traditional methods often depend on pairwise contrastive learning, which relies on a predefined anchor modality, restricting alignment across all modalities. Recent advances have investigated the simultaneous alignment of multiple modalities…
In this paper, we consider the problem of long-term point tracking, which requires consistent identification of points across video frames under significant appearance changes, motion, and occlusion. We target the online setting, i.e., tracking points frameby- frame, making it suitable for real-time and streaming applications. We extend our prior model Track-On into Track-On2, a simple and effici…
Pseudo-labeling has emerged as a popular and effective approach for utilizing unlabeled data. However, in the context of semi-supervised multi-label learning (SSMLL), conventional pseudo-labeling methods encounter difficulties when dealing with instances associated with multiple labels and an unknown label count. These limitations often result in the introduction of false positive labels or the n…
Recently, enhancing the generative capability of text-to-image (T2I) models has become a promising direction in both academia and industry. Prior studies often focused on either improving generative quality or reducing inference latency, but typically failed to improve both quality and speed simultaneously. Moreover, existing inference-enhancement methods do not achieve significant improvements s…
Hyperparameter recommendation through meta-learning (HPR-MtL) has proven effective in a wide range of studies. At its core, HPR-MtL constructs a recommendation model using metadata extracted from historical learning tasks, such as dataset characteristics and the empirical performance of hyperparameter configurations. Existing approaches-typically based on k-nearest neighbors (KNN), linear regress…
Data is the foundation for the development of computer vision, and the establishment of datasets plays an important role in advancing the techniques of fine-grained visual categorization (FGVC). In the existing FGVC datasets used in computer vision, it is generally assumed that each collected instance has fixed characteristics and the distribution of different categories is relatively balanced. I…
Inducing-point-based sparse variational approximation scales Gaussian process models to large datasets but tends to overestimate observation noise and underestimate posterior variance. Parametric predictive Gaussian process regressor (PPGPR) improve on point-wise uncertainty estimations, especially for heteroskedastic data, by repairing an mismatch between the training loss and the predictive met…
Embodied intelligence and related disciplines have identified several mechanisms that help embodied agents learn how to solve complex problems. Reinforcement learning (RL) is one of the most promising computational approaches toward enhancement of the learning-based problem-solving abilities of such agents. Given the recent rapid evolution of artificial intelligence, RL has become a keystone tech…
Invariant feature extraction is a critical challenge in intelligent image processing, particularly with the rapid advancement of multi-source/modal imaging. Cross-modal matching has attracted considerable attention, yet current studies primarily focus on targeted modalities rather than realizing a general approach. In this paper, cross-arbitrary-modal image invariant feature extraction and matchi…
Face Forgery Detection (FFD), or Deepfake detection, aims to determine whether a digital face is real or fake. Due to different face synthesis algorithms with diverse forgery patterns, FFD models often overfit specific patterns in training datasets, resulting in poor generalization to other unseen forgeries. Existing FFD methods primarily leverage pre-trained backbones with general image represen…
research.ioSign up to keep scrolling
Create your feed subscriptions, save articles, keep scrolling.