computer-vision-and-pattern-recognition
TL;DR: We needed to caption 1.2M reconstructed event-camera frames using vision-language models for auxiliary supervision. The first run died at 340K from Anthropic rate limits. Putting Bifrost in front of three VLM providers cut the rerun cost by 22% and finished in 9 hours. So, the thing is, when you work at a neuromorphic vision startup, your training data looks strange. At Prophesee we accumu…
Scientific Reports, Published online: 25 May 2026; doi:10.1038/s41598-026-54974-3 Optimization of image restoration technology and AI iterative upgrade based on PCNN
I shipped www.imgto3d.ai a few months ago. It's an AI-powered 3D model generator : you upload an image, pick a generator (upscale, denoise, face recovery, or full restoration), and get back a .glb file you can drop into Blender, Unity, or straight into a web page. The AI backend part — calling Replicate, polling for results, handling credits — was straightforward. The part that actually made me l…

In a previous article, we benchmarked three open-source Vision-Language Models on zero-shot object detection and arrived at an uncomfortable conclusion: even the fastest contender, Phi-3.5-vision-instruct, takes 4.45 seconds per frame on an NVIDIA L4. LLaVA-v1.6 sits at 8.13 seconds. For any application that needs to process a live video stream, these numbers are disqualifying. But the conclusion…
Scientific Reports, Published online: 24 May 2026; doi:10.1038/s41598-026-52885-x Geometry-aware localization evaluation of grad-CAM for wafer map defect classification
Check out this new preprint, now live on EcoEvoRxiv, from a collaboration co-led by Jordan and Yuval Cohen: Expanding the sentinel approach through multimodal integration: resolving underlying ecological processes with eDNA and computer vision Sentinel approaches can quantify in-field ecological interactions and processes in a semi-controlled way, also reducing bias and labour. The assignment of …
Abstract One-stream transformer trackers have received widespread attention for their excellent discriminatory ability. However, most of the existing trackers try to mine more information about the target while ignoring the exploitation of the background around it. In this work We propose a single-stream progressive background elimination transformer for target tracking. This model employs a prog…
International audience
This is a submission for the Gemma 4 Challenge: Build with Gemma 4 What I Built BugLens is an AI-powered screenshot debugger for developers. You upload a screenshot of any bug, error message, or broken UI, and Gemma 4 instantly analyzes it and returns a structured diagnosis — including the error type, severity, root cause, affected area, a step-by-step fix, and prevention tips. No more copy-pasti…
A well-perceived, blind image quality assessment algorithm using an enhanced noise feature criterion
Scientific Reports, Published online: 23 May 2026; doi:10.1038/s41598-026-54147-2 A well-perceived, blind image quality assessment algorithm using an enhanced noise feature criterion

Behind every self-supervised vision model lies a chain of human design choices that shape its performance, robustness, and transferability. Choices regarding pretext data, pretext tasks, model architecture, and transfer strategies matter. Successful self-supervision depends on their alignment.
This is a submission for the Gemma 4 Challenge: Build with Gemma 4 I Built a Smart Kitchen AI with Gemma 4 That Turns Fridge Photos Into Recipes What I Built Smart Kitchen AI is a multimodal AI-powered cooking assistant designed to make everyday cooking smarter and easier. The idea started during a Build With AI bootcamp where my team and I wanted to explore how AI could solve practical real-worl…
Scientific Data, Published online: 23 May 2026; doi:10.1038/s41597-026-07464-0 Cataract-LMM Large-Scale Multi-Source Multi-Task Benchmark for Deep Learning in Surgical Video Analysis
We present NPC Nano, a 501M-parameter decoder-only language model pretrained from random initialization on 8.93B tokens using a single NVIDIA A40 GPU. We document the pretraining recipe, a label-shift bug encountered during training and the pre-launch sanity gate that prevents its recurrence, an identity layer methodology with empirically recalibrated capability gates, and a four-experiment chara…
Abstract Detection of deepfakes has become a more difficult task due to the Escalating Sophistication of Reproductive Reproductions, particularly D-F architectures, such as the existing methods, which have problems with cross-dataset generalization because they rely on single-stream deep features and naive concatenation approaches. In this Paper, we present AFFD-Net (Attention-Guided Feature Fusi…
research.ioSign up to keep scrolling
Create your feed subscriptions, save articles, keep scrolling.




