computer-vision-and-pattern-recognition

Scientific Data
DEV Community

TL;DR: We needed to caption 1.2M reconstructed event-camera frames using vision-language models for auxiliary supervision. The first run died at 340K from Anthropic rate limits. Putting Bifrost in front of three VLM providers cut the rerun cost by 22% and finished in 9 hours. So, the thing is, when you work at a neuromorphic vision startup, your training data looks strange. At Prophesee we accumu…

aicomputer-sciencecomputer-visionmachine-learning
Scientific Reports
DEV Community

I shipped www.imgto3d.ai a few months ago. It's an AI-powered 3D model generator : you upload an image, pick a generator (upscale, denoise, face recovery, or full restoration), and get back a .glb file you can drop into Blender, Unity, or straight into a web page. The AI backend part — calling Replicate, polling for results, handling credits — was straightforward. The part that actually made me l…

aicomputer-visionmachine-learning
Biological sciences : Scientific Reports subject feeds
DEV Community

In a previous article, we benchmarked three open-source Vision-Language Models on zero-shot object detection and arrived at an uncomfortable conclusion: even the fastest contender, Phi-3.5-vision-instruct, takes 4.45 seconds per frame on an NVIDIA L4. LLaVA-v1.6 sits at 8.13 seconds. For any application that needs to process a live video stream, these numbers are disqualifying. But the conclusion…

aicomputer-visionmachine-learning
Scientific Reports
Foraging Ecology Research Group

Check out this new preprint, now live on EcoEvoRxiv, from a collaboration co-led by Jordan and Yuval Cohen: Expanding the sentinel approach through multimodal integration: resolving underlying ecological processes with eDNA and computer vision Sentinel approaches can quantify in-field ecological interactions and processes in a semi-controlled way, also reducing bias and labour. The assignment of …

bioinformaticsbiologycomputer-visionecology
Z
Zenodo (CERN European Organization for Nuclear Research)
Paper
Jon Halstead
1d ago
Computer ScienceComputer Vision and Pattern RecognitionData Visualization and AnalyticsPhysical Sciences
J
Journal of Information Security and Applications
D
Discover Artificial Intelligence

Abstract One-stream transformer trackers have received widespread attention for their excellent discriminatory ability. However, most of the existing trackers try to mine more information about the target while ignoring the exploitation of the background around it. In this work We propose a single-stream progressive background elimination transformer for target tracking. This model employs a prog…

Computer ScienceComputer Vision and Pattern RecognitionPhysical SciencesVideo Surveillance and Tracking Methods
H
HAL (Le Centre pour la Communication Scientifique Directe)
DEV Community

This is a submission for the Gemma 4 Challenge: Build with Gemma 4 What I Built BugLens is an AI-powered screenshot debugger for developers. You upload a screenshot of any bug, error message, or broken UI, and Gemma 4 instantly analyzes it and returns a structured diagnosis — including the error type, severity, root cause, affected area, a step-by-step fix, and prevention tips. No more copy-pasti…

aicomputer-visionmachine-learning
Scientific Reports
Research Communities by Springer Nature
DEV Community

This is a submission for the Gemma 4 Challenge: Build with Gemma 4 I Built a Smart Kitchen AI with Gemma 4 That Turns Fridge Photos Into Recipes What I Built Smart Kitchen AI is a multimodal AI-powered cooking assistant designed to make everyday cooking smarter and easier. The idea started during a Build With AI bootcamp where my team and I wanted to explore how AI could solve practical real-worl…

aicomputer-visiongenerative-aimachine-learning
Scientific Data
I
Intelligent Service Robotics
Z
Zenodo (CERN European Organization for Nuclear Research)

We present NPC Nano, a 501M-parameter decoder-only language model pretrained from random initialization on 8.93B tokens using a single NVIDIA A40 GPU. We document the pretraining recipe, a label-shift bug encountered during training and the pre-launch sanity gate that prevents its recurrence, an identity layer methodology with empirically recalibrated capability gates, and a four-experiment chara…

Advanced Neural Network ApplicationsComputer ScienceComputer Vision and Pattern RecognitionPhysical Sciences
Z
Zenodo (CERN European Organization for Nuclear Research)

Abstract Detection of deepfakes has become a more difficult task due to the Escalating Sophistication of Reproductive Reproductions, particularly D-F architectures, such as the existing methods, which have problems with cross-dataset generalization because they rely on single-stream deep features and naive concatenation approaches. In this Paper, we present AFFD-Net (Attention-Guided Feature Fusi…

Computer ScienceComputer Vision and Pattern RecognitionGenerative Adversarial Networks and Image SynthesisPhysical Sciences
research.ioresearch.io

Sign up to keep scrolling

Create your feed subscriptions, save articles, keep scrolling.

Already have an account?