data-science

DEV Community

Originally published at https://shai-kr.github.io/data-ninja-ai-lab/blog/2026-05-24-fabric-ai-functions-data-workflows.html Most enterprise GenAI demos start in the wrong place. They start with a chat window. The more useful place is usually earlier: inside the data workflow, before the dashboard, before the semantic model, before the analyst has to clean the same messy text for the tenth time. T…

aidata-sciencemachine-learning
Scientific Reports
DEV Community

What 3.9M powerlifting records tell us about competition strategy — an EDA with Python When I started this EDA project for my Data Science Master at Evolve , I picked the Open Powerlifting dataset because beyond being a gym-rat, I've always been curious about the competition strategy in powerlifting. The dataset Open Powerlifting is an open-source project that tracks powerlifting competition resu…

computer-sciencedata-analysisdata-sciencepython
Resources

Most financial institutions already operate across a complex mix of legacy systems, core banking tools, payment providers, customer communication channels, data warehouses, and manual workflows. These systems still perform critical work, holding account histories, customer records, balances, statuses, treatment logic, and reporting data used every day by collections teams.

aicomputer-sciencedata-sciencemachine-learning
DEV Community

Data has become one of the most powerful drivers of business success. However, managing data remains a significant challenge. Organizations require reliable analytical systems that can store, process, and interpret vast volumes of information. No wonder Analytics & Data Management SaaS led the industry in 2025. Whether you are aiming to disrupt the financial sector or embarking on complex healthc…

computer-sciencedata-science
DEV Community

Can an AI agent handle a 10M-row dataset in sub-2 seconds? By leveraging DuckDB and a columnar vectorized execution engine, I built a conversational BI agent that bridges the gap between raw data and executive decisions. In this post, I explore how we integrated generative AI with high-performance analytical databases to create a seamless experience for data-driven teams. Originally published at …

aidata-sciencemachine-learning
Frontiers in Environmental Science | New and Recent Articles

Short-term forecasting of the Air Quality Index (AQI) can support public health risk management and real-time environmental decision-making. In this study, we propose a multivariate, one-step-ahead time-series forecasting approach based on a Transformer encoder. The model predicts the AQI at the next time point using an observation sequence within a fixed historical window (five time steps in thi…

data-scienceenvironmentpollution
DEV Community

Over several months, we audited 1,000 French SMB sites using a unified protocol — same criteria, same tools, same thresholds. The goal: identify what actually separates sites capturing organic traffic from sites that stagnate. The results are sharper than we expected. The headline number Technically top-tier sites — those that pass Core Web Vitals, carry valid structured data, and use a coherent …

computer-sciencedata-science
DEV Community

You have 100 features. Most of them are correlated. Training is slow. Visualization is impossible. KNN is useless (curse of dimensionality). PCA is the tool that handles this. It takes your 100 features and finds 10 new features that capture 95% of the original information. Training gets faster, visualization becomes possible, and your models often get better too. It's one of those techniques you…

aidata-sciencemachine-learning
DEV Community

Power Query is the backbone of data preparation in Power BI. Before you can build stunning dashboards or write complex DAX, your data needs to be clean, consistently shaped, and properly related. In this guide, we'll use the real CodeSphere Hub dataset, comprising sales transactions, booking records, product info, and calendar data, to demonstrate every essential Power Query technique. The datase…

computer-sciencedata-science
DEV Community

As part of my Master in Data Science & AI at Evolve , I worked on a data analysis project using a real Steam games dataset from 2025. The goal of the project was to practice the full workflow of a data analysis project: understanding the dataset, cleaning it, transforming variables, creating visualizations and explaining the results in a way that is easy to understand. The main question I wanted …

computer-sciencedata-science
Newswise: Latest News

The shift from slow, trial-and-error experimentation to data-driven discovery is reshaping how we develop energy materials. A new Perspective argues that the design of materials databases--how they ingest, curate and share information--directly determines the trustworthiness of modern artificial intelligence (AI) models.

aidata-sciencemachine-learningmaterials
e-Publications@Marquette

This study examined the longitudinal relationships among eighth grade mathematics proficiency, high school physics availability, AP® Physics exam participation, and ACT STEM scores across Wisconsin public school districts. A retrospective, multi-cohort design was used, and two cohorts were tracked from middle school through junior year. Publicly available datasets were analyzed using an applied d…

data-scienceeducationstem-education
DEV Community

INTRODUCTION Data cleaning is the process of identifying and correcting errors, anomalies and inconsistencies in raw data sets to improve the quality of the data and get it ready for advanced analysis and modeling. In today’s data-driven world, raw data is often messy and rarely ready for analysis. The real value of a data analyst lies not just in collecting data, but in their ability to prepare …

computer-sciencedata-science
DEV Community

This report provides a comprehensive analysis of learner performance and program completion outcomes across ALX Nigeria programs, focusing primarily on the AI Career Essentials (AICE) track. The insights are derived from a cleaned hypothetical dataset of 5,002 learners, excluding those who deferred Executive Summary With an overall graduation rate of 38%, this analysis reveals key performance pat…

aidata-sciencemachine-learning
DEV Community

You loaded your data. You ran head() . Everything looks fine. It is not fine. The data that looks fine in head() hides its problems. The missing values are three thousand rows down. The duplicates are in the middle. The date column that looks like a date is actually a string and will break your model silently. The salary column has a value of negative forty thousand that nobody caught. Every real…

computer-sciencedata-science
Frontiers in Environmental Science | New and Recent Articles

In petroleum geophysics, well logs are fundamental for subsurface characterization; however, missing logs frequently occur due to tool failure, legacy data gaps, or economic constraints, limiting reliable reservoir evaluation. The primary aim of this study is to develop and evaluate a simple, nonparametric machine learning framework for predicting missing geophysical well logs using K-Nearest Nei…

aidata-scienceengineeringmachine-learning
bionity.com News
DEV Community

Working with Open Data can feel deceptively simple at first. You find a dataset, explore a few endpoints, maybe even build a quick prototype. Everything seems straightforward until you try to turn that prototype into something more stable. At that point, a different set of challenges starts to appear. This article is not about how to use Open Data Hub step by step. Instead, it focuses on somethin…

computer-sciencedata-science
DEV Community

How I built an end-to-end clickstream pipeline with exactly-once delivery guarantees When I set out to build Pulse, I had a specific goal: demonstrate that I could work with streaming data, not just batch. My first portfolio project (Ballistics) was a batch pipeline — API calls on a schedule, Airflow orchestration, daily refreshes. That's the bread and butter of most data engineering work, but it…

computer-sciencedata-science
research.ioresearch.io

Sign up to keep scrolling

Create your feed subscriptions, save articles, keep scrolling.

Already have an account?