
data-science

This report provides a comprehensive analysis of learner performance and program completion outcomes across ALX Nigeria programs, focusing primarily on the AI Career Essentials (AICE) track. The insights are derived from a cleaned hypothetical dataset of 5,002 learners, excluding those who deferred Executive Summary With an overall graduation rate of 38%, this analysis reveals key performance pat…
You loaded your data. You ran head() . Everything looks fine. It is not fine. The data that looks fine in head() hides its problems. The missing values are three thousand rows down. The duplicates are in the middle. The date column that looks like a date is actually a string and will break your model silently. The salary column has a value of negative forty thousand that nobody caught. Every real…
In petroleum geophysics, well logs are fundamental for subsurface characterization; however, missing logs frequently occur due to tool failure, legacy data gaps, or economic constraints, limiting reliable reservoir evaluation. The primary aim of this study is to develop and evaluate a simple, nonparametric machine learning framework for predicting missing geophysical well logs using K-Nearest Nei…

Numbers are the language of science—yet in research articles, they are often buried within the text and difficult to analyze. Researchers at Jülich have developed an AI system that automatically identifies these numbers, categorizes them, and converts them into structured data. The Quinex framework ...
Working with Open Data can feel deceptively simple at first. You find a dataset, explore a few endpoints, maybe even build a quick prototype. Everything seems straightforward until you try to turn that prototype into something more stable. At that point, a different set of challenges starts to appear. This article is not about how to use Open Data Hub step by step. Instead, it focuses on somethin…
How I built an end-to-end clickstream pipeline with exactly-once delivery guarantees When I set out to build Pulse, I had a specific goal: demonstrate that I could work with streaming data, not just batch. My first portfolio project (Ballistics) was a batch pipeline — API calls on a schedule, Airflow orchestration, daily refreshes. That's the bread and butter of most data engineering work, but it…
How I turned my eight-year weekly visualization habit into a reusable AI workflow The post Beyond Prompting: Using Agent Skills in Data Science appeared first on Towards Data Science .
AI research links mosquitoes and physics: New methods estimate class prevalence under shifting data conditions.
Read the complete Open Source and the Lakehouse series: Part 1: Apache Software Foundation: History, Purpose, and Process Part 2: What is Apache Parquet? Part 3: What is Apache Iceberg? Part 4: What is Apache Polaris? Part 5: What is Apache Arrow? Part 6: Assembling the Apache Lakehouse Part 7: Agentic Analytics on the Apache Lakehouse If you grant a Large Language Model direct access to a raw Am…
Read the complete Open Source and the Lakehouse series: Part 1: Apache Software Foundation: History, Purpose, and Process Part 2: What is Apache Parquet? Part 3: What is Apache Iceberg? Part 4: What is Apache Polaris? Part 5: What is Apache Arrow? Part 6: Assembling the Apache Lakehouse Part 7: Agentic Analytics on the Apache Lakehouse If you pull a million records from a database into a Python n…
You've decided you want a career in data analytics. Or maybe you're already in a non-data role and you see the direction things are going. You've heard "Power BI" mentioned in job listings, LinkedIn posts, and company meetings. You open YouTube. You watch a 3-hour tutorial. You build something that kind of works. Then you don't know what to learn next. That's not a learning problem. That's a road…
Connecting Data from Multiple Sources in Power BI: A Comprehensive Technical Guide Introduction In the world of data analytics, insights are only as good as the data that powers them. Often, organizations store critical business information across multiple platforms—spreadsheets, databases, APIs, PDFs, SharePoint, and cloud services. To build accurate, insightful Power BI reports, data analysts m…
Hey! I recently created my first ever data pipeline around energy information authority is the US. I'll be very happy if you take out the time to check it out and/or provide feedback (: GitHub - eia
Have you ever tried to open your Apple Health export.xml file? If you've been wearing an Apple Watch for more than a year, that file is likely a multi-gigabyte monster that makes your standard text editor cry. We are living in the golden age of Quantified Self, yet our data remains trapped in bloated, hierarchical formats that are nearly impossible to analyze efficiently. In this tutorial, we’re …
library(tidyverse) library(arrow) In this chapter, we extend univariate portfolio analysis to bivariate sorts, which means we assign stocks to portfolios based on two characteristics. Bivariate sorts are regularly used in the academic asset pricing literature and are the basis for the factors in the Fama-French three-factor model. However, some scholars also use sorts with three grouping variable…
This paper discusses the role of data within scientific reasoning and as evidence for theoretical claims, arguing for the idea that data can yield theoretically grounded models and be inferred, predicted, or explained from/by such models. Contrary to Bogen and Woodward’s rejection of data-to-theory and theory-to-data inferences/predictions, we draw upon artificial intelligence as applied to scien…
Data science is everywhere, a driving force behind modern decisions. When a streaming service suggests a movie, a bank sends a warning about unusual activity on an account, or a weather app predicts the rain, these are all examples of data science at work. If the internet creates data, data scientists are the ones who make that data useful. Data science is the practice of using information to mak…
What I learned about data wrangling, segmentation, and storytelling while building an application security report from scratch The post Turning 127 Million Data Points Into an Industry Report appeared first on Towards Data Science .
Every second, scientific experiments produce a flood of data — so much that transmitting and analyzing it can slow down even the most advanced research. To help scientists better manage this data deluge, researchers at the U.S. Department of Energy’s (DOE) Argonne National Laboratory have developed a new computer chip that rapidly compresses and processes the huge amounts of data generated by adv…
research.ioSign up to keep scrolling
Create your feed subscriptions, save articles, keep scrolling.

