data-science

DEV Community

INTRODUCTION Data cleaning is the process of identifying and correcting errors, anomalies and inconsistencies in raw data sets to improve the quality of the data and get it ready for advanced analysis and modeling. In today’s data-driven world, raw data is often messy and rarely ready for analysis. The real value of a data analyst lies not just in collecting data, but in their ability to prepare …

computer-sciencedata-science
DEV Community

This report provides a comprehensive analysis of learner performance and program completion outcomes across ALX Nigeria programs, focusing primarily on the AI Career Essentials (AICE) track. The insights are derived from a cleaned hypothetical dataset of 5,002 learners, excluding those who deferred Executive Summary With an overall graduation rate of 38%, this analysis reveals key performance pat…

aidata-sciencemachine-learning
DEV Community

You loaded your data. You ran head() . Everything looks fine. It is not fine. The data that looks fine in head() hides its problems. The missing values are three thousand rows down. The duplicates are in the middle. The date column that looks like a date is actually a string and will break your model silently. The salary column has a value of negative forty thousand that nobody caught. Every real…

computer-sciencedata-science
Frontiers in Environmental Science | New and Recent Articles

In petroleum geophysics, well logs are fundamental for subsurface characterization; however, missing logs frequently occur due to tool failure, legacy data gaps, or economic constraints, limiting reliable reservoir evaluation. The primary aim of this study is to develop and evaluate a simple, nonparametric machine learning framework for predicting missing geophysical well logs using K-Nearest Nei…

aidata-scienceengineeringmachine-learning
bionity.com News
DEV Community

Working with Open Data can feel deceptively simple at first. You find a dataset, explore a few endpoints, maybe even build a quick prototype. Everything seems straightforward until you try to turn that prototype into something more stable. At that point, a different set of challenges starts to appear. This article is not about how to use Open Data Hub step by step. Instead, it focuses on somethin…

computer-sciencedata-science
DEV Community

How I built an end-to-end clickstream pipeline with exactly-once delivery guarantees When I set out to build Pulse, I had a specific goal: demonstrate that I could work with streaming data, not just batch. My first portfolio project (Ballistics) was a batch pipeline — API calls on a schedule, Airflow orchestration, daily refreshes. That's the bread and butter of most data engineering work, but it…

computer-sciencedata-science
Towards Data Science

How I turned my eight-year weekly visualization habit into a reusable AI workflow The post Beyond Prompting: Using Agent Skills in Data Science appeared first on Towards Data Science .

aidata-sciencemachine-learning
Lamarr Institute
DEV Community

Read the complete Open Source and the Lakehouse series: Part 1: Apache Software Foundation: History, Purpose, and Process Part 2: What is Apache Parquet? Part 3: What is Apache Iceberg? Part 4: What is Apache Polaris? Part 5: What is Apache Arrow? Part 6: Assembling the Apache Lakehouse Part 7: Agentic Analytics on the Apache Lakehouse If you grant a Large Language Model direct access to a raw Am…

aidata-science
DEV Community

Read the complete Open Source and the Lakehouse series: Part 1: Apache Software Foundation: History, Purpose, and Process Part 2: What is Apache Parquet? Part 3: What is Apache Iceberg? Part 4: What is Apache Polaris? Part 5: What is Apache Arrow? Part 6: Assembling the Apache Lakehouse Part 7: Agentic Analytics on the Apache Lakehouse If you pull a million records from a database into a Python n…

computer-sciencedata-science
DEV Community

You've decided you want a career in data analytics. Or maybe you're already in a non-data role and you see the direction things are going. You've heard "Power BI" mentioned in job listings, LinkedIn posts, and company meetings. You open YouTube. You watch a 3-hour tutorial. You build something that kind of works. Then you don't know what to learn next. That's not a learning problem. That's a road…

computer-sciencedata-science
DEV Community

Connecting Data from Multiple Sources in Power BI: A Comprehensive Technical Guide Introduction In the world of data analytics, insights are only as good as the data that powers them. Often, organizations store critical business information across multiple platforms—spreadsheets, databases, APIs, PDFs, SharePoint, and cloud services. To build accurate, insightful Power BI reports, data analysts m…

computer-sciencedata-science
DEV Community

Hey! I recently created my first ever data pipeline around energy information authority is the US. I'll be very happy if you take out the time to check it out and/or provide feedback (: GitHub - eia

computer-sciencedata-science
DEV Community

Have you ever tried to open your Apple Health export.xml file? If you've been wearing an Apple Watch for more than a year, that file is likely a multi-gigabyte monster that makes your standard text editor cry. We are living in the golden age of Quantified Self, yet our data remains trapped in bloated, hierarchical formats that are nearly impossible to analyze efficiently. In this tutorial, we’re …

computer-sciencedata-science
Tidy Finance Blog

library(tidyverse) library(arrow) In this chapter, we extend univariate portfolio analysis to bivariate sorts, which means we assign stocks to portfolios based on two characteristics. Bivariate sorts are regularly used in the academic asset pricing literature and are the basis for the factors in the Fama-French three-factor model. However, some scholars also use sorts with three grouping variable…

algorithmscomputer-sciencedata-sciencequant-finance
Philosophy of Science

This paper discusses the role of data within scientific reasoning and as evidence for theoretical claims, arguing for the idea that data can yield theoretically grounded models and be inferred, predicted, or explained from/by such models. Contrary to Bogen and Woodward’s rejection of data-to-theory and theory-to-data inferences/predictions, we draw upon artificial intelligence as applied to scien…

aicomputer-sciencedata-sciencemachine-learning
Michigan Tech News and Stories

Data science is everywhere, a driving force behind modern decisions. When a streaming service suggests a movie, a bank sends a warning about unusual activity on an account, or a weather app predicts the rain, these are all examples of data science at work. If the internet creates data, data scientists are the ones who make that data useful. Data science is the practice of using information to mak…

computer-sciencedata-science
Towards Data Science

What I learned about data wrangling, segmentation, and storytelling while building an application security report from scratch The post Turning 127 Million Data Points Into an Industry Report appeared first on Towards Data Science .

computer-sciencedata-science
Semiconductor Digest

Every second, scientific experiments produce a flood of data — so much that transmitting and analyzing it can slow down even the most advanced research. To help scientists better manage this data deluge, researchers at the U.S. Department of Energy’s (DOE) Argonne National Laboratory have developed a new computer chip that rapidly compresses and processes the huge amounts of data generated by adv…

algorithmscomputer-sciencedata-science
research.ioresearch.io

Sign up to keep scrolling

Create your feed subscriptions, save articles, keep scrolling.

Already have an account?