devops

DEV Community

ECS Service Discovery: Cloud Map, Service Connect, or an Internal Load Balancer?

Matt

2h ago

ECS Service Discovery: Cloud Map vs Service Connect Originally published at https://fortem.dev/blog/ecs-service-discovery-guide Cloud Map, Service Connect, or an internal ALB? A practical decision framework for ECS Fargate teams — with the July 2025 blue/green unblock, real cost math, and Terraform snippet. Guide TL;DR AWS docs say "We recommend Service Connect" for new ECS-to-ECS traffic — built…

cloud-computingcomputer-sciencedevops

DEV Community

Monitor Kubernetes Pods Live with kubectl get --watch

DevOps Start

4d ago

This article was originally published on devopsstart.com. I'm cross-posting it here to share valuable insights with the Dev.to community on monitoring Kubernetes pods. Introduction When you are working with Kubernetes, you often need to see what is happening to your pods in real time. For instance, you might have just applied a new deployment or are debugging a flaky application. Manually running…

computer-sciencedevops

DEV Community

The Real AI Coding Breakthrough Is Not More Context. It Is Better Diagnostics.

Scarab Systems

4d ago

The Real AI Coding Breakthrough Is Not More Context. It Is Better Diagnostics. ai programming devops opensource Scarab Diagnostics Suite in the Wild - Field testing on Open Github Issues When I started building what became Scarab Diagnostic Suite, I was not trying to create a theory of AI-assisted software development. I was trying to survive my own repo. I was building an intricate frontend/back…

aidevopsmachine-learning

DEV Community

Build a DevOps Slack Agent with Cosmic: From "What Broke?" to PR in One Conversation

Tony Spiro

4d ago

If you've ever been paged at 2am, opened Slack, typed 'what broke?' and then spent 20 minutes switching between terminals, dashboards, and GitHub tabs to figure out the answer, this tutorial is for you. We're going to build a DevOps agent that lives in your Slack channel. When an engineer asks 'what broke in prod?', the agent: Pulls recent access logs from your Vercel deployment Identifies the er…

aidevopsengineeringsoftware-engineering

DEV Community

把 GitHub 用成研发加速器：从 Issue 到 Release 的实战工作流

Mao开霖

4d ago

把 GitHub 用成研发加速器：从 Issue 到 Release 的实战工作流很多团队把 GitHub 当成一个代码仓库：能 push、能 pull request、能看 diff，就算用起来了。真正高效的用法要更进一步：让 GitHub 承担需求记录、变更讨论、质量检查、版本发布和知识沉淀。这样做的好处不是多用几个功能，而是让工程活动的上下文留在代码旁边。一个后来加入项目的人，不需要在聊天记录、会议纪要和本地文档里来回翻找，只要沿着 Issue、Pull Request、Actions 和 Release 的链路，就能理解某次变更为什么发生、怎么实现、如何验证，以及最终发布了什么。这篇文章用一个小型 Web 项目的视角，整理一套可以直接套用的 GitHub 工作流。它适合个人项目，也适合 3 到 20 人的小团队。示例会覆盖 Issue 模板、分支命名、Pull Request…

computer-sciencedevopsengineeringsoftware-engineering

DEV Community

DevOps Pipeline: Stages, Tools, and CI/CD Explained

ilyas Elaissi

5d ago

Most teams don't fail at writing code. They fail at getting it to production reliably, quickly, and without someone staying late to babysit a deployment script. A well-constructed DevOps pipeline is the answer to that specific problem — and once you've set one up properly, you'll wonder how you survived the manual version. Table of Contents The Core Idea Behind Automated Delivery What a DevOps Pi…

computer-sciencedevopssoftware-engineering

DEV Community

Zero-Stall AI: Building a Self-Managing TDD Pipeline with Autonomous Agents

Tzvi Gregory Kaidanov

6d ago

published: false description: "How to design an AI-driven TDD loop that never gets stuck — GitHub Issues as memory, Playwright for tests, Vercel for staging, and Telegram for one-tap human approval." tags: aiagents, tdd, devops, llmops cover_image: https://images.unsplash.com/photo-1555949963-ff9fe0c870eb?w=1000 tl;dr — Point an AI agent at a GitHub Issue, have it write a failing E2E test, implem…

aidevopsmachine-learning

DEV Community

Security Audit of 6 Python Projects: 25 Issues Found & Fixed

JustJinoIT

9d ago

Published on : 2026-06-06 Reading time : 8 min Tags : #security #python #audit #devops Overview Over 3 months, I developed and audited 6 Python projects (3 bots + 3 libraries): a FastAPI + Telegram Bot + LLM integration system. I discovered 25 security/code issues and fixed 23 immediately. Audit scope : 91 Python files Issues found : 25 (5 critical, 18 medium, 2 minor) Fix rate : 92% (23/25) Crit…

computer-sciencedevopspythonsecurity

DEV Community

Linux: The Operating System That Runs the Internet

Nerav Doshi

9d ago

Pipeline & Prompts | Byte size guides on DevOps, Cloud and AI The Day I Realised Linux Was Everywhere When I first started working in Cloud and Infrastructure, I assumed most servers ran Windows — because that's what I grew up using on my laptop. Then I got access to my first cloud environment and was greeted with a black screen, a blinking cursor, and absolutely no Start menu in sight. That was …

computer-sciencedevops

DEV Community

Kubernetes: The Platform That Keeps the Internet Running at Scale

Nerav Doshi

9d ago

Pipeline & Prompts | Byte size guides on DevOps, Cloud and AI From Supply Chain to Container Orchestration When IBM acquired Red Hat, I was working as a technical seller trying to position IBM’s data science platform to clients. Our internal team was containerising CPLEX — a powerful optimisation engine used in warehouse management and supply chain applications — and running it on OpenShift. I ha…

computer-sciencedevops

DEV Community

From Minikube to AWS EKS: How I Built a Zero-Downtime Blue-Green Deployment Pipeline for ShopSwift

Oluwagbade Odimayo

11d ago

I built ShopSwift, a Node.js/Express e-commerce API, and wrapped it in a production-grade blue-green deployment pipeline: Docker, Kubernetes, Minikube local validation, NGINX Ingress , GitHub Actions CI, AWS EKS , Amazon ECR , and Prometheus + Grafana monitoring. Zero failed requests across every switch and rollback. Here is exactly how I did it - including the architecture mistake that caused a …

computer-sciencedevopssoftware-engineering

DEV Community

Inside Modern CI/CD Pipelines: How Automation Is Redefining DevOps

Eva Clari

13d ago

Engineering teams no longer view Continuous Integration and Continuous Deployment (CI/CD) as optional. For over a decade, pipelines have served to automate the transition from code commit to production. However, a major shift is occurring. Modern software delivery has outgrown simple bash scripts and basic test runners. Automation now redefines the entire DevOps landscape, transforming static del…

computer-sciencedevopssoftware-engineering

DEV Community

How 23,000 Repos Got Their Secrets Stolen Through Their Own CI/CD Pipeline

Vincent Olagbemide

14d ago

Been thinking about writing this one for a while. Supply chain attacks against CI/CD pipelines have been picking up pace over the past two years and the March 2025 tj-actions incident was the one that finally made me sit down and document everything properly. This is how I think about hardening GitHub Actions pipelines and what I actually do in practice. Original is on my blog but happy to have t…

cybersecuritydevopssupply-chain-attacks

DEV Community

From Zero to DevOps in Pakistan: My Real Journey With No CS Degree

zubairahmed687

15d ago

Everyone told me "You need a CS degree for DevOps." They were wrong. I'm Zubair from Pakistan. No CS degree. No bootcamp. Just a laptop and crazy curiosity. In 2 years, I went from knowing nothing about Linux to automating deployments with Docker, Kubernetes, and GitHub Actions. Here's exactly how I did it: 1. I stopped watching tutorials and started breaking things My first Ubuntu server crashed…

computer-sciencedevops

DEV Community

Unlocking Insights with Observability: My Journey with OpenTelemetry

Naveen Malothu

18d ago

Unlocking Insights with Observability: My Journey with OpenTelemetry As a Full Stack Engineer specializing in DevOps, AI Infrastructure, and Cloud, I've come to realize the importance of observability in ensuring the reliability and performance of complex systems. In my experience, having visibility into the inner workings of our applications and infrastructure is crucial for identifying issues, …

computer-sciencedevopsdistributed-systems

DEV Community

Performance Tuning: The Day the Server Got “Tired” and Started Acting Funny

AndrewDangerously

18d ago

Every sysadmin eventually encounters a system that isn’t technically down—but is clearly not doing well. It responds slowly, logs look fine, CPU usage is “not that bad,” and yet everything feels like it’s running through molasses. This is the story of a performance incident where a server slowly degraded into existential confusion, and the admin had to figure out whether the problem was CPU, memo…

computer-sciencedevops

DEV Community

Terraform + Terragrunt + Ansible: A Hands-On Learning Journey

Taha Yağız Güler

21d ago

I recently got interview feedback that changed how I approach learning: "You've used these tools, but the technical depth wasn't there." Instead of just reading documentation, I decided to build a real multi-environment infrastructure setup from scratch — dev, staging, and prod — using Terraform, Terragrunt, and Ansible. This post is a walkthrough of what I built, why each decision was made, and …

computer-sciencedevops

DEV Community

Cloud Cost Elasticity

Khushi Dubey

25d ago

Cloud spending rarely grows predictably. As systems scale, organizations face limited visibility, sudden cost spikes, and increasing pressure on margins. This often prompts leadership to ask whether to build an in-house cloud cost-optimization platform or adopt a specialized solution. While evaluating both options is responsible and encouraged by FinOps practices, what appears to be a cost-saving…

computer-sciencedevops

DEV Community

Managing Terraform State Locking in S3 Without DytnamoDB

sanjay yadav

26d ago

Introduction If you’ve worked with Terraform, you’ve probably followed the standard setup: S3 for storing Terraform state DynamoDB for state locking It’s widely recommended, and most teams implement it without questioning why. But Terraform has evolved. Today, Terraform S3 backend locking can handle state locking without DynamoDB. This introduces a simpler alternative — but also raises an importa…

computer-sciencedevops

DEV Community

Building a Cloud-Era DevOps Automation Platform: Six Pillars of Modern Ops

James Lee

29d ago

A mature automation ops platform in the cloud and DevOps era should be built around six core capabilities: ┌──────────────────────────────────────────────────────────────┐ │ Automation Ops Platform │ │ │ │ 1. Hybrid-Cloud CMDB 2. Monitoring + APM │ │ 3. Batch Ops (Web UI) 4. C…

cloud-computingcomputer-sciencedevops

research.io

Sign up to keep scrolling

Create your feed subscriptions, save articles, keep scrolling.

Already have an account?