computer-architecture
If you've ever deployed memory-bound workloads on AWS Graviton, you know that CPU compute speed is only part of the story. Another factor in real-world performance is how efficiently your code accesses the memory subsystem, specifically the cache hierarchy, interconnects, and physical DRAM. In this article, I will walk through how to use the Arm System Characterization Tool (ASCT) to analyze the …
Originally published on Alpinum Consulting The growth of open processor architectures has significantly increased the adoption of RISC‑V across embedded systems, AI accelerators, and high-performance computing platforms. This flexibility allows engineering teams to design processors with highly customised instruction sets and microarchitectures. However, this flexibility also increases verificati…

Kunal Pai, Harshil Patel, Erin Le, Noah Krim, Mahyar Samani, Bobby R. Bruce, Jason Lowe-Power
This is the fifth installment of the 80386 series. The FPGA CPU is now far enough along to run real software, and this post is about how it works. z386 is a 386-class CPU built around the original Intel microcode, in the same spirit as z8086. The core is not an instruction-by-instruction emulator in RTL. The goal is to recreate enough of the original machine that the recovered 386 control ROM can…

External GPU (eGPU) + NVIDIA Drivers on Linux: Solving the Display Manager Initialization Problem TL;DR: If your NVIDIA eGPU works in recovery mode but gives a black screen on normal boot, you're missing one critical Xorg option: AllowExternalGpus . This guide shows how to fix it properly on any X11-based Linux distribution. Introduction Installing NVIDIA drivers on a Linux system with an externa…
A new technical paper, “Emulation-based System-on-Chip Security Verification: Challenges and Opportunities,” was published by researchers at University of Florida. Abstract “Increasing system-on-chip (SoC) heterogeneity, deep hardware/software integration, and the proliferation of third-party intellectual property (IP) have brought security validation to the forefront of semiconductor design. Whi…
WARNING: This article is meant to be informal and fun! Okay, so you're a CS graduate and you did a hardware course as part of your degree, but perhaps that was a few years ago now, and you haven't really kept up with the details of processor designs since then. In particular, you might not be aware of some key topics that developed rapidly in recent times... - pipelining (superscalar, OOO, VLIW, …
For decades, we have designed chips in fundamentally the same way: human intuition applied to a vanishingly small slice of an impossibly large design space. That paradigm worked when Moore’s Law was lifting everything. We could afford to be wrong. We could afford to miss the best design. Process scaling would close the gap. That […]
Over about 10 weeks, I built a bare-metal SPMC at S-EL2 that boots Linux, manages Secure Partitions, and runs alongside Android pKVM on the same SoC. I built an ARM64 hypervisor that runs next to Google's pKVM on the same chip. pKVM takes the Normal world at NS-EL2. My hypervisor takes the Secure world at S-EL2. They coordinate through ARM's FF-A protocol, relayed by EL3 firmware. 35 end-to-end t…
IEEE Transactions on Parallel and Distributed Systems (TPDS) -- Special Issue on CMP Architectures
Paper 2026/677 SPLASH: SPeculative Leakage-Adaptive Secure Hardware Abstract Modern processors are largely fixed at the time of fabrication, rendering post-silicon security updates infeasible. This lack of flexibility is especially problematic for speculative execution attacks, which exploit microarchitectural optimizations to leak sensitive information through transient execution. However, exist…
Today, Cadence announced an expansion of its broad collaboration with NVIDIA to accelerate Cadence’s Design for AI and AI for Design strategy. The next generation of agentic AI design solutions includes autonomous, long-running agents that require accelerated, trusted, physics-grounded engines to translate design intent into automated flows, generate designs and debug errors, and manage long, com…
The honor recognizes their influential contributions to trustworthy AI, software systems, computer architecture, and design automation.
As we close the book on 2025, Computer Architecture Today has seen another successful year of community engagement. We published 29 posts covering a wide spectrum of topics—from datacenter energy-efficiency to the evolving debate on LLMs in peer review, alongside trip reports from our major conferences. I want to thank all our authors for their insights, with special appreciation for those who co…
Large language model (LLM) agents are quickly moving from “single agent” to *multi-agent systems*: tool-using agents, planner-orchestrator, debate teams, specialized sub-agents that collaborate to solve tasks. At the same time, the *context* these agents must operate within is becoming more complex: longer histories, multiple modalities, structured traces, and customized environments. This combin…
It is a sunny morning in the computer architecture research community. In the last few years, our community has multiplied in size, our conferences consistently reach record-high attendance, and the number of active research areas is mind-boggling. Members of our community are recognized with the Turing Award and are leading NSF CISE. While times may be exhilarating, it is important that all of u…
A new technical paper titled “BARD: Reducing Write Latency of DDR5 Memory by Exploiting Bank-Parallelism” was published by Georgia Tech. Abstract “This paper studies the impact of DRAM writes on DDR5-based system. To efficiently perform DRAM writes, modern systems buffer write requests and try to complete multiple write operations whenever the DRAM mode is switched... » read more The post The Imp…
For a decade, the promise of probabilistic computing has been overshadowed by a single, physical bottleneck: the need for bulky, power-draining analog control circuits. This technology relies on hardware elements, called p-bits, that naturally fluctuate between ‘one’ and ‘zero,’ allowing systems to efficiently solve optimization and inference problems that baffle traditional computers. Yet, to fi…
Microarchitecture simulators have been conceived and implemented to be valuable tools for the design of computing chips of all types (SimpleScalar, gem5, SMTSIM, Sniper, Qflex, Scarab, GPGPU-sim, Accel-Sim, Multi2Sim, NaviSim, SCALE-sim, gem5-Salam, TAO, PyTorchSim – the list is neither historically complete nor updated). In essence, microarchitecture simulators have an “impossible” objective: to…
research.ioSign up to keep scrolling
Create your feed subscriptions, save articles, keep scrolling.
