Side-Channel Attacks on ML Accelerators: The Hardware Security Threat AI Teams Are Ignoring

In 2019, a research team at MIT published a paper demonstrating that they could recover the architecture and approximate weights of a neural network running on a commercial edge AI accelerator — without ever querying the model's API, without accessing the device's memory, and without breaking any cryptographic protection the device had in place.

Their method: a microphone and a power probe.

The attack worked by capturing the acoustic and power side-channel emissions of the accelerator while it ran inference. Different layers of the network produced different power signatures. Different weight values produced measurable differences in both power draw and acoustic profile. The research team developed a signal processing pipeline that could map those physical emissions back to architectural decisions and weight approximations with enough fidelity to clone a competitor's model.

I read that paper when it came out, and my reaction was probably different from most AI engineers': I wasn't surprised. I was surprised it had taken this long.

My PhD research at NYU Tandon's Department of Electrical and Computer Engineering was focused on hardware security — specifically, on finding and analyzing the ways that the physical properties of computing devices leak information that the device's designers didn't intend to reveal. Side-channel attacks have been a known threat to cryptographic hardware for decades. The extension of these attacks to ML accelerators was, from a hardware security perspective, entirely predictable. The only question was when the AI community would start paying attention.

The answer appears to be: not yet, and not nearly fast enough.

What Side-Channel Attacks Actually Are

Before going into the ML-specific threat, let me ground the concept for readers who come from AI engineering backgrounds rather than hardware security.

A side-channel attack exploits physical information that leaks from a computing system as a byproduct of its normal operation. Unlike traditional software vulnerabilities — buffer overflows, injection attacks, authentication bypasses — side-channel attacks don't exploit bugs in code. They exploit the physics of computation.

The most practically important side channels are:

Power analysis. Every computation performed by a hardware device consumes power. The amount of power consumed varies depending on the specific values being processed and the operations being performed — a multiplication by zero consumes different power than a multiplication by a large value, because the underlying transistors switching state have measurable energy costs. Simple Power Analysis (SPA) reads the power trace and makes direct inferences about individual operations. Differential Power Analysis (DPA) uses statistical techniques across many measurements to extract secrets even when individual measurements are noisy.

Electromagnetic analysis. Computing hardware radiates electromagnetic emissions as electrons move through conductors. Like power analysis, the emission profile varies with the values being processed. EM analysis is in some ways more powerful than power analysis because it can be performed without physical contact — from distances of up to several meters with appropriate equipment.

Timing analysis. The time required to complete an operation varies depending on the input. Cache timing attacks, in particular, have been the basis of some of the most significant cryptographic breaks of the past two decades — the Spectre and Meltdown vulnerabilities that affected nearly every modern processor are, at their core, timing side channels. For ML inference, execution time varies by model architecture, input characteristics, and hardware utilization patterns in ways that can leak architectural information.

Acoustic and thermal analysis. The acoustic emissions paper I described at the opening is the extreme version of this category. Switching activity in digital circuits produces audible vibrations through a phenomenon called the Piezoelectric effect in package materials. Thermal cameras can observe heat dissipation patterns that correlate with computational activity.

In cryptographic hardware, defending against these attacks is a well-understood discipline with established countermeasures: constant-time implementations, power noise injection, physical shielding. The smart card in your credit card has been designed against power analysis attacks for thirty years.

ML accelerators have essentially none of these protections. The designers of GPU clusters, edge AI chips, and cloud inference hardware did not, until recently, need to think about power-based attacks. The entire discipline is being imported from cryptographic hardware security into AI hardware security — rapidly, because attackers are already several years into understanding the attack surface.

The Threat Model for ML Accelerators

Let me be specific about what an attacker can extract through side-channel analysis of ML inference hardware, because the threat is more serious than most AI security discussions acknowledge.

Model architecture extraction. The sequence of layer types, their dimensions, and their activation functions produce distinctive power signatures. Research has demonstrated that a full transformer architecture — number of layers, attention head configuration, embedding dimensions — can be recovered from power traces with sufficient measurement fidelity. This is particularly dangerous for proprietary model architectures where the architecture itself is trade secret IP.

Weight extraction (partial and full). Full weight extraction from a modern large language model is not currently feasible through side channels — the dimensionality is too high and the signal-to-noise ratio from a single inference run is insufficient. But partial weight extraction is feasible for specific layers, particularly the final few layers of a classification model or the embedding layers of an encoder. For models where specific capabilities depend on specific weight values — a specialized medical image classifier, a fraud detection model trained on proprietary data — even partial weight extraction can be sufficient to replicate the capability.

Training data inference. More subtle: the specific patterns in weight values can be used to make inferences about what data the model was trained on. This is related to membership inference attacks that operate at the API level, but the hardware side channel can provide additional signal that API-level attacks cannot access.

Input reconstruction. For edge inference deployments — an AI chip embedded in a camera, a medical device, an industrial sensor — the power trace of an inference run can be used to reconstruct approximate inputs. An attacker who can measure the power draw of an autonomous vehicle's perception system while it's processing a scene can make inferences about what the camera was seeing. This is not a theoretical capability; it has been demonstrated in research contexts.

The attack surface is not uniform. Cloud inference — where the target model runs on hardware that the attacker doesn't control and can't place probes on — is harder to attack through hardware side channels than edge inference. But cloud inference is not immune. Timing channels accessible through network interfaces can leak architectural information. Co-tenancy attacks — placing an attacker-controlled workload on the same physical hardware as the target, as is common in multi-tenant cloud environments — open power and cache-timing channels that don't require physical hardware access.

What I Saw Building Hardware Security Tools

My research in hardware security wasn't purely defensive. One of the most useful ways to understand an attack surface is to build tools that operationalize the attacks, and that's what I did during my doctoral work.

The TAINT tool I developed — focused on hardware trojan detection in FPGAs, discussed in my earlier piece on FPGA supply chain security — was part of a broader research program that included side-channel analysis. When you're building tools to detect malicious hardware modifications, you need to understand what the side channel looks like on a clean device before you can identify what's different about a compromised one. That work gave me a detailed empirical understanding of how power signatures map to computational operations on real hardware.

Several things stood out from that work that are directly relevant to ML accelerator security:

The attack tools are not exotic. The equipment required to perform a serious power analysis attack on an embedded AI device costs less than $5,000. An oscilloscope, a current probe, and a laptop running open-source SCA software — ChipWhisperer is the most commonly used, and it's actively maintained — are sufficient to conduct research-quality attacks. The barrier to these attacks is expertise, not equipment cost. And the expertise is increasingly available: university courses in hardware security, research papers, and conference workshops at venues like CHES (Cryptographic Hardware and Embedded Systems) all teach the techniques.

Noise doesn't protect you. A common intuition is that the complexity of a modern ML accelerator — millions of operations happening in parallel, enormous switching noise from the power supply — would obscure the side-channel signal. This is only partly right. Statistical techniques developed in the power analysis literature are specifically designed to extract signals from noisy measurements. Correlation Power Analysis (CPA) and higher-order DPA methods have been shown to work even when the signal-to-noise ratio from individual measurements is poor, by accumulating evidence across many independent measurements. An attacker who can trigger many inference runs — trivially achievable by querying a deployed model — can overcome substantial amounts of noise.

The defenses are known but expensive. The hardware countermeasures that protect cryptographic chips — masking, shuffling, random dummy operations, physical shielding — are directly applicable to ML accelerators in principle. The cost is performance. Adding masking to a hardware AES implementation typically adds 30-100% area overhead and corresponding power consumption. Adding equivalent countermeasures to an ML accelerator would reduce the efficiency advantages that make custom silicon attractive in the first place. As I described in my analysis of what AI chip design actually requires, the AI hardware industry is built around power efficiency as the primary competitive differentiator — security gets designed out because there's no market pressure to put it in. Nobody is currently building ML accelerators with these countermeasures because nobody is currently requiring them.

The Deployment Contexts Where This Actually Matters

I want to be precise about the threat severity, because I have colleagues who work in AI who correctly point out that not every ML model warrants the same security posture.

High-risk deployment contexts where side-channel attack defense is not optional:

Edge AI in adversarially accessible environments. A medical AI system embedded in a device that could be physically acquired by an adversary. A defense-sector perception system. Industrial control AI in facilities where an adversary might have physical access during a maintenance window. Smart card equivalent: these devices face the same threat model as contactless payment cards, which have decades of SCA defense design.

Proprietary model IP with significant commercial value. A specialized model trained on proprietary datasets over years of effort, deployed on hardware that customers lease or purchase. The customer's hardware is physically accessible to competitors. The model weights are the core IP. Side-channel extraction is the equivalent of a firmware dump — and it's harder to detect.

Multi-tenant cloud inference where co-tenancy creates attack channels. This is underappreciated. GPU virtualization is not as robust as CPU virtualization from a side-channel perspective. Several published research results have demonstrated that co-resident GPU tenants can observe each other's memory access patterns through shared hardware resources. For inference workloads where the model is the IP, co-tenancy on shared GPU infrastructure is a threat that the current generation of cloud security posture management tools doesn't adequately address.

Lower-risk contexts:

Open-source models deployed on commodity hardware. If the weights are already public, there's nothing to extract. Public-facing API models where the architecture is already described in published papers. Models where the commercial value is in the training data pipeline and business process, not the model weights themselves.

What a Defensible Posture Looks Like

For organizations in the high-risk categories above, here's what a technically serious response to side-channel threats looks like:

For edge AI devices: Treat the hardware security requirement the same way you'd treat it for a hardware security module (HSM). That means working with vendors who have SCA characterization in their security certification process — look for IEC 62443 or Common Criteria certification that includes side-channel testing. It means physical shielding as part of the enclosure design. It means noise injection at the power supply level. This is engineering work, and it costs money. The question is whether the IP value warrants the investment.

For cloud inference with proprietary models: Evaluate your co-tenancy risk. Dedicated instances eliminate co-tenancy channels at the cost of instance efficiency. Confidential computing offerings — AMD SEV-SNP, Intel TDX — are designed partly to address this by encrypting memory and protecting against co-tenant observation. For a detailed treatment of how these hardware root-of-trust mechanisms work in production AI infrastructure, see my guide to hardware root-of-trust in cloud AI deployments. They don't fully address power-based channels, but they substantially reduce the attack surface for the most accessible attack vectors.

For all high-value deployments: Model extraction detection. If an attacker needs to perform many inference runs to accumulate sufficient side-channel data, API-level anomaly detection — unusually high inference volumes from a single client, queries that systematically probe edge cases, timing patterns inconsistent with normal use — can detect extraction attempts before they complete. This doesn't stop physical hardware access attacks, but it addresses the remote co-tenancy threat.

For hardware development teams: The hardware security research community has been developing SCA-resistant architectures for decades. The techniques — masked computations, time-constant operations, randomized execution order — are directly applicable to ML accelerator design. Teams building custom inference silicon should be engaging with this literature now, not waiting for a high-profile attack to create regulatory pressure.

Why This Matters Now

The AI hardware security landscape is roughly where cryptographic hardware security was in the late 1990s: the attacks are known, the defenses are understood in research contexts, and deployment practices haven't caught up. The difference is the pace of deployment. ML accelerators are being embedded in critical infrastructure, medical devices, defense systems, and edge computing infrastructure at a rate that far exceeds the 1990s rollout of smartcard payment systems — which is the previous comparable case of rapid deployment of high-value hardware with inadequate side-channel protection.

The window to establish hardware security baselines before high-profile attacks force reactive compliance requirements is narrow. The research community is producing attacks faster than the industry is adopting defenses. The hardware security work I'm doing and the broader academic community are generating a clearer picture of the threat landscape every year.

My recommendation to AI engineering teams: even if your current deployment doesn't require SCA defense, assign someone to track the threat. The attack capability is advancing, the equipment required is declining in cost, and the ML systems being deployed today will be in service for five to ten years in many enterprise contexts. The threat that's theoretical for your 2024-vintage edge AI deployment may be practical for a well-resourced adversary by 2027.

This is not a reason to not deploy ML systems. It is a reason to be thoughtful about which deployments warrant investment in hardware-level security — and to make that investment before an adversary demonstrates that you needed to.

The physics of computation are not going to change. The side channels are going to remain open. What can change is whether we build systems that treat this as a first-class engineering constraint rather than a threat category that someone else will deal with eventually.

For a deeper technical treatment of the power analysis attack class specifically — including differential power analysis (DPA) on neural network accelerators, the statistical methodology, and circuit-level defenses — see Differential Power Analysis on Neural Networks.

Side-Channel Attacks on ML Accelerators: The Hardware Security Threat AI Teams Are Ignoring

What Side-Channel Attacks Actually Are

The Threat Model for ML Accelerators

What I Saw Building Hardware Security Tools

The Deployment Contexts Where This Actually Matters

What a Defensible Posture Looks Like

Why This Matters Now

Related Posts

Differential Power Analysis on Neural Networks: Recovering Model Weights at the Physical Layer

Confidential Computing for AI Inference: Securing Model Inference in Untrusted Environments

Hardware Root-of-Trust in Cloud AI Infrastructure: Why Software Security Alone Isn't Enough