Microarchitectural Security of AWS Firecracker VMM: Abstract & Intro

cover
13 Jun 2024

Authors:

(1) Zane Weissman, Worcester Polytechnic Institute Worcester, MA, USA {zweissman@wpi.edu};

(2) Thomas Eisenbarth, University of Lübeck Lübeck, S-H, Germany {thomas.eisenbarth@uni-luebeck.de};

(3) Thore Tiemann, University of Lübeck Lübeck, S-H, Germany {t.tiemann@uni-luebeck.de};

(4) Berk Sunar, Worcester Polytechnic Institute Worcester, MA, USA {sunar@wpi.edu}.

ABSTRACT

Firecracker is a virtual machine manager (VMM) purpose-built by Amazon Web Services (AWS) for serverless cloud platforms— services that run code for end users on a per-task basis, automatically managing server infrastructure. Firecracker provides fast and lightweight VMs and promises a combination of the speed of containers, typically used to isolate small tasks, and the security of VMs, which tend to provide greater isolation at the cost of performance. This combination of security and efficiency, AWS claims, makes it not only possible but safe to run thousands of user tasks from different users on the same hardware, with the host system rapidly and frequently switching between active tasks. Though AWS states that microarchitectural attacks are included in their threat model, this class of attacks directly relies on shared hardware, just as the scalability of serverless computing relies on sharing hardware between unprecedented numbers of users.

In this work, we investigate just how secure Firecracker is against microarchitectural attacks. First, we review Firecracker’s stated isolation model and recommended best practices for deployment, identify potential threat models for serverless platforms, and analyze potential weak points. Then, we use microarchitectural attack proof-of-concepts to test the isolation provided by Firecracker and find that it offers little protection against Spectre or MDS attacks. We discover two particularly concerning cases: 1) a Medusa variant that threatens Firecracker VMs but not processes running outside them, and is not mitigated by defenses recommended by AWS, and 2) a Spectre-PHT variant that remains exploitable even if recommended countermeasures are in place and SMT is disabled in the system. In summary, we show that AWS overstates the security inherent to the Firecracker VMM and provides incomplete guidance for properly securing cloud systems that use Firecracker.

CCS CONCEPTS

• Security and privacy → Virtualization and security; Sidechannel analysis and countermeasures.

KEYWORDS

system security, microarchitectural security, virtual machines, hypervisor, serverless, cloud systems

1. INTRODUCTION

Serverless computing is an emerging trend in cloud computing where cloud service providers (CSPs) serve runtime environments to their customers. This way, customers can focus on maintaining their function code while leaving the administrative work related to hardware, operating system (OS), and sometimes runtime to the CSPs. Common serverless platform models include function-as-aservice (FaaS) and container-as-a-service (CaaS). Since individual functions are typically small, but customers’ applications can each be running anywhere from one to thousands of functions, CSPs aim for fitting as many functions on a single server as possible to minimize idle times and, in turn, maximize profit. A rather light-weight approach to serving runtime environments is to run containers, which encapsulate a process with its dependencies so that only the necessary files for each process are loaded in virtual filesystems top of a shared kernel. This reduces a switch between containers to little more than a context switch between processes. On the other hand, full virtualization provides good isolation between virtual machines (VMs) and therefore security between tenants, while being rather heavy-weight as each VM comes with its own kernel.

Neither of these approaches, container or VM, is ideal for use in serverless environments, where ideally many short-lived functions owned by many users will run simultaneously and switch often, so new mechanisms of isolation have been developed for this use case. For example, mechanisms for in-process isolation [38, 45, 49] set out to improve the security of containers by reducing the attack surface of the runtime and underlying kernel. Protecting the kernel is important, as a compromised kernel directly leads to a fully compromised system in the container case. However, certain powerful protections, like limiting syscalls, also limit the functionality that is available to the container and even break compatibility with some applications. In VM research, developers created ever smaller and more efficient VMs, eventually leading to so-called microVMs. MicroVMs provide the same isolation guarantees as usual virtual machines, but are very limited in their capabilities when it comes to device or OS support, which makes them more light-weight compared to usual VMs and therefore better suited for serverless computing.

Firecracker [1] is a virtual machine manager (VMM) designed to run microVMs while providing memory overhead and start times comparable to those of common container systems. Firecracker is actively developed by Amazon Web Services (AWS) and has been used in production for AWS Lambda [5] and AWS Fargate [4] serverless compute services since 2018 [1]. AWS’s design paper [1] describes the features of Firecracker, how it diverges from more traditional virtual machines, and the intended isolation model that it provides: safety for “multiple functions run[ning] on the same hardware, protected against privilege escalation, information disclosure, covert channels, and other risks” [1]. Furthermore, AWS provides production host setup recommendations [8] for securing parts of the CPU and kernel that a Firecracker VM interacts with. In this paper, we challenge the claim that Firecracker protects functions from covert and side-channels across microVMs. We show that Firecracker itself does not add to the microarchitectural attack countermeasures but fully relies on the host and guest Linux kernels and CPU firmware/microcode updates.

Microarchitectural attacks like the various Spectre [10, 13, 22, 30, 31, 33, 52] and microarchitectural data sampling (MDS) [14, 37, 46, 50] variants pose a threat to multi-tenant systems as they are often able to bypass both software and architectural isolation boundaries, including those of VMs. Spectre and MDS threaten tenants that share CPU core resources like the branch prediction unit (BPU) or the line-fill buffer (LFB). CSPs providing more traditional services can mitigate the problem of shared hardware resources by pinning the long-lived VMs tenants to separate CPU cores, which effectively partitions the resources between the tenants and ensures that the microarchitectural state is only effected by a single tenant at a time.

In serverless environments, however, the threat of microarchitectural attacks is greater. The reason for this is the short-livedness of the functions that are run by the different tenants. Server resources in serverless environments are expected to be over-committed, which leads to tenant functions competing for compute resources on the same hardware. Disabling simultaneous multi-threading (SMT), which would disable the concurrent use of CPU resources by two sibling threads, reduces the compute power of a CPU by up to 30% [34]. If customers rent specific CPU cores, this performance penalty may be acceptable, or both threads on a CPU core might be rented together. But for serverless services, the performance penalty directly translates to 30% fewer customers that can be served in a given amount of time. This is why it has to be assumed that most serverless CSPs keep SMT enabled in their systems unless they state otherwise. The microarchitectural attack surface is largest if SMT is enabled and the malicious thread has concurrent access to a shared core. But there are also attack variants that perform just as well if the attacker thread prepares the microarchitecture before it yields the CPU core to the victim thread or executes right after the victim thread has paused execution. And even if SMT is disabled by the CSP (as is the case for AWS Lambda), tenants still share CPUs with multiple others in this time-sliced fashion.

AWS claims that Firecracker running on a system with up-todate microarchitectural defenses will provide sufficient hardening against microarchitectural attacks [1]. The Firecracker documentation also contains specific recommendations for microarchitectural security measures that should be enabled. In this work, we examine Firecracker’s security claims and recommendations and reveal oversights in its guidance as well as wholly unmitigated threats.

In summary, our main contributions are:

• We provide a comprehensive security analysis of the crosstenant and tenant-hypervisor isolation of serverless compute when based on Firecracker VM.

• We test Firecracker’s defense capabilities against microarchitectural attack proof-of-concepts (PoCs), employing protections available through microcode updates and the Linux kernel. We show that the virtual machine itself provides negligible protection against major classes of microarchitectural attacks.

• We identify a variant of the Medusa MDS attack that becomes exploitable from within Firecracker VMs even though it is not present on the host. The kernel mitigation that protects against this exploit, and most known Medusa variants, is not mentioned by AWS’s Firecracker host setup recommendations. Additionally, we show that disabling SMT provides insufficient protection against the identified Medusa variant which urges the need of the kernel mitigation.

• We identify Spectre-PHT and Spectre-BTB variants which leak data even with recommended countermeasures in place. The Spectre-PHT variants even remain a problem when SMT is disabled if the attacker and victim share a CPU core in a time-sliced fashion.

1.1 Responsible Disclosure

We informed the AWS security team about our findings and discussed technical details. The AWS security team claims that the AWS services are not affected by our findings due to additional security measurements. AWS agreed that Firecracker does not provide micro-architectural security on its own but only in combination with microcode updates and secure host and guest operating systems and plans to update its host setup recommendations for Firecracker installations.

This paper is available on arxiv under CC BY-NC-ND 4.0 DEED license.