Authors:
(1) Zane Weissman, Worcester Polytechnic Institute Worcester, MA, USA {zweissman@wpi.edu};
(2) Thomas Eisenbarth, University of Lübeck Lübeck, S-H, Germany {thomas.eisenbarth@uni-luebeck.de};
(3) Thore Tiemann, University of Lübeck Lübeck, S-H, Germany {t.tiemann@uni-luebeck.de};
(4) Berk Sunar, Worcester Polytechnic Institute Worcester, MA, USA {sunar@wpi.edu}.
Table of Links
- Abstract and Introduction
- Background
- Threat Models
- Analysis of Firecracker's Containment Systems
- Analysis of Microarchitectural Attack and Defenses in Firecracker Microvms
- Conclusion, Acknowledgments, and References
5. ANALYSIS OF MICROARCHITECTURAL ATTACKS AND DEFENSES IN FIRECRACKER MICROVMS
In this section we present our analysis of a number of microarchitectural side-channel and speculative attack PoCs on Firecracker microVMs. We test these PoCs on bare metal and in Firecracker instances, and test relevant microcode defenses in the various scenarios. We run our tests on a server with an Intel Skylake 4114 CPU
which has virtualization hardware extensions and SMT enabled. The CPU runs on microcode version 0x2006b06[2]. The host OS is Ubuntu 20.04 with a Linux 5.10 kernel. We used Firecracker v1.0.0 and v1.4.0, the latest version as of July 2023, to run an Ubuntu 18.04 guest with Linux kernel 5.4 which is provided by Amazon when following the quick-start guide[3].
In summary, the recommended production host setup provided with AWS Firecracker is insufficient when it comes to protecting tenants from malicious neighbors. Firecracker therefore fails in providing its claimed isolation guarantees. This is because
(1) we identify a Medusa variant that only becomes exploitable when it is run across microVMs. In addition, the recommended countermeasures do not contain the necessary steps to mitigate the side-channel, or most other Medusa variants.
(2) we show that tenants are not properly protected from information leaks induced through Spectre-PHT or Spectre-BTB when applying the recommended countermeasures. The Spectre-PHT variants remain a problem even when disabling SMT.
(3) we observed no differences in PoC performance between Firecracker v1.0.0 and v1.4.0.
We conclude that the virtualization layer provided by Firecracker has little effect on microarchitectural attacks, and Firecracker’s system security recommendations are insufficient.
5.1 Medusa
We evaluated Moghimi’s PoCs [35] for the Medusa [37] side-channels (classified by Intel as MLPDS variants of MDS [25]) on the bare metal of our test system and in Firecracker VMs. There is one leaking PoC for each of the three known variants described in section 2.4.2. We used two victim programs from the PoC library:
• The “Block Write” program writes a large amount of consecutive data in a loop (so that the processor will identify repeated stores and combine them).
• The “REP MOV” program performs a similar operation, but with the REP MOV instruction instead of many instructions moving smaller blocks of data in a loop.
5.1.1 Results. Table 1 shows the cases in which data is successfully leaked with all microarcitectural protections in the kernel disabled. The left two columns show the possible combinations of the three Medusa PoCs and the two included victim programs. The right columns indicate which configurations work on bare metal and with the secret and leaking program running in parallel Firecracker instances. Most notably, with the Cache Indexing variant, the Block Write secret only works with Firecracker. This is likely because of the memory address virtualization that the virtual machine provides: the guest only sees virtual memory regions mapped by KVM, and KVM traps memory access instructions and handles the transactions on behalf of the guest.
We tested the effectiveness of mds and nosmt defenses against each combination of attacker and victim PoC on bare metal and in Firecracker VMs. Table 2 lists the protections necessary to prevent Medusa attacks in bare metal and Firecracker scenarios. Across the four vulnerabilities in Firecracker, only one is mitigated by nosmt alone, and AWS does not explicitly recommend enabling the mds protection, though most Linux distributions ship with it enabled by default. That is to say, a multi-tenant cloud platform could be using Firecracker in a way that is compliant with AWS’s recommended security measures and still be vulnerable to the majority of Medusa variants, including one where the Firecracker VMM itself leaks the user’s data that would not otherwise be leaked.
5.2 RIDL and More
In this section, we present an evaluation of the RIDL PoC programs [51] provided alongside van Schaik et al.’s 2019 paper [50]. RIDL is a class of MDS attacks that exploits speculative loads from buffers inside the CPU (not from cache or memory). The RIDL PoC repository also includes examples of attacks released in later addenda to the paper as well as one variant of the Fallout MDS attack.
5.2.1 Results. Table 3 shows some basic information about the RIDL PoCs that we tested and the efficacy of relevant countermeasures at preventing the attacks. We compared attacks on bare metal and in Firecracker to evaluate Amazon’s claims of the heightened hardware security of the Firecracker microVM system. For tests on the Firecracker system, we distinguish between countermeasure flags enabled on the host system (H) and the Firecracker guest kernel (VM). Besides the nosmt and mds kernel flags, we tested other relevant flags (cf. section 2.4.4, [21]), including kaslr, pti, and l1tf, but did not find that they had an affect on any of these programs. We excluded the tsx_async_abort mitigation since the CPU we tested on includes mds mitigation which makes the tsx_async_abort kernel flag redundant [20].
In general, we found that the mds protection does not adequately protect against the majority of RIDL attacks. However, disabling SMT does mitigate the majority of these exploits. This is consistent with Intel’s [25] and the Linux developers’ [21] statements that SMT must be disabled to prevent MDS attacks across hyperthreads. The two outliers among these PoCs are alignment_write, which requires both nosmt and mds on the host, and pgtable_leak_notsx, which is mitigated only by mds countermeasures. The leak relying on alignment_write uses an alignment fault rather than a page table fault leak to trigger speculation [50]. The RIDL paper [50] and Intel’s documentation of the related VRS exploit [26] are unclear about what exactly differentiates this attack from the page-faultbased MFBDS attacks found in other PoCs, but our experimental findings indicate that the microarchitectural mechanism of the leakage is distinct. There is a simple and reasonable explanation for the behavior of pgtable_leak_notsx, which is unique among these PoCs for one key reason: it is the only exploit that crosses security boundaries (leaking page table values from the kernel) within a single thread rather than leaking from another thread. It is self-evident that disabling multi-threading would have little effect on this single-threaded exploit. However, the mds countermeasure flushes microarchitectural buffers before switching from kernelprivilege execution to user-privilege execution within the same thread, wiping the page table data accessed by kernel code from the LFB before the attacking user code can leak it.
In contrast to Medusa, most of these PoCs are mitigated byAWS’s recommendation of disabling smt. However, as with Medusa, the Firecracker VMM itself provides no microarchitectural protection against these attacks.
5.3 Spectre
Next we focus on Spectre vulnerabilities. While there have been many countermeasures developed since Spectre attacks were first discovered, many of them either come with a (significant) performance penalty or only partially mitigate the attack. Therefore,
system operators often have to decide for a performance vs. security trade-off. In this section we evaluate the Spectre attack surface available to Firecracker tenants in both threat models described earlier. To evaluate the wide range of Spectre attacks, we rely on the PoCs provided in [15]. For Spectre-PHT, Spectre-BTB, and SpectreRSB, the repository contains four PoCs each. They differ in the way the attacker mistrains the BPU. The four possibilities are (1) same-process–the attacker has control over the victim process or its inputs to mistrain the BPU–(2) cross-process–the attacker runs its own code in a separate process to influence the branch predictions of the victim process–(a) in-place–the attacker mistrains the the BPU with branch instruction that reside at the same virtual address as the target branch that the attacker wants to be misspredicted in the victim process–(b) out-of-place–the attacker mistrains the BPU with branch instructions that reside at addresses that are congruent to the target branches in the victim process.
(1) same-process: the attacker has control over the victim process or its inputs to mistrain the BPU,
(2) cross-process: the attacker runs its own code in a separate process to influence the branch predictions of the victim process,
(3) in-place: the attacker mistrains the the BPU with branch instruction that reside at the same virtual address as the target branch that the attacker wants to be misspredicted in the victim process
(4) out-of-place: the attacker mistrains the BPU with branch instructions that reside at addresses that are congruent to the target branches in the victim process.
The first two and latter two situations are orthogonal, so each PoC combines two of them. For Spectre-STL, only same-process variants are known, which is why the repository only provides two PoCs in this case. For cross-VM experiments, disabled address space layout randomization for the host and guest kernels as well as for the host and guest user level to ease finding congruent addresses that are used for mistraining.
5.3.1 Results. With AWS recommended countermeasures [8] (the default for the Linux kernels in use, cf. Figure 5) enabled on the host system and inside Firecracker VMs, we see that Spectre-RSB is successfully mitigated both on the host and inside and across VMs (cf. Table 4). On the other hand, Spectre-STL, Spectre-BTB, and Spectre-PHT allowed information leakage in particular situations.
The PoCs for Spectre-STL show leakage. However, the leakage only occurs within the same process and the same privilege level. Since no cross-process variants are known, we didn’t test the crossVM scenario for Spectre-STL. In our user-to-user threat model, Spectre-STL is not a possible attack vector, as no cross-process variants are known. If two tenant workloads would be isolated by in-process isolation within the same VM, Spectre-STL could still be a viable attack vector. In the user-to-host model, Spectre-STL is mitigated by countermeasures that are included in current Linux kernels and enabled by default.
For Spectre-PHT, the kernel countermeasures include the sanitization of user-pointers and the utilization of barriers (lfence) on privilege level switches. We therefore conclude that SpectrePHT poses little to no threat to the host system. However, these
mitigations do not protect two hyperthreads from each other if they execute on the same physical core in parallel. This is why all four Spectre-PHT mistraining variants are fully functional on the host system as well as inside Firecracker VMs. As can be seen in Table 4, this remains true even if SMT is disabled[4]. In fact, pinning both VMs to the same physical thread enables the cross-process out-of-place version of Spectre-PHT whereas we did not observe leake in the SMT case. This makes Spectre-PHT a significant threat for user-to-user.
Spectre-BTB PoCs are partially functional when AWS recommended countermeasures are enabled. The original variant that mistrains the BTB in the same process and at the same address is fully functional while same-process out-of-place mistraining is
successfully mitigated. Also, all attempts to leak information across process boundaries via out-of-place mistraining did not show any leakage. With cross-process in-place mistraining, however, we observed leakage. On the host system, the leakage occurred independent of SMT. Inside a VM, the leakage only occurred if all virtual CPU cores were assigned to a separate physical thread. Across VMs, disabling SMT removed the leakage.
Besides the countermeasures listed in Figure 5, the host kernel has Spectre countermeasures compiled into the VM entry and exit point[5] to fully disable malicious guests from attacking the host kernel while the kernel handles a VM exit.
In summary, we can say that the Linux default countermeasures– which are recommended by AWS Firecracker–only partially mitigate Spectre. Precisely, we show:
• Spectre-PHT and Spectre-BTB can still leak information between tenants in the guest-to-guest scenario with the AWS recommended countermeasures–which includes disabling SMT–in place.
• The host kernel is likely sufficiently protected by the additional precautions that are compiled into the Linux kernel to shield VM entries and exits. This, however, is orthogonal to security measures provided by Firecracker.
All leakage observed was independent of the Firecracker version in use.
5.3.2 Evaluation. We find that Firecracker does not add to the mitigations against Spectre but solely relies on general protection recommendations, which include mitigations provided by the host and guest kernels and optional microcode updates. Even worse, the recommended countermeasures insufficiently protect serverless applications from leaking information to other tenants. We therefore claim that Firecracker does not achieve its isolation goal on a microarchitectural level, even though microarchitectural attacks are considered in-scope of the Firecracker threat model.
The alert reader might wonder why Spectre-BTB remains an issue with the STIBP countermeasure in place (cf. Figure 5) as this microcode patch was designed to stop the branch prediction from using prediction information that originates from another thread. This also puzzled us for a while until recently Google published a security advisory[6] that identifies a flaw in Linux 6.2 that kept disabling the STIBP mitigation when IBRS is enabled. We verified that the code section that was identified as being responsible for the issue is also present in the Linux 5.10 source code. Our assumption therefore is that the same problem identified by Google also occurs on our system.
This paper is available on arxiv under CC BY-NC-ND 4.0 DEED license.
[2] Updating the microcode to a newer version would disable TSX on our system which would make tests with TSX-based MDS variants impossible.
[3] https://github.com/firecracker-microvm/firecracker/blob/ dbd9a84b11a63b5e5bf201e244fe83f0bc76792a/docs/getting-started.md
[4] This is simulated by pinning attacker and victim process to the same core ((1PT))
[5] https://elixir.bootlin.com/linux/v5.10/source/arch/x86/kvm/vmx/vmenter.S#L191
[6] https://github.com/google/security-research/security/advisories/GHSA-mj4w6495-6crx