Microarchitectural Security of AWS Firecracker VMM: Conclusion, Acknowledgments, and References

cover
13 Jun 2024

Authors:

(1) Zane Weissman, Worcester Polytechnic Institute Worcester, MA, USA {zweissman@wpi.edu};

(2) Thomas Eisenbarth, University of Lübeck Lübeck, S-H, Germany {thomas.eisenbarth@uni-luebeck.de};

(3) Thore Tiemann, University of Lübeck Lübeck, S-H, Germany {t.tiemann@uni-luebeck.de};

(4) Berk Sunar, Worcester Polytechnic Institute Worcester, MA, USA {sunar@wpi.edu}.

6. CONCLUSIONS

Cloud technologies constantly shift to meet the needs of their customers. At the same time, CSPs aim for maximizing efficiency and profit, which incentivizes serverless CSPs to over-commit available compute resources. While this is reasonable from an economic perspective, the resulting system behavior can be disastrous in the context of microarchitectural attacks that exploit shared hardware resources. In the past few years, the microarchitectural threat landscape changed frequently and rapidly. There mitigations that work reasonably well to prevent many attacks, but they often lead to significant performance costs, which forces CSPs to find a tradeoff between economic value and security. Furthermore, some microarchitectural attacks simply are not hindered by existing mitigations. The CSP customers have little control over the microarchitectural defenses deployed and must trust their providers to keep up with the pace of microarchitectural attack and mitigation development. Defense-in-depth requires security at every level, from the microcode to VMM to container. Each system must be considered as a whole, as some protections at one system level may open a vulnerabilities at another.

We showed that default countermeasures as they are recommended for the Firecracker VMM are insufficient to meet its isolation goals. In fact, many of the tested attack vectors showed leakage while countermeasures where in place. We identified the Medusa cache indexing/block write variant as an attack vector that only works across VMs, i. e. with additional isolation mechanisms in place. Additionally, we showed that disabling SMT–an expensive mitigation technique recommended and performed by AWS–does not provide full protection against Medusa variants. The aforementioned Medusa variant, and Spectre-PHT are still capable of leaking information between cloud tenants even if SMT is disabled, as long as the attacker and target threads keep competing for hardware resources of the same physical CPU core. Unfortunately this is inevitably the case in high-density serverless environments. In the present, serverless CSPs must remain vigilant in keeping firmware up-to-date and employing all possible defenses against microarchitectural attacks. Users must not only trust their CSPs of choice to keep their systems up-to-date and properly configured, but also be aware that some microarchitectural vulnerabilities, particularly certain Spectre variants, are still able to cross containment boundaries. Furthermore, processor designs continue to evolve and speculative and out-of-order execution remain important factors in improving performance from generation to generation. So, it is unlikely that we have seen the last of new microarchitectural vulnerabilities, as the recent wave of newly discovered attacks [36, 47, 53] shows.

ACKNOWLEDGMENTS

This work was supported by the German Research Foundation (DFG) under Grants No. 439797619 and 456967092, by the German Federal Ministry for Education and Research (BMBF) under Grants SASVI and SILGENTAS, by the National Science Foundation (NSF) under Grant CNS-2026913, and in part by a grant from the Qatar National Research Fund.

REFERENCES

[1] Alexandru Agache, Marc Brooker, Alexandra Iordache, Anthony Liguori, Rolf Neugebauer, Phil Piwonka, and Diana-Maria Popa. 2020. Firecracker: Lightweight Virtualization for Serverless Applications. In NSDI. USENIX Association, 419– 434.

[2] Alejandro Cabrera Aldaya and Billy Bob Brumley. 2022. HyperDegrade: From GHz to MHz Effective CPU Frequencies. In USENIX Security Symposium. USENIX Association, 2801–2818.

[3] Alejandro Cabrera Aldaya, Billy Bob Brumley, Sohaib ul Hassan, Cesar Pereida García, and Nicola Tuveri. 2019. Port Contention for Fun and Profit. In IEEE Symposium on Security and Privacy. IEEE, 870–887.

[4] Amazon Web Services. 2023. AWS Fargate. https://docs.aws.amazon.com/eks/ latest/userguide/fargate.html accessed: Aug 17, 2023. [5] Amazon Web Services. 2023. AWS Lambda Features. https://aws.amazon.com/ lambda/features/ accessed: Aug 17, 2023.

[6] Amazon Web Services. 2023. Firecracker Design. https://github.com/firecrackermicrovm/firecracker/blob/9c51dc6852d68d0f6982a4017a63645fa75460c0/docs/ design.md.

[7] Amazon Web Services. 2023. The Firecracker Jailer. https://github.com/ firecracker-microvm/firecracker/blob/main/docs/jailer.md. accessed: August 14, 2023.

[8] Amazon Web Services. 2023. Production Host Setup Recommendations. https://github.com/firecracker-microvm/firecracker/blob/ 9ddeaf322a74c20cfb6b5af745112c95b7cecb75/docs/prod-host-setup.md. accessed: May 22, 2023.

[9] Abhiram Balasubramanian, Marek S. Baranowski, Anton Burtsev, Aurojit Panda, Zvonimir Rakamaric, and Leonid Ryzhyk. 2017. System Programming in Rust: Beyond Safety. In HotOS. ACM, 156–161.

[10] Enrico Barberis, Pietro Frigo, Marius Muench, Herbert Bos, and Cristiano Giuffrida. 2022. Branch History Injection: On the Effectiveness of Hardware Mitigations Against Cross-Privilege Spectre-v2 Attacks. In USENIX Security Symposium. USENIX Association, 971–988.

[11] Atri Bhattacharyya, Alexandra Sandulescu, Matthias Neugschwandtner, Alessandro Sorniotti, Babak Falsafi, Mathias Payer, and Anil Kurmus. 2019. SMoTherSpectre: Exploiting Speculative Execution through Port Contention. In CCS. ACM, 785–800.

[12] Jo Van Bulck, Daniel Moghimi, Michael Schwarz, Moritz Lipp, Marina Minkin, Daniel Genkin, Yuval Yarom, Berk Sunar, Daniel Gruss, and Frank Piessens. 2020. LVI: Hijacking Transient Execution through Microarchitectural Load Value Injection. In IEEE Symposium on Security and Privacy. IEEE, 54–72.

[13] Claudio Canella, Jo Van Bulck, Michael Schwarz, Moritz Lipp, Benjamin von Berg, Philipp Ortner, Frank Piessens, Dmitry Evtyushkin, and Daniel Gruss. 2019. A Systematic Evaluation of Transient Execution Attacks and Defenses. In USENIX Security Symposium. USENIX Association, 249–266.

[14] Claudio Canella, Daniel Genkin, Lukas Giner, Daniel Gruss, Moritz Lipp, Marina Minkin, Daniel Moghimi, Frank Piessens, Michael Schwarz, Berk Sunar, Jo Van Bulck, and Yuval Yarom. 2019. Fallout: Leaking Data on Meltdown-resistant CPUs. In CCS. ACM, 769–784.

[15] Claudio Canella, Jo Van Bulck, Michael Schwarz, Daniel Gruss, Catherine Easdon, and Saagar Jha. 2019. Transient Fail [Source Code]. https://github.com/IAIK/ transientfail

[16] Guoxing Chen, Sanchuan Chen, Yuan Xiao, Yinqian Zhang, Zhiqiang Lin, and Ten-Hwang Lai. 2019. SgxPectre: Stealing Intel Secrets from SGX Enclaves Via Speculative Execution. In EuroS&P. IEEE, 142–157.

[17] Marie Dolezelová, Milan Navrátil, Eva Major ˘ sinová, Peter Ondrejka, Douglas ˘ Silas, Martin Prpic, and Rüdiger Landmann. 2020. ˘ Red Hat Enterprise Linux 7 Resource Management Guide–Using cgroups to manage system resources on RHEL. Red Hat, Inc. https://access.redhat.com/documentation/enus/red_hat_enterprise_linux/7/pdf/resource_management_guide/red_hat_ enterprise_linux-7-resource_management_guide-en-us.pdf accessed: Aug 17, 2023.

[18] Jacob Fustos, Michael Garrett Bechtel, and Heechul Yun. 2020. SpectreRewind: Leaking Secrets to Past Instructions. In ASHES@CCS. ACM, 117–126.

[19] Daniel Gruss, Moritz Lipp, Michael Schwarz, Richard Fellner, Clémentine Maurice, and Stefan Mangard. 2017. KASLR is Dead: Long Live KASLR. In ESSoS (Lecture Notes in Computer Science, Vol. 10379). Springer, 161–176.

[20] Pawan Gupta. 2020. TAA - TSX Asynchronous Abort. The Linux Kernel Organization. https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/tsx_ async_abort.html accessed: Aug 17, 2023.

[21] Tyler Hicks. 2019. MDS - Microarchitectural Data Sampling. The Linux Kernel Organization. https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/ mds.html accessed: Aug 17, 2023.

[22] Jann Horn. 2018. Speculative execution, variant 4: speculative store bypass. https://bugs.chromium.org/p/project-zero/issues/detail?id=1528 accessed: Aug 17, 2023.

[23] Intel. 2018. Speculative Execution Side Channel Mitigations. https: //www.intel.com/content/dam/develop/external/us/en/documents/336996- speculative-execution-side-channel-mitigations.pdf. rev. 3.0 accessed: Mar 22, 2023.

[24] Intel. 2019. Intel Transactional Synchronization Extensions (Intel TSX) Asynchronous Abort. Technical Report. Intel Corp. https://www.intel.com/content/www/ us/en/developer/articles/technical/software-security-guidance/technicaldocumentation/intel-tsx-asynchronous-abort.html accessed: Aug 17, 2023.

[25] Intel. 2019. Microarchitectural Data Sampling. Technical Report. Intel Corp. https://www.intel.com/content/www/us/en/developer/articles/ technical/software-security-guidance/technical-documentation/intel-analysismicroarchitectural-data-sampling.html ver. 3.0, accessed: Aug 17, 2023.

[26] Intel. 2020. Vector Register Sampling. Technical Report. Intel Corp. https: //www.intel.com/content/www/us/en/developer/articles/technical/softwaresecurity-guidance/advisory-guidance/vector-register-sampling.html accessed: Aug 17, 2023.

[27] Brian Johannesmeyer, Jakob Koschel, Kaveh Razavi, Herbert Bos, and Cristiano Giuffrida. 2022. Kasper: Scanning for Generalized Transient Execution Gadgets in the Linux Kernel. In NDSS. The Internet Society.

[28] Vladimir Kiriansky and Carl A. Waldspurger. 2018. Speculative Buffer Overflows: Attacks and Defenses. CoRR abs/1807.03757 (2018).

[29] Avi Kivity, Yaniv Kamay, Dor Laor, Uri Lubin, and Anthony Liguori. 2007. kvm: the Linux Virtual Machine Monitor. In Linux Symposium, Vol. 1. kernel.org, 225–230.

[30] Paul Kocher, Jann Horn, Anders Fogh, Daniel Genkin, Daniel Gruss, Werner Haas, Mike Hamburg, Moritz Lipp, Stefan Mangard, Thomas Prescher, Michael Schwarz, and Yuval Yarom. 2019. Spectre Attacks: Exploiting Speculative Execution. In IEEE Symposium on Security and Privacy. IEEE, 1–19.

[31] Esmaeil Mohammadian Koruyeh, Khaled N. Khasawneh, Chengyu Song, and Nael B. Abu-Ghazaleh. 2018. Spectre Returns! Speculation Attacks using the Return Stack Buffer. In WOOT @ USENIX Security Symposium. USENIX Association.

[32] Moritz Lipp, Michael Schwarz, Daniel Gruss, Thomas Prescher, Werner Haas, Anders Fogh, Jann Horn, Stefan Mangard, Paul Kocher, Daniel Genkin, Yuval Yarom, and Mike Hamburg. 2018. Meltdown: Reading Kernel Memory from User Space. In USENIX Security Symposium. USENIX Association, 973–990.

[33] Giorgi Maisuradze and Christian Rossow. 2018. ret2spec: Speculative Execution Using Return Stack Buffers. In CCS. ACM, 2109–2122.

[34] Debora T. Marr, Frank Binns, David L. Hill, Glenn Hinton, David A. Koufaty, J. Alan Miller, and Michael Upton. 2002. Hyper-Threading Technology Architecture and Microarchitecture. Intel Technology Journal 6, 1 (2002), 4–15.

[35] Daniel Moghimi. 2020. Medusa Code Repository [Source Code]. https://github. com/flowyroll/medusa

[36] Daniel Moghimi. 2023. Downfall: Exploiting Speculative Data Gathering. In USENIX Security Symposium. USENIX Association, 7179–7193.

[37] Daniel Moghimi, Moritz Lipp, Berk Sunar, and Michael Schwarz. 2020. Medusa: Microarchitectural Data Leakage via Automated Attack Synthesis. In USENIX Security Symposium. USENIX Association, 1427–1444.

[38] Shravan Narayan, Craig Disselkoen, Daniel Moghimi, Sunjay Cauligi, Evan Johnson, Zhao Gang, Anjo Vahldiek-Oberwagner, Ravi Sahita, Hovav Shacham, Dean M. Tullsen, and Deian Stefan. 2021. Swivel: Hardening WebAssembly against Spectre. In USENIX Security Symposium. USENIX Association, 1433– 1450.

[39] Dag Arne Osvik, Adi Shamir, and Eran Tromer. 2006. Cache Attacks and Countermeasures: The Case of AES. In CT-RSA (Lecture Notes in Computer Science, Vol. 3860). Springer, 1–20.

[40] Antoon Purnal, Furkan Turan, and Ingrid Verbauwhede. 2021. Prime+Scope: Overcoming the Observer Effect for High-Precision Cache Contention Attacks. In CCS. ACM, 2906–2920.

[41] Qumranet Inc. 2006. KVM: Kernel-based Virtualization Driver, White Paper. Technical Report. Qumranet Inc. https://docs.huihoo.com/kvm/kvm-white-paper.pdf accessed: Aug 17, 2023.

[42] Hany Ragab, Enrico Barberis, Herbert Bos, and Cristiano Giuffrida. 2021. Rage Against the Machine Clear: A Systematic Analysis of Machine Clears and Their Implications for Transient Execution Attacks. In USENIX Security Symposium. USENIX Association, 1451–1468.

[43] Thomas Rokicki, Clémentine Maurice, Marina Botvinnik, and Yossi Oren. 2022. Port Contention Goes Portable: Port Contention Side Channels in Web Browsers. In AsiaCCS. ACM, 1182–1194.

[44] Thomas Rokicki, Clémentine Maurice, and Michael Schwarz. 2022. CPU Port Contention Without SMT. In ESORICS (3) (Lecture Notes in Computer Science, Vol. 13556). Springer, 209–228.

[45] David Schrammel, Samuel Weiser, Stefan Steinegger, Martin Schwarzl, Michael Schwarz, Stefan Mangard, and Daniel Gruss. 2020. Donky: Domain Keys - Efficient In-Process Isolation for RISC-V and x86. In USENIX Security Symposium. USENIX Association, 1677–1694.

[46] Michael Schwarz, Moritz Lipp, Daniel Moghimi, Jo Van Bulck, Julian Stecklina, Thomas Prescher, and Daniel Gruss. 2019. ZombieLoad: Cross-PrivilegeBoundary Data Sampling. In CCS. ACM, 753–768.

[47] Daniël Trujillo, Johannes Wikner, and Kaveh Razavi. 2023. Inception: Exposing New Attack Surfaces with Training in Transient Execution. In USENIX Security Symposium. USENIX Association, 7303–7320.

[48] Paul Turner. 2018. Retpoline: a software construct for preventing branch-targetinjection. https://support.google.com/faqs/answer/7625886. accessed: Mar 22, 2023.

[49] Anjo Vahldiek-Oberwagner, Eslam Elnikety, Nuno O. Duarte, Michael Sammler, Peter Druschel, and Deepak Garg. 2019. ERIM: Secure, Efficient In-process Isolation with Protection Keys (MPK). In USENIX Security Symposium. USENIX Association, 1221–1238.

[50] Stephan van Schaik, Alyssa Milburn, Sebastian Österlund, Pietro Frigo, Giorgi Maisuradze, Kaveh Razavi, Herbert Bos, and Cristiano Giuffrida. 2019. RIDL: Rogue In-Flight Data Load. In IEEE Symposium on Security and Privacy. IEEE, 88–105.

[51] Stephan van Schaik, Alyssa Millburn, genBTC, Paul Menzel, jun1x, Stephen Kitt, pit fr, Sebastian Österlund, and Cristiano Giuffrida. 2020. RIDL [Source Code]. https://github.com/vusec/ridl

[52] Johannes Wikner and Kaveh Razavi. 2022. RETBLEED: Arbitrary Speculative Code Execution with Return Instructions. In USENIX Security Symposium. USENIX Association, 3825–3842.

[53] Johannes Wikner, Daniël Trujillo, and Kaveh Razav. 2023. Phantom: Exploiting Decoder-detectable Mispredictions. In MICRO (to appear). IEEE.

[54] Yuval Yarom and Katrina Falkner. 2014. FLUSH+RELOAD: A High Resolution, Low Noise, L3 Cache Side-Channel Attack. In USENIX Security Symposium. USENIX Association, 719–732.

[55] Ethan G. Young, Pengfei Zhu, Tyler Caraza-Harter, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2019. The True Cost of Containing: A gVisor Case Study. In HotCloud. USENIX Association.

This paper is available on arxiv under CC BY-NC-ND 4.0 DEED license.