Date: September 4, 2024
By: Rafal Gancarz,Translated by: Liu Yameng, Edited by: Ding Xiaoyun
In a significant strategic move, Figma has successfully migrated its computing platform from Amazon Web Services (AWS) Elastic Container Service (ECS) to Kubernetes (EKS) within a span of less than 12 months, with minimal impact on its customers. The company’s decision to adopt Kubernetes for running its containerized workloads was primarily driven by the desire to benefit from the extensive ecosystem supported by the Cloud Native Computing Foundation (CNCF), as well as to achieve cost savings, enhance developer experience, and increase resilience.
Background and Challenges
At the beginning of 2023, Figma transitioned to running its application services within containers and adopted ECS as its container orchestration platform. This switch allowed the company to quickly deploy containerized workloads. However, engineers soon encountered limitations with ECS, including the lack of support for StatefulSets, Helm charts, and the inability to easily run open-source software (OSS) like Temporal. Additionally, Figma realized it was missing out on the broad array of features provided by the CNCF community for Kubernetes, such as advanced autoscaling capabilities with Keda or Karpenter, service mesh with Istio/Envoy, and numerous other tools and functionalities.
The Decision to Migrate
The organization also considered the substantial engineering effort required to customize ECS to meet its needs and the availability of experienced Kubernetes engineers in the job market. These factors collectively led to the decision to switch to Kubernetes (EKS).
Migration Strategy
After deciding to transition to Kubernetes, the team agreed on the scope of the migration, focusing on minimizing changes required for the services to avoid delays and risks. Despite limiting the project’s scope, Figma aimed to cover specific improvements, such as simplifying resource definitions to enhance the developer experience and increasing reliability by deploying services across three Kubernetes clusters to mitigate the impact of defects and operational errors.
Ian VonSeggern, Figma’s software engineering manager, discussed the cost optimization goals of the migration project:
In the migration process, we didn’t want to deal with too much complex cost-benefit work, but there was one exception: we decided to support node auto-scaling from the beginning. For ECS services on EC2, we were just over-provisioning our services, so we had enough machines to handle surges during deployment. But this setup was expensive, so we decided to add this additional cost optimization to the migration, as we could save a significant amount of money with relatively little effort. We used the open-source CNCF project Karpenter to dynamically scale nodes up and down based on demand.
Execution and Success
To ensure the project’s success, Figma assembled a well-staffed team to drive the migration and engaged with a broader organization for support. Engineers conducted load testing on Kubernetes settings to avoid surprises, used weighted DNS entries to implement an incremental switching mechanism, and deployed services to a temporary Kubernetes cluster early in the process to address any issues, preparing for production deployment.
The computing platform team collaborated with service owners to provide a golden path, ensuring consistency and maintainability. The initial migration took less than 12 months, and after migrating core services, the team began considering subsequent activities, such as introducing Keda-based autoscaling capabilities.
Post-Migration Improvements
Following user feedback, engineers simplified developer tools, enabling the use of three Kubernetes clusters and new fine-grained RBAC roles. This migration has not only improved the company’s operational efficiency but also positioned Figma to leverage the latest innovations in the CNCF ecosystem.
Conclusion
Figma’s migration from ECS to Kubernetes is a testament to the company’s commitment to staying at the forefront of technology and innovation. By embracing Kubernetes, Figma has not only reduced costs but also enhanced its ability to deliver a seamless and efficient experience for its users and developers. As the company continues to evolve, its adoption of Kubernetes and the broader CNCF ecosystem will undoubtedly play a crucial role in driving its success.
About the Author: Rafal Gancarz is an experienced technology leader and expert. He is currently helping Starbucks build a scalable, resilient, and cost-effective business platform. Previously, Rafal has designed and built large-scale, distributed, and cloud-based systems for companies like Cisco, Accenture, Kedba, ICE, and Callsign. His interests span architecture and design, continuous delivery, observability and operability, and the socio-technical and organizational aspects of software delivery.
Original Article – This article is translated and published by InfoQ and any unauthorized reproduction is prohibited.
Views: 0