Netflix 近日宣布开源其下一代数据工作流引擎 Maestro,这一举措旨在为公司的数据科学家和业务线经理提供一个工作流即服务(Workflow-as-a-Service)的平台。Maestro 是一款基于 Apache 2.0 许可证发布的编排器,它支持数十万个工作流,并且在 Netflix 内部已经实现了单日多达 200 万个作业的输出。
Maestro 的工作原理基于一系列开源技术,包括 Git、Java 21、Gradle 和 Docker。它能够从 cURL 命令行调用,并且支持多种业务逻辑格式,包括 Docker 镜像、Jupyter 笔记本、bash 脚本、SQL、Python 等。Maestro 在后台管理工作流的整个生命周期,处理重试、排队需求并给计算引擎分配任务。它不仅支持有向无环图(DAG),还支持循环工作流和多个可重用模式,包括 for each 循环、子工作流和条件分支。
Netflix 工程师表示,Maestro 设计为高度可扩展和可缩放的,即使在流量高峰期间也能满足严格的服务级别目标(SLO)。Maestro 的诞生背景是 Netflix 之前使用的编排器 Meson 在负载下不堪重负,尤其是在高峰使用时间更是如此。因此,Netflix 设计了 Maestro,以应对未来的工作流规模和负载需求。
Netflix 已经发布了许多内部开发的开源工具,如系统压力测试工具 Chaos Monkey 和路由网关 Zuul。Maestro 的开源发布进一步展示了 Netflix 对开源社区的贡献和支持。
Maestro 的开源将为数据科学家和业务分析师提供一个强大的工具,帮助他们更好地理解用户行为和其他大规模数据驱动趋势。同时,它也将为开源社区提供一个可扩展和灵活的工作流编排解决方案。
英语如下:
News Title: Netflix Releases Open-Source Data Workflow Engine Maestro: The Digital Engine of the AI Era
Keywords: Open Source, Netflix, Maestro
News Content:
Netflix recently announced the open-source release of its next-generation data workflow engine, Maestro. This initiative aims to provide the company’s data scientists and business line managers with a Workflow-as-a-Service platform. Maestro is an orchestrator released under the Apache 2.0 license, supporting tens of thousands of workflows and achieving up to 2 million jobs a day in output within Netflix.
The operation of Maestro is based on a series of open-source technologies, including Git, Java 21, Gradle, and Docker. It can be invoked from a cURL command line and supports various formats of business logic, such as Docker images, Jupyter notebooks, bash scripts, SQL, Python, and more. Maestro manages the entire lifecycle of workflows in the background, handling retries, queuing requirements, and assigning tasks to compute engines. It supports not only directed acyclic graphs (DAGs) but also cyclic workflows and multiple reusable patterns, including for each loops, sub-workflows, and conditional branches.
Netflix engineers state that Maestro is designed to be highly scalable and extensible, capable of meeting stringent service level objectives (SLOs) even during peak traffic periods. The background for the creation of Maestro is that Netflix’s previous orchestrator, Meson, struggled under load, especially during peak usage times. Therefore, Maestro was designed to address the future scale and load requirements of workflows.
Netflix has already released numerous internally developed open-source tools, such as the system stress-testing tool Chaos Monkey and the routing gateway Zuul. The open-source release of Maestro further demonstrates Netflix’s contributions and support for the open-source community.
The open-source release of Maestro will provide data scientists and business analysts with a powerful tool to better understand user behavior and other large-scale data-driven trends. It will also offer the open-source community a scalable and flexible workflow orchestration solution.
【来源】https://mp.weixin.qq.com/s/jzM7NVNeXkvfFXsE9DEgJA
Views: 3