##优步推动 Kafka 分层存储,效率之争引爆行业
**2024年8月20日** – 近日,交通出行巨头优步宣布在 Apache Kafka 中引入分层存储功能,旨在解决大型 Kafka 集群的可扩展性和效率问题。这一举措引发了行业内关于分层存储优劣的热烈讨论。
优步在 Apache Kafka 3.6.0 版本中加入了分层存储功能,允许 Kafka 将数据存储扩展到远程存储系统,如 HDFS、Amazon S3 等。这一功能将 Kafka 的存储与计算资源分离,从而降低成本和运维复杂性。
优步表示,传统的 Kafka 集群扩容方式是增加代理节点,但这会导致不必要的内存和 CPU 消耗,存储成本效率低下。分层存储则将旧数据存储在外部存储中,有效降低了存储成本。
红帽公司对分层存储的优势进行了详细分析,认为它具有以下特点:
* **弹性:** 计算和存储资源可以独立扩展。
* **隔离性:** 延迟敏感数据可以通过本地层访问,历史数据则通过远程层访问,无需更改 Kafka 客户端。
***成本效益:** 远程对象存储系统通常比本地磁盘更便宜,降低了 Kafka 的存储成本。
AWS 也在 Amazon Managed Streaming for Apache Kafka (Amazon MSK) 中引入了分层存储功能,并强调了其带来的优势:
* **更快的代理恢复:** 数据自动从更快的 Amazon EBS 卷移动到更具成本效益的存储层,加速了代理故障恢复。
* **高效的负载平衡:** 减少了分区重新分配时需要移动的数据量,提高了负载平衡效率。
* **更快的扩缩:** 无需大量数据传输,可以更快速地扩缩 MSK 集群。
然而,并非所有人都对分层存储持乐观态度。WarpStream 的 Richard Artoul 认为,分层存储虽然可以降低成本,但也可能引入新的复杂性和潜在的故障模式。他指出,管理两个存储层会增加运维开销,并可能影响系统的可靠性。此外,从远程存储中获取数据可能会引入延迟,影响实时处理能力。
分层存储的成本节约可能被管理和访问远程存储系统的费用所抵消,特别是跨区域访问时。因此,企业在采用分层存储之前,需要仔细权衡其利弊,并根据自身需求选择合适的方案。
优步的这一举措无疑将推动 Kafka 分层存储的应用,但其最终效果仍需市场检验。未来,分层存储功能的完善和优化将成为行业关注的焦点。
英语如下:
##Uber Pushes for Kafka Tiered Storage, Sparking Efficiency Debate
**August20, 2024** – Ride-hailing giantUber has recently announced the introduction of tiered storage functionality within Apache Kafka, aiming to address scalability and efficiency challenges faced by large Kafka clusters. This move has sparked aheated discussion within the industry regarding the pros and cons of tiered storage.
Uber has integrated tiered storage into Apache Kafka 3.6.0, enabling Kafkato extend data storage to remote storage systems like HDFS and Amazon S3. This feature decouples Kafka’s storage from its compute resources, thereby reducing costs and operational complexity.
Uber claims that traditional Kafka cluster scaling methods, involvingthe addition of broker nodes, lead to unnecessary memory and CPU consumption, resulting in inefficient storage cost. Tiered storage, on the other hand, stores older data in external storage, effectively lowering storage costs.
Red Hat has provided a detailedanalysis of the advantages of tiered storage, highlighting its key features:
* **Elasticity:** Compute and storage resources can be scaled independently.
* **Isolation:** Latency-sensitive data can be accessed through the local tier, while historical data can be accessed through the remote tier without requiring changes to Kafka clients.
***Cost-effectiveness:** Remote object storage systems are typically cheaper than local disks, reducing Kafka’s storage costs.
AWS has also introduced tiered storage functionality in Amazon Managed Streaming for Apache Kafka (Amazon MSK), emphasizing its benefits:
* **Faster Broker Recovery:** Data is automatically moved from faster Amazon EBS volumesto more cost-effective storage tiers, accelerating broker failure recovery.
* **Efficient Load Balancing:** Reduces the amount of data that needs to be moved during partition reassignment, improving load balancing efficiency.
* **Faster Scaling:** Allows for faster scaling of MSK clusters without significant data transfers.
However, not everyoneis optimistic about tiered storage. Richard Artoul of WarpStream believes that while tiered storage can reduce costs, it may introduce new complexities and potential failure modes. He points out that managing two storage tiers increases operational overhead and could potentially impact system reliability. Additionally, retrieving data from remote storage may introduce latency, affecting real-timeprocessing capabilities.
The cost savings of tiered storage could be offset by the expenses associated with managing and accessing remote storage systems, especially when accessing across regions. Therefore, enterprises need to carefully weigh the pros and cons before adopting tiered storage and choose the appropriate solution based on their specific needs.
Uber’s move is undoubtedlygoing to drive the adoption of Kafka tiered storage, but its ultimate impact remains to be seen. In the future, the refinement and optimization of tiered storage functionality will be a key focus for the industry.
【来源】https://mp.weixin.qq.com/s/kcLcdhMXqSEorjqLr0dt7w
Views: 1