news studionews studio

By Alex Merced, Senior Technical Evangelist at Dremio

The rapid evolution of the digitallandscape has ushered in a new era of data management and utilization. Open standards and technologies, such as Apache Iceberg and open lakehouse catalogs, are empowering businesseswith unprecedented flexibility and control, breaking free from the shackles of traditional proprietary systems. This article delves into how these innovative technologies are driving the evolution of data architecture,enabling businesses to stay ahead in the fiercely competitive market.

The Rise of Open Standards

Open standards are rapidly becoming the foundation for scalable business value, driving innovation, momentum, and action. With the recent incubation of Apache Polaris, an open-source lakehouse catalog implementation for tracking Apache Iceberg tables, we are moving towards a world where data and its governance are truly portable, says Alex Merced, Senior Technical Evangelist at Dremio. This means you can usea variety of data tools without duplicating data or compromising governance.

For years, enterprises have relied on proprietary data warehouses like Teradata and Oracle. While these data warehouses were powerful, they led to expensive vendor lock-in, hindering innovation and flexibility. Moving data or integrating different technologies was not only cumbersome but also costly.

Apache Iceberg: The Disruptor

The rise of data lakes offered a new way to store data – storing raw data on cheap storage media. However, data lakes struggled to match the performance and management capabilities of traditional data warehouses. Apache Iceberg, an open table format, enables table-like capabilities similar to datawarehouses, offering the same ACID (Atomicity, Consistency, Isolation, Durability) guarantees. This combination of data warehouse performance with the flexibility and low cost of data lakes, dubbed lakehouse, has made it a game-changer.

Apache Iceberg stands out with its unique ability to provide features like time travel and schema evolution– once exclusive to expensive proprietary data warehouses – without locking businesses into a single vendor ecosystem. As enterprises increasingly recognize the importance of independent control over their data, Iceberg’s open-source nature means you can integrate it into your existing data infrastructure without being confined to a specific technology stack. This embodies an embrace of freedomand flexibility.

The Emergence of Lakehouse Catalogs

Iceberg is just one part of the lakehouse architecture, alongside the storage layer (i.e., the data lake) and the lakehouse catalog (a tool for tracking tables so other tools can discover Iceberg tables). Traditional metadata catalogs or enterprise datacatalogs (like Collibra or Alation) help provide context for humans to understand available data. Lakehouse catalogs are different. They act as a directory for table metadata, enabling tools to discover and use these tables. Essentially, one catalog is for human discovery, and the other is for system discovery.

In fact,catalogs are more than just listing tables for your favorite tools. They are evolving into universal governance hubs where you can set access rules that any tool must follow when accessing your tables. This is incredibly valuable because setting access permissions individually for each tool in the past led to inconsistent governance.

When catalogs become the core of table governance, itis crucial to build them on open standards to avoid vendor lock-in at the catalog level. With more companies adopting Apache Iceberg and open lakehouse catalogs like Apache Polaris (in incubation) and Nessie, the focus is shifting towards strengthening these open standards to support the needs of various specialized computing engines. The goal isclear: to create an ecosystem that maximizes flexibility and minimizes vendor lock-in.

The Future is Open

For businesses, this means investing in open technologies to meet current needs and support future growth and adaptation. It’s not just about keeping pace with the competition; it’s about setting the stage for thenext wave of data innovation.

As we further step into the age of artificial intelligence, the importance of open data architecture will only grow. AI and machine learning algorithms rely on data. Simply put, the more data they have, and the richer the variety, the better they perform. To provide the data needed for AI andML projects, you need a flexible, open data architecture that can efficiently deliver data.

Lakehouse with table formats like Apache Iceberg and open catalogs like Apache Polaris and Nessie are opening the door to this world. The future of data is open. As businesses continue to recognize the limitations of proprietary systems, they willturn to solutions like Apache Iceberg and open lakehouse catalogs to gain the control and flexibility they need. The days of being locked into a single vendor ecosystem are numbered.

Moving to open standards is not just a trend; it’s a necessity for any business that wants to thrive in the digital age. The choice is clear:adapt or fall behind.


>>> Read more <<<

Views: 0

发表回复

您的电子邮箱地址不会被公开。 必填项已用 * 标注