Customize Consent Preferences

We use cookies to help you navigate efficiently and perform certain functions. You will find detailed information about all cookies under each consent category below.

The cookies that are categorized as "Necessary" are stored on your browser as they are essential for enabling the basic functionalities of the site. ... 

Always Active

Necessary cookies are required to enable the basic features of this site, such as providing secure log-in or adjusting your consent preferences. These cookies do not store any personally identifiable data.

No cookies to display.

Functional cookies help perform certain functionalities like sharing the content of the website on social media platforms, collecting feedback, and other third-party features.

No cookies to display.

Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics such as the number of visitors, bounce rate, traffic source, etc.

No cookies to display.

Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience for the visitors.

No cookies to display.

Advertisement cookies are used to provide visitors with customized advertisements based on the pages you visited previously and to analyze the effectiveness of the ad campaigns.

No cookies to display.

0

Uber Optimizes SQL-Based Data Analysis with Presto and Fast Query Identification

By [Your Name], Staff Writer

Uber, a global transportation giant,relies heavily on data analysis for operational efficiency and strategic decision-making. Its engineers have significantly improved the speed of SQL-based data analysis by leveraging the open-source distributed SQL query engine, Presto, and implementing a sophisticated fast query identification system. This system prioritizes queries expected to complete within two minutes, acategory comprising roughly half of Uber’s total query volume. This article details Uber’s approach, highlighting the challenges overcome and the resulting performance gains.

Presto allows Uber to perform cross-data source analysis, encompassing diverse sources likeApache Hive, Apache Pinot, MySQL, and Apache Kafka. However, the initial approach to handling fast queries (those completing within two minutes) proved inefficient. Treating them identically to slower queries led to underutilization of Prestoclusters and increased latency due to necessary throttling to prevent system overload.

The key innovation lies in proactively identifying fast queries before they enter the processing pipeline. Uber engineers developed a predictive model based on historical query data. Each query is assigned a unique fingerprint—a hash calculated after removing comments, whitespace, and literal values. Both exact fingerprints (preserving the query’s structure) and abstract fingerprints (a more generalized representation) were tested against the P90 and P95 execution times using 2-day, 5-day, and 7-day lookback windows.

The optimal predictive model emerged from analyzing abstract fingerprints with a 5-day lookback window. This approach accurately predicts whether a query will complete within two minutes based on its past performance. The system maintains a table storing sufficient historical data to allow flexibility in adjusting parameters like percentile (P90, P95) and lookback window as needed.

Implementing this prediction, however, proved more complex than initially anticipated. The initial design placed fast and slow queries in the same queue, differentiated only by user priority (e.g., batch vs. interactive). This resulted in underutilization of the dedicated Presto cluster forfast queries, as slow queries bottlenecked their processing.

A revised design introduced a dedicated queue for fast queries. Upon verification, these queries are immediately routed to the optimized cluster, eliminating the bottleneck caused by mixing query types. This streamlined approach dramatically improved the utilization of the fast query cluster and reduced latency fortime-sensitive analyses.

Conclusion:

Uber’s innovative approach to identifying and prioritizing fast queries using Presto demonstrates a significant advancement in optimizing large-scale data analysis. By leveraging historical data and a sophisticated predictive model, Uber has achieved substantial improvements in query processing speed and resource utilization. This case study highlights theimportance of proactive query management and the potential for significant performance gains through intelligent system design. Future research could explore the application of machine learning techniques for even more accurate query prediction and dynamic resource allocation.

References:

(Note: Since the provided text lacks specific sources, this section would include citations to Uber engineeringblogs, publications, or relevant academic papers if available. A consistent citation style, such as APA, would be used.)


>>> Read more <<<

Views: 0

0

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注