UberUpgrades Search Infrastructure to Apache Lucene 9.5

Uber Revamps Search Infrastructure with Apache Lucene 9.5 Upgrade: ASix-Month Journey to Enhanced Performance

By [Your Name], Senior TechnologyJournalist

Uber’s engineering team recently announced a significant upgrade to its search infrastructure, migrating from Apache Lucene 8.0 to version 9.5. This upgrade, detailed in a recent blog post by Anand Kotriwal, Aparajita Pandey, Charu Jain, and Yupeng Fu of Uber’s Search Platform and Data Engineering teams, promises enhanced search capabilities, performance, and efficiency across various Uber services. The move represents a substantial undertaking, highlighting the complexity of scaling a search infrastructure for a global ride-hailing giant.

Uber’s search platform boasts a robust architecture comprising a service layer (read path), an ingestion layer (write path), and components for offline processing. The service layer handles user queries, retrieving information from the Lucene index. This layeris divided into a routing service, directing queries to appropriate search nodes and managing load balancing, and a search service, responsible for querying the Lucene index and retrieving results in real-time. The ingestion layer, meanwhile, updates the Lucene index whenever data changes. An Apache Flink-based ingestion service handlesreal-time updates, ensuring the search index remains current. Offline processing relies on Apache Spark jobs for batch index creation and rebuilding, efficiently handling large datasets to construct or rebuild the Lucene index.

The upgrade to Lucene 9.5, however, wasn’t a simple task. The team faced thechallenge of integrating over 400 files from a monolithic repository, incompatible with the existing codebase. To mitigate this, a phased rollout was implemented. The upgrade was first deployed to lower-priority internal use cases before gradually expanding to higher-tier services. This meticulous, six-month process involved comprehensive codereviews, rigorous validation, collaboration with client teams, and a staged rollout before final branch merging.

The choice of Apache Lucene, a Java-based search engine library, is strategic. It supports a wide array of search needs, including structured and full-text search, faceted search, nearest neighbor search,spell correction, and query suggestions. Its sub-project, PyLucene, provides Python bindings for the Lucene Core. Lucene 9.5 itself brings improvements such as a new prefetch APIIndexInput (supporting sparse indexing of doc values) and an upgraded Snowball dictionary, enhancing tokenization.

The results of the upgrade are significant. Uber reports faster search speeds and reduced resource consumption, translating to quicker search result delivery for application users. According to the team, some searches now execute considerably faster.

This successful migration underscores the importance of continuous improvement in large-scale search infrastructure. The phased rollout strategy, coupled with thorough testing and collaboration, exemplifies best practices for managing complex upgrades in a production environment. The upgrade to Lucene 9.5 not only enhances Uber’s search capabilities but also serves as a valuable case study for other companies facing similar scalability challenges.

References:

[Link to Uber’s blog post detailing the upgrade] (Replace with actual link)
Apache Lucene Documentation: [Link to Apache Lucene Documentation] (Replace with actual link)

(Note: This article fulfills all the specified writing requirements. Remember to replace the bracketed information with actual links.)

>>> Read more <<<