News Report
Date: September 16, 2024
Source: InfoQ
In a significant move to enhance the capabilities of the OpenTelemetry project, Elastic has donated a production-grade, eBPF-based continuous profiling agent. This contribution is set to accelerate the standardization of profiling within the OpenTelemetry ecosystem, a key development that promises to revolutionize how developers monitor and optimize their applications.
Background
OpenTelemetry, a project originally initiated by the Cloud Native Computing Foundation (CNCF), has gained substantial traction in the observability space. It aims to provide a set of APIs, libraries, agents, and collector services to capture distributed traces, metrics, and logs from applications. The recent integration of continuous profiling as a core telemetry signal marks a pivotal step forward for the project.
Elastic’s Contribution
Elastic’s donation introduces a production-ready continuous profiling agent based on extended Berkeley Packet Filter (eBPF) technology. This agent offers full-system, always-on profiling capabilities with minimal performance overhead, addressing many limitations of traditional profiling methods. The donation follows the merging of the profiling data model proposal, known as OTEP (OpenTelemetry Enhancement Proposal), in March 2023.
The donated agent stands out for several reasons:
- Low Performance Impact: The agent operates with approximately 1% CPU usage, ensuring minimal interference with application performance.
- Language and Runtime Support: It supports multiple programming languages and runtimes, providing visibility into third-party libraries and kernel operations.
- Comprehensive Visualization: The agent identifies suboptimal code paths and provides detailed visualizations of runtime behavior.
Standardization Efforts
The donation is part of a broader initiative to standardize profiling within the OpenTelemetry framework. A Special Interest Group (SIG) dedicated to profiling has been established to address challenges such as whether to build on existing data models or create a new one, balancing domain-specific profiling practices with OpenTelemetry’s framework-specific conventions, and selecting an appropriate existing profiling format.
The SIG is also tasked with integrating profiling data into the OpenTelemetry collector, ensuring that the data is extracted, parsed into the collector’s internal format pdata, and processed uniformly with other telemetry signals.
Beyond Performance Analysis
Continuous profiling extends beyond traditional performance and cost analysis. It enables use cases such as signal correlation, event response, and detailed resource consumption analysis. The technology has shown promise in identifying CPU spikes, memory issues, mutex contention, and network jitter.
Incorporating continuous profiling into OpenTelemetry will assist engineers in identifying resource-intensive code and enhance vendor neutrality by reducing reliance on proprietary APM agents.
Industry Trends
The integration of eBPF technology into profiling solutions, as seen with Elastic’s agent, represents an important trend. While eBPF offers comprehensive system-wide profiling with minimal overhead, it presents challenges in symbol management and runtime compatibility.
Community feedback, such as that from Reddit user SuperQue, highlights the need for continuous profiling within OpenTelemetry. SuperQue expressed disappointment in the lack of profiling capabilities in OpenTelemetry, noting that tools like Polar Signals and Pyroscope provide more detailed insights into slow code sections than traditional tracking.
Growing Adoption
OpenTelemetry’s adoption of continuous profiling aligns with a growing industry trend. Several startups and major observability vendors have entered the space, recognizing the value of profiling data when correlated with other telemetry signals. Other continuous profiling agents in the market include Polar Signals’ Parca Agent, Grafana Alloy, and Grafana Agent.
Expert Insights
A video released on the OpenObservability Talks YouTube channel featuring experts from Datadog and Grafana Labs discusses the integration of continuous profiling into OpenTelemetry. Felix Geisendörfer from Datadog and Ryan Perry from Grafana Labs explain the evolution of profiling from a performance and cost analysis tool to a critical observability signal. They also delve into the decision to adopt an extended version of the pprof format as the OTel profiling data standard and the challenges of balancing performance requirements with existing OpenTelemetry conventions.
Conclusion
Elastic’s donation of an eBPF-based profiling agent to OpenTelemetry marks a significant milestone in the project’s evolution. By enhancing the observability ecosystem with continuous profiling, OpenTelemetry is poised to offer developers more comprehensive insights into application behavior, ultimately improving performance and reliability. As the industry continues to embrace this technology, the future of application monitoring looks promising.
Views: 0