How to Integrate Prometheus and Grafana with a Self-Managed Frappe Instance for Monitoring Metrics?

I’m currently running a self-managed Frappe instance and I’m interested in integrating Prometheus and Grafana to monitor various metrics such as latency, request rates, and system performance.

Could anyone guide me on the following:

  1. Best Practices: What are the best practices for setting up Prometheus and Grafana with Frappe?
  2. Exposing Metrics: How can I expose Frappe-specific metrics that Prometheus can scrape? Are there any built-in features or third-party tools/plugins that are recommended?
  3. Configuration Examples: If anyone has a sample prometheus.yml configuration or Grafana dashboard setup specifically tailored for Frappe, that would be very helpful.
  4. Potential Challenges: Are there any known challenges or considerations I should be aware of when integrating these tools with Frappe?

When it comes to observability, OpenTelementry standards are your friend. You can integrate any OpenTelementry complaint python packages that exposes metrics endpoint for promethous to scrape and store in Granfana. Take a look at this guide.

Additionally, you can send logs generated by frappe to Grafana and define alerts. Challenges will be manly around how long you want to store this data and purging mechanism, RBAC for grafana dashboards and building the dashboards itself (on Grafana)

Related code for reference: GitHub - athul/bench-exporter: Prometheus exporter for frappe benches