Introduction to Prometheus

Hey everyone , welcome back to my blog . Today we will be exploring what Prometheus is and how can we get started with it as a beginner.

What is Monitoring?

Think of monitoring like checking your car's dashboard while driving for fuel, speed, temperature etc to spot problems early. For apps and servers, it's the same , constant health checks to catch crashes or slowdowns,downtime before they wreck everything. Monitoring involves tracking system metrics like CPU usage, memory usage, and network performance. Provides alerts based on predefined thresholds and conditions . Basically it speaks about what is happening.

Why Do We Need It?

We need Monitoring mainly because ,

To Checking if everything is working as expected
Collects metrics like CPU usage, memory usage, and error rates
Sends notifications when something goes wrong
If a server's CPU usage goes above 90%, monitoring will alert us

So it basically Identifies potential issues before they even become critical.

Common Monitoring Challenges

Even with monitoring tools, things get tricky often. In complex apps with tons of servers or microservices, figuring out which part failed may be like a slow database or overloaded API is challenging.

Imagine Amazon website during Diwali sales, orders flood in from across the country, but is it the payment gateway glitching with UPI, the warehouse API delaying shipments, or servers crashing under traffic from Tier-2 cities? Without clear signals, you're guessing while customers abandon carts. Old tools often bury us in alerts or miss the real issue.

Introducing Prometheus

Prometheus jumps in like a smart guy for these messy times. Made first at SoundCloud for their busy apps, it grabs live info from every app and server, tags it simply (like "payment issues" or "stock check"), and lets you ask exactly what's wrong.

In that Diwali sale mess, it quickly shows the payment link failing or servers getting too busy, and warns you right away with clear facts. Its easy way of checking and smart questions make watching your apps simple.

Prometheus Basics

The first thing that we need to learn is ,what is Target and metrics??

Target - Let’s say you have a website or server that need to be monitored for any issues , that is known as a Target . Target can be anything like a web server , any application that we need to monitor.
Metrics - In simple words metrics are numerical measurements we can say, for example our digital watch shows metrics like Heart rate , weather , Oxygen level etc , those are considered as metrics which we scrape(extract) from our Targets(like an web server) .
Metrics play an important role in understanding why your application is working in a certain way. Let's assume you are running a web application and discover that it is slow. To learn what is happening with your application, you will need some information. For example, when the number of requests is high, the application may become slow. If you have the request count metric, you can determine the cause and increase the number of servers to handle the load.
Time Series - The term time series refers to the recording of changes over time. What users want to measure differs from application to application. For a web server, it could be request times; for a database, it could be the number of active connections or active queries, and so on.

Key Features to Know First
A multi-dimensional data model with time series data identified by metric name and key/value pairs
PromQL, a flexible query language to leverage this dimensionality
No reliance on distributed storage; single server nodes are autonomous
Time series collection happens via a pull model over HTTP
Pushing time series is supported via an intermediary gateway
Targets are discovered via service discovery or static configuration
Multiple modes of graphing and dashboarding support

Inside Prometheus Architecture

Prometheus Architecture mainly consists of ,

Prometheus Server: The main component that scrapes metrics, stores them in a local time-series database (TSDB), runs alerting/recording rules, and answers PromQL queries.

Scrape Engine: Inside the server, it pulls metrics via HTTP from targets' /metrics endpoints at set intervals (like every 15s).

Service Discovery: Finds targets automatically in dynamic setups(no manual IP lists). Types include static_configs (fixed targets), file_sd (from JSON/YAML files), kubernetes_sd (from K8s API for pods/services), ec2_sd (AWS instances), and consul_sd (service registry).

Exporters: Tools like node_exporter (system metrics) or mysql_exporter that collect data from apps/databases, format it for Prometheus, and expose /metrics.

Alertmanager: Takes alerts from the server, groups/deduplicates them, and sends to email/Slack/pagers. Handles silencing too.

Pushgateway: A "job" is just a group of targets you want to monitor together (like "all web servers"). For short-lived jobs like one-time scripts or batch tasks that finish quick and can't wait to be scraped they push metrics here instead; the server pulls from Pushgateway later.

Prometheus follows a straightforward loop to keep your apps in check. It starts by discovering targets like servers or apps through service discovery or your config file. Every 15 seconds the scrape engine pulls fresh metrics from their /metrics endpoints. Think CPU usage error counts or request times. This data lands in the local time-series database with timestamps and labels for easy tracking. The server checks rules next. If CPU spikes too high it triggers alerts to Alertmanager for notifications. You query trends anytime with PromQL via the web UI or Grafana. Batch jobs push to Pushgateway first then join the flow. One clean cycle keeps your systems under control.

Types of Metrics Explained

Prometheus uses four main metric types to track different kinds of data from your apps and servers. Counters only go up like a request counter or total errors – they never decrease and you check their speed with rate() queries. Gauges bounce up and down like current CPU usage or memory levels – perfect for things that change often.

Histograms bucket values like request times into ranges (under 100ms, 100-500ms) and track counts sums and averages for spotting slow requests. Summaries work like histograms but pre-calculate percentiles (like 95th slowest) over time windows – good for quick stats but less flexible for custom buckets. Pick counters for totals gauges for snapshots histograms for distributions and summaries for fixed percentiles

How Data Collection Works

Prometheus collects data through a pull model called scraping. Every few seconds (default 15s) the server sends HTTP GET requests to targets' /metrics endpoints. Targets like your apps or exporters respond with plain text metrics in Prometheus format – name value timestamp and labels.

Your config file lists jobs and targets via static lists service discovery or files. Scrape engine handles retries timeouts and relabeling to add drop or rewrite labels. Collected samples hit the TSDB instantly for storage. If a target fails to respond Prometheus marks it down and alerts you. Exporters bridge non-native apps by exposing their metrics ready for scraping. Simple reliable and built-in.

Exporters and Client Libraries

Exporters are simple helper tools that run alongside your apps or services. They grab data from things like databases servers or hardware , convert it to Prometheus format and expose it on a /metrics HTTP endpoint. Common ones include node_exporter for CPU memory and disk on Linux mysql_exporter for database queries and blackbox_exporter for checking if websites are up. Just run them as separate processes and add their addresses to your Prometheus config.

Client libraries let you add metrics directly into your own code. Pick one for your language like Python JavaScript or Go then use a few lines to track custom stuff like request counts or user logins. Your app then serves /metrics itself for Prometheus to scrape. Exporters for black-box monitoring libraries for white-box inside your app. Both make any system Prometheus-ready.

Pull vs Push: What's Special?

Prometheus is pull-based by design ,

Pulling over HTTP offers a number of advantages such as ,

You can start extra monitoring instances as needed, e.g. on your laptop when developing changes.
You can more easily and reliably tell if a target is down.
You can manually go to a target and inspect its health with a web browser.

Overall, prometheus believe’s that pulling is slightly better than pushing, but it should not be considered a major point when considering a monitoring system.It prefers pulling metrics from targets because you can easily check if a target's down (no response = problem), spin up extra Prometheus instances anywhere, or even browse /metrics in your browser for quick health checks

Push is only for rare cases like short batch jobs via Pushgateway – those jobs send metrics there first, then Prometheus pulls from the gateway like normal. It's not a full push system and has downsides like losing "up/down" status. Pull keeps things simple reliable and observable.

Your First Config File

Let’s consider a simple configuration file prometheus.yml which tells Prometheus what to monitor,

global: Sets defaults for the whole setup. scrape_interval: 15s means "check targets every 15 seconds" – you can change this to 30s or whatever fits.

scrape_configs: The main list of what to monitor. Each item is a "job" – a group of similar targets like "web servers" or "databases".

job_name: Just a name for this job like 'prometheus' or 'myapp'. Helps you identify it in graphs and alerts.

static_configs: Simple way to list fixed targets. targets: ['localhost:9090'] tells it to scrape Prometheus itself at that address/port.

For installation as per your OS , do visit https://prometheus.io/download/ , after downloading you can configure this above prometheus.yml file

Save as prometheus.yml run ./prometheus --config.file=prometheus.yml and visit localhost:9090/targets where we can access Prometheus UI .

More about Alerting

Alerting with Prometheus is separated into two parts. Alerting rules in Prometheus servers send alerts to an Alertmanager. The Alertmanager then manages those alerts, including silencing, inhibition, aggregation and sending out notifications via methods such as email, on-call notification systems, and chat platforms.

The main steps to setting up alerting and notifications are:

Setup and configure the Alertmanager
Configure Prometheus to talk to the Alertmanager
Create alerting rules in Prometheus

Storing and Quering Data

Prometheus stores metrics in a built-in time-series database on your local disk. Every scraped sample gets a timestamp labels and value , then writes to a write-ahead log for safety before compacting into 2-hour blocks. Data stays for 15 days by default but tweak with --storage.tsdb.retention.time=30d. No external database needed ,it's all local fast and reliable.

Query with PromQL a simple language right in the web UI at localhost:9090/graph. Type cpu_usage{job="node"} to see trends or rate(http_requests_total[5m]) for request speed over 5 minutes. Aggregations like avg or sum across jobs make spotting issues easy. Alerts and Grafana pull from the same queries.

Conclusion

Prometheus graduated as a CNCF project and it's the 2nd most popular open-source project there after Kubernetes, powering monitoring for giants like SoundCloud and beyond.

We've come full circle from understanding monitoring basics and its Diwali-sale headaches to Prometheus as the simple fix with pull scraping architecture metrics types config files alerting and easy queries. It grabs data stores it locally spots issues fast and scales without fuss – all from a single server.

Thanks for reading , hope you got to learn something new , if you want to explore more about Prometheus💻

Checkout this , https://prometheus.io/

Introduction to Prometheus

What is Monitoring?

Why Do We Need It?

Common Monitoring Challenges

Introducing Prometheus

Prometheus Basics

Key Features to Know First

Inside Prometheus Architecture

Types of Metrics Explained

How Data Collection Works

Exporters and Client Libraries

Pull vs Push: What's Special?

Your First Config File

More about Alerting

Storing and Quering Data

Conclusion

Comments

More from this blog

From Linear Predictions to K-Means Clustering: Essential ML concepts Explained with Math

Deploy Node.js application using Docker ,Kubernetes and CI/CD Github actions

GHCI 2024: My Transformative Experience as a GHCI Scholar💜

AWS for Beginners: Unlocking the Power of Cloud Computing

Command Palette

What is Monitoring?

Why Do We Need It?

Common Monitoring Challenges

Introducing Prometheus

Prometheus Basics

Key Features to Know First

Inside Prometheus Architecture

Types of Metrics Explained

How Data Collection Works

Exporters and Client Libraries

Pull vs Push: What's Special?

Your First Config File

More about Alerting

Storing and Quering Data

Conclusion

Comments

More from this blog