Add the monitoring stack
This commit is contained in:
80
README.md
Normal file
80
README.md
Normal file
@@ -0,0 +1,80 @@
|
||||
# Home Infrastructure Monitoring Stack
|
||||
|
||||
## Overview
|
||||
|
||||
This is intended to monitor a homelab environment consisting of:
|
||||
- Proxmox
|
||||
- Kubernetes running on proxmox
|
||||
- Home router / Firewall via SNMP
|
||||
|
||||
As this is using prometheus, it means if your router, IOT device or any other data source you want to monitor has a probe - it is absolutely achievable.
|
||||
|
||||
## Hardware
|
||||
|
||||
Utilising what is lying around, in my case this a an old model 3 RPi and 4 VMs running a talos cluster.
|
||||
This setup is heavily tied to my own infra but a majority of what you will find here is easily adaptable.
|
||||
|
||||
## Component Summary
|
||||
|
||||
| Component | Location | Purpose |
|
||||
|-----------|----------|---------|
|
||||
| Grafana | RPi | Single UI for all metrics and logs |
|
||||
| Prometheus (infra) | RPi | Scrapes network devices, Proxmox, NFS VM |
|
||||
| Prometheus (cluster) | Talos | Scrapes Kubernetes workloads and nodes |
|
||||
| Loki | Talos | Centralized log storage |
|
||||
| Promtail (syslog) | RPi | Receives syslog from network devices, forwards to Loki |
|
||||
| Promtail (k8s) | Talos | Collects container and Talos logs |
|
||||
| SNMP Exporter | RPi | Translates SNMP to Prometheus metrics |
|
||||
| Node Exporter | Talos (DaemonSet) | Host-level metrics for Talos nodes |
|
||||
| Kube State Metrics | Talos | Kubernetes object metrics |
|
||||
|
||||
## Directory Structure
|
||||
|
||||
**NOTE:** The ansible directory can be downloaded from it's own [repository]().
|
||||
|
||||
```
|
||||
monitoring-stack/
|
||||
├── README.md # This file
|
||||
├── ansible/ # RPi setup
|
||||
│ ├── inventory.yml
|
||||
│ ├── playbook.yml
|
||||
│ └── roles/
|
||||
│ ├── common/
|
||||
│ ├── prometheus/
|
||||
│ ├── promtail/
|
||||
│ └── grafana/
|
||||
└── kubernetes/ # Talos cluster manifests
|
||||
├── namespace.yaml
|
||||
├── prometheus/
|
||||
├── loki/
|
||||
├── promtail/
|
||||
├── node-exporter/
|
||||
└── kube-state-metrics/
|
||||
```
|
||||
|
||||
## Deployment Order
|
||||
|
||||
1. **RPi Setup** (Ansible)
|
||||
```bash
|
||||
cd ansible
|
||||
ansible-playbook -i inventory.yml playbook.yml
|
||||
```
|
||||
|
||||
2. **Talos Cluster** (kubectl/Ansible)
|
||||
```bash
|
||||
kubectl apply -f kubernetes/namespace.yaml
|
||||
kubectl apply -f kubernetes/prometheus/
|
||||
kubectl apply -f kubernetes/loki/
|
||||
kubectl apply -f kubernetes/promtail/
|
||||
kubectl apply -f kubernetes/node-exporter/
|
||||
kubectl apply -f kubernetes/kube-state-metrics/
|
||||
```
|
||||
|
||||
3. **Configure Network Devices**
|
||||
- Point syslog to RPi IP:514 (UDP)
|
||||
- Enable SNMP on devices
|
||||
|
||||
4. **Add Data Sources in Grafana**
|
||||
- Prometheus (local): `http://localhost:9090`
|
||||
- Prometheus (cluster): `http://<talos-node-ip>:30090`
|
||||
- Loki (cluster): `http://<talos-node-ip>:30100`
|
||||
Reference in New Issue
Block a user