81 lines
2.6 KiB
Markdown
81 lines
2.6 KiB
Markdown
# Home Infrastructure Monitoring Stack
|
|
|
|
## Overview
|
|
|
|
This is intended to monitor a homelab environment consisting of:
|
|
- Proxmox
|
|
- Kubernetes running on proxmox
|
|
- Home router / Firewall via SNMP
|
|
|
|
As this is using prometheus, it means if your router, IOT device or any other data source you want to monitor has a probe - it is absolutely achievable.
|
|
|
|
## Hardware
|
|
|
|
Utilising what is lying around, in my case this a an old model 3 RPi and 4 VMs running a talos cluster.
|
|
This setup is heavily tied to my own infra but a majority of what you will find here is easily adaptable.
|
|
|
|
## Component Summary
|
|
|
|
| Component | Location | Purpose |
|
|
|-----------|----------|---------|
|
|
| Grafana | RPi | Single UI for all metrics and logs |
|
|
| Prometheus (infra) | RPi | Scrapes network devices, Proxmox, NFS VM |
|
|
| Prometheus (cluster) | Talos | Scrapes Kubernetes workloads and nodes |
|
|
| Loki | Talos | Centralized log storage |
|
|
| Promtail (syslog) | RPi | Receives syslog from network devices, forwards to Loki |
|
|
| Promtail (k8s) | Talos | Collects container and Talos logs |
|
|
| SNMP Exporter | RPi | Translates SNMP to Prometheus metrics |
|
|
| Node Exporter | Talos (DaemonSet) | Host-level metrics for Talos nodes |
|
|
| Kube State Metrics | Talos | Kubernetes object metrics |
|
|
|
|
## Directory Structure
|
|
|
|
**NOTE:** The ansible directory can be downloaded from it's own [repository]().
|
|
|
|
```
|
|
monitoring-stack/
|
|
├── README.md # This file
|
|
├── ansible/ # RPi setup
|
|
│ ├── inventory.yml
|
|
│ ├── playbook.yml
|
|
│ └── roles/
|
|
│ ├── common/
|
|
│ ├── prometheus/
|
|
│ ├── promtail/
|
|
│ └── grafana/
|
|
└── kubernetes/ # Talos cluster manifests
|
|
├── namespace.yaml
|
|
├── prometheus/
|
|
├── loki/
|
|
├── promtail/
|
|
├── node-exporter/
|
|
└── kube-state-metrics/
|
|
```
|
|
|
|
## Deployment Order
|
|
|
|
1. **RPi Setup** (Ansible)
|
|
```bash
|
|
cd ansible
|
|
ansible-playbook -i inventory.yml playbook.yml
|
|
```
|
|
|
|
2. **Talos Cluster** (kubectl/Ansible)
|
|
```bash
|
|
kubectl apply -f kubernetes/namespace.yaml
|
|
kubectl apply -f kubernetes/prometheus/
|
|
kubectl apply -f kubernetes/loki/
|
|
kubectl apply -f kubernetes/promtail/
|
|
kubectl apply -f kubernetes/node-exporter/
|
|
kubectl apply -f kubernetes/kube-state-metrics/
|
|
```
|
|
|
|
3. **Configure Network Devices**
|
|
- Point syslog to RPi IP:514 (UDP)
|
|
- Enable SNMP on devices
|
|
|
|
4. **Add Data Sources in Grafana**
|
|
- Prometheus (local): `http://localhost:9090`
|
|
- Prometheus (cluster): `http://<talos-node-ip>:30090`
|
|
- Loki (cluster): `http://<talos-node-ip>:30100`
|