API Performance Data Agent - Part 1
An API (Application Programming Interface) is a set of rules and tools that lets different software systems communicate with each other.
In this example the API is part of an agent running on a Windows server to obtain various hardware performance metric so that this data can be collected using another system using the API endpoints to retrieve data.
High-level design
Purpose:
A small FastAPI-based “Windows agent” that:
Collects hardware/Windows inventory once at the API has started startup.
Continuously collects CPU / memory / swap metrics every second.
Stores metrics:
In-memory ring buffer (for quick API access).
Rotated daily CSV files with retention.
Main parts:
Configuration (via agent_settings.conf)
Logging (rotating log file + console)
Server hardware inventory collection
hardware performance metrics collection
CSV management (per-day files + retention)
FastAPI app (auth middleware + endpoints)
Configuration & runtime
Config file: agent_settings.conf
Key sections:
[agent]
listen_host (default 0.0.0.0)
listen_port (default 8443)
collect_interval_sec (metrics sampling interval, default 1 second)
csv_dir (CSV directory, default csv)
retention_days (CSV retention, default 7)
log_file (e.g. agent.log)
log_level (INFO, DEBUG, etc.)
[security]
api_token (Bearer token)
use_tls (bool)
cert_file, key_file (for TLS)
If the config file is missing, startup of the API fails.
Data collection from the OS
Runs once at startup and populates the global INVENTORY dictionary.
Sources:
OS & basic info via platform.uname():
hostname, os_system, os_release, os_version, architecture.
CPU topology & frequency via psutil:
cpu_physical_cores
cpu_logical_cores
cpu_max_mhz (from psutil.cpu_freq().max)
CPU brand/vendor/model:
Tries py-cpuinfo if installed (cpuinfo.get_cpu_info()).
Total RAM via psutil.virtual_memory().total.
Manufacturer / model / serial:
Windows: via WMI (Win32_ComputerSystem, Win32_BIOS).
Result: a JSON-serializable dictionary with keys like:
{
"hostname": "...",
"os_system": "Windows",
"cpu_vendor": "GenuineIntel",
"cpu_model": "...",
"cpu_name": "...",
"cpu_physical_cores": 8,
"cpu_logical_cores": 16,
"memory_total_bytes": 33554432,
"manufacturer": "...",
"model": "...",
"serial": "..."
}
Metrics collect_once + collector_loop
Per-sample collection (collect_once) uses psutil to gather:
CPU:
cpu_total – overall CPU usage (percent)
cpu_per_core – list of per-core percentages
freq_mhz – current CPU frequency
Memory:
mem_percent
mem_total
mem_used
Swap:
swap_percent
swap_total
swap_used
The collector loop (collector_loop):
Runs as an async task started in the FastAPI lifespan handler.
Warm-ups psutil.cpu_percent(interval=None) for accurate deltas.
Ensures a CSV file for today exists and has a header.
On each iteration:
Calls collect_once() to get a sample dict.
Calls append_csv(sample) to append to the daily CSV.
Appends to the in-memory ring buffer (trimmed to MAX_BUF = 600 samples).
Every ~60 seconds, calls _prune_old_csv() to delete old CSVs.
Sleeps for INTERVAL seconds.
Ring buffer:
ring holds up to 600 samples (~10 minutes at 1 Hz). This is what the API reads.
CSV handling & retention
Per-day CSV file naming: metrics-YYYY-MM-DD.csv under CSV_DIR.
_csv_path_for_day() → file path for a date string.
_ensure_csv_header() → writes a header line if file is new/empty.
_get_current_csv() → picks today’s file, sets globals _current_csv_path & _current_csv_date, ensures header.
append_csv(sample) → writes a single row:
cpu_total, cpu_per_core (JSON-serialized list)
mem_percent, mem_total, mem_used
swap_percent, swap_total, swap_used
freq_mhz
Retention :
Computes a cutoff date = (today – RETENTION_DAYS).
Iterates metrics-*.csv files and parses the date.
If file’s date < cutoff, deletes it and logs the prune.
GET /health (no auth)
Simple liveness probe.
Response:
{
"status": "ok",
"time": "2025-11-14T12:34:56.789012+00:00",
"host": "hostname"
}
Good for Kubernetes/monitoring health checks.
GET /inventory (auth)
Returns the static INVENTORY object collected at startup.
Use case: One-time hardware/OS discovery.
GET /metrics/latest (auth)
Returns the most recent sample from ring.
If no metrics exist yet (e.g. during first seconds after start), returns:
503 Service Unavailable with "No data yet".
Example response:
{
"ts_unix": 1731582890.123,
"ts_iso": "2025-11-14T12:34:50.123456+00:00",
"host": "my-hostname",
"cpu_total": 7.5,
"cpu_per_core": [3.0, 5.0, 10.0, 12.0],
"mem_percent": 42.3,
"mem_total": 17179869184,
"mem_used": 7264423936,
"swap_percent": 0.0,
"swap_total": 0,
"swap_used": 0,
"freq_mhz": 3500.0
}
GET /metrics/range
Returns a list of samples from the last N seconds, filtered from the in-memory ring.
Query parameter:
seconds (int, default 10).
If seconds <= 0 → 400 Bad Request.
Internals:
Computes cutoff = current_time - seconds.
Returns [sample for sample in ring if sample["ts_unix"] >= cutoff].
Note: This only returns up to what’s in memory (max MAX_BUF samples). Older data exists in CSVs, but those aren’t exposed yet.
Logging & server startup
Logging
Logger name: "agent".
Level: from LOG_LEVEL (default INFO).
Outputs:
RotatingFileHandler to LOG_FILE:
maxBytes=5MB, backupCount=5.
StreamHandler to console.
Format:
2025-11-14 12:34:56,789 [INFO] agent: message...
Server startup (main())
If USE_TLS is true:
Validates CERT_FILE & KEY_FILE exist.
Passes them to uvicorn.run() as ssl_certfile & ssl_keyfile.
Logs “Starting Agent on host:port TLS=X”.
Runs Uvicorn with the FastAPI app and chosen host/port.
API responding to requests
2025-11-14 17:19:39,989 [INFO] agent: Agent mode: configured=auto effective=windows
2025-11-14 17:19:39,990 [INFO] agent: Inventory: {"hostname": "development", "os_system": "Windows", "os_release": "10", "os_version": "10.0.26200", "architecture": "AMD64", "mode": "windows", "cpu_physical_cores": 14, "cpu_logical_cores": 20, "cpu_max_mhz": 2300.0, "memory_total_bytes": 34087518208}
2025-11-14 17:19:39,992 [INFO] agent: Starting Agent on 0.0.0.0:8080 TLS=False mode=auto(effective=windows)
INFO: Started server process [27064]
INFO: Waiting for application startup.
2025-11-14 17:19:40,054 [INFO] agent: Collector started: interval=1s csv_dir=csv retention_days=7
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:8080 (Press CTRL+C to quit)
INFO: 127.0.0.1:64289 - "GET /health HTTP/1.1" 200 OK
INFO: 127.0.0.1:64290 - "GET /inventory HTTP/1.1" 200 OK
INFO: 127.0.0.1:64293 - "GET /metrics/latest HTTP/1.1" 200 OK
INFO: 127.0.0.1:64294 - "GET /metrics/latest HTTP/1.1" 200 OK
INFO: 127.0.0.1:64295 - "GET /metrics/latest HTTP/1.1" 200 OK
INFO: 127.0.0.1:64296 - "GET /metrics/latest HTTP/1.1" 200 OK
INFO: 127.0.0.1:64297 - "GET /metrics/latest HTTP/1.1" 200 OK
INFO: 127.0.0.1:64298 - "GET /metrics/latest HTTP/1.1" 200 OK
INFO: 127.0.0.1:64299 - "GET /metrics/latest HTTP/1.1" 200 OK
INFO: 127.0.0.1:64300 - "GET /metrics/latest HTTP/1.1" 200 OK
INFO: 127.0.0.1:64301 - "GET /health HTTP/1.1" 200 OK
INFO: 127.0.0.1:64302 - "GET /metrics/latest HTTP/1.1" 200 OK
INFO: 127.0.0.1:64303 - "GET /metrics/latest HTTP/1.1" 200 OK
INFO: 127.0.0.1:64304 - "GET /metrics/latest HTTP/1.1" 200 OK
INFO: 127.0.0.1:64305 - "GET /metrics/latest HTTP/1.1" 200 OK
INFO: 127.0.0.1:64306 - "GET /metrics/latest HTTP/1.1" 200 OK
INFO: 127.0.0.1:64307 - "GET /metrics/latest HTTP/1.1" 200 OK
INFO: 127.0.0.1:64308 - "GET /metrics/latest HTTP/1.1" 200 OK
INFO: 127.0.0.1:64309 - "GET /metrics/latest HTTP/1.1" 200 OK
INFO: 127.0.0.1:64310 - "GET /metrics/latest HTTP/1.1" 200 OK
INFO: 127.0.0.1:64311 - "GET /metrics/latest HTTP/1.1" 200 OK
INFO: 127.0.0.1:64312 - "GET /metrics/latest HTTP/1.1" 200 OK
INFO: 127.0.0.1:64313 - "GET /metrics/latest HTTP/1.1" 200 OK