API Performance Data Agent - Part 1

An API (Application Programming Interface) is a set of rules and tools that lets different software systems communicate with each other.

In this example the API is part of an agent running on a Windows server to obtain various hardware performance metric so that this data can be collected using another system using the API endpoints to retrieve data.

High-level design

Purpose:
A small FastAPI-based “Windows agent” that:

  • Collects hardware/Windows inventory once at the API has started startup.

  • Continuously collects CPU / memory / swap metrics every second.

  • Stores metrics:

    • In-memory ring buffer (for quick API access).

    • Rotated daily CSV files with retention.

Main parts:

  1. Configuration (via agent_settings.conf)

  2. Logging (rotating log file + console)

  3. Server hardware inventory collection

  4. hardware performance metrics collection

  5. CSV management (per-day files + retention)

  6. FastAPI app (auth middleware + endpoints)

Configuration & runtime

Config file: agent_settings.conf

Key sections:

  • [agent]

    • listen_host (default 0.0.0.0)

    • listen_port (default 8443)

    • collect_interval_sec (metrics sampling interval, default 1 second)

    • csv_dir (CSV directory, default csv)

    • retention_days (CSV retention, default 7)

    • log_file (e.g. agent.log)

    • log_level (INFO, DEBUG, etc.)

  • [security]

    • api_token (Bearer token)

    • use_tls (bool)

    • cert_file, key_file (for TLS)

If the config file is missing, startup of the API fails.

Data collection from the OS

Runs once at startup and populates the global INVENTORY dictionary.

Sources:

  • OS & basic info via platform.uname():

    • hostname, os_system, os_release, os_version, architecture.

  • CPU topology & frequency via psutil:

    • cpu_physical_cores

    • cpu_logical_cores

    • cpu_max_mhz (from psutil.cpu_freq().max)

  • CPU brand/vendor/model:

    • Tries py-cpuinfo if installed (cpuinfo.get_cpu_info()).

  • Total RAM via psutil.virtual_memory().total.

  • Manufacturer / model / serial:

    • Windows: via WMI (Win32_ComputerSystem, Win32_BIOS).

Result: a JSON-serializable dictionary with keys like:

{
  "hostname": "...",
  "os_system": "Windows",
  "cpu_vendor": "GenuineIntel",
  "cpu_model": "...",
  "cpu_name": "...",
  "cpu_physical_cores": 8,
  "cpu_logical_cores": 16,
  "memory_total_bytes": 33554432,
  "manufacturer": "...",
  "model": "...",
  "serial": "..."
}

Metrics collect_once + collector_loop

Per-sample collection (collect_once) uses psutil to gather:

  • CPU:

    • cpu_total – overall CPU usage (percent)

    • cpu_per_core – list of per-core percentages

    • freq_mhz – current CPU frequency

  • Memory:

    • mem_percent

    • mem_total

    • mem_used

  • Swap:

    • swap_percent

    • swap_total

    • swap_used

The collector loop (collector_loop):

  • Runs as an async task started in the FastAPI lifespan handler.

  • Warm-ups psutil.cpu_percent(interval=None) for accurate deltas.

  • Ensures a CSV file for today exists and has a header.

  • On each iteration:

    1. Calls collect_once() to get a sample dict.

    2. Calls append_csv(sample) to append to the daily CSV.

    3. Appends to the in-memory ring buffer (trimmed to MAX_BUF = 600 samples).

    4. Every ~60 seconds, calls _prune_old_csv() to delete old CSVs.

    5. Sleeps for INTERVAL seconds.

Ring buffer:
ring holds up to 600 samples (~10 minutes at 1 Hz). This is what the API reads.

CSV handling & retention

Per-day CSV file naming: metrics-YYYY-MM-DD.csv under CSV_DIR.

  • _csv_path_for_day() → file path for a date string.

  • _ensure_csv_header() → writes a header line if file is new/empty.

  • _get_current_csv() → picks today’s file, sets globals _current_csv_path & _current_csv_date, ensures header.

  • append_csv(sample) → writes a single row:

    • cpu_total, cpu_per_core (JSON-serialized list)

    • mem_percent, mem_total, mem_used

    • swap_percent, swap_total, swap_used

    • freq_mhz

Retention :

  • Computes a cutoff date = (today – RETENTION_DAYS).

  • Iterates metrics-*.csv files and parses the date.

  • If file’s date < cutoff, deletes it and logs the prune.

GET /health (no auth)

Simple liveness probe.

  • Response:

{
  "status": "ok",
  "time": "2025-11-14T12:34:56.789012+00:00",
  "host": "hostname"
}

Good for Kubernetes/monitoring health checks.

GET /inventory (auth)

Returns the static INVENTORY object collected at startup.

  • Use case: One-time hardware/OS discovery.

GET /metrics/latest (auth)

Returns the most recent sample from ring.

  • If no metrics exist yet (e.g. during first seconds after start), returns:

    • 503 Service Unavailable with "No data yet".

Example response:

{
  "ts_unix": 1731582890.123,
  "ts_iso": "2025-11-14T12:34:50.123456+00:00",
  "host": "my-hostname",
  "cpu_total": 7.5,
  "cpu_per_core": [3.0, 5.0, 10.0, 12.0],
  "mem_percent": 42.3,
  "mem_total": 17179869184,
  "mem_used": 7264423936,
  "swap_percent": 0.0,
  "swap_total": 0,
  "swap_used": 0,
  "freq_mhz": 3500.0
}

GET /metrics/range

Returns a list of samples from the last N seconds, filtered from the in-memory ring.

  • Query parameter:

    • seconds (int, default 10).

    • If seconds <= 0 → 400 Bad Request.

  • Internals:

    • Computes cutoff = current_time - seconds.

    • Returns [sample for sample in ring if sample["ts_unix"] >= cutoff].

Note: This only returns up to what’s in memory (max MAX_BUF samples). Older data exists in CSVs, but those aren’t exposed yet.

Logging & server startup

Logging

  • Logger name: "agent".

  • Level: from LOG_LEVEL (default INFO).

  • Outputs:

    • RotatingFileHandler to LOG_FILE:

      • maxBytes=5MB, backupCount=5.

    • StreamHandler to console.

Format:

2025-11-14 12:34:56,789 [INFO] agent: message...

Server startup (main())

  • If USE_TLS is true:

    • Validates CERT_FILE & KEY_FILE exist.

    • Passes them to uvicorn.run() as ssl_certfile & ssl_keyfile.

  • Logs “Starting Agent on host:port TLS=X”.

  • Runs Uvicorn with the FastAPI app and chosen host/port.

API responding to requests

2025-11-14 17:19:39,989 [INFO] agent: Agent mode: configured=auto effective=windows
2025-11-14 17:19:39,990 [INFO] agent: Inventory: {"hostname": "development", "os_system": "Windows", "os_release": "10", "os_version": "10.0.26200", "architecture": "AMD64", "mode": "windows", "cpu_physical_cores": 14, "cpu_logical_cores": 20, "cpu_max_mhz": 2300.0, "memory_total_bytes": 34087518208}
2025-11-14 17:19:39,992 [INFO] agent: Starting Agent on 0.0.0.0:8080 TLS=False mode=auto(effective=windows)
INFO:     Started server process [27064]
INFO:     Waiting for application startup.
2025-11-14 17:19:40,054 [INFO] agent: Collector started: interval=1s csv_dir=csv retention_days=7
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8080 (Press CTRL+C to quit)
INFO:     127.0.0.1:64289 - "GET /health HTTP/1.1" 200 OK
INFO:     127.0.0.1:64290 - "GET /inventory HTTP/1.1" 200 OK
INFO:     127.0.0.1:64293 - "GET /metrics/latest HTTP/1.1" 200 OK
INFO:     127.0.0.1:64294 - "GET /metrics/latest HTTP/1.1" 200 OK
INFO:     127.0.0.1:64295 - "GET /metrics/latest HTTP/1.1" 200 OK
INFO:     127.0.0.1:64296 - "GET /metrics/latest HTTP/1.1" 200 OK
INFO:     127.0.0.1:64297 - "GET /metrics/latest HTTP/1.1" 200 OK
INFO:     127.0.0.1:64298 - "GET /metrics/latest HTTP/1.1" 200 OK
INFO:     127.0.0.1:64299 - "GET /metrics/latest HTTP/1.1" 200 OK
INFO:     127.0.0.1:64300 - "GET /metrics/latest HTTP/1.1" 200 OK
INFO:     127.0.0.1:64301 - "GET /health HTTP/1.1" 200 OK
INFO:     127.0.0.1:64302 - "GET /metrics/latest HTTP/1.1" 200 OK
INFO:     127.0.0.1:64303 - "GET /metrics/latest HTTP/1.1" 200 OK
INFO:     127.0.0.1:64304 - "GET /metrics/latest HTTP/1.1" 200 OK
INFO:     127.0.0.1:64305 - "GET /metrics/latest HTTP/1.1" 200 OK
INFO:     127.0.0.1:64306 - "GET /metrics/latest HTTP/1.1" 200 OK
INFO:     127.0.0.1:64307 - "GET /metrics/latest HTTP/1.1" 200 OK
INFO:     127.0.0.1:64308 - "GET /metrics/latest HTTP/1.1" 200 OK
INFO:     127.0.0.1:64309 - "GET /metrics/latest HTTP/1.1" 200 OK
INFO:     127.0.0.1:64310 - "GET /metrics/latest HTTP/1.1" 200 OK
INFO:     127.0.0.1:64311 - "GET /metrics/latest HTTP/1.1" 200 OK
INFO:     127.0.0.1:64312 - "GET /metrics/latest HTTP/1.1" 200 OK
INFO:     127.0.0.1:64313 - "GET /metrics/latest HTTP/1.1" 200 OK