Project Overview: documents


I decided to anchor documents around a single reliable workflow: Paperless-ngx document intake and review through a scriptable, authenticated path that works with both local files and Discord attachments, without introducing extra services.

What We Built

  • A focused automation surface for Paperless-ngx, with operational rules captured in AGENTS.md and execution centered on scripts/paperless-upload.sh.
  • A secure credential model that explicitly avoids hardcoded secrets and pulls runtime values from encrypted sec keys (PAPERLESS_NGX_URL, token, and Cloudflare Access client credentials).
  • A practical Discord-to-Paperless intake flow: the upload script accepts file paths or Discord CDN URLs, normalizes and uploads them, polls tasks, and records run logs at uploads/paperless-runs/upload-<timestamp>.jsonl.
  • A documented authenticated API pattern that combines Paperless token auth with Cloudflare Access headers and API version pinning to avoid known access/signature failure modes.

Why We Built It

  • The core need was fast, dependable document retrieval and processing, reflected in session demand like: “I need to pull a series of documents… My 2025 W2(s)…”.
  • Paperless is protected behind Cloudflare Access, so “just curl it” is not enough; we needed one known-good request shape that agents can reuse without trial-and-error.
  • We prioritized operational reliability over platform expansion: one script, explicit headers, visible per-file progress, and durable run logs instead of new backend infrastructure.
  • This gives us repeatable behavior under pressure (especially tax/records workflows) and creates an auditable trail for what was uploaded and when.

How It Works

  • Runtime auth is assembled from sec, then applied consistently to API calls with:
    • Authorization: Token ...
    • CF-Access-Client-Id / CF-Access-Client-Secret
    • Accept: application/json; version=9
  • scripts/paperless-upload.sh is the main operator entrypoint for ingestion; it handles mixed inputs (local + URL), shows progress states (DOWNLOAD, UPLOAD, TASK, DONE/FAIL), and avoids “hung job” ambiguity.
  • The repository is intentionally light on app scaffolding (no package scripts detected), which keeps the workflow transparent and shell-native for both humans and agents.
  • At this point, the project state is stable rather than rapidly changing: the baseline write-up exists and current signals indicate no new implementation delta, so the main value is disciplined execution of the established flow.