# Eldersafe Production Architecture v1

## Scale targets

| Metric | Value |
|--------|-------|
| Members | 1,000,000 |
| Peak concurrent life signals | 5,000 |
| Life signal response time | < 500ms p99 |
| BSV transactions / year | 365,000,000 (batched to ~3,650,000) |
| Uptime target | 99.9% (~8h downtime/year acceptable for break-glass system) |
| Data sovereignty | Denmark / EU only |

---

## Architecture Overview

```
                          INTERNET
                             │
                             ▼
         ┌───────────────────────────────────────┐
         │  Caddy v2 (:443/:80)                  │
         │  TLS termination, HTTP/3, rate limit   │
         │                                        │
         │  app.eldersafe.dk  ──┐                │
         │  api.eldersafe.dk  ──┤                │
         │  pay.eldersafe.dk  ──┤                │
         │  mail.eldersafe.dk ──┤ subdomain      │
         │  help.eldersafe.dk ──┤ routing        │
         │  status. ...       ──┤                │
         │  docs. ...         ──┤                │
         │  8-15 subdomains   ──┘                │
         └───────┬───────────────────────────────┘
                 │
     ┌───────────┼───────────────┬────────────────┐
     │           │               │                │
     ▼           ▼               ▼                ▼
┌─────────┐ ┌─────────┐ ┌─────────────┐  ┌──────────────┐
│ TIER 1  │ │ TIER 2  │ │ TIER 3      │  │ TIER 4       │
│ Gunicorn│ │ Gunicorn│ │ Gunicorn    │  │ Caddy static │
│ 8 wkrs  │ │ 4 wkrs  │ │ 2-4 wkrs    │  │ file server  │
│ :8001   │ │ :8002   │ │ :8003       │  │ :0 workers   │
│         │ │         │ │             │  │              │
│ app     │ │ pay     │ │ mail help   │  │ status docs  │
│ api     │ │ wallet  │ │ partners*   │  │ blog manifest │
│ auth    │ │ sponsor │ │             │  │              │
│ signal  │ │         │ │             │  │              │
│         │ │         │ │             │  │              │
│ 800 MB  │ │ 400 MB  │ │ 400-600 MB  │  │ ~0 MB        │
└────┬────┘ └────┬────┘ └──────┬──────┘  └──────┬───────┘
     │           │              │                │
     └───────────┴──────────────┴────────────────┘
                 │
     ┌───────────┼───────────────┐
     ▼           ▼               ▼
┌──────────┐ ┌──────────┐ ┌──────────────┐
│PostgreSQL│ │ Redis 7  │ │ BSV Worker   │
│17 primary│ │ queue    │ │ background   │
└────┬─────┘ └──────────┘ └──────┬───────┘
     │                           │
     ▼                           ▼
┌──────────┐              ┌──────────────┐
│pg_replica│              │ BSV Network  │
│+ pg_dump │              │ API or node  │
└──────────┘              └──────────────┘
```

### Subdomain Routing

| Subdomain | Tier | Port | Workers | RAM | Purpose |
|-----------|------|------|---------|-----|---------|
| app.eldersafe.dk | 1 | 8001 | 8 | ~800 MB | PWA dashboard, life signals, auth |
| api.eldersafe.dk | 1 | 8001 | shared | — | REST API for PWA + mobile |
| pay.eldersafe.dk | 2 | 8002 | 4 | ~400 MB | Payments, wallet, sponsorship |
| wallet.eldersafe.dk | 2 | 8002 | shared | — | Member balance and transaction history |
| mail.eldersafe.dk | 3 | 8003 | 3 | ~400 MB | Pseudonymous email (@eldersafe.dk) |
| help.eldersafe.dk | 3 | 8003 | shared | — | Support escalation, knowledge base |
| partners*.eldersafe.dk | 3 | 8003 | shared | — | F-Secure, YouSee, etc. stubs |
| status.eldersafe.dk | 4 | static | 0 | ~0 MB | Health page, uptime |
| docs.eldersafe.dk | 4 | static | 0 | ~0 MB | Documentation |
| **Total (8-15 services)** | | | **15** | **~1.6 GB** | |

### Why Tiered?

Tier 1 (critical path) and Tier 2 (payments) are isolated for security and independent scaling. If a bug in the mail service crashes Tier 3, life signals are unaffected. If payment traffic spikes, pay scales independently of app. Tier 4 has zero Python footprint — Caddy serves static files directly.

---

## Component Specs

### 1. Reverse Proxy — Caddy

| Setting | Value |
|---------|-------|
| TLS | Let's Encrypt auto-renew |
| Protocols | HTTP/2, HTTP/3 (QUIC) |
| Rate limit | 5000 req/s per IP burst, 100/s sustained |
| Health check | `/health` endpoint, 5s interval |
| Static files | `/static/*` served directly by Caddy |
| Max body size | 10 KB (no file uploads needed) |

```
Caddyfile:
  # Tier 1 — Critical path (life signals, auth, API)
  app.eldersafe.dk, api.eldersafe.dk {
      reverse_proxy 127.0.0.1:8001
      rate_limit {
          zone dynamic {
              key {remote_host}
              events 5000
              window 1s
          }
      }
      header {
          X-Frame-Options DENY
          X-Content-Type-Options nosniff
          -Server
      }
  }

  # Tier 2 — Payments (isolated for security)
  pay.eldersafe.dk, wallet.eldersafe.dk {
      reverse_proxy 127.0.0.1:8002
      rate_limit {
          zone dynamic {
              key {remote_host}
              events 500
              window 1s
          }
      }
      header {
          X-Frame-Options DENY
          X-Content-Type-Options nosniff
          -Server
      }
  }

  # Tier 3 — Support services (mail, help, partners)
  mail.eldersafe.dk, help.eldersafe.dk,
  partner1.eldersafe.dk, partner2.eldersafe.dk {
      reverse_proxy 127.0.0.1:8003
      rate_limit {
          zone dynamic {
              key {remote_host}
              events 500
              window 1s
          }
      }
      header {
          X-Frame-Options DENY
          X-Content-Type-Options nosniff
          -Server
      }
  }

  # Tier 4 — Static (no Python, Caddy serves files directly)
  status.eldersafe.dk, docs.eldersafe.dk {
      root * /opt/eldersafe/static/
      file_server
  }

  # Root domain — redirect to app
  eldersafe.dk, www.eldersafe.dk, eldersafe.cloud {
      redir https://app.eldersafe.dk{uri} 301
  }
```

### 2. Application Servers — Gunicorn (Tiered)

**Tier 1: Critical Path (app + api)**

| Setting | Value |
|---------|-------|
| Workers | 8 |
| Worker class | gevent (async) |
| Worker connections | 2000 |
| Bind | 127.0.0.1:8001 |

```
# systemd: /etc/systemd/system/eldersafe-tier1.service
ExecStart=/opt/eldersafe/venv/bin/gunicorn \
    --workers 8 --worker-class gevent --worker-connections 2000 \
    --preload --timeout 30 --max-requests 100000 \
    --bind 127.0.0.1:8001 \
    app_tier1:create_app()
```

**Tier 2: Payments (pay + wallet)**

| Setting | Value |
|---------|-------|
| Workers | 4 |
| Worker class | gevent |
| Worker connections | 1000 |
| Bind | 127.0.0.1:8002 |

```
# systemd: /etc/systemd/system/eldersafe-tier2.service
ExecStart=/opt/eldersafe/venv/bin/gunicorn \
    --workers 4 --worker-class gevent --worker-connections 1000 \
    --preload --timeout 30 --max-requests 50000 \
    --bind 127.0.0.1:8002 \
    app_tier2:create_app()
```

**Tier 3: Support Services (mail, help, partners)**

| Setting | Value |
|---------|-------|
| Workers | 3 |
| Worker class | gevent |
| Worker connections | 1000 |
| Bind | 127.0.0.1:8003 |

```
# systemd: /etc/systemd/system/eldersafe-tier3.service
ExecStart=/opt/eldersafe/venv/bin/gunicorn \
    --workers 3 --worker-class gevent --worker-connections 1000 \
    --preload --timeout 30 --max-requests 50000 \
    --bind 127.0.0.1:8003 \
    app_tier3:create_app()
```

| Tier | Workers | Bind | RAM | systemd unit |
|------|---------|------|-----|-------------|
| 1 (app, api) | 8 | :8001 | ~800 MB | eldersafe-tier1 |
| 2 (pay, wallet) | 4 | :8002 | ~400 MB | eldersafe-tier2 |
| 3 (mail, help, partners) | 3 | :8003 | ~400 MB | eldersafe-tier3 |
| 4 (static) | 0 | — | ~0 MB | (Caddy file_server) |
| **Total** | **15** | | **~1.6 GB** | |

Each tier runs as an independent systemd service. Restart Tier 3 → Tier 1 is unaffected. Upgrade Tier 2 → life signals continue processing.

### 3. Application — Flask

| Setting | Value |
|---------|-------|
| State | Stateless (no in-memory session state) |
| Secret key | From environment variable |
| JWT expiry | 24 hours |
| Session cookie | HttpOnly, Secure, SameSite=Strict |
| Database | PostgreSQL via psycopg2 connection pool (min=5, max=50) |
| Redis | redis-py, connection pool |

**Endpoints:**

| Method | Path | Auth | Rate limit | Notes |
|--------|------|------|------------|-------|
| GET | `/health` | None | No limit | Returns DB + Redis status |
| POST | `/api/auth/login` | None | 5/min per IP | Member # + email → JWT magic link |
| GET | `/api/auth/verify?token=` | Token | 10/min per IP | Magic link verification |
| POST | `/api/signal` | JWT | 1/10s per member | Life signal. Idempotent per 24h window. |
| GET | `/api/member/status` | JWT | 10/min | Balance, signal streak, contract state |
| POST | `/api/sponsor/pay` | JWT | 10/min | Sponsor another member |

### 4. Database — PostgreSQL 17

| Setting | Value |
|---------|-------|
| Version | 17 |
| Connection pool | 50 max |
| WAL level | replica (for streaming replication) |
| Backup | pg_dump daily, WAL archiving to off-site |

**Schema (core tables):**

```sql
-- Pseudonymous: never stores names, CPR, addresses, personal email
CREATE TABLE members (
    id              SERIAL PRIMARY KEY,
    member_number   TEXT UNIQUE NOT NULL,     -- Ældre Sagen member number
    email_hash      TEXT UNIQUE NOT NULL,     -- SHA256(lower(email))
    bsv_public_key  TEXT,                      -- derived from salted hash
    created_at      TIMESTAMPTZ DEFAULT NOW(),
    last_signal_at  TIMESTAMPTZ,
    signal_streak   INTEGER DEFAULT 0,
    contract_state  TEXT DEFAULT 'active',    -- active, breach_day_N, terminated
    reward_balance  BIGINT DEFAULT 0,         -- satoshis earned
    sms_escrow      BIGINT DEFAULT 0          -- satoshis reserved for SMS
);

CREATE TABLE signals (
    id              SERIAL PRIMARY KEY,
    member_id       INTEGER REFERENCES members(id),
    channel         TEXT,                      -- app, sms, email
    logged_at       TIMESTAMPTZ DEFAULT NOW(),
    bsv_txid        TEXT,                      -- populated after batch broadcast
    UNIQUE(member_id, DATE(logged_at))         -- one signal per day
);

CREATE TABLE audit_log (
    id              SERIAL PRIMARY KEY,
    member_id       INTEGER,
    event_type      TEXT,                      -- signal, breach, verify, sponsor
    event_data      JSONB,
    bsv_txid        TEXT,
    created_at      TIMESTAMPTZ DEFAULT NOW()
);

CREATE INDEX idx_signals_member_date ON signals(member_id, logged_at);
CREATE INDEX idx_audit_member ON audit_log(member_id, created_at);
CREATE INDEX idx_members_contract ON members(contract_state, last_signal_at);
```

### 5. Queue — Redis 7

| Key | Type | Purpose |
|-----|------|---------|
| `pending_tx` | LIST | Signals waiting for BSV broadcast |
| `rate_limit:{member_id}` | STRING | Per-member rate limiting (INCR + TTL) |
| `session:{token}` | STRING | JWT blacklist / invalidation |

### 6. BSV Worker

Background process (systemd service or supervisor):

```python
# Runs every 60 seconds
while True:
    signals = []
    # Collect up to 100 pending signals
    for _ in range(100):
        signal = redis.lpop('pending_tx')
        if not signal:
            break
        signals.append(signal)

    if not signals:
        sleep(60)
        continue

    # Build batched BSV transaction
    tx_hex = build_batched_bsv_tx(signals)
    txid = broadcast_to_bsv(tx_hex)

    # Update PostgreSQL with txid
    db.execute(
        "UPDATE signals SET bsv_txid = %s WHERE id = ANY(%s)",
        (txid, [s['id'] for s in signals])
    )

    # Also update member reward balances
    for s in signals:
        db.execute(
            "UPDATE members SET reward_balance = reward_balance + 1 WHERE id = %s",
            (s['member_id'],)
        )
```

### 7. BSV Network Integration

| Option | Pros | Cons |
|--------|------|------|
| **Whatsonchain API** (paid tier) | No node to maintain, reliable | External dependency, API limits |
| **Own BSV node** | Full sovereignty, no limits | ~200 GB disk, maintenance, sync time |
| **Hybrid** | WOC for broadcast, own node for verification | Best of both, moderate ops |

**Recommendation:** Start with Whatsonchain paid tier for pilot and early scale. Add own BSV node when transaction volume justifies it (~$200/month for a node server).

### 8. Monitoring

| What | How |
|------|-----|
| Uptime | Caddy health check + external ping (healthchecks.io or self-hosted) |
| Signal processing lag | `pending_tx` list length in Redis (alert if > 1000) |
| Failed BSV broadcasts | Worker failure counter in PostgreSQL |
| DB replication lag | pg_replication_lag_bytes on replica |
| Disk space | Standard system monitoring |
| Error rate | Flask error logging to file + structured JSON |

---

## Production Server Spec

| Component | Pilot (100 members) | Scale (1M members, 5K concurrent) |
|-----------|---------------------|----------------------------------|
| CPU | 2 vCPU | 8 vCPU |
| RAM | 8 GB | 16 GB |
| Disk | 100 GB SSD | 500 GB NVMe |
| OS | Ubuntu 24.04 LTS | Ubuntu 24.04 LTS |
| Provider | This VPS | Hetzner AX52 or equivalent |
| Gunicorn RAM | ~200 MB (1 instance) | ~1,600 MB (3 tiered instances) |
| PostgreSQL RAM | ~200 MB | ~2,000 MB (shared_buffers) |
| Redis RAM | ~50 MB | ~500 MB |
| Caddy RAM | ~50 MB | ~100 MB |
| BSV Worker RAM | ~100 MB | ~200 MB |
| OS overhead | ~800 MB | ~1,000 MB |
| **Actual used** | **~1.4 GB** | **~5.4 GB** |
| **Headroom** | **6.6 GB** | **~10.6 GB** |

**Estimated monthly hosting: 500-700 DKK**

---

## Data Flow: Life Signal

```
Step 1: Member taps "I'm here" (app/SMS/email)
        │
Step 2: Caddy terminates TLS, forwards to Gunicorn
        │
Step 3: Flask validates JWT, checks rate limit (Redis INCR)
        │
Step 4: INSERT INTO signals (member_id, channel, logged_at)
        │  ON CONFLICT (member_id, date) DO NOTHING  ← idempotent
        │
Step 5: RPUSH pending_tx {member_id, signal_id, timestamp}
        │
Step 6: UPDATE members SET last_signal_at = NOW(), signal_streak += 1
        │
Step 7: Return 200 {signal_id, streak, balance}
        │  Total: ~10-20ms

        ── ASYNCHRONOUS ──

Step 8: BSV Worker LPOP from pending_tx (every 60s)
Step 9: Build batched BSV transaction (100 signals)
Step 10: Broadcast to BSV network
Step 11: UPDATE signals SET bsv_txid = ...
Step 12: UPDATE members SET reward_balance += 1
```

---

## Recovery & Failover

| Failure | Impact | Response |
|---------|--------|----------|
| PostgreSQL down | No new signals processed | Gunicorn returns 503. Worker pauses. Restart DB. |
| Redis down | BSV broadcast delayed | Signals written to PostgreSQL. Queue lost on restart. Worker catches up from DB. |
| Gunicorn crash | No web traffic | systemd auto-restarts. Caddy returns 502 briefly. |
| Caddy crash | No HTTPS | systemd auto-restarts. ~2s outage. |
| BSV API down | No on-chain confirmation | Worker retries with exponential backoff. Signals stored in DB. |
| Whole server down | Full outage | DB replica can be promoted. DNS failover to standby. |

---

## Cost Summary (Operational Pool)

| Line | Monthly | Annual | Notes |
|------|---------|--------|-------|
| Production server (Hetzner AX52, 16 GB, 8 cores) | 600 DKK | 7,200 DKK | Dedicated bare metal, no noisy neighbors |
| BSV API (Whatsonchain paid tier) | 300 DKK | 3,600 DKK | Guaranteed throughput, SLA-backed |
| SMS gateway (GatewayAPI.dk, paid) | 100 DKK | 1,200 DKK | Danish provider, per-SMS pricing |
| Email delivery (SendGrid paid, 100K emails/mo) | 150 DKK | 1,800 DKK | No free tier — dedicated IP, guaranteed delivery |
| Domains + DNS | 40 DKK | 480 DKK | eldersafe.dk, eldersafe.cloud, DNS management |
| Off-site backups (Hetzner Storage Box 500 GB) | 50 DKK | 600 DKK | Encrypted, geographically separate |
| Monitoring (Better Uptime, 50 monitors + status page) | 200 DKK | 2,400 DKK | Public status page, on-call alerts, 30s checks |
| PostgreSQL (self-hosted, but hardware cost included above) | — | — | Covered by server |
| Redis (self-hosted, included above) | — | — | Covered by server |
| Caddy + Let's Encrypt (self-hosted, included above) | — | — | Covered by server |
| **Total infrastructure** | **~1,440 DKK** | **~17,280 DKK** |

From the 21.5M DKK operational pool: **0.08%**

The remaining 21,482,720 DKK covers engineering, compliance, security audits, insurance, DPO, and the notary retainer (already allocated separately at 5 DKK/member = 5M DKK/year).

### Why nothing is free

Every line item corresponds to a real vendor contract with an invoice, an SLA, and an audit trail. "Free" services disappear, change terms, or monetize your data. Eldersafe members pay 50 DKK/year — they're entitled to infrastructure that is contractually accountable. A Better Uptime status page means any member can verify uptime without trusting Eldersafe's word. A paid SendGrid account means email deliverability is guaranteed, not best-effort.
