Skip to content

Multi-Tenancy — Initial Setup Runbook

Ticket: TQ-115 — Phase 0 / Phase 1 deliverable Audience: Ops. Run this once per Keycloak / PostgreSQL environment. Tenant onboarding (per-tenant): doc/operations/tenant-provisioning.md

This runbook covers the foundational install of TQPro multi-tenancy. The main path is greenfield — a clean stack with no pre-existing TQPro data. If you're migrating an existing single-tenant install, do the greenfield steps below and then run Appendix A to import the existing tenant.


0. Execution checklist

Tick each item as you go. Every row points to the section that explains it. Phases 1–8 are the greenfield main path. Appendices A and B are conditional.

Phase 0 — Prep

  • [ ] Verify prerequisites (Keycloak admin, PG rights, sudo, tooling, schema dump source) — §1
  • [ ] Decide path: greenfield (run §2–§8) or in-place migration (run §2–§8, then Appendix A) — document header
  • [ ] Understand tourlinq.properties (API-side) — location, scope, lifecycle — §1A
  • [ ] Understand tqpro-platform.properties (orchestration-side) — three sections, shared-keys policy — §1B
  • [ ] Skim the deployment layout reference — what file lives on which host — §1C
  • [ ] On API hosts: create TLINQ_HOME, install an initial tourlinq.properties, set mode 0640 owner tqpro-svc§1A — Initial setup
  • [ ] Export TLINQ_HOME in the tqpro-api systemd unit (no need on the orchestration host) — §1A — Initial setup

Phase 1 — Keycloak platform service account (§2)

  • [ ] Create tqpro-platform-admin client in master realm — §2 steps 1–3
  • [ ] Copy and save the client secret — §2 step 4
  • [ ] Assign only the create-realm realm role to the service account — §2 step 5
  • [ ] Run verification curls (create test realm → 201, read master users → 403, delete test realm → 204) — §2 Verification curls

Phase 2 — Platform DB (§3)

  • [ ] Create tqpro_platform Postgres role and tqplatform database — §3
  • [ ] Apply 0001-tqplatform-schema.sql§3
  • [ ] Verify tenant is empty and schema_migrations has the bootstrap row — §3
  • [ ] Record platform.db.* credentials in tourlinq.properties (API host) AND mirror in tqpro-platform.properties Section B (orchestration host) — §3 / §1B

Phase 3 — PostgreSQL tuning (§4)

  • [ ] Raise max_connections, shared_buffers, work_mem and restart PostgreSQL — §4
  • [ ] Set tenant.db.host / tenant.db.port in tourlinq.properties (API) AND mirror in tqpro-platform.properties Section B — §4
  • [ ] Set template.db.name in tqpro-platform.properties Section A (orchestration only) — §4

Phase 4 — Encryption key (§5)

  • [ ] Generate 64-hex-char TQPRO_ENCRYPTION_KEY§5 step 5.1
  • [ ] Install /etc/tqpro/tqpro.env with correct ownership and 0600 perms — §5 step 5.2
  • [ ] Wire the env file into the tqpro-api systemd unit (EnvironmentFile=) — §5
  • [ ] Store a sealed offline backup of the key — §5

Phase 5 — Template DB (§6)

  • [ ] Produce pg_dump --schema-only from a current TQPro source DB — §6.1
  • [ ] Run scripts/db/bootstrap-template-db.sh --schema-file <dump>§6.2
  • [ ] Verify nts.booking count is 0 and schema_migrations matches the file count in config/db-changes/§6.2
  • [ ] (Ongoing) Run scripts/db/apply-tenant-migrations.sh after every deploy shipping new migrations — §6.3

Phase 6 — nginx three-tier (§7)

Order matters: §7.1.4 SSHes to the gateway, so §7.2 must finish before it.

  • [ ] Orchestration host: install the toolkit (pinned repo checkout) at /opt/tqpro-orchestration/§7.1.1
  • [ ] Orchestration host (host-wide): create tqpro-ops group, add operators, install tqpro-platform.properties at /etc/tqpro/platform/, generate the shared tqpro-deploy SSH key under /etc/tqpro/ssh/, seed /etc/ssh/ssh_known_hosts§7.1.2
  • [ ] Orchestration host (per operator): set up ~/.pgpass, point ~/.ssh/config at the shared deploy key (no TLINQ_HOME needed) — §7.1.3
  • [ ] Gateway (DMZ): install nginx + certbot, create tqpro-deploy user, install its SSH key, drop the NOPASSWD sudoers file — §7.2
  • [ ] Each web host: install nginx, create tqpro-deploy, install SSH key, drop sudoers, deploy SPA bundle to /opt/tqpro/tqweb-adm§7.3
  • [ ] Orchestration host: run scripts/platform/render-upstreams.sh to render the shared upstreams file (depends on §7.2) — §7.1.4
  • [ ] Set upstream.api, upstream.web, gateway.deploy.host, web.deploy.hosts in tqpro-platform.properties Section A — §7.4
  • [ ] Confirm no per-tenant vhosts are expected until tenant 1 is onboarded — §7.5
  • [ ] (Dev only) Add /etc/hosts or dnsmasq entries for local tenants — §7.6

Phase 7 — Smoke test (§8)

  • [ ] Prereq: API server installed and configured per doc/deployment/bare-metal-deployment.md§8 prereq
  • [ ] Pre-flight: TLINQ_HOME readable by tqpro-svc, platform DB reachable, tqpro.env wired into systemd unit, ports free — §8.0
  • [ ] Restart tqpro-api, confirm log shows TenantRegistry loaded 0 active tenant(s)§8.1
  • [ ] Confirm /tlinq-api/auth/config returns 404 unknown-tenant with no tenants present — §8 step 8.2
  • [ ] Confirm the platform service account can issue a token — §8 step 8.3

Phase 8 — Proceed to tenant onboarding

  • [ ] Continue with the per-tenant runbook doc/operations/tenant-provisioning.md§8 / §10

Appendix A — Only if migrating an existing single-tenant install

  • [ ] A.1 Add manager + finance realm roles and create the tqpro-admin-api client in the existing tqpro-adm realm — §A.1
  • [ ] A.2 Apply 0072-schema-migrations-ledger.sql and run bootstrap-schema-migrations.sh tlinq§A.2
  • [ ] A.3 Dump existing tlinq schema and bootstrap the template DB from it — §A.3
  • [ ] A.4 Insert the seed tenant row into tqplatform.tenant and refresh the registry — §A.4
  • [ ] A.5 Replace legacy nginx-gw.conf with a per-tenant vhost, disable the old block, reload nginx — §A.5

Appendix B — Only if running in a closed lab without public Let's Encrypt

  • [ ] B.1.1 Pick CA hostname; add /etc/hosts entry on step-ca host if lab DNS doesn't have it — §B.1.1
  • [ ] B.1.2 Pre-flight: hostname resolves, port 8443 free — §B.1.2
  • [ ] B.1.3 dpkg -i step-cli/step-ca; step ca init --acme with password files — §B.1.3
  • [ ] B.1.4 Move ~/.step/etc/step-ca/, rewrite absolute paths in ca.json§B.1.4
  • [ ] B.1.5 Install systemd unit, enable + start step-ca — §B.1.5
  • [ ] B.1.6 Smoke test: /health and /acme/acme/directory respond — §B.1.6
  • [ ] B.1.7 Remove ~/.step, back up ca-pass.txt and root fingerprint — §B.1.7
  • [ ] B.2 Add /etc/hosts entry on gateway, push root cert, install with update-ca-certificates, verify curl https://<ca-host>:8443/health without -k§B.2
  • [ ] B.3 Set certbot.acme.server in tqpro-platform.properties Section A on the orchestration host (single edit, no API-side change) — §B.3
  • [ ] B.4 Per dev machine: install root in OS trust store, install in Firefox separately if used, add /etc/hosts entries — §B.4
  • [ ] B.5 Provision a lab tenant and confirm step-ca issued the cert — §B.5

Troubleshooting reference

  • Common issues: TLINQ_HOME unset, certbot HTTP-01 failure, Keycloak 401 after realm create, template-bootstrap schema path, migrations failing on tenants but not template — §9
  • Lab / step-ca–specific failures — §B.8

1. Prerequisites checklist

Before starting, make sure you have:

  • Admin access to the Keycloak master realm
  • A PostgreSQL role with CREATE DATABASE and max_connections tuning rights
  • sudo on the gateway host (for nginx + certbot)
  • python3 + openssl for key generation
  • A pg_dump --schema-only file of the tenant schema, produced from any current TQPro source DB (see §4.2). Reverse-engineering the schema by re-running early migrations against an empty DB is not safe — the early migrations were reverse-engineered themselves and may not faithfully reproduce production schema. A verified dump is the only trusted source of truth.
  • The TQPro repo checked out where you'll run scripts

1A. The tourlinq.properties configuration file

tourlinq.properties is the API-side configuration file, read by the Java API server at startup. It carries application-level keys (mail, payment gateway, AI, third-party APIs) plus the platform / tenant DB connection details the API needs to wire up PlatformDbConfig and TenantAwareDBSession.

Orchestration-side configuration lives in a separate file: tqpro-platform.properties. See §1B for that one. The provisioning shell scripts no longer read tourlinq.properties at all — splitting the two means the orchestration host doesn't need $TLINQ_HOME and there's no "edit both copies" risk.

Location

The file lives at:

$TLINQ_HOME/tourlinq.properties

TLINQ_HOME is the environment variable that anchors the API host's config layout: point it at the directory that holds tourlinq.properties, tlinqapi.properties, the entity XMLs, and the properties.d/ override directory.

Host Typical TLINQ_HOME
Production API host (bare-metal, recommended) /var/tqpro/api — symlink to the current release; tourlinq.properties inside is itself a symlink to /var/tqpro/conf/tourlinq.properties so upgrades don't overwrite config. See doc/deployment/bare-metal-deployment.md.
Small single-host install /etc/tqpro/config — file lives directly, no symlink layer
Orchestration host (build server / dedicated ops box) N/A — orchestration host does not run the API and does not need $TLINQ_HOME. It uses tqpro-platform.properties instead; see §1B.
Developer checkout <repo>/config — the tourlinq.properties committed to the repo is a dev template

Key files next to tourlinq.properties in the same directory:

File Purpose
tourlinq.properties Application settings, DB connections, integrations
tlinqapi.properties Jetty / API-server settings (HTTP/HTTPS ports, auth, dev-mode). Separate file — don't mix app config and server config
tourlinq-config.xml Entity + transformer config (references entities/*.xml via XInclude)
api-roles.properties Endpoint → role mappings
properties.d/*.properties Override files merged on top (see below)

Who reads it

  • Java API server only — read once at startup by the AppConfig singleton (tqcommon/…/util/AppConfig.java) and cached for the JVM lifetime. Consumed by PlatformDbConfig, TenantAwareDBSession, facades, renderers, and anything calling AppConfig.getInstance().getProp(...).
  • Provisioning shell scripts: NOT read. Scripts read tqpro-platform.properties instead — see §1B.

File format

Standard Java Properties format — UTF-8, key=value, # for comments:

# Section header comment
platform.db.url=jdbc:postgresql://localhost:5432/tqplatform
platform.db.user=tqpro_platform
platform.db.pass=s3cret

Whitespace around = is trimmed by Java.

What's in it — logical sections

Block Keys Set up in
Company / content company.code, content.directory, content.cdn-prefix Legacy
Mail mail.server, mail.user, mail.password, mail.usetls Legacy
Payment gateway pgw.class, telr.*, pgw.callback-base Legacy
Third-party APIs goglobal.api.password, tiqets.api.key, rapidapi.visa.key Legacy
AI ai.provider, ai.api.key, ai.model, ai.prompt.file ai-outline-admin-guide.md
PDF / branding tqpro.company.name, tqpro.company.logo.url Legacy
Platform registry DB platform.db.url, platform.db.user, platform.db.pass §3also mirror in §1B Section B
Tenant DB host tenant.db.host, tenant.db.port §4also mirror in §1B Section B

Override files — properties.d/

On startup, AppConfig scans $TLINQ_HOME/properties.d/ for *.properties files, loads them in alphabetical order, and merges them on top of tourlinq.properties. Later files override earlier ones and the base file.

Existing override files in the repo:

  • properties.d/messaging.properties — Twilio SMS/WhatsApp, broadcast settings
  • properties.d/erp-booking.properties — Odoo ERP channel + default product mappings

Use properties.d/ for:

  • Module-scoped configuration you want to add or remove as a unit
  • Host-specific overrides (staging vs prod) kept outside the base file
  • Secrets kept off-repo (e.g. secrets.properties managed by your secrets tooling, dropped onto the host separately)

properties.d/ is read only by the API. The orchestration scripts read tqpro-platform.properties (single file, no overrides) — see §1B.

Initial setup (greenfield)

  1. Create the config directory on the target host, owned by the service user:
sudo useradd -r -s /usr/sbin/nologin tqpro-svc   # if not already present
sudo install -d -m 0755 -o tqpro-svc -g tqpro-svc /etc/tqpro/config
sudo install -d -m 0755 -o tqpro-svc -g tqpro-svc /etc/tqpro/config/properties.d
  1. Copy the template tourlinq.properties from your release bundle (or from the repo's config/tourlinq.properties) into TLINQ_HOME:
sudo install -m 0640 -o tqpro-svc -g tqpro-svc \
    tourlinq.properties /etc/tqpro/config/tourlinq.properties

Mode 0640 — the file holds secrets and must not be world-readable.

  1. Do the same for tlinqapi.properties, tourlinq-config.xml, api-roles.properties, the entities/ tree, and anything else the API server needs at startup. In bare-metal deployments this is handled by the release process (see doc/deployment/bare-metal-deployment.md §8.3 for the canonical TLINQ_HOME layout).

  2. Edit tourlinq.properties and fill in the blocks that this runbook assigns values for. Every section below uses property references of the form platform.db.url=…; open the file now in an editor so you can fill them in as you go.

  3. Export TLINQ_HOME for every consumer:

# API server (systemd unit file, see §5 for the full env-file pattern):
#   Environment=TLINQ_HOME=/etc/tqpro/config

# Ops shell (for running the provisioning scripts):
echo 'export TLINQ_HOME=/etc/tqpro/config' | sudo tee -a /etc/profile.d/tqpro.sh
sudo chmod 0644 /etc/profile.d/tqpro.sh

Lifecycle — when to edit, when to restart

Change What to do
Edit a key read by the API server (platform.db.*, mail.*, ai.*, etc.) Save the file, then systemctl restart tqpro-api. AppConfig is a static singleton and caches values for the JVM lifetime — there is no live-reload endpoint.
Edit a key read only by shell scripts (upstream.*, web.deploy.hosts, gateway.deploy.*, certbot.acme.server) No API restart. Next script invocation picks up the change. For upstream.* edits, re-run scripts/platform/render-upstreams.sh to push the updated pools to the gateway.
Add / remove a file in properties.d/ Restart the API server. Shell scripts ignore this directory.
Rotate a secret (SMTP password, API key, DB password) Edit the value in place, restart the API. If the file is replicated across hosts, push the updated file first and restart in a rolling fashion.
Onboard a new tenant No property edit needed. The tenant row is inserted into tqplatform.tenant by tenant-provision.sh, not into this file.
Scale the API or web tier (add a host) Append the new host:port to upstream.api or upstream.web, re-run scripts/platform/render-upstreams.sh. No per-tenant changes, no API restart — the gateway re-renders the shared upstreams file.
Deploy new tenant schema migrations No property edit; scripts/db/apply-tenant-migrations.sh covers the template + every ACTIVE tenant DB.
Move to a lab / closed network Set certbot.acme.server to your step-ca URL (§B.3). No other keys change.

Permissions & secrets

  • Mode 0640, owner tqpro-svc. Never world-readable — the file carries SMTP passwords, API keys, DB passwords, payment-gateway auth keys, and similar.
  • If multiple human operators need to read it, create a tqpro-ops group, chown tqpro-svc:tqpro-ops, and add operator accounts to the group.
  • Never commit environment-specific values to the repo. The config/tourlinq.properties in Git is a dev template; production values live on the host.
  • The one secret that is not in this file by design is TQPRO_ENCRYPTION_KEY — it lives in /etc/tqpro/tqpro.env and is read from the process environment (see §5). This isolation means a leak of tourlinq.properties does not also leak the master key used to decrypt tqplatform.tenant secrets.

Backup

Back up TLINQ_HOME as a unit. At minimum:

  • tourlinq.properties
  • tlinqapi.properties
  • properties.d/*.properties
  • any customized entities/*.xml

Back it up alongside /etc/tqpro/tqpro.env — together they are the full host-side config state. Plus the tqplatform database and every tenant DB, and you have a complete, restorable deployment.


1B. The tqpro-platform.properties configuration file

tqpro-platform.properties is the orchestration-side configuration file, read by the tenant provisioning shell scripts. It carries the keys those scripts need: tenant DB topology, the nginx upstream pools, SSH deploy targets, certbot's ACME server URL, the platform domain, the platform API URL, and the Keycloak base URL.

The Java API does NOT read this file. Some keys (DB connection details, URLs) are intentionally duplicated between this file and tourlinq.properties / tlinqapi.properties on the API host — see the Shared keys section below.

Location

The file lives at:

/etc/tqpro/platform/tqpro-platform.properties

Override the path via env var: TQPRO_PLATFORM_CONFIG.

Host Has this file?
Production API host No — API host doesn't run the scripts. Has tourlinq.properties instead (§1A).
Orchestration host (build server / dedicated ops box) Yes — installed at /etc/tqpro/platform/tqpro-platform.properties owned root:tqpro-ops 0640. No $TLINQ_HOME is required on this host.
Single-host dev (orch + API on one box) Both files coexist — different paths, different filenames, no collision. The shared keys in Section B/C must agree between the two files.

Who reads it

Read on every invocation by two groups of shell scripts via plain grep against $TQPRO_PLATFORM_CONFIG.

Orchestration scripts (scripts/platform/) — read most of the file. Live on the orchestration host because they need the SSH deploy keys, nginx vhost templates, and the platform API URL:

  • scripts/platform/tenant-provision.sh
  • scripts/platform/tenant-rollback.sh
  • scripts/platform/render-upstreams.sh

DB-management scripts (scripts/db/) — read only template.db.name from this file (everything else comes from libpq env / ~/.pgpass). Runnable from anywhere with PostgreSQL client tooling — orchestration host, build server, DBA laptop, API host post-deploy:

  • scripts/db/bootstrap-template-db.sh
  • scripts/db/bootstrap-schema-migrations.sh
  • scripts/db/apply-tenant-migrations.sh
  • scripts/db/apply-platform-migrations.sh

Contents — three sections

The committed template (config/platform/tqpro-platform.properties) groups keys into three sections with explicit headers:

Section A — orchestration-only keys (no equivalent in tourlinq.properties)

Key Purpose
platform.domain DNS suffix per-tenant subdomains live under (e.g. acme.tourlinq.com)
template.db.name Source DB for tenant clones (default tqpro_template)
upstream.api, upstream.web Comma-separated host:port lists rendered into tqpro-upstreams.conf on the gateway
gateway.deploy.host, gateway.deploy.user SSH target for nginx fan-out on the gateway
web.deploy.hosts, web.deploy.user SSH targets for the web tier
certbot.acme.server Optional private ACME URL (lab); empty = public Let's Encrypt

Section B — duplicated from tourlinq.properties

Key in tqpro-platform.properties Source on API host
platform.db.url platform.db.url in tourlinq.properties
platform.db.user platform.db.user in tourlinq.properties
platform.db.pass platform.db.pass in tourlinq.properties
tenant.db.host tenant.db.host in tourlinq.properties
tenant.db.port tenant.db.port in tourlinq.properties

These are duplicated so the orchestration host can be self-sufficient — an operator running scripts there can discover where the platform DB and tenant DBs live without SSHing to an API host.

Section C — duplicated from tlinqapi.properties

Key in tqpro-platform.properties Source on API host
platform.api.url Constructed from http-port in tlinqapi.properties + the API host's external FQDN. Where tenant-provision.sh POSTs /platform/tenant/provision.
keycloak.base.url oidc-keycloak-base-url in tlinqapi.properties. Used by tenant-rollback.sh to clean up Keycloak realms.
platform.api.url — base URL only, no path

scripts/platform/tenant-provision.sh constructs the full request URL as ${platform.api.url}/tlinq-api/platform/tenant/provision, so the property holds just the base<scheme>://<host>[:<port>].

Do NOT include /tlinq-api, any other path component, or a trailing slash. The script appends them.

# Behind a load balancer (recommended for HA — Jetty is stateless):
platform.api.url=http://api-lb.internal:11080

# Single API host, reachable by FQDN:
platform.api.url=http://api1.internal:11080

# Single-host dev (orchestration + API on one box):
platform.api.url=http://127.0.0.1:11080

The legacy PLATFORM_API_URL env var (which still overrides the property) follows the opposite convention — it expects the full URL including /tlinq-api. That's an artefact of pre-existing scripts; the property is the cleaner shape going forward.

keycloak.base.url — base URL only, no path

scripts/platform/tenant-rollback.sh constructs the full request URL as ${keycloak.base.url}/admin/realms/<tenant-code>. The property holds just the Keycloak base URL<scheme>://<host>[:<port>].

Do NOT include /admin, /realms, or a trailing slash. Exception: on Keycloak ≤ 16 (legacy WildFly distro) include /auth because that's the context root used there; Keycloak ≥ 17 (Quarkus) dropped it.

# Modern Keycloak (≥ 17, Quarkus distribution — what TQ-115 targets):
keycloak.base.url=https://kc.example.com
keycloak.base.url=https://keycloak.internal:8443

# Lab with self-signed / step-ca cert:
keycloak.base.url=https://kc.lab.local

# Legacy Keycloak (≤ 16, WildFly distribution — pre-Quarkus):
keycloak.base.url=https://kc.example.com/auth

The same value goes in tlinqapi.properties on the API host as oidc-keycloak-base-url — keep them in sync (that's the whole point of Section C).

Shared keys — must stay in sync with the API

The following keys exist in both tqpro-platform.properties (orchestration host) and tourlinq.properties / tlinqapi.properties (API host(s)). Operator responsibility: when changing any of these on one side, change the other.

Key API-side file API-side key
platform.db.url tourlinq.properties platform.db.url
platform.db.user tourlinq.properties platform.db.user
platform.db.pass tourlinq.properties platform.db.pass
tenant.db.host tourlinq.properties tenant.db.host
tenant.db.port tourlinq.properties tenant.db.port
platform.api.url tlinqapi.properties http-port (combined with FQDN)
keycloak.base.url tlinqapi.properties oidc-keycloak-base-url

Recommended pattern: store the canonical values in config-management (Ansible / private Git) and render to both hosts. For small lab installs, maintain the discipline manually.

This is intentional duplication. The trade-off: one synchronization point per shared key in exchange for a self-sufficient orchestration host that doesn't need to SSH into an API host to discover service endpoints.

Initial setup

# On the orchestration host:
sudo install -d -m 0750 -o root -g tqpro-ops /etc/tqpro/platform
sudo install -m 0640 -o root -g tqpro-ops \
    config/platform/tqpro-platform.properties \
    /etc/tqpro/platform/tqpro-platform.properties

# Edit the deployed file with values for your environment.
sudo ${EDITOR:-vi} /etc/tqpro/platform/tqpro-platform.properties

# Sanity check: scripts can read the file.
PLATFORM_CONFIG=/etc/tqpro/platform/tqpro-platform.properties
test -r "${PLATFORM_CONFIG}" && echo "readable" \
    || echo "check group membership (tqpro-ops) and re-login"

Lifecycle — when to edit, when scripts pick changes up

Change What to do
Edit any key in this file No service restart. Next script invocation picks up the change.
Add or remove an API host Update upstream.api (and any other API-side state); re-run scripts/platform/render-upstreams.sh to push the new upstreams to the gateway.
Add or remove a web host Update upstream.web and web.deploy.hosts; re-run scripts/platform/render-upstreams.sh to push the upstream change. New web hosts get per-tenant vhosts on the next tenant-provision.sh invocation.
Move to a lab/closed network Set certbot.acme.server to your step-ca URL; subsequent provisions issue against it.
Switch back to public LE Empty certbot.acme.server; new provisions use LE. Existing renewals stay with whichever server issued them (per-cert state in /etc/letsencrypt/renewal/).
Rotate a shared key (e.g. platform.db.pass) Edit on the API host's tourlinq.properties AND in this file's Section B; restart tqpro-api; verify scripts still authenticate.

Permissions & secrets

  • Mode 0640, owner root:tqpro-ops. Read access via group membership; see §7.1.2.
  • Carries platform.db.pass and (potentially) URLs that hint at internal topology. Treat as sensitive.
  • Never commit environment-specific values to the repo. The committed config/platform/tqpro-platform.properties is a dev template.

Backup

Back up /etc/tqpro/platform/ alongside /etc/tqpro/tqpro.env (the encryption key) and the API hosts' $TLINQ_HOME config trees. Together they describe the full host-side configuration of the deployment.


1C. Where things live — deployment layout reference

A scannable map of every file path the runbook touches, grouped by host. Useful for the build script (which files have to land where) and for ops (what's where on a running install).

The runbook itself walks through creating each item; this section is a reference, not a setup guide.

Orchestration host (build-server-deployed toolkit + ops config)

Path Source Notes
/opt/tqpro-orchestration/scripts/platform/ Build artifact tenant-provision.sh, tenant-rollback.sh, render-upstreams.sh, render-vhost.py
/opt/tqpro-orchestration/scripts/db/ Build artifact bootstrap-template-db.sh, bootstrap-schema-migrations.sh, apply-platform-migrations.sh, apply-tenant-migrations.sh
/opt/tqpro-orchestration/config/Nginx Config/templates/ Build artifact upstreams.conf.template, tenant-gw.conf.template, tenant-web.conf.template
/opt/tqpro-orchestration/config/db-changes/*.sql Build artifact Migration SQL files (read by scripts/db/apply-*.sh and scripts/db/bootstrap-*.sh)
/etc/tqpro/platform/tqpro-platform.properties Ops, separate from artifact Sections A/B/C — see §1B. root:tqpro-ops mode 0640.
/etc/tqpro/ssh/tqpro-deploy Ops, generated once in §7.1.2 Shared SSH private key. root:tqpro-ops mode 0640.
/etc/tqpro/ssh/tqpro-deploy.pub Ops, generated alongside the private key Public key — installed into authorized_keys on the gateway and every web host. root:tqpro-ops mode 0644.
/etc/ssh/ssh_known_hosts Ops, ssh-keyscan in §7.1.2 Hashed host keys for gateway + web hosts. Required because scripts use BatchMode=yes.
~/.pgpass Per-operator libpq passwords for tqplatform + tenant DBs. Mode 0600, owned by the operator.
~/.ssh/config Per-operator Match user tqpro-deploy host … block pointing at /etc/tqpro/ssh/tqpro-deploy.

No $TLINQ_HOME on this host. The orchestration host does not run the API. Scripts read $TQPRO_PLATFORM_CONFIG (default /etc/tqpro/platform/tqpro-platform.properties) instead.

Path Source Notes
/var/tqpro/api/api-<build>/ Release deploy $TLINQ_HOME — symlink to current release. See doc/deployment/bare-metal-deployment.md.
/var/tqpro/conf/tourlinq.properties Ops App + platform.db.* + tenant.db.host/port. Symlinked into $TLINQ_HOME. See §1A.
/var/tqpro/conf/tlinqapi.properties Ops Jetty / API-server settings (ports, auth, OIDC). Symlinked into $TLINQ_HOME.
/var/tqpro/conf/tourlinq-config.xml + entities/*.xml Ops Entity + transformer config
/var/tqpro/conf/api-roles.properties Ops API endpoint → role mappings
/var/tqpro/conf/properties.d/*.properties Ops Override files (messaging, ERP) — merged onto tourlinq.properties
/etc/tqpro/tqpro.env Ops, §5 TQPRO_ENCRYPTION_KEY + TQPRO_PLATFORM_ADMIN_SECRET. Mode 0600, owner tqpro-svc.
/etc/systemd/system/tqpro-api.service Ops systemd unit. EnvironmentFile=/etc/tqpro/tqpro.env, Environment=TLINQ_HOME=/var/tqpro/api.

For single-host dev installs (orchestration = API on one box), $TLINQ_HOME is typically /etc/tqpro/config directly with no symlink layer; the same files live there.

Gateway host (DMZ, 1 host today)

Path Source Notes
/etc/nginx/conf.d/tqpro-upstreams.conf scripts/platform/render-upstreams.sh Shared upstream pools (tqpro_api, tqpro_web). Re-rendered when scaling either tier.
/etc/nginx/sites-available/<tenant-code>.conf scripts/platform/tenant-provision.sh Per-tenant gateway vhost. One per ACTIVE tenant.
/etc/nginx/sites-enabled/<tenant-code>.conf scripts/platform/tenant-provision.sh Symlink → sites-available.
/var/www/certbot/ Ops, one-time in §7.2 Webroot for HTTP-01 challenges. Owned www-data.
/etc/letsencrypt/live/<tenant-host>/ certbot Per-tenant cert + key chain
/etc/letsencrypt/renewal/<tenant-host>.conf certbot Renewal config; records the issuing ACME server URL per cert
/home/tqpro-deploy/.ssh/authorized_keys Ops, one-time Holds tqpro-deploy.pub from the orchestration host. Owned by tqpro-deploy, mode 0600.
/etc/sudoers.d/tqpro-deploy-gateway Ops, one-time NOPASSWD rules for the deploy user. Mode 0440.

Web host(s) (1..N — every entry in web.deploy.hosts)

Path Source Notes
/etc/nginx/sites-available/<tenant-code>.conf scripts/platform/tenant-provision.sh Per-tenant web vhost.
/etc/nginx/sites-enabled/<tenant-code>.conf scripts/platform/tenant-provision.sh Symlink → sites-available.
/opt/tqpro/tqweb-adm/ Ops/CI deploy Static multi-tenant-aware SPA bundle.
/home/tqpro-deploy/.ssh/authorized_keys Ops, one-time Same shared key as gateway. Owned by tqpro-deploy, mode 0600.
/etc/sudoers.d/tqpro-deploy-web Ops, one-time Narrower than the gateway's (no certbot). Mode 0440.

API hosts get no per-tenant nginx config — Jetty is stateless and learns about new tenants via TenantRegistry.refresh() against the platform DB.

PostgreSQL host

DB Created by Notes
tqplatform §3createdb + 0001-tqplatform-schema.sql Tenant registry, WhatsApp phone routing, platform migration ledger
tqpro_template scripts/db/bootstrap-template-db.sh Schema-only skeleton; source for tenant clones
tlinq_<tenant-code> scripts/platform/tenant-provision.sh step 2 One per ACTIVE tenant; cloned from tqpro_template

Keycloak

Realm Created by Notes
master Bundled with Keycloak Holds the tqpro-platform-admin service-account client (created in §2)
<tenant-code> tenant-provision.sh step 6 (via POST /platform/tenant/provision) Per-tenant realm with tqweb-adm (browser SPA) + tqpro-admin-api (server-to-server) clients

Build artifact contents (for the build script)

The build server's orchestration artifact must contain at least:

scripts/platform/        ← orchestration scripts (4 files)
scripts/db/              ← DB management scripts (4 files)
config/Nginx Config/templates/   ← 3 templates
config/db-changes/       ← migration SQL files (read by scripts/db/*)

Everything else under config/ (tourlinq.properties, entities/, tourlinq-config.xml, etc.) is API-side and does not need to be in the orchestration artifact. If your build packages the whole repo, those files will be inert on the orchestration host — ignore them.

tqpro-platform.properties is not part of the build artifact — it holds environment-specific values and is deployed/managed separately (typically via config-management like Ansible).


2. Keycloak — master-realm platform service account

Required regardless of greenfield vs in-place. The platform service account exists once per Keycloak install; tenant realms are created later by the provisioning script.

  1. Admin console → realm masterClients → Create client
  2. Client type: OpenID Connect
  3. Client ID: tqpro-platform-admin
  4. Name: TQPro platform provisioner
  5. Capability config
  6. Client authentication: ON
  7. Service accounts roles: ON
  8. Standard flow / Direct access grants / Implicit / Authorization: OFF
  9. Settings — leave Valid redirect URIs and Web origins empty.
  10. Credentials tab — copy the Client secret. Save it; it goes into /etc/tqpro/tqpro.env in §5.
  11. Service accounts rolesAssign role
  12. Switch the filter dropdown to "Filter by realm roles"
  13. Check create-realm only → Assign
  14. Do NOT assign admin or anything on the master-realm client
  15. default-roles-master will appear auto-assigned — this is harmless (self-service account-portal roles, no admin scope)
  16. Create + assign the platform-admin realm role (this is our app-level role used by the TQPro API to gate /platform/* endpoints — it is NOT the Keycloak built-in admin role, so D-4's prohibition on master-realm admin still holds):
  17. Admin console → realm masterRealm roles → Create role
  18. Role name: platform-admin
  19. Description: TQPro platform API access — grants /platform/tenant/* endpoints
  20. Save
  21. Back to Clients → tqpro-platform-admin → Service accounts roles
  22. Assign role → filter by realm roles → check platform-admin → Assign
  23. Without this step, every call to /platform/tenant/provision (and siblings) returns ERR0008 "User is not authorized to access this API."
  24. (No KC admin step needed for SMTP — handled automatically.) Tenant-realm SMTP is configured by KeycloakRealmProvisioner at tenant-creation time from the mail.* keys in tourlinq.properties (the same keys used by the application's own outbound mail). Make sure mail.server, mail.port, mail.user, mail.password, mail.from, mail.name, mail.usetls are populated in $TLINQ_HOME/tourlinq.properties before the first tenant-provision.sh run — the provisioner does a PUT-update of the realm right after creation. If mail.server is empty, the tenant realm is created without SMTP and the welcome-email send logs a non-fatal warning; the admin user still exists but ops must set the password manually in the KC admin UI.

Verification curls

export KC_URL=https://<your-keycloak-host>
export TQPRO_PLATFORM_ADMIN_SECRET=<the secret from step 4>

TOKEN=$(curl -s -X POST \
  "${KC_URL}/realms/master/protocol/openid-connect/token" \
  -d "grant_type=client_credentials" \
  -d "client_id=tqpro-platform-admin" \
  -d "client_secret=${TQPRO_PLATFORM_ADMIN_SECRET}" \
  | jq -r .access_token)

# Must succeed (HTTP/1.1 201):
curl -si -X POST "${KC_URL}/admin/realms" \
  -H "Authorization: Bearer ${TOKEN}" \
  -H "Content-Type: application/json" \
  -d '{"realm":"tqpro-verify-delete-me","enabled":true}' | head -1

# Must fail (HTTP/1.1 403) — proves narrow privilege:
curl -si -X GET "${KC_URL}/admin/realms/master/users?max=1" \
  -H "Authorization: Bearer ${TOKEN}" | head -1

# Cleanup the test realm. Note the token must be re-issued — see §6
# Troubleshooting "401 after realm-create".
TOKEN=$(curl -s -X POST "${KC_URL}/realms/master/protocol/openid-connect/token" \
  -d "grant_type=client_credentials" -d "client_id=tqpro-platform-admin" \
  -d "client_secret=${TQPRO_PLATFORM_ADMIN_SECRET}" | jq -r .access_token)
curl -si -X DELETE "${KC_URL}/admin/realms/tqpro-verify-delete-me" \
  -H "Authorization: Bearer ${TOKEN}" | head -1   # 204 No Content

3. Platform DB

The tqplatform DB holds the tenant registry, WhatsApp phone routing, and its own migration ledger. Greenfield installs leave its tenant table empty — the first tenant is added later by tenant-provision.sh.

sudo -u postgres createuser tqpro_platform --pwprompt
sudo -u postgres createdb --owner=tqpro_platform tqplatform

sudo -u postgres psql tqplatform \
    -f config/db-changes/platform/0001-tqplatform-schema.sql

# Verify (greenfield: tenant is empty, schema_migrations has the bootstrap row):
sudo -u postgres psql tqplatform -c "SELECT count(*) FROM tenant"
# Expected: 0
sudo -u postgres psql tqplatform -c "SELECT * FROM schema_migrations"
# Expected: 0001-tqplatform-schema.sql | <today> | NULL

Record the DB password in two places — the API-side tourlinq.properties (read by PlatformDbConfig) and the orchestration-side tqpro-platform.properties Section B (script-side discovery). The values must match; see §1B "Shared keys".

In $TLINQ_HOME/tourlinq.properties on each API host:

platform.db.url=jdbc:postgresql://localhost:5432/tqplatform
platform.db.user=tqpro_platform
platform.db.pass=<your-tqpro_platform-password>

In /etc/tqpro/platform/tqpro-platform.properties Section B on the orchestration host (same values):

platform.db.url=jdbc:postgresql://localhost:5432/tqplatform
platform.db.user=tqpro_platform
platform.db.pass=<your-tqpro_platform-password>

4. PostgreSQL connection tuning + tenant-DB topology

Multi-tenancy means many simultaneous connection pools. Raise the ceiling before production traffic.

# On the DB host (requires a restart):
sudo vim /etc/postgresql/<version>/main/postgresql.conf
# Set:
#   max_connections = 500           # was typically 100
#   shared_buffers  = 2GB            # ~25% of RAM
#   work_mem        = 8MB
sudo systemctl restart postgresql
sudo -u postgres psql -c "SHOW max_connections"     # expect 500

Set the tenant-DB host/port in two places — tourlinq.properties (API reads via TenantAwareDBSession) and tqpro-platform.properties Section B (scripts use for pg_dump / pg_restore). The values must match. The template DB name lives in tqpro-platform.properties Section A only.

In $TLINQ_HOME/tourlinq.properties on each API host:

tenant.db.host=localhost
tenant.db.port=5432

In /etc/tqpro/platform/tqpro-platform.properties on the orchestration host:

# Section A — orchestration only
template.db.name=tqpro_template

# Section B — must match tourlinq.properties on API host
tenant.db.host=localhost
tenant.db.port=5432

Revisit max_connections at ~30 tenants; pgbouncer is out of scope for Phase 0 (D-5, OD-1).


5. TQPRO_ENCRYPTION_KEY

Used by TenantConfig.encrypt/decrypt to protect sensitive fields in tqplatform.tenant (and later in tenant nts.system_settings).

# 5.1 Generate
python3 -c "import secrets; print(secrets.token_hex(32))"
# Output: 64 hex chars.

# 5.2 Place on disk
sudo install -d -m 0700 -o tqpro-svc -g tqpro-svc /etc/tqpro
sudo tee /etc/tqpro/tqpro.env >/dev/null <<'EOF'
TQPRO_ENCRYPTION_KEY=<paste-the-64-hex-chars-here>
TQPRO_PLATFORM_ADMIN_SECRET=<from §2 step 4>
EOF
sudo chmod 0600 /etc/tqpro/tqpro.env
sudo chown tqpro-svc:tqpro-svc /etc/tqpro/tqpro.env

5.3 Wire the env file into the systemd unit

This step is load-bearing — without it the JVM never sees TQPRO_ENCRYPTION_KEY / TQPRO_PLATFORM_ADMIN_SECRET, even though the file is on disk. Symptoms when missing: the provisioning API returns ERR00014 "Keycloak client secret missing — set TQPRO_PLATFORM_ADMIN_SECRET".

# Confirm whether the unit already references the env file.
sudo systemctl cat tlinq | grep -E 'EnvironmentFile|Environment='
# If output is empty, add a drop-in override:

sudo systemctl edit tlinq
# In the editor that opens, paste:
[Service]
EnvironmentFile=/etc/tqpro/tqpro.env

# Save + exit, then:
sudo systemctl daemon-reload
sudo systemctl restart tlinq

# Verify the JVM actually received the vars (after ~5s startup):
sudo cat /proc/$(pgrep -n jsvc)/environ 2>/dev/null | tr '\0' '\n' \
    | grep -E '^(TQPRO_|TLINQ_HOME)'
# Should show TQPRO_ENCRYPTION_KEY and TQPRO_PLATFORM_ADMIN_SECRET set.

5.4 Backup

Make a sealed offline backup of the key — losing it makes every encrypted value in tqplatform.tenant and per-tenant nts.system_settings (Phase 3) unreadable.


6. Skeleton (template) DB

Every new tenant DB is cloned from tqpro_template. The skeleton holds the schema with no customer data and is maintained in schema sync with every migration via scripts/db/apply-tenant-migrations.sh.

The DB-management scripts under scripts/db/ (bootstrap-template-db.sh, bootstrap-schema-migrations.sh, apply-platform-migrations.sh, apply-tenant-migrations.sh) only need a libpq connection (PGHOST/PGPORT/PGUSER/PGPASSWORD or ~/.pgpass) and read access to config/db-changes/. They don't depend on the SSH/nginx toolchain in scripts/platform/, so you can run them from any host that has the toolkit checkout plus a route to the PostgreSQL server — orchestration host, build server, DBA laptop, or an API host post-deploy. They do read template.db.name from $TQPRO_PLATFORM_CONFIG if available, but fall back to tqpro_template and accept overrides via env / CLI.

6.1 Produce the schema dump

Run this once, on a host that has psql access to a current TQPro source DB. The source DB MUST have all current migrations applied — the template inherits whatever the source has. If the source is missing migrations, your template (and every tenant cloned from it) will be missing schema.

# From any current TQPro source — dev, staging, or your existing prod.
pg_dump --schema-only --no-owner --no-privileges \
        -d <source-db> > /tmp/tqpro-schema.sql
# Sanity check: file should be tens of KB to a few MB, all DDL, no INSERTs.
wc -l /tmp/tqpro-schema.sql

If you don't yet have any TQPro install to dump from (true greenfield), take a dump from the development team's reference DB. Reverse-engineering the schema by re-running migrations against an empty DB is not safe — early migrations were themselves reverse-engineered and may diverge from production.

6.2 Bootstrap the template

scripts/db/bootstrap-template-db.sh --schema-file /tmp/tqpro-schema.sql

# Verify:
psql tqpro_template -c "SELECT count(*) FROM nts.booking"
# Expected: 0 (skeleton holds zero customer data)
psql tqpro_template -c "SELECT count(*) FROM public.schema_migrations"
# Expected: equals the number of files under config/db-changes/*.sql

bootstrap-template-db.sh is idempotent — re-running refreshes the schema from the supplied dump and re-seeds the ledger. Re-run whenever the schema source has changed (rare).

6.3 Ongoing maintenance

After bootstrap, scripts/db/apply-tenant-migrations.sh automatically processes tqpro_template alongside every ACTIVE tenant. Run it after every deploy that ships new tenant migrations.

scripts/db/apply-tenant-migrations.sh
# Iterates: every ACTIVE tenant DB + tqpro_template.
# Applies any *.sql under config/db-changes/ not yet in each DB's ledger.

The runner is idempotent and safe to re-run. Per-file transaction boundaries: a failing migration leaves the ledger untouched so the next run retries.


7. nginx — three-tier setup with internal orchestration

The architecture (D-14, revised — see also config/Nginx Config/templates/README.md):

Orchestration host (protected subnet — dedicated ops box, typically
                    co-located with the build server)
  ├─ pg_dump / pg_restore (network) ──────► PostgreSQL
  ├─ POST /platform/...   (network) ──────► Jetty on the API host
  ├─ ssh tqpro-deploy@gateway ────────────► gateway nginx + certbot
  └─ ssh tqpro-deploy@web1, web2, ... ─────► web nginx

Browser
  ↓ HTTPS
Host A (DMZ): gateway nginx — TLS termination, no DB credentials
  ├─ /tlinq-api/* → upstream tqpro_api → 1..N Jetty hosts (protected)
  └─ /*           → upstream tqpro_web → 1..N web nginx hosts (protected)
                                          → /opt/tqpro/tqweb-adm

Security model. The gateway lives in the DMZ. It must NOT have PostgreSQL credentials or run TQPro processes. All provisioning runs from a dedicated internal orchestration host that SSHes outward to make narrowly-scoped changes via NOPASSWD-sudo'd commands on each remote host. If the gateway is compromised, the attacker gains a TLS terminator with nginx config — not the platform DB, not the Java API, not the ability to provision new tenants.

The orchestration host is separate from the API host by design: keeping the toolkit (provisioning scripts, deploy SSH keys, platform DB credentials) off the API server reduces blast radius if either side is compromised. The build server is a natural co-location target — it already has SSH access to deploy targets, already holds repo checkouts, and is already trusted with deployment credentials. A standalone ops box works equally well. Co-locating on the API server is acceptable for single-host dev installs only.

Per tenant: gateway gets one server block + cert; every web host listed in web.deploy.hosts gets one server block. API hosts need NO per-tenant changes — Jetty is stateless and learns about new tenants from the platform DB via TenantRegistry.refresh().

The shared tqpro-upstreams.conf (rendered by scripts/platform/render-upstreams.sh) defines the upstream pools by name. To scale a tier, add hosts to upstream.api / upstream.web in tourlinq.properties and re-run render-upstreams.sh — every existing tenant picks up the change instantly because they all reference the same upstream names.

7.1 Orchestration host — one-time setup

This is a dedicated host in the protected subnet that runs the provisioning scripts. Recommended target: the build server — it already has the repo checked out and SSH access to deploy targets, so adding the orchestration role costs almost nothing. A standalone ops box works the same way; co-locating on the Java API server is acceptable only for single-host dev installs.

The orchestration host needs two distinct directories — keep them separate:

Path What it holds Why separate
/opt/tqpro-orchestration/ The orchestration toolkit: scripts/platform/* + config/Nginx Config/templates/*.template. Built and deployed by the build server as a self-contained artifact. DB-management scripts (scripts/db/*) typically ship in the same artifact but are runnable from any host with libpq access — they don't need the SSH/nginx toolchain. Replaced wholesale on each release; no operator state lives here
/etc/tqpro/platform/tqpro-platform.properties The orchestration config file (see §1B). Carries platform DB connection details, deploy host names, upstream pools, certbot ACME URL, platform API URL, Keycloak base URL. Mode 0640, owned root:tqpro-ops. Survives toolkit upgrades. No $TLINQ_HOME is needed on the orchestration host.

7.1.1 Install the toolkit

The orchestration host is a deploy target, not a dev environment. The build server produces an artifact (tarball, package, or pinned git tag) containing scripts/platform/, scripts/db/, and config/Nginx Config/templates/; deploy that to /opt/tqpro-orchestration/.

Production / staging — deploy from the build artifact:

# On the orchestration host. Replace the deploy command with whatever
# your build pipeline produces — rsync, dpkg -i, tar -xzf, etc.
sudo install -d -m 0755 -o root -g root /opt/tqpro-orchestration

# Example: untar a build artifact uploaded by the build server.
sudo tar -xzf /tmp/tqpro-orchestration-<release>.tgz -C /opt/tqpro-orchestration

# Verify the layout:
ls /opt/tqpro-orchestration/scripts/platform/   # tenant-provision.sh, tenant-rollback.sh, render-upstreams.sh, render-vhost.py
ls /opt/tqpro-orchestration/scripts/db/         # bootstrap-*, apply-*
ls "/opt/tqpro-orchestration/config/Nginx Config/templates/"

Dev / lab — git clone fallback:

# Acceptable for a single-host dev install or a hand-managed lab where
# the build server isn't producing artifacts yet. Pin to a release tag
# so CI churn doesn't surprise ops.
sudo install -d -m 0755 -o root -g root /opt/tqpro-orchestration
sudo git clone --depth 1 --branch <release-tag> <repo-url> /opt/tqpro-orchestration

The toolkit directory is not TLINQ_HOME and does not need to be. Scripts compute their own paths to the nginx templates relative to their own location, and they read configuration from $TQPRO_PLATFORM_CONFIG (default /etc/tqpro/platform/tqpro-platform.properties), installed in §7.1.2.

If your artifact ships a checked-in dev config/ sub-tree (because the build server packaged the whole repo), ignore everything under /opt/tqpro-orchestration/config/ except the Nginx Config/templates/ directory — those are the only files the toolkit actually uses at runtime.

7.1.2 Host-wide one-time setup (group, config, shared deploy key)

tqpro-ops is a unix group, not a user — the orchestration host does not run any service as tqpro-ops. Membership in the group grants human operators read access to tqpro-platform.properties and the shared deploy SSH key; provisioning scripts run as the operator's own login (so audit trail is per-person, not "the shared service account did it").

# Group for ops permissions (do NOT create a tqpro-ops user).
sudo groupadd -f tqpro-ops

# Add each human operator to the group. Group membership only takes
# effect on next login — log out/in or run `newgrp tqpro-ops` to
# refresh the current shell.
sudo usermod -aG tqpro-ops <operator-login>

# Install tqpro-platform.properties — the only config file the
# orchestration host needs. NO $TLINQ_HOME is required on this host;
# tourlinq.properties / tlinqapi.properties live on API hosts only.
# See §1B for the full reference, including the "shared keys" duplication
# policy (Sections B and C must match the values on the API hosts).
sudo install -d -m 0750 -o root -g tqpro-ops /etc/tqpro/platform
sudo install -m 0640 -o root -g tqpro-ops \
    config/platform/tqpro-platform.properties \
    /etc/tqpro/platform/tqpro-platform.properties

# Edit values for your environment (especially Sections B and C — they
# must match the API hosts' tourlinq.properties / tlinqapi.properties).
sudo ${EDITOR:-vi} /etc/tqpro/platform/tqpro-platform.properties

# Shared deploy SSH key. Holding the key here (rather than in each
# operator's home) means rotating it once propagates to everyone. The
# matching public key gets added to authorized_keys on each remote
# host in §7.2 and §7.3.
sudo install -d -m 0750 -o root -g tqpro-ops /etc/tqpro/ssh
sudo ssh-keygen -t ed25519 -N '' \
    -f /etc/tqpro/ssh/tqpro-deploy \
    -C "tqpro-deploy@$(hostname)"
sudo chgrp tqpro-ops /etc/tqpro/ssh/tqpro-deploy /etc/tqpro/ssh/tqpro-deploy.pub
sudo chmod 0640 /etc/tqpro/ssh/tqpro-deploy
sudo chmod 0644 /etc/tqpro/ssh/tqpro-deploy.pub

# Seed system-wide known_hosts with the gateway and every web host.
# Without this, render-upstreams.sh and tenant-provision.sh fail with
# "Host key verification failed" because BatchMode=yes disables the
# interactive "accept new key" prompt — the symptom is a misleading
# "lost connection" from scp.
#
# Substitute your actual hostnames from gateway.deploy.host and
# web.deploy.hosts in tqpro-platform.properties.
sudo ssh-keyscan -H \
    gateway.internal \
    web1.internal web2.internal \
    | sudo tee -a /etc/ssh/ssh_known_hosts >/dev/null
sudo chmod 0644 /etc/ssh/ssh_known_hosts

# Verify the captured fingerprints match what each host's sshd actually
# advertises (in production, cross-check against an out-of-band record).
sudo ssh-keygen -F gateway.internal -f /etc/ssh/ssh_known_hosts -l

If you co-locate on the build server, the tqpro-deploy SSH key may already exist (the build server uses it for releases). Skip the ssh-keygen and instead sudo install the existing private/public key pair into /etc/tqpro/ssh/ with the ownership and modes shown above.

For the shared keys in tqpro-platform.properties Sections B and C, keep them in sync with the corresponding values in the API hosts' tourlinq.properties and tlinqapi.properties. Recommended pattern: canonical copy in config-management (Ansible / private Git), rendered to both hosts. See §1B "Shared keys".

7.1.3 Per-operator setup (each operator runs once for themselves)

Each human operator (member of the tqpro-ops group) does this once on the orchestration host. Nothing here requires sudo — these are personal shell, libpq, and SSH config files in your own home directory.

# 1. (No TLINQ_HOME needed.) The orchestration scripts read
#    /etc/tqpro/platform/tqpro-platform.properties — installed in §7.1.2 —
#    via the default path. Override with TQPRO_PLATFORM_CONFIG only if
#    you want to point at a different file (e.g. for testing).
#
#    If you previously set TLINQ_HOME on this host, remove it from your
#    shell rc and from any /etc/profile.d/ script:
#       sed -i '/TLINQ_HOME/d' ~/.bashrc
#       sudo rm -f /etc/profile.d/tqpro-ops.sh

# 2. libpq passwords for the tenant-DB clone + platform-DB access.
#    libpq REQUIRES mode 0600 and the file owned by you — it cannot
#    be shared across operators.
cat >> ~/.pgpass <<EOF
<tenant-db-host>:5432:*:tlinq:<tlinq-db-pass>
<platform-db-host>:5432:tqplatform:tqpro_platform:<platform-db-pass>
EOF
chmod 0600 ~/.pgpass

# 3. Point ~/.ssh/config at the shared deploy key from §7.1.2 so the
#    provisioning scripts don't need explicit -i flags.
#
#    Use Match (not Host) so the rule fires ONLY when the remote user is
#    tqpro-deploy. A Host-based rule would also catch manual connections
#    like `ssh ubuntu@gateway.internal` and force the wrong key + BatchMode,
#    breaking interactive ops access to the same hosts.
#
#    API hosts are intentionally NOT in this list — tenant provisioning
#    reaches them via HTTP POST /platform/tenant/provision (using
#    PLATFORM_ADMIN_TOKEN), not SSH. Only the gateway and web hosts
#    receive per-tenant nginx vhosts and therefore need SSH access.
#    See the §7 architecture diagram for the full channel breakdown.
cat >> ~/.ssh/config <<'EOF'
Match user tqpro-deploy host gateway.internal,web1.internal,web2.internal
    IdentityFile /etc/tqpro/ssh/tqpro-deploy
    IdentitiesOnly yes
    BatchMode yes
EOF
chmod 0600 ~/.ssh/config

# Verify: the first should show tqpro-deploy + the explicit IdentityFile,
# the second should fall through to your normal user + default identities.
ssh -G tqpro-deploy@gateway.internal | grep -E '^(identityfile|batchmode|user)'
ssh -G ubuntu@gateway.internal       | grep -E '^(identityfile|batchmode|user)'

# 4. Sanity check.
[ -r /etc/tqpro/platform/tqpro-platform.properties ] && echo "platform config readable"
[ -r /etc/tqpro/ssh/tqpro-deploy ] && echo "deploy key readable"

If neither line prints, your group membership hasn't taken effect yet — log out and back in (or newgrp tqpro-ops).

Next: before running §7.1.4, complete §7.2 (gateway one-time setup) and §7.3 (web hosts one-time setup) — those steps create the tqpro-deploy user and authorize the public key from /etc/tqpro/ssh/tqpro-deploy.pub on each remote host. Without them, render-upstreams.sh will fail with lost connection when it SSHes to the gateway.

7.1.4 Render the shared upstreams file

Prerequisite: §7.2 and §7.3 must be complete on the gateway and every web host. This step SSHes to gateway.deploy.host as tqpro-deploy to install the rendered file. If you see lost connection, the deploy user or its authorized_keys entry is not yet in place on the gateway — finish §7.2 first.

cd /opt/tqpro-orchestration
scripts/platform/render-upstreams.sh
# Output:
#   Rendering upstreams (api='...' web='...')
#   Installing → gateway.internal:/etc/nginx/conf.d/tqpro-upstreams.conf
#   ... nginx test successful ...
#   Done.

The script reads upstream.api / upstream.web from $TLINQ_HOME/tourlinq.properties (set in §7.1.3 step 1) and SSHes to gateway.deploy.host to install the rendered file. It has no need for any other part of TLINQ_HOME.

Adding or removing a backend host later: edit upstream.api / upstream.web in both copies of tourlinq.properties (orchestration host + API host), re-run scripts/platform/render-upstreams.sh. No per-tenant vhost re-rendering is needed — every tenant picks up the change immediately because they all reference the same shared upstream names.

7.2 Gateway host (DMZ) — one-time setup

sudo apt-get install -y nginx certbot
sudo install -d -m 0755 -o www-data /var/www/certbot
sudo systemctl is-active certbot.timer   # expect "active"

# Create the deploy user. NOTE: the username here MUST match
# gateway.deploy.user in tourlinq.properties on the orchestration host.
# Linux convention favours hyphens; if you change to a different name,
# update the property file and the `useradd`/install lines below in lockstep.
sudo useradd -m -s /bin/bash tqpro-deploy

# Add the orchestration host's SSH public key to authorized_keys.
# CRITICAL: the file must be owned by tqpro-deploy, not root — sshd's
# privilege-separated child reads it as the target user, and root-owned
# 0600 files are silently unreadable from that context, surfacing as
# "Failed publickey" with no other clue.
sudo install -d -m 0700 -o tqpro-deploy -g tqpro-deploy /home/tqpro-deploy/.ssh
echo "<paste orchestration-host's tqpro-deploy public key>" \
    | sudo -u tqpro-deploy tee /home/tqpro-deploy/.ssh/authorized_keys >/dev/null
sudo -u tqpro-deploy chmod 0600 /home/tqpro-deploy/.ssh/authorized_keys

# Sanity check: file must show owner tqpro-deploy:tqpro-deploy.
sudo ls -la /home/tqpro-deploy/.ssh/

# NOPASSWD sudo for exactly the commands provisioning + rollback run.
# The list is narrow on purpose: an attacker compromising the gateway
# cannot escalate beyond editing nginx vhosts and managing certbot.
sudo tee /etc/sudoers.d/tqpro-deploy-gateway >/dev/null <<'EOF'
tqpro-deploy ALL=(root) NOPASSWD: /usr/bin/install -m 0644 /tmp/*.conf /etc/nginx/sites-available/*.conf
tqpro-deploy ALL=(root) NOPASSWD: /usr/bin/install -m 0644 /tmp/tqpro-upstreams.conf /etc/nginx/conf.d/tqpro-upstreams.conf
tqpro-deploy ALL=(root) NOPASSWD: /usr/bin/ln -sf /etc/nginx/sites-available/*.conf /etc/nginx/sites-enabled/*.conf
tqpro-deploy ALL=(root) NOPASSWD: /usr/bin/rm -f /etc/nginx/sites-enabled/*.conf /etc/nginx/sites-available/*.conf
tqpro-deploy ALL=(root) NOPASSWD: /usr/sbin/nginx -t
tqpro-deploy ALL=(root) NOPASSWD: /bin/systemctl reload nginx
tqpro-deploy ALL=(root) NOPASSWD: /usr/bin/certbot certonly *
tqpro-deploy ALL=(root) NOPASSWD: /usr/bin/certbot revoke *
tqpro-deploy ALL=(root) NOPASSWD: /usr/bin/certbot delete *
EOF
sudo chmod 0440 /etc/sudoers.d/tqpro-deploy-gateway
sudo visudo -c   # syntax check

The gateway has NO PostgreSQL client, no TQPro process, no platform-admin token. Its only role is to terminate TLS and proxy.

7.3 Web host(s) — one-time setup, repeat per host in web.deploy.hosts

sudo apt-get install -y nginx

# Username must match web.deploy.user in tourlinq.properties — see §7.2
# for the rationale and the same-username caveat.
sudo useradd -m -s /bin/bash tqpro-deploy

# Add the orchestration host's SSH public key. As in §7.2, the file
# MUST be owned by tqpro-deploy (not root) or sshd's privilege-separated
# child can't read it.
sudo install -d -m 0700 -o tqpro-deploy -g tqpro-deploy /home/tqpro-deploy/.ssh
echo "<paste orchestration-host's tqpro-deploy public key>" \
    | sudo -u tqpro-deploy tee /home/tqpro-deploy/.ssh/authorized_keys >/dev/null
sudo -u tqpro-deploy chmod 0600 /home/tqpro-deploy/.ssh/authorized_keys

# Narrow NOPASSWD sudo. Web hosts run no certbot — gateway holds all certs.
sudo tee /etc/sudoers.d/tqpro-deploy-web >/dev/null <<'EOF'
tqpro-deploy ALL=(root) NOPASSWD: /usr/bin/install -m 0644 /tmp/*.conf /etc/nginx/sites-available/*.conf
tqpro-deploy ALL=(root) NOPASSWD: /usr/bin/ln -sf /etc/nginx/sites-available/*.conf /etc/nginx/sites-enabled/*.conf
tqpro-deploy ALL=(root) NOPASSWD: /usr/bin/rm -f /etc/nginx/sites-enabled/*.conf /etc/nginx/sites-available/*.conf
tqpro-deploy ALL=(root) NOPASSWD: /usr/sbin/nginx -t
tqpro-deploy ALL=(root) NOPASSWD: /bin/systemctl reload nginx
EOF
sudo chmod 0440 /etc/sudoers.d/tqpro-deploy-web
sudo visudo -c

# Drop the static SPA bundle into /opt/tqpro/tqweb-adm.
#
# Ownership pattern: tqpro-deploy owns + writes, www-data (nginx) reads via
# group bits, setgid (2 in 2775) makes new files inherit group www-data so
# subsequent deploys preserve nginx's read access. tqpro-deploy does NOT
# need to be in the www-data group — owner-bit writes are independent of
# group membership.
sudo install -d -o tqpro-deploy -g www-data -m 2775 /opt/tqpro/tqweb-adm

From the orchestration host (one-time per web host, then again on every SPA bundle change — usually wired into the CI/CD deploy step):

# Initial deploy of the SPA bundle. No sudo needed on the web host —
# tqpro-deploy owns /opt/tqpro/tqweb-adm so it can scp/rsync directly.
rsync -av --delete "${REPO_ROOT}/tqweb-adm/" \
    "tqpro-deploy@<web-host>:/opt/tqpro/tqweb-adm/"

Verify on the web host:

ls -la /opt/tqpro/tqweb-adm/index.html
# -rw-r--r-- 1 tqpro-deploy www-data ... index.html
#   owner: tqpro-deploy (writer)        ↑ group: www-data (reader, via setgid)

Without the ownership pattern (mode 0755 www-data:www-data for the dir) the CI/CD deploy can't write the bundle without sudo escalation, which usually isn't in the narrow tqpro-deploy sudoers. Symptoms: deploy fails with Permission denied on the rsync, or the operator ends up granting broader sudo and widening the blast radius.

Web hosts have NO per-tenant config until tenant-provision.sh drops its first server block. The default install can leave nginx with the distro default site enabled — TQPro's blocks coexist via server_name matching.

7.4 Properties summary on the orchestration host

In /etc/tqpro/platform/tqpro-platform.properties — Section A (orchestration-only) keys relevant to the nginx fan-out:

upstream.api=api1.internal:11080,api2.internal:11080
upstream.web=web1.internal:80,web2.internal:80

gateway.deploy.host=gateway.internal
gateway.deploy.user=tqpro-deploy

web.deploy.hosts=web1.internal,web2.internal
web.deploy.user=tqpro-deploy

See §1B for the full file reference, including the duplicated keys (Sections B and C) that must match the API hosts' configuration.

Username consistency. gateway.deploy.user and web.deploy.user MUST match the actual user accounts created on the gateway and web hosts in §7.2 / §7.3 (default: tqpro-deploy, hyphenated — Linux convention). If you change the name, update the property here AND the useradd + install -o ... lines in §7.2 / §7.3 in lockstep, plus any sudoers entries that reference the username. A mismatch surfaces as lost connection from render-upstreams.sh or tenant-provision.sh, not as a clear "user not found" — the script's BatchMode=yes + -q flags suppress the underlying "Permission denied" error.

The hosts in upstream.web and web.deploy.hosts are usually the same list; the only reason they're separate is upstream.web includes :port. Empty gateway.deploy.host and empty web.deploy.hosts = "run locally on the orchestration host" — appropriate for single-host dev installs only, never for production.

7.5 Greenfield: no per-tenant vhosts until tenant 1

In greenfield, no tenant exists yet — there are no per-tenant vhosts on either tier. The first tenant's vhosts (gateway + web) are created automatically by scripts/platform/tenant-provision.sh when ops onboards tenant 1.

7.6 Local development hosts

Add one line per tenant being tested to /etc/hosts once tenants exist:

127.0.0.1  acme-travel.tourlinq.local

Or set address=/tourlinq.local/127.0.0.1 in dnsmasq for wildcard local resolution. For local single-host dev where the gateway, web, and API all run on 127.0.0.1, leave web.deploy.hosts= empty and the provisioner applies the web vhost locally.


8. Smoke test (post-bootstrap, pre-tenant-1)

Prerequisite — API server installed and configured. This runbook assumes the TQPro API is already deployed on each host in upstream.api, with TLINQ_HOME pointing at a valid config tree (§1A) and TQPRO_ENCRYPTION_KEY exported via EnvironmentFile= in the systemd unit (§5). The actual install of tqapi.jar + companion JARs is covered in doc/deployment/bare-metal-deployment.md.

8.0 Pre-flight (run on each API host before starting the service)

A tenant-less first start is the expected initial state — the tqplatform.tenant table is empty, TenantRegistry loads zero rows, and any incoming request returns 404 unknown-tenant. That is correct. What can actually break the first start is unrelated to tenants:

# 1. TLINQ_HOME readable by the service user.
sudo -u tqpro-svc test -r "$TLINQ_HOME/tourlinq.properties" && echo "config OK"

# 2. Platform DB reachable from this host with the credentials in
#    tourlinq.properties.
PG_PASS=$(grep '^platform.db.pass=' $TLINQ_HOME/tourlinq.properties | cut -d= -f2-)
PGPASSWORD="$PG_PASS" psql \
    -h <platform-db-host> -U tqpro_platform -d tqplatform \
    -c 'SELECT count(*) FROM tenant'
# Expected: count = 0 (greenfield)

# 3. Encryption key wired into the systemd unit's environment.
sudo systemctl cat tqpro-api | grep -E 'EnvironmentFile|Environment='
# Expected to show /etc/tqpro/tqpro.env among the EnvironmentFile entries.

# 4. Listening ports free.
sudo ss -tlnp | grep -E ':1108[01]' || echo "11080/11079 free"

Common first-start failure modes:

Symptom Cause Fix
Cannot locate tourlinq.properties! TLINQ_HOME not set in the systemd unit, or wrong path systemctl show tqpro-api -p Environment; verify with ls $TLINQ_HOME/tourlinq.properties
password authentication failed for user "tqpro_platform" platform.db.pass mismatch Re-check value vs what createuser --pwprompt accepted in §3
relation "tenant" does not exist 0001-tqplatform-schema.sql not applied Re-run §3 against tqplatform
TQPRO_ENCRYPTION_KEY environment variable is not set EnvironmentFile=/etc/tqpro/tqpro.env missing from the unit, or file unreadable systemctl cat tqpro-api; ls -l /etc/tqpro/tqpro.env (must be readable by tqpro-svc)
Port already in use Another process holds 11080/11079 sudo ss -tlnp | grep 11080
Logs show TenantRegistry loaded 0 active tenant(s) Working as intended — the greenfield first-start state Proceed with §8.1

8.1 Start the API and watch for the "ready" line

# 8.1 Start TQPro API server.
sudo systemctl restart tqpro-api

# 8.2 With no tenants yet, /auth/config for any subdomain returns 404
# (unknown-tenant) — that's the correct behaviour.
curl -si -H "Host: acme-travel.tourlinq.com" \
    http://localhost:11080/tlinq-api/auth/config \
    -H "Content-Type: application/json" -d '{}' | head -3
# Expected: HTTP/1.1 404 Not Found, body { "error": "unknown-tenant", ... }

# 8.3 Verify the platform service account can list realms.
TOKEN=$(curl -s -X POST "${KC_URL}/realms/master/protocol/openid-connect/token" \
  -d "grant_type=client_credentials" -d "client_id=tqpro-platform-admin" \
  -d "client_secret=${TQPRO_PLATFORM_ADMIN_SECRET}" | jq -r .access_token)
# Listing realms requires master-realm view-realm — narrow service accounts can't.
# Just confirm token issuance succeeded:
[ -n "${TOKEN}" ] && echo "token issued OK"

The platform is now ready for tenant onboarding. Continue with doc/operations/tenant-provisioning.md for the per-tenant flow.


9. Troubleshooting

TenantRegistry logs "TLINQ_HOME is not set" — the service user's environment must include TLINQ_HOME. systemd: Environment=TLINQ_HOME=/etc/tqpro/config.

certbot HTTP-01 challenge fails — DNS hasn't propagated. dig +short <host> must return the gateway's public IP before certbot will succeed.

Keycloak returns 401 on POST /admin/realms/... after POST /admin/realms just succeeded — the existing service-account token predates the auto-granted realm-management role. Re-fetch the token (client-credentials grant) and retry. Phase 1's KeycloakRealmProvisioner does this automatically; it's only an issue for manual curls.

bootstrap-template-db.sh errors with "schema file not found" — pass an absolute path to --schema-file and verify the file exists with ls -la.

Migrations fail when applied to tenants but not the template — most likely the template was bootstrapped from a stale source. Re-run bootstrap-template-db.sh --schema-file <fresh dump> after dumping a source DB that has all current migrations applied.

sudo VAR=value command doesn't pass VAR to the command — sudo sanitizes the environment by default, so the variable gets set in the calling shell and then thrown away. Operators hit this trying to pass e.g. SSL_CERT_FILE to certbot. Three workarounds: - sudo env VAR=value command — most explicit, no sudoers change. - export VAR=value && sudo -E command — preserves all caller env. - Add VAR to Defaults env_keep += "VAR" in /etc/sudoers.d/... if it needs to be permanent.


9A. Known gaps in this runbook (discovered 2026-05-16)

During the first end-to-end manual tenant provisioning, ops hit several steps that this runbook is missing or that need code changes. Read this section if you're following the runbook for a fresh install — the provisioning will fail in non-obvious ways without these adjustments. Full per-gap detail and workarounds live in tenant-provisioning.md Appendix A.3.

# Gap Where it belongs in this runbook Status
1 JWTValidator accepts only one oidc-client-id — can't serve master-realm AND tenant-realm tokens simultaneously §5 / §8 RESOLVED 2026-05-17oidc-client-id is now a comma-separated list. Set oidc-client-id=tqweb-adm,tqpro-platform-admin on tlinqapi.properties to accept both. First entry is the SPA-facing primary.
2 TenantProvisioningFacade doesn't write db_user/db_pass to tqplatform.tenant n/a (per-tenant) RESOLVED 2026-05-17ProvisionRequest now requires dbUser + dbPass; facade encrypts and inserts them alongside the other tenant fields.
3 Missing step: create platform-admin realm role in master realm + assign to tqpro-platform-admin service account §2 RESOLVED 2026-05-17 — added as §2 step 6. D-4 prohibition still holds (this is our app-level role, not KC's built-in admin).
4 Missing step: systemd unit must reference EnvironmentFile=/etc/tqpro/tqpro.env (otherwise TQPRO_PLATFORM_ADMIN_SECRET / TQPRO_ENCRYPTION_KEY are unreachable to the JVM) §5 RESOLVED 2026-05-17 — added as §5.3 (drop-in via systemctl edit tlinq + verification).
5 Missing step: migrate tlinqapi.properties from single-tenant layout (auth-mode, dev-mode=false, drop oidc-issuer, add oidc-keycloak-base-url, update oidc-client-id) Appendix A (in-place migration) RESOLVED 2026-05-17 — added as A.6 with the exact sed script and the new comma-separated oidc-client-id recommendation.
6 Missing section: web-tier host bootstrap. /opt/tqpro/tqweb-adm directory + ownership (tqpro-deploy:www-data + setgid 2775) + initial SPA rsync from ${REPO_ROOT}/tqweb-adm/ new section between §7 and §8 RESOLVED 2026-05-17 — §7.3 updated with the correct ownership pattern + initial rsync command from the orchestration host.
7 Missing step: certbot + step-ca needs SSL_CERT_FILE=/etc/ssl/certs/ca-certificates.crt (and REQUESTS_CA_BUNDLE=) in the certbot systemd unit's Environment= block, OR step-ca root spliced into certifi's bundle Appendix B §B.1.7 or §B.2 RESOLVED 2026-05-17 — added as B.3.1 with both options (env vars vs certifi splice) and the 24-hour-cert urgency note.
8 Missing step: CI/orchestration host (the one running provisioning) also needs the step-ca root installed Appendix B §B.4 RESOLVED 2026-05-17 — added as B.3.2.
9 Script gap: tenant-provision.sh doesn't create the per-tenant Postgres role n/a (per-tenant, fix in script) RESOLVED 2026-05-17 — script step 2 now auto-generates DB_PASS (or honors TENANT_DB_PASS env var), runs CREATE ROLE + fan-out grants across every schema, passes credentials to the Java provision call. Rollback drops the role too.
10 Script gap: tenant-provision.sh runs nginx -t before certbot, but the rendered vhost references not-yet-existing cert files n/a (per-tenant, fix in script) RESOLVED 2026-05-17 — script now does two-pass install (stub vhost → certbot → swap in full HTTPS vhost in new step 4b). Mirrors the manual walkthrough's B.6+B.7.5 approach.
11 tqpro_template may be missing public.schema_migrations — every tenant cloned from it inherits the gap §6 RESOLVED 2026-05-17tenant-provision.sh step 2 now pre-flights for the ledger and fails fast with a pointer to bootstrap-schema-migrations.sh for repair.
12 Operator note: sudo VAR=val command does NOT pass VAR to the command — sudo sanitizes the environment. Use sudo env VAR=val command instead (general) RESOLVED 2026-05-17 — added to §9 troubleshooting with all three workarounds.
13 Operator note: KC realm-template SMTP isn't set, so the welcome-email send during KeycloakRealmProvisioner.provisionTenant fails with Invalid sender address 'null'. Realm is still created and usable; operator must set the admin user's password manually in the KC admin UI on first onboard §2 (KC) RESOLVED 2026-05-17KeycloakRealmProvisioner.provisionTenant now accepts an optional SMTP config; TenantProvisioningFacade builds it from tourlinq.properties mail. keys and passes it in. Welcome emails work out of the box if mail. is configured (which it already is for application mail). Empty config → graceful skip with the existing warning.
14 Environment cleanup: a migrated gateway may carry stale single-tenant vhosts (tqweb-pub, tqweb-adm, tqweb-b2b, etc.) that don't break the new per-tenant vhost but interfere with the implicit default for port 443 Appendix A (in-place migration) RESOLVED 2026-05-17 — added as A.7 with the move-to-sites-available approach + rollback note.
15 Pre-existing bug in tenant-provision.sh:106: apostrophe in ${ACME_SERVER:-Let's Encrypt production (default)} is a real bash parser error — the script was unparseable n/a (per-tenant, fix in script) RESOLVED 2026-05-17 — apostrophe removed (now LetsEncrypt). Surfaced during code-review of the script's syntax validity.

Action (2026-05-17 update): all 15 gaps are now fully resolved inline. The next install of this runbook from greenfield should hit no surprises.


10. References

  • Tenant provisioning runbook (per-tenant): doc/operations/tenant-provisioning.md
  • Tenant-provisioning known gaps + manual walkthrough: doc/operations/tenant-provisioning.md Appendix A.3 + Appendix B
  • Multi-tenant runtime architecture: doc/architecture/multitenant-architecture.md
  • Tenant-aware coding for developers: doc/developer/getting-started/tenant-aware-coding.md
  • Execution plan: doc/plans/multitenancy-execution.md
  • Architecture decisions: §2 of the execution plan (D-1 .. D-19)
  • Appendix A below: migrating an existing single-tenant install

Appendix A: Migrating an existing single-tenant install

If you already have a working single-tenant TQPro deployment (an existing tlinq DB + tqpro-adm Keycloak realm + nginx-gw.conf single-server block), this appendix covers the one-time conversion. Once done, every new tenant uses the greenfield onboarding flow — there's nothing recurring to maintain here.

A.1 Bring the seed Keycloak realm into the two-client / five-role pattern

The existing tqpro-adm realm has one client (tqweb-adm) and three realm roles (guest, agent, admin). Add the missing pieces so it matches what every new tenant gets from tenant-provision.sh:

  1. Admin console → realm tqpro-admRealm roles → Create role
  2. manager
  3. finance
  4. Clients → Create client
  5. Client ID: tqpro-admin-api
  6. Client authentication: ON
  7. Service accounts roles: ON
  8. Standard flow / Direct access grants: OFF
  9. Credentials — copy the secret. Save it for §A.4.
  10. Service accounts roles → Filter by clients → realm-management
  11. Check manage-users, view-users only (NOT manage-realm)

The existing tqweb-adm client stays unchanged. Verify Valid redirect URIs cover both your existing host and the new <seed-code>.<platform.domain> once DNS for the new pattern is in place.

A.2 Bootstrap the seed-tenant DB ledger

Apply migration 0072 to the existing tlinq DB and seed the ledger with every existing migration filename marked applied:

psql tlinq -f config/db-changes/0072-schema-migrations-ledger.sql
scripts/db/bootstrap-schema-migrations.sh tlinq
# Expected: total_rows = (number of files under config/db-changes/*.sql)

After this, apply-tenant-migrations.sh will process the existing tlinq DB alongside the template and any future tenants.

A.3 Use the existing DB as the source for the template dump

If you don't already have a clean schema source, dump from the existing tlinq DB:

pg_dump --schema-only --no-owner --no-privileges -d tlinq > /tmp/tqpro-schema.sql
scripts/db/bootstrap-template-db.sh --schema-file /tmp/tqpro-schema.sql

Then complete steps §3 onwards from the main runbook.

A.4 Insert the seed tenant row in tqplatform.tenant

The greenfield migration leaves tqplatform.tenant empty. For an in-place install, insert the existing tenant manually:

# First, encrypt the seed-realm tqpro-admin-api client secret captured in §A.1.
# The cleanest path: use a small Java helper that calls TenantConfig.encrypt;
# until that ships, this column accepts plaintext as well — TenantConfig.decrypt
# treats non-'encrypted:'-prefixed values as passthrough. Replace with the
# encrypted form before going to production.
SEED_TENANT_ID=$(uuidgen)
SEED_KC_ADMIN_SECRET='<paste from A.1 step 3, plain or encrypted:<base64>>'

psql tqplatform <<SQL
INSERT INTO tenant (tenant_id, tenant_code, tenant_name,
                    db_name, db_user, db_pass,
                    kc_realm, kc_admin_client_secret, status)
VALUES ('${SEED_TENANT_ID}', 'tqpro-adm', 'Default Agency',
        'tlinq', 'tlinq', 'TlinqAdmin',
        'tqpro-adm', '${SEED_KC_ADMIN_SECRET}', 'ACTIVE');
SQL

Then call /platform/tenant/refresh (or restart the API server) so the in-memory TenantRegistry picks up the new row.

A.5 Replace the legacy nginx-gw.conf with a per-tenant vhost

The old single-server nginx-gw.conf block needs to retire. Render the seed tenant's vhost from the template (same renderer the provisioner uses):

python3 scripts/platform/render-vhost.py \
    --template 'config/Nginx Config/templates/tenant-gw.conf.template' \
    --output /etc/nginx/sites-available/tqpro-adm.conf \
    --tenant-code tqpro-adm \
    --tenant-host tqpro-adm.<platform.domain>

sudo ln -sf /etc/nginx/sites-available/tqpro-adm.conf \
            /etc/nginx/sites-enabled/tqpro-adm.conf
sudo nginx -t

# Disable the legacy block (keep the file for rollback).
sudo mv /etc/nginx/sites-enabled/nginx-gw.conf /etc/nginx/sites-available/

sudo systemctl reload nginx

Then issue a Let's Encrypt certificate for the seed tenant's host (see §7.2). DNS for the new <seed-code>.<platform.domain> must resolve before certbot will succeed.

After all this, the in-place install behaves identically to a greenfield install: the seed tenant is just "the first tenant" in tqplatform.tenant, and every subsequent onboarding goes through scripts/platform/tenant-provision.sh like any greenfield tenant.

A.6 Migrate tlinqapi.properties from the single-tenant layout

Pre-multi-tenancy installs carry an API config that's hardcoded to one realm and runs with dev-mode=true. The multi-tenant validator needs oidc-keycloak-base-url (a host prefix, no realm path), no oidc-issuer (it's derived per-token), and dev-mode=false so JWT validation failures don't silently fall through to a fake dev user.

# On each API host. Adjust TLINQ_HOME to wherever your tlinqapi.properties
# actually lives (it may be /var/tqpro/conf, not /etc/tqpro).
sudo cp "${TLINQ_HOME}/tlinqapi.properties" "${TLINQ_HOME}/tlinqapi.properties.bak"

sudo sed -i \
    -e 's/^dev-mode=.*/dev-mode=false/' \
    -e 's|^oidc-issuer=.*|# oidc-issuer (removed — multi-tenant uses oidc-keycloak-base-url)|' \
    -e 's|^oidc-client-id=.*|oidc-client-id=tqweb-adm,tqpro-platform-admin|' \
    "${TLINQ_HOME}/tlinqapi.properties"

# Add oidc-keycloak-base-url if not already present. Value must match the
# token's iss claim prefix exactly: iss = <base-url>/realms/<realm>, no
# trailing slash on the base URL.
grep -q '^oidc-keycloak-base-url=' "${TLINQ_HOME}/tlinqapi.properties" || \
    echo 'oidc-keycloak-base-url=https://auth.vanevski.net' \
        | sudo tee -a "${TLINQ_HOME}/tlinqapi.properties" > /dev/null

# Verify, then restart.
grep -E "^(auth-mode|oidc-|dev-)" "${TLINQ_HOME}/tlinqapi.properties"
sudo systemctl restart tlinq

# After restart, look for the OIDC config-loaded log line:
sudo journalctl -u tlinq -n 30 --no-pager | grep -iE "OIDC config"
# Expected: "OIDC configuration loaded. keycloakBaseUrl=https://auth.vanevski.net, ..."

oidc-client-id is now a comma-separated list — first entry is the SPA-facing primary (used in /auth/config responses and logout URLs), all entries are accepted by the audience check. The recommended setting tqweb-adm,tqpro-platform-admin lets one API instance serve both tenant-realm browser SSO and master-realm /platform/* calls without the operator having to flip the config between provisioning runs.

A.7 Clean up legacy single-tenant vhosts on the gateway

A migrated gateway typically carries vhosts from before multi-tenancy (tqweb-pub, tqweb-adm, tqweb-b2b, tqweb-auth, auth.<domain>). After A.5 brings the seed tenant into the new per-tenant pattern, these legacy blocks are no longer needed AND can interfere with the implicit default server selection for port 443 — confusing diagnostics like https://<new-tenant>/ redirecting to https://auth.<domain>/admin/ trace back to alphabetical load-order making one of the legacy blocks the implicit default.

# On the gateway, after every consumer (CLI tooling, admin staff
# bookmarks, internal scripts) has cut over to the per-tenant URLs:
sudo mv /etc/nginx/sites-enabled/tqweb-pub /etc/nginx/sites-available/
sudo mv /etc/nginx/sites-enabled/tqweb-adm /etc/nginx/sites-available/
sudo mv /etc/nginx/sites-enabled/tqweb-b2b /etc/nginx/sites-available/
sudo mv /etc/nginx/sites-enabled/tqweb-auth /etc/nginx/sites-available/

sudo nginx -t && sudo systemctl reload nginx

Keep the files in sites-available/ so they can be quickly re-enabled if you need to roll back. Once stable for a release cycle, delete them.


Appendix B: Lab / closed-network setup with step-ca

The provisioning flow assumes the gateway can reach Let's Encrypt and LE can reach the gateway over public internet for HTTP-01 validation. In a closed lab where the gateway is only on the internal network, neither side of that holds. This appendix walks through running a private ACME server (smallstep step-ca) so the same tenant-provision.sh flow works unchanged, with --server <local-ACME-URL> passed to certbot.

B.1 Stand up step-ca

step-ca is a single Go binary (~50 MB). Install it on any host the gateway can reach over the lab network — the orchestration host is a natural fit since it's already in the protected subnet.

B.1.1 Pick a hostname for the CA

The CA's TLS cert is issued for one hostname (you can add SANs later, but the canonical one is set at init time). Both the gateway and any browser machines you'll test from must resolve this name to the orchestration host's IP.

This runbook uses ca.vanevski.net as an example. If your lab DNS doesn't have a record for it, plan to add a /etc/hosts entry on every host that needs to reach step-ca (see §B.2 and §B.4 below).

B.1.2 Pre-flight

# On the orchestration host (intended step-ca host):
dig +short ca.vanevski.net
hostname -I
# If dig returns nothing, add a hosts entry on this host first:
#   echo "$(hostname -I | awk '{print $1}')  ca.vanevski.net" | sudo tee -a /etc/hosts

sudo ss -tlnp | grep ':8443' || echo "8443 free"

B.1.3 Install + initialize

# Install (Ubuntu/Debian — substitute the .deb names from the GitHub
# releases page if version naming has changed).
cd /tmp
wget -q https://github.com/smallstep/cli/releases/latest/download/step-cli_amd64.deb
wget -q https://github.com/smallstep/certificates/releases/latest/download/step-ca_amd64.deb
sudo dpkg -i step-cli_amd64.deb step-ca_amd64.deb
step version && step-ca version

# Generate a strong CA password (saved so the systemd unit can start
# unattended).
mkdir -p ~/.step
openssl rand -base64 32 > ~/.step/ca-pass.txt
chmod 0600 ~/.step/ca-pass.txt

# Initialize. The --provisioner-password-file ensures the JWK provisioner
# also unlocks unattended; --acme adds the ACME provisioner that certbot
# will use.
step ca init \
    --name "TQPro Lab CA" \
    --dns "ca.vanevski.net" \
    --address ":8443" \
    --provisioner admin \
    --password-file ~/.step/ca-pass.txt \
    --provisioner-password-file ~/.step/ca-pass.txt \
    --acme

# CAPTURE TWO THINGS from the output and stash them safely:
#   1. The root cert SHA256 fingerprint (printed at the end).
#   2. The ACME directory URL — should be exactly
#      https://ca.vanevski.net:8443/acme/acme/directory

B.1.4 Move data to /etc/step-ca and rewrite paths

step ca init writes to ~/.step/ and bakes the absolute paths into ca.json. To run as a system service we move the data to a system location and rewrite the paths — otherwise the daemon (running as the step-ca user) tries to read /home/<your-user>/.step/... and fails with permission denied.

sudo useradd -r -s /usr/sbin/nologin -d /etc/step-ca step-ca || true
sudo install -d -m 0750 -o step-ca -g step-ca /etc/step-ca
sudo cp -a ~/.step/* /etc/step-ca/
sudo chown -R step-ca:step-ca /etc/step-ca
sudo chmod 0640 /etc/step-ca/ca-pass.txt

# CRITICAL: rewrite the absolute paths in ca.json from the user-home
# location to the new system location. Without this, the daemon fails
# at startup with "Invalid Dir: /home/<user>/.step/db: permission denied".
sudo sed -i "s|$HOME/.step|/etc/step-ca|g" /etc/step-ca/config/ca.json

# Verify all paths now point at /etc/step-ca:
sudo grep -E '"db"|"address"|"root"|"crt"|"key"|"password"' \
    /etc/step-ca/config/ca.json

B.1.5 Run as a systemd service

sudo tee /etc/systemd/system/step-ca.service >/dev/null <<'EOF'
[Unit]
Description=smallstep step-ca server
After=network-online.target
Wants=network-online.target

[Service]
Type=simple
User=step-ca
Group=step-ca
Environment=STEPPATH=/etc/step-ca
ExecStart=/usr/bin/step-ca /etc/step-ca/config/ca.json --password-file /etc/step-ca/ca-pass.txt
Restart=on-failure
RestartSec=5
NoNewPrivileges=true
ProtectSystem=strict
ProtectHome=true
ReadWritePaths=/etc/step-ca

[Install]
WantedBy=multi-user.target
EOF

sudo systemctl daemon-reload
sudo systemctl enable --now step-ca
sudo systemctl status step-ca --no-pager

ProtectHome=true is what makes the path-rewrite mandatory — without it the daemon would silently read from /home/<user>/.step/ if perms allowed, hiding the configuration error.

B.1.6 Smoke test

curl -k https://ca.vanevski.net:8443/health
# Expected: {"status":"ok"}

curl -k https://ca.vanevski.net:8443/acme/acme/directory
# Expected: a JSON object with newAccount, newOrder, newNonce,
# revokeCert, keyChange URLs.

-k is needed here because we haven't installed the root yet on this host. After §B.2 the -k won't be needed.

B.1.7 Cleanup + backup

# Stale user-home copy is no longer used at runtime. Remove it.
rm -rf ~/.step

# BACK UP the password and the root fingerprint to a sealed location.
# Losing /etc/step-ca/ca-pass.txt means losing the ability to issue or
# rotate certs (and there is no recovery — you'd have to stand up a new
# CA and re-trust everywhere).
sudo cat /etc/step-ca/ca-pass.txt    # → secure backup
sudo step certificate fingerprint /etc/step-ca/certs/root_ca.crt    # → secure backup

B.2 Trust the step-ca root cert on the gateway

certbot has to trust the step-ca cert to talk to it. Copy the root cert to the gateway and install it as a system trust anchor.

B.2.1 Add a /etc/hosts entry on the gateway

The gateway must resolve ca.vanevski.net to the step-ca host's IP. If lab DNS doesn't have the record, add it manually:

# On the gateway. Substitute the orchestration host's actual IP.
echo '192.168.88.163  ca.vanevski.net' | sudo tee -a /etc/hosts
getent hosts ca.vanevski.net    # confirm

B.2.2 Push the root cert to the gateway

Note on SSH access. The §7.1.3 SSH config restricts the FQDN of the gateway to tqpro-deploy connections only — scp as ubuntu or any other operator login fails with Permission denied (publickey). Two practical paths:

  1. Use the short hostname (dev-gw01 instead of dev-gw01.vanevski.net) — bypasses the Match rule and falls back to default identities + password.
  2. Push your own SSH key to the gateway as ubuntu once, via console or ssh-copy-id if password auth is enabled. Useful for other ops tasks too.

The Match-based SSH config from §7.1.3 step 3 prevents this conflict for new installs (it only matches tqpro-deploy@host), but if you followed an earlier iteration of this doc with a Host-based block, you'll hit the constraint.

# On orchestration host — get the cert into a transferable location.
sudo cp /etc/step-ca/certs/root_ca.crt /tmp/tqpro-step-ca-root.crt
sudo chmod 0644 /tmp/tqpro-step-ca-root.crt

# Push (use whichever account has SSH access to the gateway).
scp /tmp/tqpro-step-ca-root.crt ubuntu@dev-gw01:/tmp/

# Alternative: paste the PEM block over an existing console session
# instead of scp.
sudo cat /etc/step-ca/certs/root_ca.crt   # copy output to clipboard

B.2.3 Install on the gateway

# On the gateway.
sudo install -m 0644 /tmp/tqpro-step-ca-root.crt \
    /usr/local/share/ca-certificates/tqpro-step-ca.crt
sudo update-ca-certificates
# Expected: "1 added, 0 removed; ..."

# RHEL/CentOS variant:
#   sudo install ... /etc/pki/ca-trust/source/anchors/tqpro-step-ca.crt
#   sudo update-ca-trust

# Verify — no -k flag this time.
curl https://ca.vanevski.net:8443/health
# Expected: {"status":"ok"}

rm /tmp/tqpro-step-ca-root.crt

If the curl fails with an SSL error after update-ca-certificates reported success, the file probably didn't end in .crt (Ubuntu's update script ignores other extensions) or wasn't in /usr/local/share/ca-certificates/.

B.3 Point certbot at step-ca

On the orchestration host, set certbot.acme.server in /etc/tqpro/platform/tqpro-platform.properties (Section A):

certbot.acme.server=https://ca.vanevski.net:8443/acme/acme/directory

That's the only change — and it's a single edit on the orchestration host. The API hosts' tourlinq.properties does NOT carry this key.

tenant-provision.sh reads this property at each run and appends --server <url> to the certbot call. Renewals via certbot.timer automatically use the same server because it's stored per-cert in /etc/letsencrypt/renewal/<host>.conf.

B.3.1 Make certbot trust step-ca (env vars for the renewal timer)

Even after update-ca-certificates installs the step-ca root into the OS bundle, certbot's Python requests/urllib3/certifi stack uses its own bundled Mozilla CA list and ignores the OS trust store. Result: the first certbot call (and every subsequent renewal) fails with:

requests.exceptions.SSLError: ... CERTIFICATE_VERIFY_FAILED ...
  unable to get local issuer certificate

Two fixes; choose one. The systemd env-var approach is the cleaner of the two — survives python3-certifi upgrades and doesn't touch a distro-managed file.

Option A — env vars in the certbot systemd unit (recommended):

sudo systemctl edit certbot.service
# Paste:
[Service]
Environment=SSL_CERT_FILE=/etc/ssl/certs/ca-certificates.crt
Environment=REQUESTS_CA_BUNDLE=/etc/ssl/certs/ca-certificates.crt

sudo systemctl daemon-reload

Then for manual invocations of certbot (during initial provisioning, not via the timer), prefix the call with sudo env:

sudo env SSL_CERT_FILE=/etc/ssl/certs/ca-certificates.crt \
         REQUESTS_CA_BUNDLE=/etc/ssl/certs/ca-certificates.crt \
    certbot certonly --webroot --webroot-path /var/www/certbot \
        --server <step-ca-acme-url> \
        --non-interactive --agree-tos \
        --email ops@<domain> \
        --domain <tenant>.<domain> \
        --deploy-hook 'systemctl reload nginx'

Note sudo env rather than sudo VAR=… certbot … — plain sudo VAR=val command sanitizes the env and the variable never reaches the command. See §9 troubleshooting.

tenant-provision.sh doesn't currently set these env vars when calling certbot over SSH — operators using step-ca should append them to sudoers env_keep (Defaults env_keep += "SSL_CERT_FILE REQUESTS_CA_BUNDLE") so they propagate from the sudo certbot call inside the script.

Option B — splice the step-ca root into certifi's bundle:

CERTIFI_BUNDLE=$(sudo find /usr/lib/python3 -name "cacert.pem" 2>/dev/null | head -1)
sudo bash -c "cat /usr/local/share/ca-certificates/tqpro-step-ca.crt >> ${CERTIFI_BUNDLE}"
# Verify:
sudo grep -c "BEGIN CERTIFICATE" "${CERTIFI_BUNDLE}"   # +1 vs before

Trade-off: gets overwritten the next time python3-certifi is upgraded, so you'd need to re-splice (or wire it into a post-upgrade hook). The cert is gone the moment the package upgrades — surprise renewal failures the next day.

24-hour step-ca leaf cert urgency: step-ca's default leaf lifetime is 24h. Whichever fix you pick, do it before the first issued cert expires — otherwise the renewal timer fails silently overnight and the tenant URL serves a self-signed snake-oil or an expired cert.

B.3.2 Trust step-ca on the orchestration / CI host

Optional, but useful: without it, any verification curl from the orchestration host to a tenant URL (https://<tenant>.<domain>/) fails with SSL certificate problem. The tenant-provisioning flow itself doesn't need it (the script's calls to the API host use plain HTTP on the internal port platform.api.url), but verification commands and ad-hoc debugging do.

# From the orchestration host
scp tqpro-deploy@<gateway>:/usr/local/share/ca-certificates/tqpro-step-ca.crt /tmp/
sudo install -m 0644 /tmp/tqpro-step-ca.crt /usr/local/share/ca-certificates/
sudo update-ca-certificates

# Verify
curl -sI https://<tenant>.<domain>/ | head -1
# Should return an HTTP status with no "SSL certificate problem" preamble.

B.4 Trust the root cert on developer machines

Browsers visiting https://<tenant>.<platform.domain> validate the TLS chain against the OS (and Firefox's separate) trust store. Until the step-ca root is installed, every tenant subdomain shows NET::ERR_CERT_AUTHORITY_INVALID — correct behaviour for an unknown CA, just inconvenient in dev. This step installs the root on every machine you'll browse from.

Why this isn't automatic. Browsers ship with a curated trust list of ~150 well-known CAs (DigiCert, Let's Encrypt, etc.). Your private step-ca isn't in that list and never will be — only manually-installed roots are trusted. Public Let's Encrypt certs validate without this step; that's the trade-off when you switch to a private ACME server for a closed lab. If/when you go public with real LE (certbot.acme.server= empty in tourlinq.properties), the manual trust step disappears.

B.4.0 Get the root cert off the step-ca host

On the step-ca host (orchestration host):

sudo cat /etc/step-ca/certs/root_ca.crt
# Copy the entire BEGIN/END CERTIFICATE block to your clipboard.

# Optional — note the SHA256 fingerprint to verify after install.
sudo step certificate fingerprint /etc/step-ca/certs/root_ca.crt

B.4.1 Linux (Ubuntu / Debian / Mint / Pop!_OS)

# 1. Save the cert. Filename MUST end in .crt for update-ca-certificates
#    to pick it up.
sudo nano /usr/local/share/ca-certificates/tqpro-step-ca.crt
# Paste the cert content. Save.

# 2. Update the system trust store.
sudo update-ca-certificates
# Expected: "1 added, 0 removed; done."

# 3. Verify (covers Chrome, Chromium, Edge, curl, wget, Java, Python —
#    anything using the OS trust store).
curl https://ca.vanevski.net:8443/health
# Expected: {"status":"ok"} with no warnings.

RHEL / Fedora / CentOS variant:

sudo install -m 0644 root_ca.crt /etc/pki/ca-trust/source/anchors/tqpro-step-ca.crt
sudo update-ca-trust

B.4.2 Windows

Tested on Windows 10/11. Chrome, Edge, and IE use the system trust store; Firefox is separate (see B.4.3).

Save the cert. Open Notepad as Administrator, paste the cert contents, save as:

C:\ProgramData\tqpro-step-ca.crt

When saving: change "Save as type" to All Files (*.*) so the extension is .crt not .txt.

Install — pick one path. Each requires admin privileges.

GUI:

  1. Double-click C:\ProgramData\tqpro-step-ca.crt in File Explorer
  2. Click Install Certificate…
  3. Store Location: Local Machine (not "Current User", if you want it for all users on the box)
  4. Next → Place all certificates in the following store → Browse → Trusted Root Certification Authorities → OK
  5. Next → Finish
  6. Confirm the security warning — verify the displayed thumbprint matches the fingerprint you noted in §B.4.0.

PowerShell (Administrator):

Import-Certificate `
    -FilePath "C:\ProgramData\tqpro-step-ca.crt" `
    -CertStoreLocation Cert:\LocalMachine\Root

cmd (Administrator):

certutil -addstore -f "Root" C:\ProgramData\tqpro-step-ca.crt

Verify. In Chrome or Edge, navigate to:

https://ca.vanevski.net:8443/health

Should show {"status":"ok"} with a normal padlock — no certificate warning.

B.4.3 Firefox (Linux + Windows + macOS)

Firefox maintains its own trust store, separate from the OS. Install per profile:

  1. Firefox → ☰ menu → Settings → Privacy & Security
  2. Scroll to "Certificates" → "View Certificates…"
  3. Tab "Authorities" → Import…
  4. Pick the tqpro-step-ca.crt file you saved earlier
  5. Check "Trust this CA to identify websites"
  6. OK

CLI alternative on Linux (uses NSS db):

sudo apt-get install -y libnss3-tools
certutil -A -n "TQPro Lab CA" -t "C,," \
    -i /usr/local/share/ca-certificates/tqpro-step-ca.crt \
    -d sql:$HOME/.mozilla/firefox/<profile>.default-release

The GUI is faster for one-off setup.

B.4.4 macOS

sudo security add-trusted-cert -d -r trustRoot \
    -k /Library/Keychains/System.keychain \
    /path/to/tqpro-step-ca.crt

(Or drag the .crt into Keychain Access → System keychain → set "When using this certificate" to "Always Trust".)

B.4.5 Hosts entries on the dev machine

Browsers also need to resolve ca.vanevski.net and any tenant subdomains you'll test. Until lab DNS has the records, edit /etc/hosts (Linux/macOS) or C:\Windows\System32\drivers\etc\hosts (Windows, Notepad as Admin):

192.168.88.163  ca.vanevski.net
192.168.88.169  acme-travel.vanevski.net

(Substitute IPs and hostnames for your lab. The step-ca IP is the orchestration host; the tenant IP is the gateway.) On Windows, run ipconfig /flushdns after editing.

B.4.6 Common gotchas

Symptom Fix
Browser still shows "Not Secure" after install Hard reload (Ctrl+Shift+R) — browsers cache TLS errors. Try incognito if persistent.
Firefox still complains, OS-store browsers fine Forgot the separate Firefox import (B.4.3).
curl: SSL certificate problem after install on Linux File didn't end in .crt, or wasn't in /usr/local/share/ca-certificates/. Re-check, re-run update-ca-certificates.
Windows: "the requested operation requires elevation" Run Notepad / PowerShell / cmd as Administrator. Importing into LocalMachine\Root needs it.
nslookup still hits public DNS, not the hosts entry ipconfig /flushdns (Windows) or sudo systemd-resolve --flush-caches (Linux).
Tenant subdomain shows different cert error than ca.vanevski.net Different cert (the tenant's), but same root. If the CA URL works in browser and the tenant doesn't, the tenant cert is the problem (e.g. issuer chain incomplete) — not the root trust.

B.5 Provision a lab tenant

export PLATFORM_ADMIN_TOKEN=$(curl -s -X POST \
    "https://kc.lab.local/realms/master/protocol/openid-connect/token" \
    -d "grant_type=client_credentials" \
    -d "client_id=tqpro-platform-admin" \
    -d "client_secret=${TQPRO_PLATFORM_ADMIN_SECRET}" | jq -r .access_token)

# Add 127.0.0.1 acme-travel.tourlinq.local to /etc/hosts on whatever
# machine you'll browse from, OR resolve the hostname properly via lab DNS.

scripts/platform/tenant-provision.sh acme-travel "Acme Lab" admin@example.lab

Watch the output — step 4 should now mention the step-ca URL:

[4/6] Issuing TLS certificate for acme-travel.tourlinq.local
    Requesting a certificate for acme-travel.tourlinq.local
    ... using server https://step-ca.lab.local:8443/acme/acme/directory ...
    Successfully received certificate.

Browse to https://acme-travel.tourlinq.local/ — full green padlock (once the root is trusted per B.4).

B.6 Renewal in lab

step-ca-issued certs typically have shorter lifetimes (default 24h for the leaf). certbot.timer (active on the gateway) checks daily and renews anything <30 days away — which means lab certs renew on every timer fire. That's fine; the renewal call goes to step-ca, not LE, so no rate-limit concerns.

B.7 Switching back to public Let's Encrypt

When you take a lab tenant to a publicly-reachable environment, just remove the certbot.acme.server value (or set it to the LE production URL — same effect). The next tenant-provision.sh run uses LE and the existing renewal config for already-provisioned tenants keeps using step-ca until you certbot renew --force-renewal --server <new-url> explicitly.

B.8 Failure modes specific to step-ca

Symptom Cause Fix
certbot certonly fails with TLS handshake errors step-ca root not trusted on the gateway Install per §B.2 + update-ca-certificates
Browser shows "untrusted CA" on tenant subdomain step-ca root not installed on dev machine Install per §B.4
certbot certonly fails with "Connection refused" step-ca not running, or the gateway can't reach step-ca.lab.local:8443 Check step-ca process; verify the gateway can curl https://step-ca.lab.local:8443/health
Renewal works but tenant cert isn't picked up nginx didn't reload The --deploy-hook 'systemctl reload nginx' from issuance time should run automatically. If it didn't, check /etc/letsencrypt/renewal/<host>.conf has the renew_hook line.