Tenant Provisioning Runbook¶

Ticket: TQ-115 — Phase 1 deliverable Audience: Ops. Run this once per new tenant after Phase 0 is in place. Initial-install runbook: doc/operations/multitenancy-setup.md

This runbook covers onboarding a new tenant against an installation that already has the multi-tenancy foundation deployed. If you're standing up TQPro multi-tenancy from scratch for the first time, do multitenancy-setup.md first.

1. Prerequisites checklist (per tenant)¶

Before running scripts/platform/tenant-provision.sh:

[ ] Tenant code chosen — lowercase, alphanumerics + -, 3-50 chars, doesn't start or end with -. Used as DNS subdomain, Keycloak realm name, and the slug in tlinq_<code> for the DB name.
[ ] DNS A record for <tenant-code>.<platform.domain> points at the gateway public IP and has propagated. Verify: dig +short <tenant-code>.<platform.domain>
[ ] Email address for the initial tenant admin user known. Keycloak will send the welcome / set-password email to this address.
[ ] tqpro_template DB exists and is current — created via scripts/db/bootstrap-template-db.sh --schema-file <dump> per multitenancy-setup.md §6 and kept in schema sync by scripts/db/apply-tenant-migrations.sh. If you've added migrations recently, double-check the runner ran successfully against the template before you provision a new tenant.
[ ] PLATFORM_ADMIN_TOKEN available in the ops shell environment. Get one via the master-realm client_credentials grant — same call KeycloakAdminClient.fetchToken makes:
```
export PLATFORM_ADMIN_TOKEN=$(curl -s -X POST \
  "${KC_URL}/realms/master/protocol/openid-connect/token" \
  -d "grant_type=client_credentials" \
  -d "client_id=tqpro-platform-admin" \
  -d "client_secret=${TQPRO_PLATFORM_ADMIN_SECRET}" \
  | jq -r .access_token)
```
Tokens are short-lived (~5 min by default). If the script fails on the curl to /platform/tenant/provision with HTTP 401, just re-fetch the token and re-run. - [ ] (Optional) Meta WhatsApp phone_number_id for the tenant. If supplied as the 4th positional arg, the script inserts a row into tqplatform.wa_phone_routing so inbound webhooks route to the right tenant (Phase 5).

2. The provisioning command¶

./scripts/platform/tenant-provision.sh <tenant_code> "<tenant name>" <admin_email> [<wa_phone_id>]

# Example:
./scripts/platform/tenant-provision.sh acme-travel "Acme Travel LLC" admin@acmetravel.example

The script does six things in order. If any step fails, it invokes scripts/platform/tenant-rollback.sh <tenant_code> automatically before exiting.

Step	What	Where to look if it fails
1	DNS pre-flight (`dig +short`)	The script exits before any state change. Wait for DNS or fix the A record.
2	Clone `tqpro_template` → `tlinq_<code>` (pg_dump\|pg_restore)	PG logs (`journalctl -u postgresql`). Verify `tqpro_template` exists and the calling user has CREATEDB.
3	Render nginx vhost + symlink + `nginx -t`	Stderr of the script. `nginx -t` output is verbose.
4	`systemctl reload nginx`	`journalctl -u nginx`
5	`certbot certonly --webroot`	`/var/log/letsencrypt/letsencrypt.log`. Common causes: DNS not propagated to LE servers (wait + retry), rate-limit hit (50 certs / registered domain / week — pause new tenants).
6	`POST /platform/tenant/provision` (Java side)	TQPro application logs (`journalctl -u tqpro-api` or wherever you log). The body of the failed response is in `/tmp/provision-response.json` after the script exits.

A successful run prints:

Tenant 'acme-travel' provisioned. Log in at https://acme-travel.tourlinq.com/
(after DNS + cert finalise). Ops follow-up: confirm the welcome email reached
admin@acmetravel.example.

3. Verifying success¶

# Keycloak — realm exists with both clients and 5 roles + 1 admin user.
curl -s -H "Authorization: Bearer ${PLATFORM_ADMIN_TOKEN}" \
    "${KC_URL}/admin/realms/acme-travel" | jq '.realm, .enabled'
# realm: "acme-travel", enabled: true

curl -s -H "Authorization: Bearer ${PLATFORM_ADMIN_TOKEN}" \
    "${KC_URL}/admin/realms/acme-travel/clients?clientId=tqweb-adm" | jq 'length'
# 1
curl -s -H "Authorization: Bearer ${PLATFORM_ADMIN_TOKEN}" \
    "${KC_URL}/admin/realms/acme-travel/clients?clientId=tqpro-admin-api" | jq 'length'
# 1
curl -s -H "Authorization: Bearer ${PLATFORM_ADMIN_TOKEN}" \
    "${KC_URL}/admin/realms/acme-travel/users" | jq 'length, .[0].requiredActions'
# 1
# ["UPDATE_PASSWORD"]

# Platform DB — tenant row present, kc_admin_client_secret encrypted.
psql tqplatform -c "SELECT tenant_code, status, kc_admin_client_secret LIKE 'encrypted:%' AS sec_enc FROM tenant WHERE tenant_code='acme-travel'"

# Tenant DB — exists, ledger is current, no customer data.
psql tlinq_acme_travel -c "SELECT count(*) FROM public.schema_migrations"   # >= 73
psql tlinq_acme_travel -c "SELECT count(*) FROM nts.booking"                # 0

# nginx — vhost active and serving.
ls -l /etc/nginx/sites-enabled/acme-travel.conf
sudo nginx -t

# Certificate — issued for the tenant host.
sudo certbot certificates | grep acme-travel

After welcome-email arrival, the tenant admin sets a password and lands in the TQPro admin UI on https://acme-travel.<platform.domain>/.

4. Rollback¶

If anything went wrong and the auto-rollback didn't fire (or you want to remove a tenant for any other reason during Phase 1), run:

./scripts/platform/tenant-rollback.sh acme-travel

The script tolerates partial state — every step skips if the prior step's artefact doesn't exist. It does NOT touch DNS records.

If the Keycloak realm DELETE returns 401, re-fetch PLATFORM_ADMIN_TOKEN and re-run — the platform service account picked up realm-management roles when the realm was created, but the cached token predates them.

5. Suspending and reactivating¶

The platform admin API (P1.4) supports temporary suspension:

# Suspend (status → SUSPENDED). Hazelcast fan-out evicts DB pools on every node.
curl -s -X POST "https://<platform-host>/tlinq-api/platform/tenant/suspend" \
    -H "Authorization: Bearer ${PLATFORM_ADMIN_TOKEN}" \
    -H "Content-Type: application/json" \
    -d '{"tenantId":"<uuid-from-platform-list>"}' | jq

# Reactivate.
curl -s -X POST "https://<platform-host>/tlinq-api/platform/tenant/activate" \
    -H "Authorization: Bearer ${PLATFORM_ADMIN_TOKEN}" \
    -H "Content-Type: application/json" \
    -d '{"tenantId":"<uuid>"}' | jq

Suspension only flips the platform-DB status flag and evicts pools. The nginx vhost and TLS cert remain in place. The Keycloak realm is left enabled — login attempts hit the API and get 403 because the tenant is no longer ACTIVE in the registry. Hard deprovisioning (with realm disabled) is Phase 8.

6. Manual registry refresh (escape hatch)¶

If Hazelcast propagation fails for any reason and one node's TenantRegistry is out of sync, force a refresh:

# Refresh entire registry on every node.
curl -X POST "https://<platform-host>/tlinq-api/platform/tenant/refresh" \
    -H "Authorization: Bearer ${PLATFORM_ADMIN_TOKEN}" \
    -H "Content-Type: application/json" -d '{}'

# Refresh just one tenant — also runs the eviction fan-out for non-ACTIVE
# tenants, useful after manually editing tqplatform.tenant.
curl -X POST "https://<platform-host>/tlinq-api/platform/tenant/refresh" \
    -H "Authorization: Bearer ${PLATFORM_ADMIN_TOKEN}" \
    -H "Content-Type: application/json" \
    -d '{"tenantId":"<uuid>"}'

7. Troubleshooting¶

Symptom	Cause	Fix
Step 1 fails — DNS does not resolve	A record missing or not propagated	Wait 5-15 min and re-run; verify with `dig +short`
Step 2 fails with `database "tqpro_template" does not exist`	Template not bootstrapped yet	Run `scripts/db/bootstrap-template-db.sh`
Step 5 (certbot) fails with rate-limit error	Let's Encrypt limits 50 certs / registered domain / week	Wait, or request a rate limit increase from LE. New tenants will need to queue.
Step 5 fails with "Connection refused" / no challenge response	nginx didn't reload, or port 80 firewalled	`nginx -t && systemctl reload nginx`; check firewall
Step 6 returns HTTP 401	`PLATFORM_ADMIN_TOKEN` expired	Re-fetch token and re-run
Step 6 returns "Keycloak unreachable"	KC down or `oidc-keycloak-base-url` wrong	Check Keycloak; verify the property in `tlinqapi.properties`
`KeycloakRealmProvisioner` logs "401 after realm-create"	Token-refresh logic regression	The provisioner re-fetches automatically (P1.3). If it's still failing, check `KeycloakAdminClient.invalidateToken()` is called on the line right after `POST /admin/realms`.
`tenant-rollback.sh` Keycloak DELETE returns 401	Cached token predates realm-admin grant	Re-fetch token, re-run rollback
Tenant logs in but immediately fails with "tenant-unresolved"	Subdomain doesn't match a registered tenant code	Check the tenant_code in `tqplatform.tenant` matches the subdomain; or call `/platform/tenant/refresh` if the row was added recently

8. References¶

Initial install runbook: doc/operations/multitenancy-setup.md
Multi-tenant runtime architecture: doc/architecture/multitenant-architecture.md
Tenant-aware coding for developers: doc/developer/getting-started/tenant-aware-coding.md
Execution plan: doc/plans/multitenancy-execution.md
Architecture decisions: §2 of the execution plan (D-1 .. D-19)
Phase 8 deprovisioning runbook: TBD (separate document)

Appendix A — Orchestration topology¶

Provisioning is always driven from an orchestration host that is not the gateway and not (necessarily) the API host. In a typical TQPro deployment that's either a dedicated ops box or the CI/CD build server that runs the deploy pipelines. The orchestration host holds the credentials and the repo; everything else is acted upon remotely.

A.1 Where each step actually runs¶

The shell script (scripts/platform/tenant-provision.sh) follows this topology end-to-end — the table below also applies if you do the work manually following Appendix B.

Step	Runs on	Reaches out to
0 Read prereqs (`tqpro-platform.properties`, env)	Orchestration	—
1 Pick tenant params	Orchestration (shell vars)	—
2 DNS A record	DNS provider UI (browser)	—
3 `pg_dump` template → `createdb` → `pg_restore` tenant DB	Orchestration (`psql`/`pg_dump` clients)	Postgres host (network)
4 `CREATE ROLE` + grants for the tenant DB role (not in script — see A.3)	Orchestration (`psql` as a Postgres superuser)	Postgres host (network)
5 Render gateway nginx vhost (`render-vhost.py`)	Orchestration (repo + templates here)	—
5 Install vhost + reload nginx	Orchestration	SSH → gateway host
6 `certbot certonly`	Orchestration	SSH → gateway host (runs `sudo certbot` there)
7 Web vhost render + install	Orchestration	SSH → each `web.deploy.hosts` entry
8 Fetch `PLATFORM_ADMIN_TOKEN` from Keycloak master realm	Orchestration (`curl`)	HTTPS → Keycloak
9 `POST /platform/tenant/provision`	Orchestration (`curl`)	HTTPS → API host (internal port)
10 `UPDATE tenant SET db_user, db_pass` (not in API — see A.3)	Orchestration (`psql`)	Postgres host (platform DB)
11 `POST /platform/tenant/refresh`	Orchestration (`curl`)	HTTPS → API host
12 Verifications	Orchestration	KC, API, Postgres

The gateway host is intentionally in the DMZ and must not have Postgres credentials or the PLATFORM_ADMIN_TOKEN. Keep these on the orchestration side only.

A.2 What the orchestration host needs¶

For a CI/build server playing this role:

Requirement	Why
TQPro repo checked out	`scripts/platform/render-vhost.py`, `config/Nginx Config/templates/`, `tqpro-platform.properties`
`psql`, `pg_dump`, `pg_restore` clients	DB clone + role creation + tenant-row UPDATE
`~/.pgpass` (or `PGPASSWORD`) for: a Postgres superuser account on the cluster (for `createdb`/`CREATE ROLE`), and the `tqpro_platform` user on `tqplatform`	Steps 3, 4, 10
SSH key with passwordless access to `${gateway.deploy.user}@${gateway.deploy.host}`	Steps 5, 6
Same for every host in `web.deploy.hosts` (with `${web.deploy.user}`)	Step 7
The remote user must have `NOPASSWD sudo` for at least: `install`, `ln`, `nginx`, `systemctl reload nginx`, `certbot`	Steps 5, 6, 7
Network reachability to: Postgres (5432), Keycloak (443), API host's internal Jersey port (typically 11080, not publicly exposed)	Steps 3, 4, 8, 9, 10, 11
`TQPRO_PLATFORM_ADMIN_SECRET` available as an env var, OR a way to fetch the platform token (e.g., from a secrets manager)	Step 8
`jq` and `curl`	Token fetch + API calls

For pipelines, mark TQPRO_PLATFORM_ADMIN_SECRET, PLATFORM_ADMIN_TOKEN, and the generated DB_PASS as masked variables, and redirect curl output to files instead of stdout — bodies that carry tokens or encrypted secrets shouldn't appear in build logs.

A.3 Known gaps in the current automation¶

All gaps below were surfaced during the first end-to-end manual provisioning walkthrough on 2026-05-16 (tenant perun). They split into three categories: PENDING CODE FIXES that require Java changes, SCRIPT GAPS in tenant-provision.sh, and OPERATOR NOTES that are gotchas rather than bugs.

A.3.1 PENDING CODE FIXES (planned for next sprint)¶

Update 2026-05-17: items A.3.1 #1, A.3.1 #2, and A.3.2 #3 below have been RESOLVED. See the commit that follows the gap-discovery commit for the patch. The corresponding manual workarounds in §B.4 + §B.10 + §B.11 are no longer needed when running against an up-to-date API + script. The text is preserved for historical reference and for ops following an older deployment.

JWTValidator accepts only one oidc-client-id — high priority. JWTValidator.validateToken() (line 92-104) checks audience contains config.getClientId() OR azp == config.getClientId() — a single client id. But multi-tenancy needs to accept tokens from at least two distinct clients on the same API instance:
tqpro-platform-admin (master realm, for /platform/* endpoints)
tqpro-admin-api and/or tqweb-adm (per-tenant realm, for browser SSO + tenant APIs)

Today this forces an either/or config: set oidc-client-id=tqpro-platform-admin and master-realm calls work but tenant-realm browser SSO fails (the /auth/config endpoint also returns this wrong clientId to the SPA); set it to tqweb-adm and the inverse. Fix: accept a comma-separated list of client ids and check membership.

Until fixed, operators must flip oidc-client-id to tqpro-platform-admin to provision a tenant, then flip back to tqweb-adm for browser SSO to work — with a systemctl restart tlinq between flips.

STATUS: RESOLVED 2026-05-17. oidc-client-id now accepts a comma-separated list. The first entry is the primary client (returned by /auth/config to the SPA, used in logout URLs). JWTValidator.validateToken() accepts a token whose aud or azp matches any entry. Recommended setting: oidc-client-id=tqweb-adm,tqpro-platform-admin.

TenantProvisioningFacade.insertTenantRow does not write db_user / db_pass. Already documented — it writes everything else (tenant_id, tenant_code, tenant_name, db_name, kc_realm, kc_admin_client_secret, status) but leaves the credential columns null. TenantAwareDBSession.requireTenantCredentials then rejects every request to the tenant with TenantConfigException. Workaround is the manual UPDATE in §B.10 + a /platform/tenant/refresh in §B.11 so the in-memory TenantInfo picks up the new values.

STATUS: RESOLVED 2026-05-17. ProvisionRequest now requires dbUser + dbPass. The facade encrypts dbPass via TenantConfig.encrypt (unless already prefixed encrypted:) and inserts it alongside the other columns. Plain POST /platform/tenant/provision produces a complete row — no follow-up UPDATE needed. §B.10 and §B.11 in this appendix become unnecessary when running against an up-to-date API.

A.3.2 SCRIPT GAPS (`scripts/platform/tenant-provision.sh`)¶

The script does not create the per-tenant Postgres role. createdb happens, but no CREATE ROLE for the tenant. The Java API later tries to connect as that role and fails. Workaround: run §B.4 yourself. Note the corrected role+grants block in §B.4 fans the GRANT loop across all tenant-DB schemas — not just public — so the role can actually read every schema (amadeus, goglobal, nts, public, rayna, tiqets, tqwa).

STATUS: RESOLVED 2026-05-17. tenant-provision.sh step 2 now auto-generates DB_PASS (or accepts a pre-set TENANT_DB_PASS env var) and runs CREATE ROLE + the same fan-out GRANT loop §B.4 describes. The role + password are then passed to the Java provision call in step 6. tenant-rollback.sh step 4b drops the role on rollback. §B.4 in this appendix becomes unnecessary.

The script runs nginx -t immediately after vhost install, but on a fresh gateway the per-tenant cert files don't exist yet. The rendered vhost references /etc/letsencrypt/live/<tenant-host>/{fullchain,privkey}.pem, and nginx -t fails loudly if they're missing — which aborts the script before certbot ever runs. On a gateway with at least one prior tenant, it accidentally works (other tenants' files keep nginx -t happy). Manual workaround is a two-pass install (§B.6 + §B.7.5): stub HTTP-only vhost first → certbot → replace stub with the full HTTPS vhost. Script fix should follow the same pattern, or pre-create a snake-oil cert before the install.

STATUS: RESOLVED 2026-05-17. tenant-provision.sh step 3 now installs an HTTP-only stub (handling just the /.well-known/acme-challenge/ location); step 4 runs certbot as before; new step 4b installs the full HTTPS vhost from the render output. nginx -t is happy at every step. §B.6 (stub) and §B.7.5 (full swap) in the manual walkthrough are no longer needed.

Pre-existing bug in tenant-provision.sh:106 — apostrophe in ${ACME_SERVER:-Let's Encrypt production (default)} is a real bash parser error. Inside ${VAR:-...}, the apostrophe opens a single quote that is never closed. bash -n reports a syntax error at end-of-file; runtime invocations refuse to start. The script was probably never successfully run end-to-end since the line was introduced — operators presumably did manual provisioning all along without realising the automation didn't actually work.

STATUS: RESOLVED 2026-05-17. Apostrophe removed (now LetsEncrypt production (default)). bash -n passes cleanly.

A.3.3 SETUP / RUNBOOK GAPS (see also `multitenancy-setup.md`)¶

Missing step: create + assign platform-admin realm role in master realm. multitenancy-setup.md §2 currently only tells operators to assign create-realm to the tqpro-platform-admin service account (per D-4). But the Java API also requires the token to carry the realm role platform-admin to reach /platform/* endpoints — without it, every call returns ERR0008. The platform-admin role is our app-level role (entirely separate from KC's built-in admin role, so D-4's prohibition still holds). Needs a new sub-step in §2.
Missing step: EnvironmentFile= in the systemd unit. multitenancy-setup.md §5 documents /etc/tqpro/tqpro.env but doesn't enforce that the systemd unit actually references it. If sudo systemctl cat tlinq | grep Environment returns nothing, the JVM doesn't see TQPRO_ENCRYPTION_KEY or TQPRO_PLATFORM_ADMIN_SECRET, and the provisioning API fails with ERR00014 even though the env file is on disk. Fix is a drop-in:
```
# /etc/systemd/system/tlinq.service.d/override.conf
[Service]
EnvironmentFile=/etc/tqpro/tqpro.env
```
followed by systemctl daemon-reload && systemctl restart tlinq.
Missing step: migrate tlinqapi.properties from single-tenant layout. Pre-multi-tenancy installs have auth-mode=hybrid, dev-mode=true, oidc-issuer=<hardcoded one realm>, oidc-client-id=tqweb-adm, no oidc-keycloak-base-url. The multi-tenant validator needs oidc-keycloak-base-url=https://<kc-host> (no trailing slash, exactly matching the prefix of the JWT's iss claim minus /realms/<realm>), and dev-mode=false so JWT validation failures don't silently fall through to a fake dev user. multitenancy-setup.md Appendix A (in-place migration) should call this out explicitly with a sed/diff against the old config.
Missing section: web-tier host bootstrap. The rendered web vhost hardcodes root /opt/tqpro/tqweb-adm, but the SPA bundle isn't deployed there automatically. Web hosts need a one-time setup that:
Creates /opt/tqpro/tqweb-adm with ownership tqpro-deploy:www-data and mode 2775 (setgid so new files inherit group www-data)
Initial rsync of ${REPO_ROOT}/tqweb-adm/ to that path This belongs in multitenancy-setup.md as its own section because it's per-web-host setup, not per-tenant.
Missing step: certbot + step-ca needs SSL_CERT_FILE / REQUESTS_CA_BUNDLE env vars in the renewal timer. Python's requests/urllib3/certifi ignores the OS trust store and uses its own bundled Mozilla list. Even after update-ca-certificates installs the step-ca root into the OS bundle, certbot fails with SSLError: unable to get local issuer certificate. Fix one of:
Add SSL_CERT_FILE=/etc/ssl/certs/ca-certificates.crt to certbot's systemd unit Environment= block (renewal-safe)
Splice the step-ca root into certifi's bundle (/usr/lib/python3/dist-packages/certifi/cacert.pem) — gets overwritten on python3-certifi package upgrade The 24-hour-lifetime step-ca leaf cert dies fast, so renewal MUST work. Add to gateway-bootstrap section of multitenancy-setup.md Appendix B (step-ca lab setup).
tqpro_template may be missing public.schema_migrations. bootstrap-template-db.sh is supposed to create and seed the ledger (its comment header step 4), but the template observed in the perun walkthrough did not have it. This pre-dates per-tenant provisioning; every new tenant cloned from a ledger-less template will also lack it, breaking future apply-tenant-migrations.sh runs. Verify with: psql -d tqpro_template -c "SELECT count(*) FROM public.schema_migrations". If it errors, the template needs a bootstrap fix and every existing tenant needs the ledger back-filled.

A.3.4 OPERATOR NOTES (gotchas)¶

sudo VAR=value command does NOT pass VAR to the command. sudo sanitizes the environment by default. The variable gets set in the calling shell, then thrown away. Use one of:
- sudo env VAR=value command (explicit env as the first command under sudo)
- export VAR=value && sudo -E command (sudo -E preserves all caller-side env)
- Add VAR to Defaults env_keep in /etc/sudoers.d/
CI/orchestration host doesn't automatically trust step-ca. Without this, curl from the orchestration host to KC or to https://<tenant>.<platform.domain>/ fails with SSL certificate problem: unable to get local issuer certificate. Fix (one-time per CI host):
```
scp tqpro-deploy@<gateway>:/usr/local/share/ca-certificates/tqpro-step-ca.crt /tmp/
sudo install -m 0644 /tmp/tqpro-step-ca.crt /usr/local/share/ca-certificates/
sudo update-ca-certificates
```
Not blocking the manual flow (all CI → API calls use internal HTTP on platform.api.url, not HTTPS), but blocks any verification curl that would hit the public TLS endpoint.
Gateway may have stale single-tenant vhosts (tqweb-pub, tqweb-adm, tqweb-b2b, tqweb-auth, auth.vanevski.net) from the pre-multi-tenancy era. They don't break the new per-tenant vhost, but they pollute the config and the implicit-default selection for port 443 is whichever loads first alphabetically — which can cause confusing diagnostics like https://<new-tenant>.../ redirecting to the legacy auth realm until the new vhost loads. Plan a cleanup once all tenants are migrated to the per-tenant pattern.
First-tenant KC realm has no SMTP sender configured — welcome / password-reset email fails with Invalid sender address 'null'. KeycloakRealmProvisioner logs a WARNING rather than failing the provision call, so the realm is created and usable, but the initial admin user has no password and no way to get one via email. Operator workaround: in KC admin UI → Realm <code> → Users → admin user → Credentials tab → Set password manually. Long-term fix: provision sets default SMTP from realm template, or multitenancy-setup.md adds a step to configure master-level email-realm defaults that get inherited.

Most of these collapse into single steps once fixed: items 2, 3 → §B.4 + §B.10 + §B.11 become unnecessary items 5, 6, 7 → setup runbook gains the missing one-time steps item 1 → the oidc-client-id flip-flop disappears items 8-10 → setup runbook expands to cover the whole platform install

Appendix B — Manual provisioning (without the script)¶

Use this when you want to understand the flow end-to-end, when you're debugging a script failure, or when running in an environment the script doesn't natively support. All steps execute from the orchestration host unless explicitly noted; see Appendix A for the topology rules.

What this appendix does NOT replace: the Keycloak realm + clients + roles + initial admin user. That work happens inside the Java API (POST /platform/tenant/provision) and is too fiddly to redo by hand at the KC admin API level — there's no benefit to bypassing it.

B.1 Pick parameters¶

CODE='acme-travel'                         # Keycloak realm + DNS + DB suffix
NAME='Acme Travel LLC'
ADMIN_EMAIL='admin@acmetravel.example'
PLATFORM_DOMAIN='vanevski.net'             # from tqpro-platform.properties

TENANT_HOST="${CODE}.${PLATFORM_DOMAIN}"   # e.g. acme-travel.vanevski.net
DB_NAME="tlinq_${CODE//-/_}"               # hyphens → underscores
DB_ROLE="tlinq_${CODE//-/_}"               # convention: role == db name
DB_PASS="$(openssl rand -base64 24)"
echo "Tenant DB password (save this): $DB_PASS"

B.2 DNS¶

Register an A record at the DNS provider: ${CODE}.${PLATFORM_DOMAIN} → gateway IP.

Verify from the orchestration host:

dig +short ${TENANT_HOST}

B.3 Clone the template DB¶

pg_dump -Fc -d tqpro_template -f /tmp/seed.dump
createdb "${DB_NAME}"
pg_restore --no-owner --no-privileges -d "${DB_NAME}" /tmp/seed.dump
rm /tmp/seed.dump

# Verify
psql -d "${DB_NAME}" -c "SELECT count(*) FROM schema_migrations;"

B.4 Create the tenant's Postgres role (gap — see A.3)¶

Run as a Postgres superuser. From the orchestration host (assuming superuser credentials in ~/.pgpass):

psql -h "${PG_HOST:-localhost}" -U postgres <<SQL
CREATE ROLE ${DB_ROLE} WITH LOGIN PASSWORD '${DB_PASS}';
GRANT ALL PRIVILEGES ON DATABASE ${DB_NAME} TO ${DB_ROLE};
\c ${DB_NAME}
GRANT ALL PRIVILEGES ON SCHEMA public TO ${DB_ROLE};
GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA public TO ${DB_ROLE};
GRANT ALL PRIVILEGES ON ALL SEQUENCES IN SCHEMA public TO ${DB_ROLE};
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT ALL ON TABLES TO ${DB_ROLE};
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT ALL ON SEQUENCES TO ${DB_ROLE};
SQL

# Verify
PGPASSWORD="${DB_PASS}" psql -h "${PG_HOST:-localhost}" -U "${DB_ROLE}" \
    -d "${DB_NAME}" -c "SELECT count(*) FROM schema_migrations;"

B.5 Render the gateway nginx vhost¶

python3 scripts/platform/render-vhost.py \
    --template 'config/Nginx Config/templates/tenant-gw.conf.template' \
    --output /tmp/${CODE}-gw.conf \
    --tenant-code "${CODE}" \
    --tenant-host "${TENANT_HOST}"

cat /tmp/${CODE}-gw.conf   # always review before deploying

B.6 Install the vhost on the gateway¶

GW_HOST="$(grep ^gateway.deploy.host= /etc/tqpro/platform/tqpro-platform.properties | cut -d= -f2-)"
GW_USER="$(grep ^gateway.deploy.user= /etc/tqpro/platform/tqpro-platform.properties | cut -d= -f2-)"
GW_USER="${GW_USER:-tqpro-deploy}"

# Pre-flight: the shared upstreams.conf must already exist on the gateway
ssh "${GW_USER}@${GW_HOST}" "test -f /etc/nginx/conf.d/tqpro-upstreams.conf" || \
    { echo "MISSING — run scripts/platform/render-upstreams.sh first"; exit 1; }

scp /tmp/${CODE}-gw.conf "${GW_USER}@${GW_HOST}:/tmp/"
ssh "${GW_USER}@${GW_HOST}" "\
    sudo install -m 0644 /tmp/${CODE}-gw.conf /etc/nginx/sites-available/${CODE}.conf && \
    sudo ln -sf /etc/nginx/sites-available/${CODE}.conf /etc/nginx/sites-enabled/${CODE}.conf && \
    sudo nginx -t && sudo systemctl reload nginx && \
    rm -f /tmp/${CODE}-gw.conf"

B.7 Issue the TLS certificate¶

ssh "${GW_USER}@${GW_HOST}" "\
    sudo certbot certonly --webroot --webroot-path /var/www/certbot \
        --non-interactive --agree-tos \
        --email ops@${PLATFORM_DOMAIN} \
        --domain ${TENANT_HOST} \
        --deploy-hook 'systemctl reload nginx'"

# Verify
ssh "${GW_USER}@${GW_HOST}" "sudo ls /etc/letsencrypt/live/${TENANT_HOST}/"
curl -sI https://${TENANT_HOST}/ | head -1

In a closed lab without public ACME reach, append --server <private-acme-url> (the script does this when certbot.acme.server is set in tqpro-platform.properties; see multitenancy-setup.md Appendix B).

B.8 Render and install the web vhost (multi-host deployments only)¶

Skip if your dev box collapses the gateway and web tiers into one nginx.

python3 scripts/platform/render-vhost.py \
    --template 'config/Nginx Config/templates/tenant-web.conf.template' \
    --output /tmp/${CODE}-web.conf \
    --tenant-code "${CODE}" \
    --tenant-host "${TENANT_HOST}"

WEB_HOSTS="$(grep ^web.deploy.hosts= /etc/tqpro/platform/tqpro-platform.properties | cut -d= -f2-)"
WEB_USER="$(grep ^web.deploy.user= /etc/tqpro/platform/tqpro-platform.properties | cut -d= -f2-)"
WEB_USER="${WEB_USER:-tqpro-deploy}"

IFS=',' read -ra hosts <<<"${WEB_HOSTS}"
for h in "${hosts[@]}"; do
    h="$(echo "$h" | tr -d '[:space:]')"
    scp /tmp/${CODE}-web.conf "${WEB_USER}@${h}:/tmp/"
    ssh "${WEB_USER}@${h}" "\
        sudo install -m 0644 /tmp/${CODE}-web.conf /etc/nginx/sites-available/${CODE}.conf && \
        sudo ln -sf /etc/nginx/sites-available/${CODE}.conf /etc/nginx/sites-enabled/${CODE}.conf && \
        sudo nginx -t && sudo systemctl reload nginx && \
        rm -f /tmp/${CODE}-web.conf"
done

B.9 Call the Java provisioning endpoint¶

Fetch a fresh PLATFORM_ADMIN_TOKEN (see §1) then:

PLATFORM_API_URL="$(grep ^platform.api.url= /etc/tqpro/platform/tqpro-platform.properties | cut -d= -f2-)/tlinq-api"

RESPONSE=$(curl -s -X POST "${PLATFORM_API_URL}/platform/tenant/provision" \
    -H "Authorization: Bearer ${PLATFORM_ADMIN_TOKEN}" \
    -H "Content-Type: application/json" \
    -d "{
        \"tenantCode\":  \"${CODE}\",
        \"tenantName\":  \"${NAME}\",
        \"dbName\":      \"${DB_NAME}\",
        \"tenantHost\":  \"${TENANT_HOST}\",
        \"adminEmail\":  \"${ADMIN_EMAIL}\"
    }")

echo "${RESPONSE}" | jq
TENANT_ID=$(echo "${RESPONSE}" | jq -r .tenantId)
echo "Provisioned tenant_id: ${TENANT_ID}"

Behind that single call, the Java code (TenantProvisioningFacade.provision): 1. Creates the Keycloak ${CODE} realm with two clients (tqweb-adm browser, tqpro-admin-api server-to-server) and five roles (guest, agent, admin, manager, finance) plus the initial admin user. 2. Encrypts the admin-client secret returned by KC. 3. Inserts the row into tqplatform.tenant. 4. Calls TenantRegistry.refresh() to publish via Hazelcast.

On failure after the realm exists, it best-effort deletes the realm and returns an error.

B.10 Manual patch: set `db_user` / `db_pass` (gap — see A.3)¶

PGPASSWORD="$(grep ^platform.db.pass= /etc/tqpro/tourlinq.properties | cut -d= -f2-)" \
  psql -h "${PG_PLATFORM_HOST:-localhost}" -U tqpro_platform -d tqplatform <<SQL
UPDATE tenant
SET db_user = '${DB_ROLE}',
    db_pass = '${DB_PASS}'
WHERE tenant_id = '${TENANT_ID}';
SQL

Plaintext vs encrypted db_pass: the column stores either form. TenantConfig.decrypt() treats values that don't start with encrypted: as passthrough plaintext — fine for dev/test. For production, encrypt with TenantConfig.encrypt() and store with the encrypted: prefix (same convention as kc_admin_client_secret).

B.11 Refresh the registry¶

The provision call above already called TenantRegistry.refresh() once, but that ran before you UPDATEd db_user/db_pass. The cached TenantInfo is therefore missing the credentials. Force a re-read:

curl -s -X POST "${PLATFORM_API_URL}/platform/tenant/refresh" \
    -H "Authorization: Bearer ${PLATFORM_ADMIN_TOKEN}" \
    -H "Content-Type: application/json" \
    -d "{\"tenantId\": \"${TENANT_ID}\"}" | jq

B.12 Verify¶

# Platform DB row complete
psql ... -d tqplatform -c "SELECT tenant_id, tenant_code, db_name, db_user IS NOT NULL AS has_user,
                                  db_pass IS NOT NULL AS has_pass, status
                          FROM tenant WHERE tenant_id='${TENANT_ID}';"

# Keycloak realm
curl -s -H "Authorization: Bearer ${PLATFORM_ADMIN_TOKEN}" \
    "${KC_URL}/admin/realms/${CODE}" | jq '{realm, enabled}'

# API resolves the tenant
curl -si -H "Host: ${TENANT_HOST}" \
    "${PLATFORM_API_URL}/auth/config" \
    -H "Content-Type: application/json" -d '{}' | head -20
# Expected: HTTP 200 with realm + clientId for ${CODE}

# First request triggers lazy SessionFactory build — watch the API log:
#   "Building NTS factory for tenant ${CODE} → jdbc:postgresql://..."
journalctl -u tlinq -n 50 | grep "Building.*factory.*${CODE}"

B.13 Browser smoke test¶

Open https://${TENANT_HOST}/. You'll be redirected through Keycloak (${CODE} realm). Log in as ${ADMIN_EMAIL}; KC prompts for password reset on first login (welcome email, or set directly via the KC admin console for dev). Empty dashboard = correctly provisioned tenant with no data yet.

B.14 Mapping to the script's flow¶

The complete correspondence between manual steps and the shell script, for anyone debugging a script failure or extending the automation:

Appendix B step	Script section / line range
B.2 — DNS	`tenant-provision.sh:138-145` (Step 1)
B.3 — DB clone	`tenant-provision.sh:147-165` (Step 2)
B.4 — DB role (gap)	not in script
B.5–B.6 — Gateway vhost	`tenant-provision.sh:167-195` (Step 3)
B.7 — Certbot	`tenant-provision.sh:197-212` (Step 4)
B.8 — Web vhost	`tenant-provision.sh:214-245` (Step 5)
B.9 — Java provision	`tenant-provision.sh:247-280` (Step 6)
B.10 — UPDATE db_user/pass (gap)	not in script
B.11 — Registry refresh	done implicitly inside the Java facade