Tenant Provisioning Runbook¶
Ticket: TQ-115 — Phase 1 deliverable
Audience: Ops. Run this once per new tenant after Phase 0 is in place.
Initial-install runbook: doc/operations/multitenancy-setup.md
This runbook covers onboarding a new tenant against an installation that
already has the multi-tenancy foundation deployed. If you're standing up
TQPro multi-tenancy from scratch for the first time, do
multitenancy-setup.md first.
1. Prerequisites checklist (per tenant)¶
Before running scripts/platform/tenant-provision.sh:
- [ ] Tenant code chosen — lowercase, alphanumerics +
-, 3-50 chars, doesn't start or end with-. Used as DNS subdomain, Keycloak realm name, and the slug intlinq_<code>for the DB name. - [ ] DNS A record for
<tenant-code>.<platform.domain>points at the gateway public IP and has propagated. Verify:dig +short <tenant-code>.<platform.domain> - [ ] Email address for the initial tenant admin user known. Keycloak will send the welcome / set-password email to this address.
- [ ]
tqpro_templateDB exists and is current — created viascripts/db/bootstrap-template-db.sh --schema-file <dump>permultitenancy-setup.md§6 and kept in schema sync byscripts/db/apply-tenant-migrations.sh. If you've added migrations recently, double-check the runner ran successfully against the template before you provision a new tenant. -
[ ]
PLATFORM_ADMIN_TOKENavailable in the ops shell environment. Get one via the master-realm client_credentials grant — same callKeycloakAdminClient.fetchTokenmakes:export PLATFORM_ADMIN_TOKEN=$(curl -s -X POST \ "${KC_URL}/realms/master/protocol/openid-connect/token" \ -d "grant_type=client_credentials" \ -d "client_id=tqpro-platform-admin" \ -d "client_secret=${TQPRO_PLATFORM_ADMIN_SECRET}" \ | jq -r .access_token)Tokens are short-lived (~5 min by default). If the script fails on the curl to
/platform/tenant/provisionwith HTTP 401, just re-fetch the token and re-run. - [ ] (Optional) Meta WhatsAppphone_number_idfor the tenant. If supplied as the 4th positional arg, the script inserts a row intotqplatform.wa_phone_routingso inbound webhooks route to the right tenant (Phase 5).
2. The provisioning command¶
./scripts/platform/tenant-provision.sh <tenant_code> "<tenant name>" <admin_email> [<wa_phone_id>]
# Example:
./scripts/platform/tenant-provision.sh acme-travel "Acme Travel LLC" admin@acmetravel.example
The script does six things in order. If any step fails, it invokes
scripts/platform/tenant-rollback.sh <tenant_code> automatically before exiting.
| Step | What | Where to look if it fails |
|---|---|---|
| 1 | DNS pre-flight (dig +short) |
The script exits before any state change. Wait for DNS or fix the A record. |
| 2 | Clone tqpro_template → tlinq_<code> (pg_dump|pg_restore) |
PG logs (journalctl -u postgresql). Verify tqpro_template exists and the calling user has CREATEDB. |
| 3 | Render nginx vhost + symlink + nginx -t |
Stderr of the script. nginx -t output is verbose. |
| 4 | systemctl reload nginx |
journalctl -u nginx |
| 5 | certbot certonly --webroot |
/var/log/letsencrypt/letsencrypt.log. Common causes: DNS not propagated to LE servers (wait + retry), rate-limit hit (50 certs / registered domain / week — pause new tenants). |
| 6 | POST /platform/tenant/provision (Java side) |
TQPro application logs (journalctl -u tqpro-api or wherever you log). The body of the failed response is in /tmp/provision-response.json after the script exits. |
A successful run prints:
Tenant 'acme-travel' provisioned. Log in at https://acme-travel.tourlinq.com/
(after DNS + cert finalise). Ops follow-up: confirm the welcome email reached
admin@acmetravel.example.
3. Verifying success¶
# Keycloak — realm exists with both clients and 5 roles + 1 admin user.
curl -s -H "Authorization: Bearer ${PLATFORM_ADMIN_TOKEN}" \
"${KC_URL}/admin/realms/acme-travel" | jq '.realm, .enabled'
# realm: "acme-travel", enabled: true
curl -s -H "Authorization: Bearer ${PLATFORM_ADMIN_TOKEN}" \
"${KC_URL}/admin/realms/acme-travel/clients?clientId=tqweb-adm" | jq 'length'
# 1
curl -s -H "Authorization: Bearer ${PLATFORM_ADMIN_TOKEN}" \
"${KC_URL}/admin/realms/acme-travel/clients?clientId=tqpro-admin-api" | jq 'length'
# 1
curl -s -H "Authorization: Bearer ${PLATFORM_ADMIN_TOKEN}" \
"${KC_URL}/admin/realms/acme-travel/users" | jq 'length, .[0].requiredActions'
# 1
# ["UPDATE_PASSWORD"]
# Platform DB — tenant row present, kc_admin_client_secret encrypted.
psql tqplatform -c "SELECT tenant_code, status, kc_admin_client_secret LIKE 'encrypted:%' AS sec_enc FROM tenant WHERE tenant_code='acme-travel'"
# Tenant DB — exists, ledger is current, no customer data.
psql tlinq_acme_travel -c "SELECT count(*) FROM public.schema_migrations" # >= 73
psql tlinq_acme_travel -c "SELECT count(*) FROM nts.booking" # 0
# nginx — vhost active and serving.
ls -l /etc/nginx/sites-enabled/acme-travel.conf
sudo nginx -t
# Certificate — issued for the tenant host.
sudo certbot certificates | grep acme-travel
After welcome-email arrival, the tenant admin sets a password and lands in
the TQPro admin UI on https://acme-travel.<platform.domain>/.
4. Rollback¶
If anything went wrong and the auto-rollback didn't fire (or you want to remove a tenant for any other reason during Phase 1), run:
The script tolerates partial state — every step skips if the prior step's artefact doesn't exist. It does NOT touch DNS records.
If the Keycloak realm DELETE returns 401, re-fetch PLATFORM_ADMIN_TOKEN
and re-run — the platform service account picked up realm-management
roles when the realm was created, but the cached token predates them.
5. Suspending and reactivating¶
The platform admin API (P1.4) supports temporary suspension:
# Suspend (status → SUSPENDED). Hazelcast fan-out evicts DB pools on every node.
curl -s -X POST "https://<platform-host>/tlinq-api/platform/tenant/suspend" \
-H "Authorization: Bearer ${PLATFORM_ADMIN_TOKEN}" \
-H "Content-Type: application/json" \
-d '{"tenantId":"<uuid-from-platform-list>"}' | jq
# Reactivate.
curl -s -X POST "https://<platform-host>/tlinq-api/platform/tenant/activate" \
-H "Authorization: Bearer ${PLATFORM_ADMIN_TOKEN}" \
-H "Content-Type: application/json" \
-d '{"tenantId":"<uuid>"}' | jq
Suspension only flips the platform-DB status flag and evicts pools. The nginx vhost and TLS cert remain in place. The Keycloak realm is left enabled — login attempts hit the API and get 403 because the tenant is no longer ACTIVE in the registry. Hard deprovisioning (with realm disabled) is Phase 8.
6. Manual registry refresh (escape hatch)¶
If Hazelcast propagation fails for any reason and one node's
TenantRegistry is out of sync, force a refresh:
# Refresh entire registry on every node.
curl -X POST "https://<platform-host>/tlinq-api/platform/tenant/refresh" \
-H "Authorization: Bearer ${PLATFORM_ADMIN_TOKEN}" \
-H "Content-Type: application/json" -d '{}'
# Refresh just one tenant — also runs the eviction fan-out for non-ACTIVE
# tenants, useful after manually editing tqplatform.tenant.
curl -X POST "https://<platform-host>/tlinq-api/platform/tenant/refresh" \
-H "Authorization: Bearer ${PLATFORM_ADMIN_TOKEN}" \
-H "Content-Type: application/json" \
-d '{"tenantId":"<uuid>"}'
7. Troubleshooting¶
| Symptom | Cause | Fix |
|---|---|---|
| Step 1 fails — DNS does not resolve | A record missing or not propagated | Wait 5-15 min and re-run; verify with dig +short |
Step 2 fails with database "tqpro_template" does not exist |
Template not bootstrapped yet | Run scripts/db/bootstrap-template-db.sh |
| Step 5 (certbot) fails with rate-limit error | Let's Encrypt limits 50 certs / registered domain / week | Wait, or request a rate limit increase from LE. New tenants will need to queue. |
| Step 5 fails with "Connection refused" / no challenge response | nginx didn't reload, or port 80 firewalled | nginx -t && systemctl reload nginx; check firewall |
| Step 6 returns HTTP 401 | PLATFORM_ADMIN_TOKEN expired |
Re-fetch token and re-run |
| Step 6 returns "Keycloak unreachable" | KC down or oidc-keycloak-base-url wrong |
Check Keycloak; verify the property in tlinqapi.properties |
KeycloakRealmProvisioner logs "401 after realm-create" |
Token-refresh logic regression | The provisioner re-fetches automatically (P1.3). If it's still failing, check KeycloakAdminClient.invalidateToken() is called on the line right after POST /admin/realms. |
tenant-rollback.sh Keycloak DELETE returns 401 |
Cached token predates realm-admin grant | Re-fetch token, re-run rollback |
| Tenant logs in but immediately fails with "tenant-unresolved" | Subdomain doesn't match a registered tenant code | Check the tenant_code in tqplatform.tenant matches the subdomain; or call /platform/tenant/refresh if the row was added recently |
8. References¶
- Initial install runbook:
doc/operations/multitenancy-setup.md - Multi-tenant runtime architecture:
doc/architecture/multitenant-architecture.md - Tenant-aware coding for developers:
doc/developer/getting-started/tenant-aware-coding.md - Execution plan:
doc/plans/multitenancy-execution.md - Architecture decisions: §2 of the execution plan (D-1 .. D-19)
- Phase 8 deprovisioning runbook: TBD (separate document)
Appendix A — Orchestration topology¶
Provisioning is always driven from an orchestration host that is not the gateway and not (necessarily) the API host. In a typical TQPro deployment that's either a dedicated ops box or the CI/CD build server that runs the deploy pipelines. The orchestration host holds the credentials and the repo; everything else is acted upon remotely.
A.1 Where each step actually runs¶
The shell script (scripts/platform/tenant-provision.sh) follows this
topology end-to-end — the table below also applies if you do the work
manually following Appendix B.
| Step | Runs on | Reaches out to |
|---|---|---|
0 Read prereqs (tqpro-platform.properties, env) |
Orchestration | — |
| 1 Pick tenant params | Orchestration (shell vars) | — |
| 2 DNS A record | DNS provider UI (browser) | — |
3 pg_dump template → createdb → pg_restore tenant DB |
Orchestration (psql/pg_dump clients) |
Postgres host (network) |
4 CREATE ROLE + grants for the tenant DB role (not in script — see A.3) |
Orchestration (psql as a Postgres superuser) |
Postgres host (network) |
5 Render gateway nginx vhost (render-vhost.py) |
Orchestration (repo + templates here) | — |
| 5 Install vhost + reload nginx | Orchestration | SSH → gateway host |
6 certbot certonly |
Orchestration | SSH → gateway host (runs sudo certbot there) |
| 7 Web vhost render + install | Orchestration | SSH → each web.deploy.hosts entry |
8 Fetch PLATFORM_ADMIN_TOKEN from Keycloak master realm |
Orchestration (curl) |
HTTPS → Keycloak |
9 POST /platform/tenant/provision |
Orchestration (curl) |
HTTPS → API host (internal port) |
10 UPDATE tenant SET db_user, db_pass (not in API — see A.3) |
Orchestration (psql) |
Postgres host (platform DB) |
11 POST /platform/tenant/refresh |
Orchestration (curl) |
HTTPS → API host |
| 12 Verifications | Orchestration | KC, API, Postgres |
The gateway host is intentionally in the DMZ and must not have
Postgres credentials or the PLATFORM_ADMIN_TOKEN. Keep these on the
orchestration side only.
A.2 What the orchestration host needs¶
For a CI/build server playing this role:
| Requirement | Why |
|---|---|
| TQPro repo checked out | scripts/platform/render-vhost.py, config/Nginx Config/templates/, tqpro-platform.properties |
psql, pg_dump, pg_restore clients |
DB clone + role creation + tenant-row UPDATE |
~/.pgpass (or PGPASSWORD) for: a Postgres superuser account on the cluster (for createdb/CREATE ROLE), and the tqpro_platform user on tqplatform |
Steps 3, 4, 10 |
SSH key with passwordless access to ${gateway.deploy.user}@${gateway.deploy.host} |
Steps 5, 6 |
Same for every host in web.deploy.hosts (with ${web.deploy.user}) |
Step 7 |
The remote user must have NOPASSWD sudo for at least: install, ln, nginx, systemctl reload nginx, certbot |
Steps 5, 6, 7 |
| Network reachability to: Postgres (5432), Keycloak (443), API host's internal Jersey port (typically 11080, not publicly exposed) | Steps 3, 4, 8, 9, 10, 11 |
TQPRO_PLATFORM_ADMIN_SECRET available as an env var, OR a way to fetch the platform token (e.g., from a secrets manager) |
Step 8 |
jq and curl |
Token fetch + API calls |
For pipelines, mark TQPRO_PLATFORM_ADMIN_SECRET, PLATFORM_ADMIN_TOKEN,
and the generated DB_PASS as masked variables, and redirect curl
output to files instead of stdout — bodies that carry tokens or
encrypted secrets shouldn't appear in build logs.
A.3 Known gaps in the current automation¶
All gaps below were surfaced during the first end-to-end manual
provisioning walkthrough on 2026-05-16 (tenant perun). They split
into three categories: PENDING CODE FIXES that require Java changes,
SCRIPT GAPS in tenant-provision.sh, and OPERATOR NOTES that
are gotchas rather than bugs.
A.3.1 PENDING CODE FIXES (planned for next sprint)¶
Update 2026-05-17: items A.3.1 #1, A.3.1 #2, and A.3.2 #3 below have been RESOLVED. See the commit that follows the gap-discovery commit for the patch. The corresponding manual workarounds in §B.4 + §B.10 + §B.11 are no longer needed when running against an up-to-date API + script. The text is preserved for historical reference and for ops following an older deployment.
JWTValidatoraccepts only oneoidc-client-id— high priority.JWTValidator.validateToken()(line 92-104) checksaudience contains config.getClientId() OR azp == config.getClientId()— a single client id. But multi-tenancy needs to accept tokens from at least two distinct clients on the same API instance:tqpro-platform-admin(master realm, for/platform/*endpoints)tqpro-admin-apiand/ortqweb-adm(per-tenant realm, for browser SSO + tenant APIs)
Today this forces an either/or config: set oidc-client-id=tqpro-platform-admin
and master-realm calls work but tenant-realm browser SSO fails (the
/auth/config endpoint also returns this wrong clientId to the SPA);
set it to tqweb-adm and the inverse. Fix: accept a comma-separated
list of client ids and check membership.
Until fixed, operators must flip oidc-client-id to
tqpro-platform-admin to provision a tenant, then flip back to
tqweb-adm for browser SSO to work — with a systemctl restart tlinq
between flips.
STATUS: RESOLVED 2026-05-17. oidc-client-id now accepts a
comma-separated list. The first entry is the primary client
(returned by /auth/config to the SPA, used in logout URLs).
JWTValidator.validateToken() accepts a token whose aud or azp
matches any entry. Recommended setting:
oidc-client-id=tqweb-adm,tqpro-platform-admin.
TenantProvisioningFacade.insertTenantRowdoes not writedb_user/db_pass. Already documented — it writes everything else (tenant_id, tenant_code, tenant_name, db_name, kc_realm, kc_admin_client_secret, status) but leaves the credential columns null.TenantAwareDBSession.requireTenantCredentialsthen rejects every request to the tenant withTenantConfigException. Workaround is the manualUPDATEin §B.10 + a/platform/tenant/refreshin §B.11 so the in-memoryTenantInfopicks up the new values.
STATUS: RESOLVED 2026-05-17. ProvisionRequest now requires
dbUser + dbPass. The facade encrypts dbPass via
TenantConfig.encrypt (unless already prefixed encrypted:) and
inserts it alongside the other columns. Plain POST /platform/tenant/provision
produces a complete row — no follow-up UPDATE needed. §B.10 and §B.11
in this appendix become unnecessary when running against an up-to-date
API.
A.3.2 SCRIPT GAPS (scripts/platform/tenant-provision.sh)¶
- The script does not create the per-tenant Postgres role.
createdbhappens, but noCREATE ROLEfor the tenant. The Java API later tries to connect as that role and fails. Workaround: run §B.4 yourself. Note the corrected role+grants block in §B.4 fans theGRANTloop across all tenant-DB schemas — not justpublic— so the role can actually read every schema (amadeus, goglobal, nts, public, rayna, tiqets, tqwa).
STATUS: RESOLVED 2026-05-17. tenant-provision.sh step 2 now
auto-generates DB_PASS (or accepts a pre-set TENANT_DB_PASS env
var) and runs CREATE ROLE + the same fan-out GRANT loop §B.4
describes. The role + password are then passed to the Java provision
call in step 6. tenant-rollback.sh step 4b drops the role on
rollback. §B.4 in this appendix becomes unnecessary.
- The script runs
nginx -timmediately after vhost install, but on a fresh gateway the per-tenant cert files don't exist yet. The rendered vhost references/etc/letsencrypt/live/<tenant-host>/{fullchain,privkey}.pem, andnginx -tfails loudly if they're missing — which aborts the script before certbot ever runs. On a gateway with at least one prior tenant, it accidentally works (other tenants' files keepnginx -thappy). Manual workaround is a two-pass install (§B.6 + §B.7.5): stub HTTP-only vhost first → certbot → replace stub with the full HTTPS vhost. Script fix should follow the same pattern, or pre-create a snake-oil cert before the install.
STATUS: RESOLVED 2026-05-17. tenant-provision.sh step 3 now
installs an HTTP-only stub (handling just the
/.well-known/acme-challenge/ location); step 4 runs certbot as
before; new step 4b installs the full HTTPS vhost from the
render output. nginx -t is happy at every step. §B.6 (stub) and
§B.7.5 (full swap) in the manual walkthrough are no longer needed.
- Pre-existing bug in
tenant-provision.sh:106— apostrophe in${ACME_SERVER:-Let's Encrypt production (default)}is a real bash parser error. Inside${VAR:-...}, the apostrophe opens a single quote that is never closed.bash -nreports a syntax error at end-of-file; runtime invocations refuse to start. The script was probably never successfully run end-to-end since the line was introduced — operators presumably did manual provisioning all along without realising the automation didn't actually work.
STATUS: RESOLVED 2026-05-17. Apostrophe removed (now
LetsEncrypt production (default)). bash -n passes cleanly.
A.3.3 SETUP / RUNBOOK GAPS (see also multitenancy-setup.md)¶
-
Missing step: create + assign
platform-adminrealm role in master realm.multitenancy-setup.md§2 currently only tells operators to assigncreate-realmto thetqpro-platform-adminservice account (per D-4). But the Java API also requires the token to carry the realm roleplatform-adminto reach/platform/*endpoints — without it, every call returns ERR0008. Theplatform-adminrole is our app-level role (entirely separate from KC's built-in admin role, so D-4's prohibition still holds). Needs a new sub-step in §2. -
Missing step:
followed byEnvironmentFile=in the systemd unit.multitenancy-setup.md§5 documents/etc/tqpro/tqpro.envbut doesn't enforce that the systemd unit actually references it. Ifsudo systemctl cat tlinq | grep Environmentreturns nothing, the JVM doesn't seeTQPRO_ENCRYPTION_KEYorTQPRO_PLATFORM_ADMIN_SECRET, and the provisioning API fails with ERR00014 even though the env file is on disk. Fix is a drop-in:systemctl daemon-reload && systemctl restart tlinq. -
Missing step: migrate
tlinqapi.propertiesfrom single-tenant layout. Pre-multi-tenancy installs haveauth-mode=hybrid,dev-mode=true,oidc-issuer=<hardcoded one realm>,oidc-client-id=tqweb-adm, nooidc-keycloak-base-url. The multi-tenant validator needsoidc-keycloak-base-url=https://<kc-host>(no trailing slash, exactly matching the prefix of the JWT'sissclaim minus/realms/<realm>), anddev-mode=falseso JWT validation failures don't silently fall through to a fake dev user.multitenancy-setup.mdAppendix A (in-place migration) should call this out explicitly with a sed/diff against the old config. -
Missing section: web-tier host bootstrap. The rendered web vhost hardcodes
root /opt/tqpro/tqweb-adm, but the SPA bundle isn't deployed there automatically. Web hosts need a one-time setup that: - Creates
/opt/tqpro/tqweb-admwith ownershiptqpro-deploy:www-dataand mode2775(setgid so new files inherit groupwww-data) -
Initial rsync of
${REPO_ROOT}/tqweb-adm/to that path This belongs inmultitenancy-setup.mdas its own section because it's per-web-host setup, not per-tenant. -
Missing step: certbot + step-ca needs
SSL_CERT_FILE/REQUESTS_CA_BUNDLEenv vars in the renewal timer. Python'srequests/urllib3/certifiignores the OS trust store and uses its own bundled Mozilla list. Even afterupdate-ca-certificatesinstalls the step-ca root into the OS bundle,certbotfails withSSLError: unable to get local issuer certificate. Fix one of: - Add
SSL_CERT_FILE=/etc/ssl/certs/ca-certificates.crtto certbot's systemd unitEnvironment=block (renewal-safe) -
Splice the step-ca root into certifi's bundle (
/usr/lib/python3/dist-packages/certifi/cacert.pem) — gets overwritten onpython3-certifipackage upgrade The 24-hour-lifetime step-ca leaf cert dies fast, so renewal MUST work. Add to gateway-bootstrap section ofmultitenancy-setup.mdAppendix B (step-ca lab setup). -
tqpro_templatemay be missingpublic.schema_migrations.bootstrap-template-db.shis supposed to create and seed the ledger (its comment header step 4), but the template observed in the perun walkthrough did not have it. This pre-dates per-tenant provisioning; every new tenant cloned from a ledger-less template will also lack it, breaking futureapply-tenant-migrations.shruns. Verify with:psql -d tqpro_template -c "SELECT count(*) FROM public.schema_migrations". If it errors, the template needs a bootstrap fix and every existing tenant needs the ledger back-filled.
A.3.4 OPERATOR NOTES (gotchas)¶
-
sudo VAR=value commanddoes NOT passVARto the command.sudosanitizes the environment by default. The variable gets set in the calling shell, then thrown away. Use one of:sudo env VAR=value command(explicitenvas the first command under sudo)export VAR=value && sudo -E command(sudo-Epreserves all caller-side env)- Add
VARtoDefaults env_keepin/etc/sudoers.d/
-
CI/orchestration host doesn't automatically trust step-ca. Without this,
curlfrom the orchestration host to KC or tohttps://<tenant>.<platform.domain>/fails withSSL certificate problem: unable to get local issuer certificate. Fix (one-time per CI host):Not blocking the manual flow (all CI → API calls use internal HTTP onscp tqpro-deploy@<gateway>:/usr/local/share/ca-certificates/tqpro-step-ca.crt /tmp/ sudo install -m 0644 /tmp/tqpro-step-ca.crt /usr/local/share/ca-certificates/ sudo update-ca-certificatesplatform.api.url, not HTTPS), but blocks any verification curl that would hit the public TLS endpoint. -
Gateway may have stale single-tenant vhosts (
tqweb-pub,tqweb-adm,tqweb-b2b,tqweb-auth,auth.vanevski.net) from the pre-multi-tenancy era. They don't break the new per-tenant vhost, but they pollute the config and the implicit-default selection for port 443 is whichever loads first alphabetically — which can cause confusing diagnostics likehttps://<new-tenant>.../redirecting to the legacy auth realm until the new vhost loads. Plan a cleanup once all tenants are migrated to the per-tenant pattern. -
First-tenant KC realm has no SMTP sender configured — welcome / password-reset email fails with
Invalid sender address 'null'.KeycloakRealmProvisionerlogs aWARNINGrather than failing the provision call, so the realm is created and usable, but the initial admin user has no password and no way to get one via email. Operator workaround: in KC admin UI → Realm<code>→ Users → admin user → Credentials tab → Set password manually. Long-term fix: provision sets default SMTP from realm template, ormultitenancy-setup.mdadds a step to configure master-level email-realm defaults that get inherited.
Most of these collapse into single steps once fixed:
items 2, 3 → §B.4 + §B.10 + §B.11 become unnecessary
items 5, 6, 7 → setup runbook gains the missing one-time steps
item 1 → the oidc-client-id flip-flop disappears
items 8-10 → setup runbook expands to cover the whole platform install
Appendix B — Manual provisioning (without the script)¶
Use this when you want to understand the flow end-to-end, when you're debugging a script failure, or when running in an environment the script doesn't natively support. All steps execute from the orchestration host unless explicitly noted; see Appendix A for the topology rules.
What this appendix does NOT replace: the Keycloak realm + clients + roles + initial admin user. That work happens inside the Java API (
POST /platform/tenant/provision) and is too fiddly to redo by hand at the KC admin API level — there's no benefit to bypassing it.
B.1 Pick parameters¶
CODE='acme-travel' # Keycloak realm + DNS + DB suffix
NAME='Acme Travel LLC'
ADMIN_EMAIL='admin@acmetravel.example'
PLATFORM_DOMAIN='vanevski.net' # from tqpro-platform.properties
TENANT_HOST="${CODE}.${PLATFORM_DOMAIN}" # e.g. acme-travel.vanevski.net
DB_NAME="tlinq_${CODE//-/_}" # hyphens → underscores
DB_ROLE="tlinq_${CODE//-/_}" # convention: role == db name
DB_PASS="$(openssl rand -base64 24)"
echo "Tenant DB password (save this): $DB_PASS"
B.2 DNS¶
Register an A record at the DNS provider:
${CODE}.${PLATFORM_DOMAIN} → gateway IP.
Verify from the orchestration host:
B.3 Clone the template DB¶
pg_dump -Fc -d tqpro_template -f /tmp/seed.dump
createdb "${DB_NAME}"
pg_restore --no-owner --no-privileges -d "${DB_NAME}" /tmp/seed.dump
rm /tmp/seed.dump
# Verify
psql -d "${DB_NAME}" -c "SELECT count(*) FROM schema_migrations;"
B.4 Create the tenant's Postgres role (gap — see A.3)¶
Run as a Postgres superuser. From the orchestration host (assuming
superuser credentials in ~/.pgpass):
psql -h "${PG_HOST:-localhost}" -U postgres <<SQL
CREATE ROLE ${DB_ROLE} WITH LOGIN PASSWORD '${DB_PASS}';
GRANT ALL PRIVILEGES ON DATABASE ${DB_NAME} TO ${DB_ROLE};
\c ${DB_NAME}
GRANT ALL PRIVILEGES ON SCHEMA public TO ${DB_ROLE};
GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA public TO ${DB_ROLE};
GRANT ALL PRIVILEGES ON ALL SEQUENCES IN SCHEMA public TO ${DB_ROLE};
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT ALL ON TABLES TO ${DB_ROLE};
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT ALL ON SEQUENCES TO ${DB_ROLE};
SQL
# Verify
PGPASSWORD="${DB_PASS}" psql -h "${PG_HOST:-localhost}" -U "${DB_ROLE}" \
-d "${DB_NAME}" -c "SELECT count(*) FROM schema_migrations;"
B.5 Render the gateway nginx vhost¶
python3 scripts/platform/render-vhost.py \
--template 'config/Nginx Config/templates/tenant-gw.conf.template' \
--output /tmp/${CODE}-gw.conf \
--tenant-code "${CODE}" \
--tenant-host "${TENANT_HOST}"
cat /tmp/${CODE}-gw.conf # always review before deploying
B.6 Install the vhost on the gateway¶
GW_HOST="$(grep ^gateway.deploy.host= /etc/tqpro/platform/tqpro-platform.properties | cut -d= -f2-)"
GW_USER="$(grep ^gateway.deploy.user= /etc/tqpro/platform/tqpro-platform.properties | cut -d= -f2-)"
GW_USER="${GW_USER:-tqpro-deploy}"
# Pre-flight: the shared upstreams.conf must already exist on the gateway
ssh "${GW_USER}@${GW_HOST}" "test -f /etc/nginx/conf.d/tqpro-upstreams.conf" || \
{ echo "MISSING — run scripts/platform/render-upstreams.sh first"; exit 1; }
scp /tmp/${CODE}-gw.conf "${GW_USER}@${GW_HOST}:/tmp/"
ssh "${GW_USER}@${GW_HOST}" "\
sudo install -m 0644 /tmp/${CODE}-gw.conf /etc/nginx/sites-available/${CODE}.conf && \
sudo ln -sf /etc/nginx/sites-available/${CODE}.conf /etc/nginx/sites-enabled/${CODE}.conf && \
sudo nginx -t && sudo systemctl reload nginx && \
rm -f /tmp/${CODE}-gw.conf"
B.7 Issue the TLS certificate¶
ssh "${GW_USER}@${GW_HOST}" "\
sudo certbot certonly --webroot --webroot-path /var/www/certbot \
--non-interactive --agree-tos \
--email ops@${PLATFORM_DOMAIN} \
--domain ${TENANT_HOST} \
--deploy-hook 'systemctl reload nginx'"
# Verify
ssh "${GW_USER}@${GW_HOST}" "sudo ls /etc/letsencrypt/live/${TENANT_HOST}/"
curl -sI https://${TENANT_HOST}/ | head -1
In a closed lab without public ACME reach, append
--server <private-acme-url> (the script does this when
certbot.acme.server is set in tqpro-platform.properties; see
multitenancy-setup.md Appendix B).
B.8 Render and install the web vhost (multi-host deployments only)¶
Skip if your dev box collapses the gateway and web tiers into one nginx.
python3 scripts/platform/render-vhost.py \
--template 'config/Nginx Config/templates/tenant-web.conf.template' \
--output /tmp/${CODE}-web.conf \
--tenant-code "${CODE}" \
--tenant-host "${TENANT_HOST}"
WEB_HOSTS="$(grep ^web.deploy.hosts= /etc/tqpro/platform/tqpro-platform.properties | cut -d= -f2-)"
WEB_USER="$(grep ^web.deploy.user= /etc/tqpro/platform/tqpro-platform.properties | cut -d= -f2-)"
WEB_USER="${WEB_USER:-tqpro-deploy}"
IFS=',' read -ra hosts <<<"${WEB_HOSTS}"
for h in "${hosts[@]}"; do
h="$(echo "$h" | tr -d '[:space:]')"
scp /tmp/${CODE}-web.conf "${WEB_USER}@${h}:/tmp/"
ssh "${WEB_USER}@${h}" "\
sudo install -m 0644 /tmp/${CODE}-web.conf /etc/nginx/sites-available/${CODE}.conf && \
sudo ln -sf /etc/nginx/sites-available/${CODE}.conf /etc/nginx/sites-enabled/${CODE}.conf && \
sudo nginx -t && sudo systemctl reload nginx && \
rm -f /tmp/${CODE}-web.conf"
done
B.9 Call the Java provisioning endpoint¶
Fetch a fresh PLATFORM_ADMIN_TOKEN (see §1) then:
PLATFORM_API_URL="$(grep ^platform.api.url= /etc/tqpro/platform/tqpro-platform.properties | cut -d= -f2-)/tlinq-api"
RESPONSE=$(curl -s -X POST "${PLATFORM_API_URL}/platform/tenant/provision" \
-H "Authorization: Bearer ${PLATFORM_ADMIN_TOKEN}" \
-H "Content-Type: application/json" \
-d "{
\"tenantCode\": \"${CODE}\",
\"tenantName\": \"${NAME}\",
\"dbName\": \"${DB_NAME}\",
\"tenantHost\": \"${TENANT_HOST}\",
\"adminEmail\": \"${ADMIN_EMAIL}\"
}")
echo "${RESPONSE}" | jq
TENANT_ID=$(echo "${RESPONSE}" | jq -r .tenantId)
echo "Provisioned tenant_id: ${TENANT_ID}"
Behind that single call, the Java code (TenantProvisioningFacade.provision):
1. Creates the Keycloak ${CODE} realm with two clients
(tqweb-adm browser, tqpro-admin-api server-to-server) and five
roles (guest, agent, admin, manager, finance) plus the
initial admin user.
2. Encrypts the admin-client secret returned by KC.
3. Inserts the row into tqplatform.tenant.
4. Calls TenantRegistry.refresh() to publish via Hazelcast.
On failure after the realm exists, it best-effort deletes the realm and returns an error.
B.10 Manual patch: set db_user / db_pass (gap — see A.3)¶
PGPASSWORD="$(grep ^platform.db.pass= /etc/tqpro/tourlinq.properties | cut -d= -f2-)" \
psql -h "${PG_PLATFORM_HOST:-localhost}" -U tqpro_platform -d tqplatform <<SQL
UPDATE tenant
SET db_user = '${DB_ROLE}',
db_pass = '${DB_PASS}'
WHERE tenant_id = '${TENANT_ID}';
SQL
Plaintext vs encrypted
db_pass: the column stores either form.TenantConfig.decrypt()treats values that don't start withencrypted:as passthrough plaintext — fine for dev/test. For production, encrypt withTenantConfig.encrypt()and store with theencrypted:prefix (same convention askc_admin_client_secret).
B.11 Refresh the registry¶
The provision call above already called TenantRegistry.refresh() once,
but that ran before you UPDATEd db_user/db_pass. The cached
TenantInfo is therefore missing the credentials. Force a re-read:
curl -s -X POST "${PLATFORM_API_URL}/platform/tenant/refresh" \
-H "Authorization: Bearer ${PLATFORM_ADMIN_TOKEN}" \
-H "Content-Type: application/json" \
-d "{\"tenantId\": \"${TENANT_ID}\"}" | jq
B.12 Verify¶
# Platform DB row complete
psql ... -d tqplatform -c "SELECT tenant_id, tenant_code, db_name, db_user IS NOT NULL AS has_user,
db_pass IS NOT NULL AS has_pass, status
FROM tenant WHERE tenant_id='${TENANT_ID}';"
# Keycloak realm
curl -s -H "Authorization: Bearer ${PLATFORM_ADMIN_TOKEN}" \
"${KC_URL}/admin/realms/${CODE}" | jq '{realm, enabled}'
# API resolves the tenant
curl -si -H "Host: ${TENANT_HOST}" \
"${PLATFORM_API_URL}/auth/config" \
-H "Content-Type: application/json" -d '{}' | head -20
# Expected: HTTP 200 with realm + clientId for ${CODE}
# First request triggers lazy SessionFactory build — watch the API log:
# "Building NTS factory for tenant ${CODE} → jdbc:postgresql://..."
journalctl -u tlinq -n 50 | grep "Building.*factory.*${CODE}"
B.13 Browser smoke test¶
Open https://${TENANT_HOST}/. You'll be redirected through Keycloak
(${CODE} realm). Log in as ${ADMIN_EMAIL}; KC prompts for password
reset on first login (welcome email, or set directly via the KC admin
console for dev). Empty dashboard = correctly provisioned tenant with no
data yet.
B.14 Mapping to the script's flow¶
The complete correspondence between manual steps and the shell script, for anyone debugging a script failure or extending the automation:
| Appendix B step | Script section / line range |
|---|---|
| B.2 — DNS | tenant-provision.sh:138-145 (Step 1) |
| B.3 — DB clone | tenant-provision.sh:147-165 (Step 2) |
| B.4 — DB role (gap) | not in script |
| B.5–B.6 — Gateway vhost | tenant-provision.sh:167-195 (Step 3) |
| B.7 — Certbot | tenant-provision.sh:197-212 (Step 4) |
| B.8 — Web vhost | tenant-provision.sh:214-245 (Step 5) |
| B.9 — Java provision | tenant-provision.sh:247-280 (Step 6) |
| B.10 — UPDATE db_user/pass (gap) | not in script |
| B.11 — Registry refresh | done implicitly inside the Java facade |