Multi-tenant runtime architecture¶

How tenant identity is propagated through the JVM and how per-tenant DB connections are obtained. Read this before writing any code that touches a DB, schedules a background task, or runs outside the request lifecycle.

This document covers the runtime model — the operator-side multi-tenancy runbook lives in doc/operations/multitenancy-setup.md, and the implementation history is in doc/plans/multitenancy.md and multitenancy-execution.md.

1. The model in one paragraph¶

TQPro runs one JVM process per node, serving N tenants concurrently. Every request must execute against exactly one tenant's database, picked at runtime. Rather than threading a tenantId parameter through every method signature, the framework stashes it in a ThreadLocal (RequestContext). Per-tenant Hibernate session helpers read the ThreadLocal on every getSession() call and route to the correct per-tenant SessionFactory — which is built lazily, cached per tenant, and backed by its own HikariCP pool. Two and only two code paths set the ThreadLocal: the request entry filter (automatic, request path) and TenantScope.run(...) (explicit, background/manual path). Any code that calls getSession() outside one of those two paths fails immediately with IllegalStateException.

2. The three classes you need to know¶

`RequestContext` — `tqcommon/.../util/RequestContext.java`¶

A plain immutable carrier of (userId, userName, userEmail, correlationId, tenantId) stored in a ThreadLocal<RequestContext>. Only three static methods: set(), current(), clear(). No business logic. The class is deliberately dumb — its job is just to be reachable from any code on the current thread.

`TenantScope` — `tqcommon/.../tenant/TenantScope.java`¶

The only sanctioned API for setting RequestContext outside the JAX-RS filter chain. Wraps a block of code with a save/restore pair:

TenantScope.run("tenant-uuid", () -> {
    // RequestContext.current().getTenantId() == "tenant-uuid"
    Session s = NTSDBSession.getSession();      // picks the right tenant DB
    // ...
});

Three flavours:

run(tenantId, Runnable) — no return value.
call(tenantId, Callable<T>) — returns T, propagates checked exceptions.
callUnchecked(tenantId, Callable<T>) — same but wraps checked exceptions in RuntimeException so callers don't have to declare throws.

Important invariants:

Save and restore the previous context (try/finally). Safe to nest.
When nested inside an existing context, keeps the outer user identity and only swaps tenantId — so a fan-out inside a user request stays attributed to that user, not "system".
When outside any context, synthesizes userId="system", userName="system". Use this for scheduled work that has no human caller.
null tenantId → IllegalArgumentException. Fails loud.

`TenantAwareDBSession` — `tqcommon/.../tenant/TenantAwareDBSession.java`¶

Abstract base class extended by every per-schema session helper (NTSDBSession, RaynaDBSession, GoGlobalDBSession, TiqetsDBSession, AirportCacheDBSession, ...). Each subclass declares its JPA-annotated entity classes via annotatedClasses() and a short prefix via poolPrefix(). The base class holds a ConcurrentHashMap<tenantId, SessionFactory> and does everything else.

On every openSession() call:

Read RequestContext.current(). If null → IllegalStateException.
Pull tenantId from the context. If null → IllegalStateException.
factories.computeIfAbsent(tenantId, this::buildFactory) — first call for a tenant builds its SessionFactory; subsequent calls reuse it.
factory.openSession() and return.

buildFactory(tenantId):

Looks up the tenant row from TenantRegistry.instance().requireById().
Decrypts db_pass via TenantConfig.decrypt().
Assembles jdbc:postgresql://${tenant.db.host}:${tenant.db.port}/${tenant.db_name} — host/port come from tourlinq.properties, DB name from the tenant row.
Spins up a Hibernate Configuration with the subclass's entity list and a dedicated HikariCP pool (minIdle=2, maxPoolSize=5).
Pool name: <prefix>-<tenantCode>-<dbName>, visible in metrics.

Net result: N tenants × M plugin schemas = N × M pools, each ~2–5 connections. Postgres max_connections must be sized accordingly — see multitenancy-setup.md §4.

3. The two flows¶

Flow A — Request flow (automatic)¶

┌─────────────────────────────────────────────────────────────────────┐
│  HTTP request arrives at Jetty / Jersey                             │
│                                                                     │
│  AuthenticationFilter (tqapi/.../AuthenticationFilter.java:214):    │
│    - Validates JWT or dev-mode credentials                          │
│    - Extracts tenantId from the Keycloak realm / Host header        │
│    - RequestContext.set(new RequestContext(                         │
│            userId, userName, userEmail, correlationId, tenantId))   │
│                                                                     │
│      ┌─────────────────────────────────────────────────────────┐    │
│      │  Jersey resource → Facade → Service                     │    │
│      │     ...                                                 │    │
│      │     XxxDBSession.getSession()  ← reads ThreadLocal      │    │
│      │       → returns a Session bound to the tenant's pool    │    │
│      │     ...                                                 │    │
│      └─────────────────────────────────────────────────────────┘    │
│                                                                     │
│  CORSResponseFilter (tqapi/.../CORSResponseFilter.java:79):         │
│    - RequestContext.clear()                                         │
│      (Jetty worker threads are pooled — a leaked context would      │
│       attribute the next request to the wrong tenant)               │
│                                                                     │
│  HTTP response goes out                                             │
└─────────────────────────────────────────────────────────────────────┘

Request-path developers don't need to think about any of this. As long as they obtain a session through XxxDBSession.getSession() — never via a side-channel — the right tenant is selected automatically.

Flow B — Manual flow (background)¶

Used by:

Scheduled tasks (ScheduledExecutorService runners — Rayna's SDRefreshRunner, NTS's PackageRetirementRunner / OptionExpiryRunner, GoGlobal's GGRefreshRunner, Google Flights' AirportRefreshRunner, Tiqets' refresh task)
Hazelcast topic listeners (cache invalidations, registry refreshes)
Plugin initialization that touches the per-tenant DB (e.g. RaynaCacheManager.instance() triggering eager cache load)
Startup / smoke checks
Apache Commons Daemon (init / start) hooks before any request has been served

These run on threads that the JAX-RS filter chain never touches, so RequestContext is unset. They must establish it themselves:

┌─────────────────────────────────────────────────────────────────────┐
│  Background thread wakes up (executor tick, listener callback, ...) │
│                                                                     │
│  Collection<TenantInfo> tenants =                                   │
│      TenantRegistry.instance().listActive();                        │
│                                                                     │
│  if (tenants.isEmpty()) { log "skipping"; return; }                 │
│                                                                     │
│  for (TenantInfo t : tenants) {                                     │
│    try {                                                            │
│      TenantScope.run(t.getTenantId(), () -> {                       │
│          ┌──────────────────────────────────────────────────┐       │
│          │  XxxDBSession.getSession()  ← reads ThreadLocal  │       │
│          │    → returns a Session for tenant t              │       │
│          │  ... do per-tenant work ...                      │       │
│          └──────────────────────────────────────────────────┘       │
│      });                                                            │
│    } catch (Exception ex) {                                         │
│      logger.log(WARNING, "failed for tenant " + t.getTenantCode()); │
│      // continue with next tenant                                   │
│    }                                                                │
│  }                                                                  │
└─────────────────────────────────────────────────────────────────────┘

Three rules for background tasks:

Always check the registry first. Empty registry → log "no active tenants — skipping" and return cleanly. This is the documented greenfield state (see multitenancy-setup.md §8.0).
One tenant per TenantScope. Wrap each tenant's work in its own TenantScope.run(...). A failure for one tenant must not abort the others — log it and move on.
Distributed locks (Hazelcast) stay global. When a runner uses a Hazelcast lock to prevent duplicate execution across cluster nodes, acquire the lock once for the whole tick and fan out tenants inside the locked section. Per-tenant locks are an option but not required — keep the existing lock structure unless there's a reason to change.

4. The platform-DB carve-out¶

There is one other DB connection path that does not go through TenantAwareDBSession: the platform database (tqplatform).

tqplatform is a separate Postgres DB that holds:

The tenant table — registry of all tenants (DB credentials, KC realm, status, etc.)
The wa_phone_routing table — WhatsApp phone-number-to-tenant routing
The platform's own schema_migrations ledger

It is not part of any tenant's data. There is no per-tenant view of it — every node sees the same tqplatform DB. Tenants don't talk to it directly; only platform-management code does:

Caller	File	Purpose
`TenantRegistry`	`tqcommon/.../tenant/TenantRegistry.java:174`	Loads `tenant` rows into memory at startup and on refresh
`TenantProvisioningFacade`	`tqapp/.../entity/tenant/TenantProvisioningFacade.java:142, 157`	Inserts / updates `tenant` rows during onboarding
`PlatformAdminApi`	`tqapi/.../api/PlatformAdminApi.java:257, 283`	Admin endpoints that act on the registry directly

These callers obtain a java.sql.Connection from PlatformDbConfig.instance().getDataSource() — a small, separate HikariCP pool (minIdle=1, maxPoolSize=2) defined in tqcommon/.../tenant/PlatformDbConfig.java. Its connection details come from platform.db.url/user/pass in tourlinq.properties, not from any tenant row.

Do not call PlatformDbConfig.instance() from request-path code or from per-tenant business logic. It exists specifically for tenant management and is intentionally cordoned off from the tenant flow.

5. Enforcement — what stops you from getting it wrong¶

Three safety nets stack:

Compile-time: RequestContext.set() and RequestContext.clear() are normal public methods — nothing forbids you from calling them directly. This is a deliberate trade-off; the discipline is "use TenantScope, not raw set()," documented here and enforced by code review.
Runtime, hard fail: TenantAwareDBSession.openSession() throws IllegalStateException immediately when RequestContext.current() is null or has no tenant. The exception message names the missing class and tells you to wrap in TenantScope.run(...). This is why any background task that forgets the wrap blows up on first DB call instead of silently writing to the wrong DB.
Runtime, soft fail: scheduled runners that probe TenantRegistry first log "no active tenants — skipping" and return cleanly when the registry is empty (e.g., greenfield first start before any tenant is provisioned). This is not a safety check — it's a correctness shortcut so the JVM can boot before the first tenant exists.

6. Decision guide for developers¶

Situation	What to do
Writing a Jersey resource, facade, or service called from a Jersey resource	Nothing special — just call `XxxDBSession.getSession()`. The filter set the context for you.
Writing a `Runnable` for `scheduledExecutor.scheduleAtFixedRate(...)`	Inside the runnable, iterate `TenantRegistry.instance().listActive()` and wrap each iteration in `TenantScope.run(...)`. Skip if registry empty.
Writing a Hazelcast topic listener	Listener fires on a Hazelcast thread — wrap the body in `TenantScope.run(tenantId, ...)` using the tenantId from the message payload.
Plugin `initializePlugin()` that needs to load per-tenant data	Fan out over `TenantRegistry.instance().listActive()` exactly like a scheduled task. Empty registry → log "no active tenants — deferring" and return.
Singleton `Manager.instance()` whose constructor eagerly loads from DB	Wrap the first `instance()` call in `TenantScope.run(...)`. The constructor runs once on first call; the tenant whose scope is active at that moment determines what gets loaded. Currently "last-tenant-wins" for these singletons (`RaynaCacheManager`, `SupplierCache`, `StaticMapCache.MEALPLAN_CACHE`, `TiqetsCacheManager`) — a shared-cache refactor is planned.
Reading or writing the `tqplatform.tenant` table itself	Use `PlatformDbConfig.instance().getDataSource()`. Do not route this through `TenantAwareDBSession` — `tqplatform` is not a tenant.
Anything else that obtains a DB connection	Don't. If you find yourself needing a third path, stop and discuss — this document is intentionally exhaustive.

7. "No third path" — how this was verified¶

Both claims below were verified by exhaustive grep of the non-test, non-build source tree:

Claim 1: `RequestContext` is set in exactly two places¶

grep -rn "RequestContext\.set\b\|new RequestContext\b" \
    --include='*.java' --exclude-dir={build,test,.idea}

Returns only:

AuthenticationFilter.java:214 — the request entry point
TenantScope.java:30, 40 — the manual/background helper (TenantScope constructs new RequestContext(...) at lines 65 and 72)

No other code constructs or stores a RequestContext. Test fixtures (tqcommon/src/test/...) do, but they don't run in production.

Claim 2: Tenant DB sessions are obtained in exactly one place¶

grep -rn "DriverManager\.getConnection\|buildSessionFactory" \
    --include='*.java' --exclude-dir={build,test,.idea}

buildSessionFactory appears only inside TenantAwareDBSession.buildFactory(). DriverManager.getConnection does not appear in production source at all.

The platform-DB callers (PlatformDbConfig.getDataSource().getConnection() in TenantRegistry, TenantProvisioningFacade, PlatformAdminApi) are the only Connection.getConnection() sites — and they all target tqplatform, not any tenant DB. See §4.

Re-running the audit¶

If a future change adds a new path, the same two greps will surface it immediately. Run them as part of any review that touches DB or filter code. Any new RequestContext.set(...) or buildSessionFactory() call outside the four files named in §7 is, by definition, a bug.

8. References¶

Operator runbook: doc/operations/multitenancy-setup.md
Per-tenant onboarding: doc/operations/tenant-provisioning.md
TQ-115 implementation plan: doc/plans/multitenancy.md
Phase-by-phase execution: doc/plans/multitenancy-execution.md
Code: tqcommon/src/main/java/com/perun/tlinq/tenant/{RequestContext,TenantScope,TenantAwareDBSession,TenantRegistry,PlatformDbConfig}.java
Filter wiring: tqapi/src/main/java/com/perun/tlinq/{AuthenticationFilter,CORSResponseFilter}.java