Multi-tenant runtime architecture¶
How tenant identity is propagated through the JVM and how per-tenant DB connections are obtained. Read this before writing any code that touches a DB, schedules a background task, or runs outside the request lifecycle.
This document covers the runtime model — the operator-side multi-tenancy
runbook lives in doc/operations/multitenancy-setup.md,
and the implementation history is in doc/plans/multitenancy.md
and multitenancy-execution.md.
1. The model in one paragraph¶
TQPro runs one JVM process per node, serving N tenants concurrently.
Every request must execute against exactly one tenant's database, picked at
runtime. Rather than threading a tenantId parameter through every method
signature, the framework stashes it in a ThreadLocal (RequestContext).
Per-tenant Hibernate session helpers read the ThreadLocal on every
getSession() call and route to the correct per-tenant SessionFactory —
which is built lazily, cached per tenant, and backed by its own HikariCP
pool. Two and only two code paths set the ThreadLocal: the request entry
filter (automatic, request path) and TenantScope.run(...) (explicit,
background/manual path). Any code that calls getSession() outside one of
those two paths fails immediately with IllegalStateException.
2. The three classes you need to know¶
RequestContext — tqcommon/.../util/RequestContext.java¶
A plain immutable carrier of (userId, userName, userEmail, correlationId,
tenantId) stored in a ThreadLocal<RequestContext>. Only three static
methods: set(), current(), clear(). No business logic. The class is
deliberately dumb — its job is just to be reachable from any code on the
current thread.
TenantScope — tqcommon/.../tenant/TenantScope.java¶
The only sanctioned API for setting RequestContext outside the JAX-RS
filter chain. Wraps a block of code with a save/restore pair:
TenantScope.run("tenant-uuid", () -> {
// RequestContext.current().getTenantId() == "tenant-uuid"
Session s = NTSDBSession.getSession(); // picks the right tenant DB
// ...
});
Three flavours:
run(tenantId, Runnable)— no return value.call(tenantId, Callable<T>)— returnsT, propagates checked exceptions.callUnchecked(tenantId, Callable<T>)— same but wraps checked exceptions inRuntimeExceptionso callers don't have to declarethrows.
Important invariants:
- Save and restore the previous context (
try/finally). Safe to nest. - When nested inside an existing context, keeps the outer user identity
and only swaps
tenantId— so a fan-out inside a user request stays attributed to that user, not "system". - When outside any context, synthesizes
userId="system", userName="system". Use this for scheduled work that has no human caller. nulltenantId →IllegalArgumentException. Fails loud.
TenantAwareDBSession — tqcommon/.../tenant/TenantAwareDBSession.java¶
Abstract base class extended by every per-schema session helper
(NTSDBSession, RaynaDBSession, GoGlobalDBSession, TiqetsDBSession,
AirportCacheDBSession, ...). Each subclass declares its JPA-annotated
entity classes via annotatedClasses() and a short prefix via
poolPrefix(). The base class holds a ConcurrentHashMap<tenantId,
SessionFactory> and does everything else.
On every openSession() call:
- Read
RequestContext.current(). If null →IllegalStateException. - Pull
tenantIdfrom the context. If null →IllegalStateException. factories.computeIfAbsent(tenantId, this::buildFactory)— first call for a tenant builds itsSessionFactory; subsequent calls reuse it.factory.openSession()and return.
buildFactory(tenantId):
- Looks up the tenant row from
TenantRegistry.instance().requireById(). - Decrypts
db_passviaTenantConfig.decrypt(). - Assembles
jdbc:postgresql://${tenant.db.host}:${tenant.db.port}/${tenant.db_name}— host/port come fromtourlinq.properties, DB name from the tenant row. - Spins up a Hibernate
Configurationwith the subclass's entity list and a dedicated HikariCP pool (minIdle=2, maxPoolSize=5). - Pool name:
<prefix>-<tenantCode>-<dbName>, visible in metrics.
Net result: N tenants × M plugin schemas = N × M pools, each ~2–5
connections. Postgres max_connections must be sized accordingly — see
multitenancy-setup.md §4.
3. The two flows¶
Flow A — Request flow (automatic)¶
┌─────────────────────────────────────────────────────────────────────┐
│ HTTP request arrives at Jetty / Jersey │
│ │
│ AuthenticationFilter (tqapi/.../AuthenticationFilter.java:214): │
│ - Validates JWT or dev-mode credentials │
│ - Extracts tenantId from the Keycloak realm / Host header │
│ - RequestContext.set(new RequestContext( │
│ userId, userName, userEmail, correlationId, tenantId)) │
│ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Jersey resource → Facade → Service │ │
│ │ ... │ │
│ │ XxxDBSession.getSession() ← reads ThreadLocal │ │
│ │ → returns a Session bound to the tenant's pool │ │
│ │ ... │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
│ CORSResponseFilter (tqapi/.../CORSResponseFilter.java:79): │
│ - RequestContext.clear() │
│ (Jetty worker threads are pooled — a leaked context would │
│ attribute the next request to the wrong tenant) │
│ │
│ HTTP response goes out │
└─────────────────────────────────────────────────────────────────────┘
Request-path developers don't need to think about any of this. As long as
they obtain a session through XxxDBSession.getSession() — never via a
side-channel — the right tenant is selected automatically.
Flow B — Manual flow (background)¶
Used by:
- Scheduled tasks (
ScheduledExecutorServicerunners — Rayna'sSDRefreshRunner, NTS'sPackageRetirementRunner/OptionExpiryRunner, GoGlobal'sGGRefreshRunner, Google Flights'AirportRefreshRunner, Tiqets' refresh task) - Hazelcast topic listeners (cache invalidations, registry refreshes)
- Plugin initialization that touches the per-tenant DB
(e.g.
RaynaCacheManager.instance()triggering eager cache load) - Startup / smoke checks
- Apache Commons Daemon (
init/start) hooks before any request has been served
These run on threads that the JAX-RS filter chain never touches, so
RequestContext is unset. They must establish it themselves:
┌─────────────────────────────────────────────────────────────────────┐
│ Background thread wakes up (executor tick, listener callback, ...) │
│ │
│ Collection<TenantInfo> tenants = │
│ TenantRegistry.instance().listActive(); │
│ │
│ if (tenants.isEmpty()) { log "skipping"; return; } │
│ │
│ for (TenantInfo t : tenants) { │
│ try { │
│ TenantScope.run(t.getTenantId(), () -> { │
│ ┌──────────────────────────────────────────────────┐ │
│ │ XxxDBSession.getSession() ← reads ThreadLocal │ │
│ │ → returns a Session for tenant t │ │
│ │ ... do per-tenant work ... │ │
│ └──────────────────────────────────────────────────┘ │
│ }); │
│ } catch (Exception ex) { │
│ logger.log(WARNING, "failed for tenant " + t.getTenantCode()); │
│ // continue with next tenant │
│ } │
│ } │
└─────────────────────────────────────────────────────────────────────┘
Three rules for background tasks:
- Always check the registry first. Empty registry → log "no active
tenants — skipping" and return cleanly. This is the documented
greenfield state (see
multitenancy-setup.md §8.0). - One tenant per
TenantScope. Wrap each tenant's work in its ownTenantScope.run(...). A failure for one tenant must not abort the others — log it and move on. - Distributed locks (Hazelcast) stay global. When a runner uses a Hazelcast lock to prevent duplicate execution across cluster nodes, acquire the lock once for the whole tick and fan out tenants inside the locked section. Per-tenant locks are an option but not required — keep the existing lock structure unless there's a reason to change.
4. The platform-DB carve-out¶
There is one other DB connection path that does not go through
TenantAwareDBSession: the platform database (tqplatform).
tqplatform is a separate Postgres DB that holds:
- The
tenanttable — registry of all tenants (DB credentials, KC realm, status, etc.) - The
wa_phone_routingtable — WhatsApp phone-number-to-tenant routing - The platform's own
schema_migrationsledger
It is not part of any tenant's data. There is no per-tenant view of
it — every node sees the same tqplatform DB. Tenants don't talk to it
directly; only platform-management code does:
| Caller | File | Purpose |
|---|---|---|
TenantRegistry |
tqcommon/.../tenant/TenantRegistry.java:174 |
Loads tenant rows into memory at startup and on refresh |
TenantProvisioningFacade |
tqapp/.../entity/tenant/TenantProvisioningFacade.java:142, 157 |
Inserts / updates tenant rows during onboarding |
PlatformAdminApi |
tqapi/.../api/PlatformAdminApi.java:257, 283 |
Admin endpoints that act on the registry directly |
These callers obtain a java.sql.Connection from
PlatformDbConfig.instance().getDataSource() — a small, separate HikariCP
pool (minIdle=1, maxPoolSize=2) defined in
tqcommon/.../tenant/PlatformDbConfig.java. Its connection details come
from platform.db.url/user/pass in tourlinq.properties, not from any
tenant row.
Do not call PlatformDbConfig.instance() from request-path code or
from per-tenant business logic. It exists specifically for tenant
management and is intentionally cordoned off from the tenant flow.
5. Enforcement — what stops you from getting it wrong¶
Three safety nets stack:
-
Compile-time:
RequestContext.set()andRequestContext.clear()are normal public methods — nothing forbids you from calling them directly. This is a deliberate trade-off; the discipline is "useTenantScope, not rawset()," documented here and enforced by code review. -
Runtime, hard fail:
TenantAwareDBSession.openSession()throwsIllegalStateExceptionimmediately whenRequestContext.current()is null or has no tenant. The exception message names the missing class and tells you to wrap inTenantScope.run(...). This is why any background task that forgets the wrap blows up on first DB call instead of silently writing to the wrong DB. -
Runtime, soft fail: scheduled runners that probe
TenantRegistryfirst log "no active tenants — skipping" and return cleanly when the registry is empty (e.g., greenfield first start before any tenant is provisioned). This is not a safety check — it's a correctness shortcut so the JVM can boot before the first tenant exists.
6. Decision guide for developers¶
| Situation | What to do |
|---|---|
| Writing a Jersey resource, facade, or service called from a Jersey resource | Nothing special — just call XxxDBSession.getSession(). The filter set the context for you. |
Writing a Runnable for scheduledExecutor.scheduleAtFixedRate(...) |
Inside the runnable, iterate TenantRegistry.instance().listActive() and wrap each iteration in TenantScope.run(...). Skip if registry empty. |
| Writing a Hazelcast topic listener | Listener fires on a Hazelcast thread — wrap the body in TenantScope.run(tenantId, ...) using the tenantId from the message payload. |
Plugin initializePlugin() that needs to load per-tenant data |
Fan out over TenantRegistry.instance().listActive() exactly like a scheduled task. Empty registry → log "no active tenants — deferring" and return. |
Singleton Manager.instance() whose constructor eagerly loads from DB |
Wrap the first instance() call in TenantScope.run(...). The constructor runs once on first call; the tenant whose scope is active at that moment determines what gets loaded. Currently "last-tenant-wins" for these singletons (RaynaCacheManager, SupplierCache, StaticMapCache.MEALPLAN_CACHE, TiqetsCacheManager) — a shared-cache refactor is planned. |
Reading or writing the tqplatform.tenant table itself |
Use PlatformDbConfig.instance().getDataSource(). Do not route this through TenantAwareDBSession — tqplatform is not a tenant. |
| Anything else that obtains a DB connection | Don't. If you find yourself needing a third path, stop and discuss — this document is intentionally exhaustive. |
7. "No third path" — how this was verified¶
Both claims below were verified by exhaustive grep of the non-test, non-build source tree:
Claim 1: RequestContext is set in exactly two places¶
grep -rn "RequestContext\.set\b\|new RequestContext\b" \
--include='*.java' --exclude-dir={build,test,.idea}
Returns only:
AuthenticationFilter.java:214— the request entry pointTenantScope.java:30, 40— the manual/background helper (TenantScopeconstructsnew RequestContext(...)at lines 65 and 72)
No other code constructs or stores a RequestContext. Test fixtures
(tqcommon/src/test/...) do, but they don't run in production.
Claim 2: Tenant DB sessions are obtained in exactly one place¶
grep -rn "DriverManager\.getConnection\|buildSessionFactory" \
--include='*.java' --exclude-dir={build,test,.idea}
buildSessionFactory appears only inside TenantAwareDBSession.buildFactory().
DriverManager.getConnection does not appear in production source at all.
The platform-DB callers (PlatformDbConfig.getDataSource().getConnection()
in TenantRegistry, TenantProvisioningFacade, PlatformAdminApi) are
the only Connection.getConnection() sites — and they all target
tqplatform, not any tenant DB. See §4.
Re-running the audit¶
If a future change adds a new path, the same two greps will surface it
immediately. Run them as part of any review that touches DB or filter
code. Any new RequestContext.set(...) or buildSessionFactory() call
outside the four files named in §7 is, by definition, a bug.
8. References¶
- Operator runbook:
doc/operations/multitenancy-setup.md - Per-tenant onboarding:
doc/operations/tenant-provisioning.md - TQ-115 implementation plan:
doc/plans/multitenancy.md - Phase-by-phase execution:
doc/plans/multitenancy-execution.md - Code:
tqcommon/src/main/java/com/perun/tlinq/tenant/{RequestContext,TenantScope,TenantAwareDBSession,TenantRegistry,PlatformDbConfig}.java - Filter wiring:
tqapi/src/main/java/com/perun/tlinq/{AuthenticationFilter,CORSResponseFilter}.java