Skip to content

Multi-tenant runtime architecture

How tenant identity is propagated through the JVM and how per-tenant DB connections are obtained. Read this before writing any code that touches a DB, schedules a background task, or runs outside the request lifecycle.

This document covers the runtime model — the operator-side multi-tenancy runbook lives in doc/operations/multitenancy-setup.md, and the implementation history is in doc/plans/multitenancy.md and multitenancy-execution.md.


1. The model in one paragraph

TQPro runs one JVM process per node, serving N tenants concurrently. Every request must execute against exactly one tenant's database, picked at runtime. Rather than threading a tenantId parameter through every method signature, the framework stashes it in a ThreadLocal (RequestContext). Per-tenant Hibernate session helpers read the ThreadLocal on every getSession() call and route to the correct per-tenant SessionFactory — which is built lazily, cached per tenant, and backed by its own HikariCP pool. Two and only two code paths set the ThreadLocal: the request entry filter (automatic, request path) and TenantScope.run(...) (explicit, background/manual path). Any code that calls getSession() outside one of those two paths fails immediately with IllegalStateException.


2. The three classes you need to know

RequestContexttqcommon/.../util/RequestContext.java

A plain immutable carrier of (userId, userName, userEmail, correlationId, tenantId) stored in a ThreadLocal<RequestContext>. Only three static methods: set(), current(), clear(). No business logic. The class is deliberately dumb — its job is just to be reachable from any code on the current thread.

TenantScopetqcommon/.../tenant/TenantScope.java

The only sanctioned API for setting RequestContext outside the JAX-RS filter chain. Wraps a block of code with a save/restore pair:

TenantScope.run("tenant-uuid", () -> {
    // RequestContext.current().getTenantId() == "tenant-uuid"
    Session s = NTSDBSession.getSession();      // picks the right tenant DB
    // ...
});

Three flavours:

  • run(tenantId, Runnable) — no return value.
  • call(tenantId, Callable<T>) — returns T, propagates checked exceptions.
  • callUnchecked(tenantId, Callable<T>) — same but wraps checked exceptions in RuntimeException so callers don't have to declare throws.

Important invariants:

  • Save and restore the previous context (try/finally). Safe to nest.
  • When nested inside an existing context, keeps the outer user identity and only swaps tenantId — so a fan-out inside a user request stays attributed to that user, not "system".
  • When outside any context, synthesizes userId="system", userName="system". Use this for scheduled work that has no human caller.
  • null tenantId → IllegalArgumentException. Fails loud.

TenantAwareDBSessiontqcommon/.../tenant/TenantAwareDBSession.java

Abstract base class extended by every per-schema session helper (NTSDBSession, RaynaDBSession, GoGlobalDBSession, TiqetsDBSession, AirportCacheDBSession, ...). Each subclass declares its JPA-annotated entity classes via annotatedClasses() and a short prefix via poolPrefix(). The base class holds a ConcurrentHashMap<tenantId, SessionFactory> and does everything else.

On every openSession() call:

  1. Read RequestContext.current(). If null → IllegalStateException.
  2. Pull tenantId from the context. If null → IllegalStateException.
  3. factories.computeIfAbsent(tenantId, this::buildFactory) — first call for a tenant builds its SessionFactory; subsequent calls reuse it.
  4. factory.openSession() and return.

buildFactory(tenantId):

  • Looks up the tenant row from TenantRegistry.instance().requireById().
  • Decrypts db_pass via TenantConfig.decrypt().
  • Assembles jdbc:postgresql://${tenant.db.host}:${tenant.db.port}/${tenant.db_name} — host/port come from tourlinq.properties, DB name from the tenant row.
  • Spins up a Hibernate Configuration with the subclass's entity list and a dedicated HikariCP pool (minIdle=2, maxPoolSize=5).
  • Pool name: <prefix>-<tenantCode>-<dbName>, visible in metrics.

Net result: N tenants × M plugin schemas = N × M pools, each ~2–5 connections. Postgres max_connections must be sized accordingly — see multitenancy-setup.md §4.


3. The two flows

Flow A — Request flow (automatic)

┌─────────────────────────────────────────────────────────────────────┐
│  HTTP request arrives at Jetty / Jersey                             │
│                                                                     │
│  AuthenticationFilter (tqapi/.../AuthenticationFilter.java:214):    │
│    - Validates JWT or dev-mode credentials                          │
│    - Extracts tenantId from the Keycloak realm / Host header        │
│    - RequestContext.set(new RequestContext(                         │
│            userId, userName, userEmail, correlationId, tenantId))   │
│                                                                     │
│      ┌─────────────────────────────────────────────────────────┐    │
│      │  Jersey resource → Facade → Service                     │    │
│      │     ...                                                 │    │
│      │     XxxDBSession.getSession()  ← reads ThreadLocal      │    │
│      │       → returns a Session bound to the tenant's pool    │    │
│      │     ...                                                 │    │
│      └─────────────────────────────────────────────────────────┘    │
│                                                                     │
│  CORSResponseFilter (tqapi/.../CORSResponseFilter.java:79):         │
│    - RequestContext.clear()                                         │
│      (Jetty worker threads are pooled — a leaked context would      │
│       attribute the next request to the wrong tenant)               │
│                                                                     │
│  HTTP response goes out                                             │
└─────────────────────────────────────────────────────────────────────┘

Request-path developers don't need to think about any of this. As long as they obtain a session through XxxDBSession.getSession() — never via a side-channel — the right tenant is selected automatically.

Flow B — Manual flow (background)

Used by:

  • Scheduled tasks (ScheduledExecutorService runners — Rayna's SDRefreshRunner, NTS's PackageRetirementRunner / OptionExpiryRunner, GoGlobal's GGRefreshRunner, Google Flights' AirportRefreshRunner, Tiqets' refresh task)
  • Hazelcast topic listeners (cache invalidations, registry refreshes)
  • Plugin initialization that touches the per-tenant DB (e.g. RaynaCacheManager.instance() triggering eager cache load)
  • Startup / smoke checks
  • Apache Commons Daemon (init / start) hooks before any request has been served

These run on threads that the JAX-RS filter chain never touches, so RequestContext is unset. They must establish it themselves:

┌─────────────────────────────────────────────────────────────────────┐
│  Background thread wakes up (executor tick, listener callback, ...) │
│                                                                     │
│  Collection<TenantInfo> tenants =                                   │
│      TenantRegistry.instance().listActive();                        │
│                                                                     │
│  if (tenants.isEmpty()) { log "skipping"; return; }                 │
│                                                                     │
│  for (TenantInfo t : tenants) {                                     │
│    try {                                                            │
│      TenantScope.run(t.getTenantId(), () -> {                       │
│          ┌──────────────────────────────────────────────────┐       │
│          │  XxxDBSession.getSession()  ← reads ThreadLocal  │       │
│          │    → returns a Session for tenant t              │       │
│          │  ... do per-tenant work ...                      │       │
│          └──────────────────────────────────────────────────┘       │
│      });                                                            │
│    } catch (Exception ex) {                                         │
│      logger.log(WARNING, "failed for tenant " + t.getTenantCode()); │
│      // continue with next tenant                                   │
│    }                                                                │
│  }                                                                  │
└─────────────────────────────────────────────────────────────────────┘

Three rules for background tasks:

  1. Always check the registry first. Empty registry → log "no active tenants — skipping" and return cleanly. This is the documented greenfield state (see multitenancy-setup.md §8.0).
  2. One tenant per TenantScope. Wrap each tenant's work in its own TenantScope.run(...). A failure for one tenant must not abort the others — log it and move on.
  3. Distributed locks (Hazelcast) stay global. When a runner uses a Hazelcast lock to prevent duplicate execution across cluster nodes, acquire the lock once for the whole tick and fan out tenants inside the locked section. Per-tenant locks are an option but not required — keep the existing lock structure unless there's a reason to change.

4. The platform-DB carve-out

There is one other DB connection path that does not go through TenantAwareDBSession: the platform database (tqplatform).

tqplatform is a separate Postgres DB that holds:

  • The tenant table — registry of all tenants (DB credentials, KC realm, status, etc.)
  • The wa_phone_routing table — WhatsApp phone-number-to-tenant routing
  • The platform's own schema_migrations ledger

It is not part of any tenant's data. There is no per-tenant view of it — every node sees the same tqplatform DB. Tenants don't talk to it directly; only platform-management code does:

Caller File Purpose
TenantRegistry tqcommon/.../tenant/TenantRegistry.java:174 Loads tenant rows into memory at startup and on refresh
TenantProvisioningFacade tqapp/.../entity/tenant/TenantProvisioningFacade.java:142, 157 Inserts / updates tenant rows during onboarding
PlatformAdminApi tqapi/.../api/PlatformAdminApi.java:257, 283 Admin endpoints that act on the registry directly

These callers obtain a java.sql.Connection from PlatformDbConfig.instance().getDataSource() — a small, separate HikariCP pool (minIdle=1, maxPoolSize=2) defined in tqcommon/.../tenant/PlatformDbConfig.java. Its connection details come from platform.db.url/user/pass in tourlinq.properties, not from any tenant row.

Do not call PlatformDbConfig.instance() from request-path code or from per-tenant business logic. It exists specifically for tenant management and is intentionally cordoned off from the tenant flow.


5. Enforcement — what stops you from getting it wrong

Three safety nets stack:

  1. Compile-time: RequestContext.set() and RequestContext.clear() are normal public methods — nothing forbids you from calling them directly. This is a deliberate trade-off; the discipline is "use TenantScope, not raw set()," documented here and enforced by code review.

  2. Runtime, hard fail: TenantAwareDBSession.openSession() throws IllegalStateException immediately when RequestContext.current() is null or has no tenant. The exception message names the missing class and tells you to wrap in TenantScope.run(...). This is why any background task that forgets the wrap blows up on first DB call instead of silently writing to the wrong DB.

  3. Runtime, soft fail: scheduled runners that probe TenantRegistry first log "no active tenants — skipping" and return cleanly when the registry is empty (e.g., greenfield first start before any tenant is provisioned). This is not a safety check — it's a correctness shortcut so the JVM can boot before the first tenant exists.


6. Decision guide for developers

Situation What to do
Writing a Jersey resource, facade, or service called from a Jersey resource Nothing special — just call XxxDBSession.getSession(). The filter set the context for you.
Writing a Runnable for scheduledExecutor.scheduleAtFixedRate(...) Inside the runnable, iterate TenantRegistry.instance().listActive() and wrap each iteration in TenantScope.run(...). Skip if registry empty.
Writing a Hazelcast topic listener Listener fires on a Hazelcast thread — wrap the body in TenantScope.run(tenantId, ...) using the tenantId from the message payload.
Plugin initializePlugin() that needs to load per-tenant data Fan out over TenantRegistry.instance().listActive() exactly like a scheduled task. Empty registry → log "no active tenants — deferring" and return.
Singleton Manager.instance() whose constructor eagerly loads from DB Wrap the first instance() call in TenantScope.run(...). The constructor runs once on first call; the tenant whose scope is active at that moment determines what gets loaded. Currently "last-tenant-wins" for these singletons (RaynaCacheManager, SupplierCache, StaticMapCache.MEALPLAN_CACHE, TiqetsCacheManager) — a shared-cache refactor is planned.
Reading or writing the tqplatform.tenant table itself Use PlatformDbConfig.instance().getDataSource(). Do not route this through TenantAwareDBSessiontqplatform is not a tenant.
Anything else that obtains a DB connection Don't. If you find yourself needing a third path, stop and discuss — this document is intentionally exhaustive.

7. "No third path" — how this was verified

Both claims below were verified by exhaustive grep of the non-test, non-build source tree:

Claim 1: RequestContext is set in exactly two places

grep -rn "RequestContext\.set\b\|new RequestContext\b" \
    --include='*.java' --exclude-dir={build,test,.idea}

Returns only:

  • AuthenticationFilter.java:214 — the request entry point
  • TenantScope.java:30, 40 — the manual/background helper (TenantScope constructs new RequestContext(...) at lines 65 and 72)

No other code constructs or stores a RequestContext. Test fixtures (tqcommon/src/test/...) do, but they don't run in production.

Claim 2: Tenant DB sessions are obtained in exactly one place

grep -rn "DriverManager\.getConnection\|buildSessionFactory" \
    --include='*.java' --exclude-dir={build,test,.idea}

buildSessionFactory appears only inside TenantAwareDBSession.buildFactory(). DriverManager.getConnection does not appear in production source at all.

The platform-DB callers (PlatformDbConfig.getDataSource().getConnection() in TenantRegistry, TenantProvisioningFacade, PlatformAdminApi) are the only Connection.getConnection() sites — and they all target tqplatform, not any tenant DB. See §4.

Re-running the audit

If a future change adds a new path, the same two greps will surface it immediately. Run them as part of any review that touches DB or filter code. Any new RequestContext.set(...) or buildSessionFactory() call outside the four files named in §7 is, by definition, a bug.


8. References