Frappe v16 bench new-site stalls on AKS then reports 'tabDefaultValue' missing

Hello all,

I’m attempting to deploy frappe v16 on AKS using Azure managed redis and an external MariaDB cluster hosted on Azure as well. The site setup command bench new-site stalls for long time and when it continues, frappe reports that tabDefaultValue is missing, even though the table exists when queried manually from the same container.

Problem

  1. The bench new-site command stalls for a very long time during runtime deployment on AKS.

  2. When it stops stalling, I get the following message:

Table 'tabDefaultValue' missing in the restored site.
This happens when the backup fails to restore. Please check that the file is valid
Do go through the above output to check the exact error message from MariaDB

Deployment setup

  • Application is deployed on Azure AKS

  • Redis is an Azure managed Redis service

  • MariaDB is an external MariaDB cluster

  • Build pipeline runs bench init which give a docker image with all bench requirements installed

  • during runtime the container runs bench new-site

  • To avoid race conditions while using new-site, I tested with only one pod running

Command being used

i’m using --no-setup-db since the user and table already have been created on MariaDb cluster. and user have all the permissions required to to run bench new-site. before running bench new site i mount the app into the apps directory and i also set the global configs using bench set-config -g

bench new-site "$SITE_NAME" \
  --db-type mariadb \
  --db-name "$DB_NAME" \
  --db-user "$DB_USER" \
  --db-password "$DB_PASSWORD" \
  --db-host "$DB_HOST" \
  --db-port "$DB_PORT" \
  --admin-password "$ADMIN_PASSWORD" \
  --set-default \
  --no-setup-db

Debugging steps I followed

  1. Verified MariaDB connectivity from inside the same AKS container using the mysql CLI. The connection works, and I can query the database successfully.

  2. I also noticed that the database tables have been created.

  3. Verified MariaDB connectivity using the underlying Python DB library. That also works correctly.

  4. Queried information_schema.tables manually from the same container and confirmed that tabDefaultValue exists.

SELECT table_name
FROM information_schema.tables
WHERE table_schema = 'frappe'
  AND table_name = 'tabDefaultValue';

This returns:

tabDefaultValue

  1. Verified Redis connectivity using redis-cli. The connection works.

  2. Verified Redis connectivity using Python Redis client. That also works.

  3. Tested Frappe’s Redis client/cache wrapper separately from inside the container. Direct Redis works, but Frappe’s ClientCache().get_value() stalls for around 20 seconds.

source /home/frappe/frappe-bench/env/bin/activate && \
cd /home/frappe/frappe-bench && \
python -c "
import os
import time
import frappe
from frappe.types import _dict
from frappe.utils.redis_wrapper import setup_cache, ClientCache

frappe.local.conf = _dict({
    'redis_cache': os.environ['REDIS_CACHE'],
    'redis_cache_sentinel_enabled': 0,
})

frappe.local.cache = {}

frappe.cache = setup_cache()
frappe.client_cache = ClientCache()

start = time.time()
print(frappe.cache.get('client_cache:db_tables'))
print('REDIS GET:', time.time() - start)

start = time.time()
print(frappe.client_cache.get_value('db_tables'))
print('FRAPPE CLIENT CACHE GET:', time.time() - start)"

Output:

None
REDIS GET: 0.03649544715881348
None
FRAPPE CLIENT CACHE GET: 19.74853205680847

  1. I looked at the bench new-site command source code to see where i get this issue from to narrow down source of the issue:

    • This is where i the message appears, after it stalls for long time

  • inside get_tables functions:

    I tested both methods: retrieving the tables from Redis, which returns none, and retrieving the tables from information_schema, which returns the actual values.

  1. I attempted to run bench doctor inside the container which is stalling as well

Questions

  1. Could Frappe’s Redis client wrapper behave differently from direct redis-cli or plain Python Redis since i have patched the code to able to run it?

  2. Since bench doctor also stalls, could this indicate an issue during Frappe context initialization rather than only during bench new-site?

  3. During bootstrap_database, is there any reason get_tables(cached=True) would not see a table that exists when queried manually from the same container?

Any guidance on whether this could be related to Frappe initialization, Redis caching, MariaDB, or something else I should check would be appreciated.

Thank you