Site deployment job fails with pymysql 'unknown (generic) error'

joshr · May 31, 2021, 5:32pm

I’ve got erpnext deployed via helm. All pods are in running state. I’m trying to add a new site using the ‘new site’ job. mysql is configured per these instructions. However, the site deployment fails with the following…

pymysql.err.OperationalError: (1005, ‘Can't create table _3e8ea0dcafb368d3.tabCash Flow Mapping Accounts (errno: 168 “Unknown (generic) error from engine”)’)

Looking around, I see some mention of mariadb needing python3-dev or libmysqlclient-dev libraries but I’m not finding anything solid. Does anyone have any insight insight to offer?

Thanks in advance.

Edit: I’ve tried using bitnami mariadb chart (10.3 and 10.5), same issue.

Edit: mariadb config

|-
[mysqld]
character-set-client-handshake=FALSE
skip-name-resolve explicit_defaults_for_timestamp
basedir=/opt/bitnami/mariadb
plugin_dir=/opt/bitnami/mariadb/plugin port=3306
socket=/opt/bitnami/mariadb/tmp/mysql.sock
tmpdir=/opt/bitnami/mariadb/tmp max_allowed_packet=16M
bind-address=0.0.0.0
pid-file=/opt/bitnami/mariadb/tmp/mysqld.pid
log-error=/opt/bitnami/mariadb/logs/mysqld.log
character-set-server=utf8mb4
collation-server=utf8mb4_unicode_ci
[client] port=3306
socket=/opt/bitnami/mariadb/tmp/mysql.sock
default-character-set=utf8mb4
plugin_dir=/opt/bitnami/mariadb/plugin
[manager] port=3306
socket=/opt/bitnami/mariadb/tmp/mysql.sock
pid-file=/opt/bitnami/mariadb/tmp/mysqld.pidPreformatted text

Edit: “grant all privileges on db_name.* TO ‘db_name’@‘%’ identified by ‘db_password’;” works just fine (with proper values substituted). I’d like to add python3-dev to see if that makes a difference but I’m not sure where to do it? The worker image, used by site-creator seems like a reasonable candidate. However, I’d expect the native frappe images to be ironed out.

revant_one · June 1, 2021, 2:57pm

I think the error is related to db configuration. Check the configuration in the frappe/helm repo tests directory. It is used to run tests and it is successfully completing the site creation job.

“dev” in production images felt odd. I didn’t add.

if you need to add it for custom requirement base your image on official image and add it in your image.

Even if you wish to try it for official image try it like this. Create your image substitute in values.yaml and try, if it works we need to fix it in official.

I use k3d.io as a local kubernetes cluster to develop these things.
If you want to speed it up even faster check the k3d docs for using local container registries.

What I’m doing in case of containerized installations is, I maintain state (NFS Server/Redis/MariaDB) in standard VPS/VM and then deploy rest of the the stateless things in containers. I have setup MariaDB with Frappe specific config on a VM in private / firewalled network, Same VM has NFS and Redis. Gist of Data Server setup.

joshr · June 1, 2021, 7:35pm

Thanks. I checked your config against mine. There are no substantial differences. Also, I tried starting over with mariadb-galera. After debugging all of the issues related to the db swap, I’m back to almost the same thing:

pymysql.err.OperationalError: (1005, ‘Can't create table _3e8ea0dcafb368d3.tabC-Form Invoice Detail (errno: 168 “Unknown (generic) error from engine”)’)

Now, the referenced table is different but the same generic error is thrown. This time, I found the following, too:

2021-06-01 18:58:34 12 [Warning] Aborted connection 12 to db: ‘unconnected’ user: ‘unauthenticated’ host: ‘172.31.16.163’ (This connection closed normally without authentication)

and

2021-06-01 18:59:35 122 [Warning] Access denied for user ‘_3e8ea0dcafb368d3’@‘172.31.21.183’ (using password: YES)

It looks like variables aren’t being populated. If I explicitly set the user / passwords (rootUser, db, replication) in the mariadb values, then the erpnext pod doesn’t deploy properly (gets ‘access denied’). If I leave them unset, everything deploys but the db user doesn’t seem to get set properly (when running the ‘add site’ job). I think I’m getting closer here but thought maybe the issue would seem obvious to someone. This is in AWS EKS, btw.

joshr · June 1, 2021, 8:10pm

Interesting point…in the log, I see this…

2021-06-01 19:51:05 2552 [Warning] Access denied for user ‘_3e8ea0dcafb368d3’@‘172.31.31.86’

However, when looking at that user in the database, the only entry that I see for that user in mysql.user is:

| 172.31.17.130 | _3e8ea0dcafb368d3 |

How is this user created? Why might it be failing to include access for all cluster hosts?

joshr · June 1, 2021, 8:47pm

I’ve granted access for all hosts with that username but now the job errors out indicating that the site already exists no matter what SITE_NAME I pass it. I’ve deleted previous instances of the job. Is there some other key that’s used for site creation?

revant_one · June 2, 2021, 12:36am

Somehow I’m able to install all this.

joshr · June 2, 2021, 12:54am

Thank you for that.

revant_one · June 2, 2021, 1:07am

This happens to prevent overwrite of site if it already exists.

Deleted the site dir from vol or use FORCE=1 as env var to new job pod

joshr · June 4, 2021, 6:37pm

Figured it out. The problem here was tied to the fact that my database lived on an AWS EFS volume. EFS volumes only support 256 concurrent open files. The database would begin to populate and, since each table gets its own file, the limit would be reached part of the way through.

In the process of troubleshooting I switched over to mysql from mariadb. This isn’t necessary but if anyone else should find themselves in this situation, you need to add the following options to the [mysqld] sections for primary and secondary (in addition to the other character-set changes required by erpnext)…

innodb_file_per_table = OFF innodb_strict_mode = off sql_mode=‘’