By default, SQLAlchemy keeps a db.Engine's pooled connections open forever (pool_recycle=-1). This is a problem in MySQL because of a config value called wait_timeout, which sets the MySQL server to kill idle connections after 8 hours. Therefore, it was possible for code to check out a connection from SQLAlchemy (via self.connect() on one of the storages) where the underlying DB connection had been removed. To rectify this, I am setting the pooled connections to recycle after 1 hour.
Details
Details
bk and Integration
Local testing involved setting the wait_timeout variable on the MySQL DB to 2 seconds & then sleeping for 3 or more seconds. Before this fix in Dagit we'd encounter an OperationalError as the MySQL connection was dead; however, after this fix (setting MYSQL_POOL_RECYCLE to instead be 1 second), this issue was not present in a long-running (15 minute+) session with frequent reloading.
The issue repro code looked something like this:
with self.connect() as conn: conn.execute('SET SESSION wait_timeout = 2;') time.sleep(3)
Diff Detail
Diff Detail
- Repository
- R1 dagster
- Branch
- mysql-issue (branched from master)
- Lint
Lint Passed - Unit
No Test Coverage
Event Timeline
Comment Actions
We should definitely add a test for this.
python_modules/libraries/dagster-mysql/dagster_mysql/run_storage/run_storage.py | ||
---|---|---|
70 | might eventually want to make this configurable, but this seems fine for now. |
python_modules/libraries/dagster-mysql/dagster_mysql/event_log/event_log.py | ||
---|---|---|
59 | not sure this matters for NullPool |
Comment Actions
Added tests & removed the NullPool wait_timeout fix - there's no pool for that case so there's nothing to recycle
python_modules/libraries/dagster-mysql/dagster_mysql_tests/test_wait_timeout.py | ||
---|---|---|
26–28 | some assert here would be good |