By default, SQLAlchemy keeps a db.Engine's pooled connections open forever (pool_recycle=-1). This is a problem in MySQL because of a config value called wait_timeout, which sets the MySQL server to kill idle connections after 8 hours. Therefore, it was possible for code to check out a connection from SQLAlchemy (via self.connect() on one of the storages) where the underlying DB connection had been removed. To rectify this, I am setting the pooled connections to recycle after 1 hour.
bk and Integration
Local testing involved setting the wait_timeout variable on the MySQL DB to 2 seconds & then sleeping for 3 or more seconds. Before this fix in Dagit we'd encounter an OperationalError as the MySQL connection was dead; however, after this fix (setting MYSQL_POOL_RECYCLE to instead be 1 second), this issue was not present in a long-running (15 minute+) session with frequent reloading.
The issue repro code looked something like this:
with self.connect() as conn: conn.execute('SET SESSION wait_timeout = 2;') time.sleep(3)
We should definitely add a test for this.
might eventually want to make this configurable, but this seems fine for now.
not sure this matters for NullPool
Added tests & removed the NullPool wait_timeout fix - there's no pool for that case so there's nothing to recycle
some assert here would be good