Page MenuHomeElementl

[core] schema changes for MySQL compatability
ClosedPublic

Authored by sidkmenon on Feb 10 2021, 4:51 PM.

Details

Summary

Schema modifications to allow for MySQL compatability -

1: Adding changes to db.DateTime and CUSTOM_TIMESTAMP to allow for decimal precision on MySQL as well (default behavior on Postgres & SQLite)
2: Mapping db.Text + a UNIQUE constraint to VARCHAR(512) instead of TEXT on MySQL (as a stopgap until a schema migration is made)

Since these changes only modify the compiler's behavior for mysql, they are backwards compatible.

Depends on D6399, D6519

Test Plan

Integration + bk

Diff Detail

Repository
R1 dagster
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

Harbormaster returned this revision to the author for changes because remote builds failed.Feb 10 2021, 5:09 PM
Harbormaster failed remote builds in B25551: Diff 31182!

Adding datetime precision to MySQL event log schemas

This is so that the MySQL event log schema acts the same as PG or SQLite (both support arbitrary datetime precision by default, while MySQL rounds to the nearest second by default)

Harbormaster returned this revision to the author for changes because remote builds failed.Feb 12 2021, 3:47 AM
Harbormaster failed remote builds in B25702: Diff 31367!

Modifying schema st. no alembic migrations needed until 0.11.0 release.

Harbormaster returned this revision to the author for changes because remote builds failed.Feb 16 2021, 9:10 PM
Harbormaster failed remote builds in B25898: Diff 31614!
sidkmenon added inline comments.
python_modules/dagster/dagster/core/storage/event_log/schema.py
6

probably should move all these sqlalchemy hacks to another file, but not sure what yet. All these things basically ensure that the MySQL db has the same semantics as postgres or sqlite

Modifying runs schema and schedules schema for MySQL compatability

forgot to correct secondary index table in runs::schema

sidkmenon retitled this revision from WIP - schema changes & alembic migrations for MySQL compatability to Schema changes & alembic migrations for MySQL compatability.Feb 17 2021, 2:00 AM
sidkmenon edited the summary of this revision. (Show Details)
sidkmenon edited the summary of this revision. (Show Details)
python_modules/dagster/dagster/core/storage/event_log/schema.py
3

I think this probably does not need to be called custom_compilation or be considered a migration.

maybe in ..sql, or ..utils?

python_modules/dagster/dagster/core/storage/migration/custom_compilation.py
10–14 ↗(On Diff #31676)

it doesn't seem like we'd actually want this to be configurable, or to even change it. would prefer to just inline this.

49–64 ↗(On Diff #31676)

Are the custom classes required? Why are we not just changing the mysql compilation of the existing constructs?

My preference is to not require schema authors to have to be aware that this custom compilation even exists if we can help it...

Does the compiler have access to column fields like unique?

python_modules/dagster/dagster/core/storage/runs/schema.py
20

we should have a test/lint check against this if it's going to break mysql.

prha requested changes to this revision.Feb 18 2021, 5:00 PM

returning to your queue

This revision now requires changes to proceed.Feb 18 2021, 5:00 PM
python_modules/dagster/dagster/core/storage/runs/schema.py
3

style nit:

can we instead do

from ..sql import get_current_timestamp

Aside from external libraries, we generally don't import modules.

python_modules/dagster/dagster/core/storage/sql.py
147

feels a little weird that this is named/structured like a function call rather than a class...

should this actually be a function call that instantiates the DateTime class?

Are the semantics actually the same? If I construct the query, then sleep, then execute on the connection, does it behave the same as CURRENT_TIMESTAMP (which I think is a command to the engine?)

175–181

ahhhh, this is so much better!!

niceeeeeee

sidkmenon marked 3 inline comments as done.

responding to @prha's comments

okay, this is looking pretty good.

I would consider adding a lint check so that we're not reliant on waiting for build failures, but you can look in to doing that in a separate diff (see python_modules/dagster/dagster/utils/linter.py).

Please mind the comment about whether the timestamp gets resolved at execution time on the sql engine or at statement definition time. We should re-evaluate the implementation if it's the latter.

This revision is now accepted and ready to land.Feb 20 2021, 12:57 AM
python_modules/dagster/dagster/core/storage/event_log/schema.py
3

ah makes sense to me

python_modules/dagster/dagster/core/storage/migration/custom_compilation.py
49–64 ↗(On Diff #31676)

That's a really good point - I didn't think it would be possible to do on plain db.Text at first, but you're totally right. Next diff update address these comments. Thanks Phil!

python_modules/dagster/dagster/core/storage/sql.py
147

I think this follows an example laid out in the docs more of less - https://docs.sqlalchemy.org/en/13/core/compiler.html#utc-timestamp-function is where I got the inspiration. As for semantics, my understanding is that the generated sql is identical.

as an example, for the asset_keys table this would be the default generated table

CREATE TABLE asset_keys (
	id INTEGER NOT NULL,
	asset_key TEXT,
	create_timestamp DATETIME DEFAULT CURRENT_TIMESTAMP,
	PRIMARY KEY (id),
	UNIQUE (asset_key)
)

and here is the MySQL statement:

CREATE TABLE asset_keys (
	id INTEGER NOT NULL AUTO_INCREMENT,
	asset_key VARCHAR(512),
	create_timestamp DATETIME(6) DEFAULT CURRENT_TIMESTAMP(6),
	PRIMARY KEY (id),
	UNIQUE (asset_key)
)
sidkmenon retitled this revision from Schema changes & alembic migrations for MySQL compatability to [core] schema changes for MySQL compatability.
sidkmenon edited the summary of this revision. (Show Details)