Page MenuHomePhabricator

create run partition data migration, tie to `dagster instance migrate`
ClosedPublic

Authored by prha on Mon, Jan 11, 9:34 PM.

Details

Summary

Sets up migrations for run_storage, to be run upon dagster instance migrate.

Currently, we have a CLI command dagster instance migrate which performs schema migrations, and dagster instance reindex which performs optional data migrations (over the event log).

https://dagster.phacility.com/D5840 adds a partition column to the runs table, which would require a data migration in order to begin doing reads off of that column.

This diff sets up the machinery to track that migration status, and also notably performs a data migration over the runs table upon dagster instance migrate.

The upside is that with 0.10.0, we have some schema changes that must be performed and so we know that people will run dagster instance migrate. The downside is that this is definitely a data migration and its latency depends on the size of the runs table.

Test Plan

bk

Diff Detail

Repository
R1 dagster
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

prha requested review of this revision.Mon, Jan 11, 9:56 PM
prha retitled this revision from create run partition data migration to RFC: create run partition data migration, tie to `dagster instance migrate`.Mon, Jan 11, 10:02 PM
prha edited the summary of this revision. (Show Details)
prha added reviewers: dgibson, alangenfeld, schrockn.

probably makes sense, lets just get a test in place before landing

python_modules/dagster/dagster/core/instance/__init__.py
489–493

nit: reindex feels wrong given the behavior, build_missing_indexes with a force_rebuild_all param would feel better i think

python_modules/dagster/dagster/core/storage/runs/migration.py
7

hm this lambda looks off, doesn't it need () or to drop the lambda (by moving this dict after the fn decl)

python_modules/dagster/dagster/core/storage/runs/sql_run_storage.py
64–76

i think more verbose names would be nice given how infrequent these should be called, at least maybe some comments

python_modules/dagster/dagster/core/storage/runs/migration.py
7

yeah, the lambda is just there to get around the fn decl ordering.

It's invoked by calling migration_fn()(instance, print_fn)

  • rename index methods, add migration test

i don't love this complexity burden but i think its the right call for end user experience

This revision is now accepted and ready to land.Tue, Jan 12, 9:54 PM

What's your take on how long we'd need to maintain this migration behavior? I've self-justified the complexity by thinking that we'd eventually consider it safe to remove this migration behavior.

The current migration path is set up so that anyone who has run dagster instance migrate or started using dagster post-0.10.0 will be in a good state.

Is it safe to remove with 0.11.0? In other words, do we need to handle the case where people jump from 0.9.x to > 0.11.0?

i dont know, I think building missing indexes on migration seems reasonable, the tracking of indexes in each storage also makes sense. I think dropping all this stuff will have to come with some big change to the instance set up

prha retitled this revision from RFC: create run partition data migration, tie to `dagster instance migrate` to create run partition data migration, tie to `dagster instance migrate`.Tue, Jan 12, 10:29 PM