Page MenuHomeElementl

[RFC] Add Helm job to migrate Dagster instance
ClosedPublic

Authored by rexledesma on Feb 24 2021, 12:11 AM.

Details

Summary

Resolves https://github.com/dagster-io/dagster/issues/2378.

Here, we write a Helm hook to initiate dagster instance migrate in a kubernetes
job, to occur after all resources have been loaded or updated in kubernetes via
helm install or helm upgrade.

See:
https://helm.sh/docs/topics/charts_hooks/
https://itnext.io/database-migrations-on-kubernetes-using-helm-hooks-fb80c0d97805
https://medium.com/flant-com/custom-commands-deploying-in-kubernetes-df4ea4487f18

Test Plan

helm install -g helm/dagster, inspected via k9s that hook ran successfully.

Diff Detail

Repository
R1 dagster
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

Awesome. Couple quqestions: Does this mean that whenever you, say, update code in a user deployment via helm, that it will run a migration check? I think generally 'dagster instance migrate' is idempotent, so that may be fine or even desirable, but just checking - one thing i'd potentially be worried in that case is other parts of the deployment interacting with the DB while its in the middle of migrating though.

If we did want to make it more of a manually-initiated thing (but still more streamlined than it is now in master), would there be a path for that?

Another option is to ask users to connect to their cluster and manually run kubectl apply -f dagster-instance-migrate.yaml to create the migration job so that users aren't caught off guard? for example, some users may want to be extra safe and backup their database before migration in case something unexpected happens

dgibson requested changes to this revision.Feb 24 2021, 5:26 PM

Yeah, I think the backup thing is a good point. I'd love to say that we are 100% confident in our migrations but in practice there's still some risk with each one.

Having a single kubectl command they can run would still be a huge improvement over the status quo.

This revision now requires changes to proceed.Feb 24 2021, 5:26 PM

Also wanted to second daniel's concern about running the migration while dagit is in use.

In redash's upgrade process, they stop the service before migrating which we may need to do too

Does this mean that whenever you, say, update code in a user deployment via helm, that it will run a migration check?

Yes. However, once we split the helm chart to support isolated releases of user code and dagster infrastructure, this migration job will only be kicked off if the release for dagster infrastructure is updated.

If we did want to make it more of a manually-initiated thing (but still more streamlined than it is now in master), would there be a path for that?

We can put this automatic migration behind a flag dagit.migrationHook.enabled if that's of interest!

I'm not sure we should offer it as a flag if we're worried about the 'running other stuff during a migration' issue though - since if its an option in the helm chart we're implicitly endorsing it as safe.

@dgibson @catherinewu

I've made migration a more manual process. Rather than it being a hook, they'll have to manually kick off the job. I've also added steps for them to scale down their deployments so that no runs will be launched during the migration process.

rexledesma retitled this revision from [RFC] Add Helm hook to migrate dagster instance to [RFC] Add Helm job to migrate Dagster instance.Mar 3 2021, 8:01 PM

this makes sense to me! @catherinewu or @johann is there anything else you all think should happen here? I put a couple of small comments about the docs inline.

docs/next/src/pages/deploying/kubernetes/how-to-migrate.mdx
13 ↗(On Diff #32713)

suggestion "you may also need to migrate your Dagster instance"

14 ↗(On Diff #32713)

this is true but it also might just crash if you don't :) So I think you could leave the part from "so that" out

49 ↗(On Diff #32713)

Migration doesn't always wipe out your schedules - just the 0.10.0 migration did. Maybe "Check the migration document for any additional steps you may need to take as part of the migration"? Maybe that should be first actually, since there might be stuff that's best to do while the system is down

docs/next/src/pages/deploying/kubernetes/how-to-migrate.mdx
13 ↗(On Diff #32713)

It would be nice to reassure users that migration is only needed between minor versions and not between dot releases

31 ↗(On Diff #32713)

Could add how to check whether the migration succeeded

45 ↗(On Diff #32713)

users' original replicas arg might not be 1

46 ↗(On Diff #32713)

users' original replicas arg might not be 1

helm/dagster/templates/job-instance-migrate.yaml
15

do we need to have a check-db-ready container to make sure the db is available

FYI: we've landed D6835 later today - which affected the docs content structure. I'm happy to hop on a call to go over the new info hierarchy, especially the k8s guides!

Nice! Existing feedback looks good to me

just back to your queue for the various docs changes - generally i think this looks great

This revision now requires changes to proceed.Mar 11 2021, 11:10 PM
This revision was not accepted when it landed; it landed in state Needs Review.Mar 18 2021, 7:59 AM
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.