Page MenuHomePhabricator

Split RepositoryOrigin/PipelineOrigin hierarchy into ExternalOrigins and PythonOrigins
ClosedPublic

Authored by dgibson on Wed, Oct 28, 5:26 PM.

Details

Summary

Now that we are going to be persisting pipeline origins in the run database forever, it's worth making sure that what we persist actually reflects what we want to store forever.

right now we are using origins for two different purposes:

  • PipelinePythonOrigin is used in user processes to identify the code pointer needed to execute a pipeline
  • RepositoryOrigins are used in host processes to recreate an ExternalRepository/Pipeline/Schedule

The things that make the first thing possible (including the code pointer in the origin rather than the repository name) means that changing the code pointer breaks all of your schedule origins in the database, even if the underlying repository doesn't change. It's also leaky to pass around a CodePointer in a host process.

So this diff splits those two purposes out into two different classes.

ExternalRepositoryOrigin / PipelineOrigin / ScheduleOrigin is used when you want to identify an ExternalRepository/Pipeline/Schedule in a host process, e.g. the scheduler. That's what we persist in our databases too.

RepositoryPythonOrigin / PipelinePythonOrigin is used for user processes when you need to pass around a code pointer for execution (for example, the k8s run launcher gets a code pointer and passes it into the user code execution command, and the gRPC server creates a PipelinePythonOrigin from the passes in ExternalPipelineOrigin

Test Plan

BK+Azure

Diff Detail

Repository
R1 dagster
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Harbormaster returned this revision to the author for changes because remote builds failed.Wed, Oct 28, 7:37 PM
Harbormaster failed remote builds in B20329: Diff 24662!
Harbormaster returned this revision to the author for changes because remote builds failed.Wed, Oct 28, 10:24 PM
Harbormaster failed remote builds in B20358: Diff 24693!
Harbormaster returned this revision to the author for changes because remote builds failed.Thu, Oct 29, 3:06 AM
Harbormaster failed remote builds in B20378: Diff 24715!
Harbormaster returned this revision to the author for changes because remote builds failed.Thu, Oct 29, 4:57 AM
Harbormaster failed remote builds in B20382: Diff 24719!
Harbormaster returned this revision to the author for changes because remote builds failed.Thu, Oct 29, 2:04 PM
Harbormaster failed remote builds in B20383: Diff 24720!
Harbormaster returned this revision to the author for changes because remote builds failed.Thu, Oct 29, 5:33 PM
Harbormaster failed remote builds in B20414: Diff 24757!
dgibson retitled this revision from Refactor repo/pipeline/schedule origins so that they are fully reconstructable using the original repository location to Split RepositoryOrigin/PipelineOrigin hierarchy into ExternalOrigins and PythonOrigins.Thu, Oct 29, 6:16 PM
dgibson added a reviewer: prha.
prha requested changes to this revision.Fri, Oct 30, 2:30 PM

This makes a lot of sense to me.

This will shift all the stored origin ids in schedule storage, right? Is the plan to land this before 0.10.0?

python_modules/dagster-graphql/dagster_graphql/schema/external.py
122–123

I think this will break the repository information in dagit?

We'll need to handle nullable values...

(would be caught if we had codePointer as a requested field in a dagster-graphql test that I don't think we have)

python_modules/dagster/dagster/core/host_representation/external.py
252–253

should we put a deprecation warning on this?

python_modules/dagster/dagster/grpc/server.py
357 ↗(On Diff #24777)

thankyou

374 ↗(On Diff #24777)

sorry

This revision now requires changes to proceed.Fri, Oct 30, 2:30 PM

Split out the part of this that adds the new origin (not breaking, D4994) into the part that eliminates the old origin hierarchy other than the XXPythonOrigin classes (breaking, this diff, can land closer to 0.10.0?)

revert unintentional changes

to your queue for now

This revision now requires changes to proceed.Mon, Nov 2, 6:38 PM

rebase in case @prha wants to layer on top of this for his job storage changes

rebase onto non broken master

prha requested changes to this revision.Thu, Nov 19, 9:03 PM
prha added inline comments.
python_modules/dagster-graphql/dagster_graphql/schema/schedules/schedule_state.py
127

I'm getting errors in the dauphin constructor here, when the location origin is a ManagedGrpcPythonEnvRepositoryLocationOrigin

This revision now requires changes to proceed.Thu, Nov 19, 9:03 PM

roll in prha's diff into this diff (since it was built on top of this diff, and so that this won't break anything when it lands - unfortunately it also makes the diff even larger)

incorporate latest origin graphql changes

This revision is now accepted and ready to land.Fri, Nov 20, 7:28 PM