Page MenuHomePhabricator

Move launch_pipeline CLI commands to not run any user code
ClosedPublic

Authored by dgibson on Aug 2 2020, 12:42 AM.

Details

Summary

Remove all use of PipelineDefinition / ReconstructablePipeline in the CLI path, so that we can use it with GRPC servers too.

Test Plan

BK

Diff Detail

Repository
R1 dagster
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Harbormaster returned this revision to the author for changes because remote builds failed.Aug 2 2020, 12:58 AM
Harbormaster failed remote builds in B16325: Diff 19927!
Harbormaster returned this revision to the author for changes because remote builds failed.Aug 2 2020, 1:43 AM
Harbormaster failed remote builds in B16327: Diff 19929!
Harbormaster returned this revision to the author for changes because remote builds failed.Aug 3 2020, 4:28 AM
Harbormaster failed remote builds in B16331: Diff 19933!
Harbormaster returned this revision to the author for changes because remote builds failed.Aug 3 2020, 2:47 PM
Harbormaster failed remote builds in B16334: Diff 19937!
Harbormaster returned this revision to the author for changes because remote builds failed.Aug 3 2020, 3:18 PM
Harbormaster failed remote builds in B16339: Diff 19942!
Harbormaster returned this revision to the author for changes because remote builds failed.Aug 3 2020, 3:38 PM
Harbormaster failed remote builds in B16341: Diff 19944!
Harbormaster returned this revision to the author for changes because remote builds failed.Aug 3 2020, 4:22 PM
Harbormaster failed remote builds in B16352: Diff 19955!
dgibson updated this revision to Diff 19962.Aug 3 2020, 5:04 PM
dgibson retitled this revision from execute_pipeline and launch_pipeline to extrnal repo/pipeline to Move execute_pipeline and launch_pipeline CLI commands to not run any user code.
dgibson edited the summary of this revision. (Show Details)
dgibson edited the test plan for this revision. (Show Details)
dgibson added reviewers: schrockn, alangenfeld, max, sashank.

up, ready for review now

dgibson published this revision for review.Aug 3 2020, 5:33 PM

couple straggler tests to sort out but this is generally complete now

I think the only last test failure here indicates that something is not working correctly with the ComputeLogManager in the BK test environment (it's not picking up stdout/stderr logs during step execution)

dgibson updated this revision to Diff 19970.Aug 3 2020, 7:00 PM

hopeful test fix

hmm dagster pipeline execute firing a pipeline execution in an un-managed grpc server is a bit worrisome. In our deployed cases the assumption is the grpc server acts as metadata-reads-only and all executions happen via run launcher. I'm not certain that dagster pipeline execute shouldn't boot up in a user process context directly and execute right there. I guess its a little weird if you are choosing out of a workspace that declares a different python env for that pipeline

curious if @schrockn has thoughts here

we could also add a method on RunLauncher that executes the run synchronously? (instead of the repo location, I just put it there since that's where it was before - there's even still a lingering broken callsite in dagster-graphql)

I assumed that's what the semantics were here - launch starts it asynchronously, execute runs it synchronously, both are host processes.

ya in my head dagster pipeline execute is for "just execute this pipeline right here in my terminal so i can debug it / see what happens", which might not be *right* but is what gives me pause here

alangenfeld requested changes to this revision.Aug 3 2020, 10:04 PM

can you split out the execute pipeline change to its own diff? would be good to land the launch pipeline fix ASAP

python_modules/dagster/dagster/cli/pipeline.py
479–483

accidental tuple, classic

This revision now requires changes to proceed.Aug 3 2020, 10:04 PM
dgibson updated this revision to Diff 20013.Aug 4 2020, 12:50 AM

keep the execute command running through user code for now (will send that out separately - left in the infra for it though, like the new method on repositorylocation)

dgibson retitled this revision from Move execute_pipeline and launch_pipeline CLI commands to not run any user code to Move launch_pipeline CLI commands to not run any user code.Aug 4 2020, 12:54 AM
dgibson updated this revision to Diff 20040.Aug 4 2020, 4:00 PM

test fix

schrockn requested changes to this revision.Aug 5 2020, 4:23 PM

This looks very good overall. I just wanted to make sure you saw the feedback around getting rid of DagsterInstance.get() calls. I could be missing something, but seems possible to eliminate them here

python_modules/dagster/dagster/cli/pipeline.py
306–310

can we not make the dagster instance required for all of these while you are in here? I would really, really like to remove most calls to DagsterInstance.get() in the codebase (ideally just at the top of tool entry points). Seems like it shouldn't be too hard just to ensure it is threaded by all callsites?

334

oh man

425–435

it seems we have the opportunity here to have shared logic between this and other calls sites (for example in dagster-graphql). not blocking for this diff, but something to investigate

468–470

can we name this external_pipeline_subset or something to distinguish it from the original

This revision now requires changes to proceed.Aug 5 2020, 4:23 PM
dgibson added inline comments.Aug 5 2020, 4:37 PM
python_modules/dagster/dagster/cli/pipeline.py
306–310

the only reason to pass this in is for tests, but there are other ways to stub in DagsterInstance values (like writing to dagster.yaml in the test filesystem).

dgibson added inline comments.Aug 5 2020, 4:44 PM
python_modules/dagster/dagster/cli/pipeline.py
306–310

oh, wait, I misread what you said. Yeah, that's no problem.

dgibson updated this revision to Diff 20152.Aug 5 2020, 4:59 PM

feedback - re:DagsterInstance.get(), my last piece of work for 0.9 task is a big refactor of the test that calls execute_execute_command in a bunch of places, so I'll do that, but roll it into that change coming later today to avoid conflict hell, if that's alright

schrockn accepted this revision.Aug 5 2020, 9:16 PM

startreknod

makes sense to me

This revision is now accepted and ready to land.Aug 5 2020, 9:28 PM

https://dagster.phacility.com/D4112 removes the optional DagsterInstance params