Page MenuHomeElementl

Submit API
AbandonedPublic

Authored by cdecarolis on Mar 4 2021, 9:44 PM.

Details

Summary

This diff adds a submit API, effectively performing the functionality of the dagster pipeline launch CLI.
Things included in this diff as of now:

  • Workspace.get API: resolves the closest workspace.yaml from the callsite of the working directory.
  • Workspace.find_external_target API: a context manager that searches locations within workspace for a specific target (right now this is a pipeline name), and returns external + location information pointing to that target.
  • submit - the actual submit API itself. This API launches off a run, and then returns a SubmitResult object for monitoring the run.
  • A context manager allowing for viewing outputs once run has completed: wait_for_result.
  • A mypy error fix - incorrect typing for SolidHandle.with_ancestor

Known issues:

  • cannot resolve a workspace from file if repo uses relative imports. I think this issue might actually pre-date this, because I couldn't find any examples testing this fxnality in code. It is also possible that this isn't actually an issue, but just something that we can't handle.
  • the execution results are a tangled mess. Planning on doing a refactor here.
Test Plan

Tests for new workspace methods and submit API.

Diff Detail

Repository
R1 dagster
Branch
launch
Lint
Lint Passed
Unit
No Test Coverage

Event Timeline

python_modules/dagster/dagster/core/execution/launch.py
66 ↗(On Diff #32855)

pipeline_def will be replaced by target or something, and we'll be able to ingest other things (just pipeline name, repo, etc). The idea is that regardless of the target, we always want to resolve to a workspace, and then from the workspace back into an executable thing (pipeline). Workspace -> pipeline is always gonna be the same, target -> workspace will change depending on the target.

86 ↗(On Diff #32855)

add support

87 ↗(On Diff #32855)

^

88 ↗(On Diff #32855)

^

89 ↗(On Diff #32855)

I think we probably don't want to support this

92 ↗(On Diff #32855)

I actually want this to be a context manager, where at the end of spin up, we also spin up io managers and are able to query for results before the end of the run.

95 ↗(On Diff #32855)

as discussed in https://dagster.phacility.com/D6727, we can probably move a lot of this resolution to Workspace itself.

python_modules/dagster/dagster_tests/cli_tests/command_tests/basic_repo/workspace.yaml
3

test multi repo, from module, from package, etc

One thing that I'm concerned about is communicating the failure modes of the resolution mechanism that we're doing here. It's a bit of magic, and there's ways that each can fail.

python_modules/dagster/dagster/core/execution/launch.py
35–47 ↗(On Diff #32855)

this del prune stuff feels real bad - feel like theres gotta be a better way

55–63 ↗(On Diff #32855)

did you find WorkspaceRequestContext WorkspaceProcessContext ? I think a lot of this code is like partial re-implementations of stuff we have else where.

python_modules/dagster/dagster/core/execution/launch.py
35–47 ↗(On Diff #32855)

Yea I'm def gonna go back through this to find something cleaner.

55–63 ↗(On Diff #32855)

yea I went through that while doing the reconstructable_from_workspace stuff. I think you're right, the only issue is that there seems to be two branching paths from the location origins - one to reconstructableRepository, and the other to externalRepository. I'm still a bit unclear on how those interact, and my impression is that I could probably figure out a way of reusing much more than I am here.

python_modules/dagster/dagster/core/execution/launch.py
66 ↗(On Diff #32855)

ya ideally any gap in functionality makes sense to attach to the existing things that are close

Execution Results, Launch -> Submit

cdecarolis retitled this revision from Launch API to Submit API.Mar 9 2021, 2:26 PM
cdecarolis added inline comments.
python_modules/dagster/dagster/core/execution/execution_results.py
95

Am aware of the volume of copy pasta here, gonna work on that.

Pushed workspace resolution onto workspace, simplified repo resolution greatly

python_modules/dagster/dagster/core/definitions/dependency.py
273

should probably pull this into it's own change but this is an incorrect typing

Split out tests for workspace and submit. Expanded tests to include modules and packages. Greatly expanded testing coverage of submit API.

cdecarolis edited the test plan for this revision. (Show Details)
cdecarolis added reviewers: schrockn, sandyryza.

Pipe through resource config to IO managers

python_modules/dagster/dagster/cli/workspace/workspace.py
217

I wouldn't expect this to be a public API.

231

the yield here is definitely rough. Maybe wrapping it in a context object or something?

python_modules/dagster/dagster/core/execution/execution_results.py
95

The execution results stuff is a tangled mess and I'm not sure how to resolve it. I tried to refactor into a few base abstract classes and couldn't get it to work. Probably gonna have to spend a good amount of time digging through this at some point.

200

This verbosity is a super unfortunate result of us not having access to the execution context. We need to rederive and pass in a ton of information to Input/Output Contexts.

241

Use the external execution plan to retrieve the mapping key for dynamic outputs. cc @alangenfeld I'm not 100% sure if this is necessary, don't have full context here, but it mimics the behavior on execute_pipelines output results.

python_modules/dagster/dagster/core/execution/submit.py
67

The API currently targets a workspace and a target upon that workspace. Right now, I assume it's a pipeline_name, but that should fail under collisions and so we need more specific targets as well (a la repo.pipeline_name)

67

Instance is currently mandatory, but you can imagine making this default to DagsterInstance.get().

alangenfeld added inline comments.
python_modules/dagster/dagster/cli/pipeline.py
440–441

the refactor to go for here is that the cli just calls your new function instead of importing anything from cli in to core

python_modules/dagster/dagster/core/execution/execution_results.py
216–223

nit: probably should have a single log manager that we-reuse for the whole result tree instead of newing them up

241

I think the right thing to do is revisit the output value fetching API instead of using the one from execute_pipeline/solid - we could probably handle these cases more elegantly if we designed for it

python_modules/dagster/dagster/core/execution/submit.py
5

this is a dubious import

10–33

i would expect this to be a stateful object so that it can do cursor fetches for events instead of refetching the entire event log over and over again

47–48

its debatable - but I think it makes more sense to me to open just for output access instead of the whole result - something to get feedback from others on

49–50

even with cursored fetching we want some sleeping or something here so we dont hit the DB as fast as possible

python_modules/dagster/dagster_tests/cli_tests/workspace_tests/workspace_get_tests/workspace.yaml
7–14

these are basically the same thing - dont think we need both

feel a bit off about adding a repo to the public package - its is small so maybe not worth worrying about

This revision now requires changes to proceed.Mar 12 2021, 4:32 PM
python_modules/dagster/dagster/cli/workspace/workspace.py
28

You shouldn't need to care about the PipelineDefinition for this API - you want to be able to run it in environments where you don't have direct access to the Pipeline code and thus are unable to load the PipelineDefinition (as the 'dagster pipeline launch' command works today)

218–221

the workspace shouldn't need to create any handles or locations - they've all already been created by the time you have a workspace, you should be able to iterate through the existing locations

227–236

as above you can ditch the pipeline definition for this command

python_modules/dagster/dagster/core/execution/execution_results.py
95

talked a bit offline about this - but I'm a little confused why 'the result of a submitted run' is in scope for this command, that seems like a separate thing

98–101

if we did this and are thinking of it as a host process operation, it probably shouldn't include user process information like the resources (anything that ends in definition)

python_modules/dagster/dagster/core/execution/submit.py
67

It's not clear to me why this is a method on Workspace - submitting runs is much more of an instance-level thing I think?

69

as above i don't think we want pipeline_def here