Page MenuHomeElementl

Refactor step handler to pass a context object
ClosedPublic

Authored by johann on May 13 2021, 8:05 PM.

Details

Summary

Previously I think I gave too much weight to getting the step handler abstraction to work with the existing multiprocess executor machinery. I think the abstraction is actually more useful if we limit it's parameters, for example in a case where we want to call a step handler across a process boundary. This refactors the arguments to a context object that can be serialized and deserialized if necessary, but doesn't add overhead otherwise.

Test Plan

integration

Diff Detail

Repository
R1 dagster
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

johann added reviewers: dgibson, alangenfeld.
johann retitled this revision from Refactor step handler to Refactor step handler to only use ExecuteStepArgs.
Harbormaster returned this revision to the author for changes because remote builds failed.May 13 2021, 8:43 PM
Harbormaster failed remote builds in B30536: Diff 37538!
Harbormaster returned this revision to the author for changes because remote builds failed.May 13 2021, 9:10 PM
Harbormaster failed remote builds in B30542: Diff 37545!
Harbormaster returned this revision to the author for changes because remote builds failed.May 13 2021, 9:34 PM
Harbormaster failed remote builds in B30550: Diff 37556!

I was under the impression that the main goal of ExecuteStepArgs was to have a serializable input to the execute_steps CLI command (something that you can JSON serialize), so its a little surprising to me to see it given this new job as well (the primary interface into step handlers in the new executor). Is the idea that this same class will be usable on the other side of a process boundary that takes in an ExecuteStepArgs?

"I think the abstraction is actually more useful if we limit it's parameters." - maybe this is the part I'm not following - what becomes more useful here? Is the idea that lots of step handlers will involve executing an execute_step CLI call in some environment, so it reduces boilerplate to have that all packaged up already?

python_modules/libraries/dagster-k8s/dagster_k8s/executor.py
63

if there were ever, completely hypothetically of course, a StepDelegatingExecutor that didn't want to include the instance ref in any ExecuteStepArgs they wanted to pass around, how would that work with this new API? make a new copy with instance_ref None?

dgibson requested changes to this revision.May 17 2021, 1:51 PM

to you for q - might make more sense to me after i work through the other non-step-isolated diff

This revision now requires changes to proceed.May 17 2021, 1:51 PM

Good points. My thinking was that step handlers exist to call the dagster api execute_step call, and that they should be able to do it with these args.

pass step handler context

johann retitled this revision from Refactor step handler to only use ExecuteStepArgs to Refactor step handler to pass a context object.May 18 2021, 9:41 PM
johann edited the summary of this revision. (Show Details)
johann edited the summary of this revision. (Show Details)
johann edited the summary of this revision. (Show Details)

up

python_modules/dagster/dagster/core/executor/step_delegating/step_handler/base.py
49

naming/other convention to follow?

dgibson added a subscriber: prha.
dgibson added inline comments.
python_modules/dagster/dagster/core/executor/step_delegating/step_delegating_executor.py
37

i wonder if is_dagster_event should be an index in the DB

cc @prha relevant to our discussion today around logs vs. events

48

sorry if I asked this earlier - is instance_ref=None here correct in general?

83–90

are we confident that steps will always be terminated individually?

python_modules/dagster/dagster/core/executor/step_delegating/step_handler/base.py
11

Handler not Hander

49

InstanceRef is the closest thing I can think of

def serialize(self)?

I feel like 'tuple' doesn't necc. imply 'serializable'

This revision is now accepted and ready to land.May 19 2021, 6:39 PM
python_modules/dagster/dagster/core/executor/step_delegating/step_delegating_executor.py
37

ah...

I guess I don't have any idea what the ratio of structured vs unstructured events in the event log are. Seems like this could be an optimization that we do later on though.