Previously I think I gave too much weight to getting the step handler abstraction to work with the existing multiprocess executor machinery. I think the abstraction is actually more useful if we limit it's parameters, for example in a case where we want to call a step handler across a process boundary. This refactors the arguments to a context object that can be serialized and deserialized if necessary, but doesn't add overhead otherwise.
I was under the impression that the main goal of ExecuteStepArgs was to have a serializable input to the execute_steps CLI command (something that you can JSON serialize), so its a little surprising to me to see it given this new job as well (the primary interface into step handlers in the new executor). Is the idea that this same class will be usable on the other side of a process boundary that takes in an ExecuteStepArgs?
"I think the abstraction is actually more useful if we limit it's parameters." - maybe this is the part I'm not following - what becomes more useful here? Is the idea that lots of step handlers will involve executing an execute_step CLI call in some environment, so it reduces boilerplate to have that all packaged up already?
if there were ever, completely hypothetically of course, a StepDelegatingExecutor that didn't want to include the instance ref in any ExecuteStepArgs they wanted to pass around, how would that work with this new API? make a new copy with instance_ref None?
i wonder if is_dagster_event should be an index in the DB
cc @prha relevant to our discussion today around logs vs. events
sorry if I asked this earlier - is instance_ref=None here correct in general?
are we confident that steps will always be terminated individually?
Handler not Hander
InstanceRef is the closest thing I can think of
I feel like 'tuple' doesn't necc. imply 'serializable'
I guess I don't have any idea what the ratio of structured vs unstructured events in the event log are. Seems like this could be an optimization that we do later on though.