Page MenuHomeElementl

RFC solid-hook context.solid_exception
ClosedPublic

Authored by yuhan on Apr 7 2021, 7:33 AM.

Details

Summary

user problem: https://github.com/dagster-io/dagster/issues/3631
users want to get detailed info about an error in failure_hook

nullable context.solid_exception

@failure_hook
def slack_on_failure(context):
    traceback.print_tb(context.solid_exception.__traceback__)

Pros:

  • No breaking change for existing hook users, as it only adds an additional attribute to HookContext.
  • It’s explicitly called “error” so users don’t need to validate the event
  • Users get the actual exception thrown in the body of the solid

Cons:

  • Nullable yield so need to handle cases when accessing in success_hook.

Alternatives : https://elementl.quip.com/uDMaAoa8Yx5O/Better-Hooks#fVNACAJDyPT

Test Plan

unit

Diff Detail

Repository
R1 dagster
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

Harbormaster returned this revision to the author for changes because remote builds failed.Apr 7 2021, 7:50 AM
Harbormaster failed remote builds in B28534: Diff 35017!
yuhan requested review of this revision.Apr 7 2021, 11:17 PM
yuhan added reviewers: sashank, prha.

quick drive by thoughts

  • since hooks happen in process I think we should have the actual exception object available on failure
  • I think we should probably have all the events from the step available, then maybe utilities for finding the success/failure hook from the full list

hold until we figure out a plan for the pipeline hook

python_modules/dagster/dagster/core/execution/context/system.py
585

solid hooks are event based and we invoke hooks after we wrap sys.exc_info() into SerializableErrorInfo - code, so it's easier to return SerializableErrorInfo directly.

alternatively, we could 1) deserialize it back to the real exception which ideally needs to replicate the serialization logic, or 2) pipe through sys.exc_info() from the hook invocation, e.g. here

pros:

  • the name is more obvious than event or failure_event
  • users dont have to do event.event_specific_data.error
  • it gives error info about the step failure and is exactly the same as what users would get in the logs.

cons: SerializableErrorInfo isn't really public

per offline convo with @alangenfeld, we think it's important to make the actual exception object available on the failure hook

yuhan retitled this revision from RFC solid-hook context.failure_event to RFC solid-hook failure_hook provide access to the error.Tue, Apr 20, 6:12 PM

I like the idea of just exposing error_info. Even if we were to provide a more generic API, error_info would still be useful as a convenience.

The question of what to do with SerializableErrorInfo is tough. I don't think there's any kind of generic way that we can turn a SerializableErrorInfo into an Error, is there? I.e. we can't be sure that the error class has a constructor that supports any particular arguments. So my instinct would just be to expose it. What do you think @alangenfeld?

context.error returns the actual exception object

yuhan retitled this revision from RFC solid-hook failure_hook provide access to the error to RFC solid-hook failure_hook provide access to the original error.Thu, Apr 22, 11:01 PM
yuhan edited the summary of this revision. (Show Details)
yuhan added reviewers: sandyryza, alangenfeld, prha.
python_modules/dagster/dagster/core/execution/context/system.py
346

this would only be used by solid hooks (failure_hook in particular) for passing the original exception object from [1] to [2] into the HookContext. bc solid hooks will always execute in the same process as the step execution does, so this step_exception is step-scoped. so it'd make sense to just put on StepExecutionContext as opposed to IStepContext or IPipelineContext.

463–464

@alangenfeld how do you feel about this assignment.. 😅 - can make it private so users wont use or any other suggestions?

besides, I leave it as the original exception here. we can also return a nicer-formatted TrackbackException

python_modules/dagster/dagster/core/execution/plan/execute_plan.py
94

[2]

203

[1]

python_modules/dagster/dagster/core/execution/context/system.py
585

def exception ? since its not always an error object

595–598

hm maybe just let it be None? not sure if it matters to throw here

python_modules/dagster/dagster/core/execution/plan/execute_plan.py
203–217

nit: move these two to just be one right under the except

232–283

hm sucks to have to dupe this but not the worst in the world

leaving open for others inputs - I like this

python_modules/dagster/dagster/core/execution/context/system.py
585

maybe solid_exception ? I guess if we were for example going to make outputs available for success what would it be

def outputs(self) -> Dict[str, Any]

def solid_outputs(self) -> Dict[str, Any]

The step_exception trick wouldn't work if we ever want to execute the hook in a different process than the solid, right? Is that an avenue we want to keep open?

yes and i think it's ok because solid hooks being in-process could be one of the key distinction between solid hooks and "pipeline hooks" (naming pending)

python_modules/dagster/dagster/core/execution/context/system.py
585

def solid_exception(self) -> BaseException
def solid_outputs(self) -> Dict[str, Any]
?

besides, i think technically soild_outputs could also be available for failure hook, as you can yield multiple outputs and the failure happens in between - in this case certain outputs have been yielded.

595–598

bc mentioned above, outputs can also be on failure hook, i think it'd make sense to have all these new property on HookContext rather splitting it. so if a user is trying to do context.solid_exception in a success solid, either we error like this, or return None which indicates there's no exception? im open to either

python_modules/dagster/dagster/core/execution/plan/execute_plan.py
232–283

def sucks but also *a little nice* as in we wont blindly pass all exception, like in case (3) we would only capture the dagster_user_error.user_exception

python_modules/dagster/dagster/core/execution/context/system.py
595–598

nullable for now. open to feedback

python_modules/dagster/dagster/core/execution/plan/execute_plan.py
203–217

good call out and it raises a semantic question bc the last else won't fire a failure event:
if we think all exceptions should be captured and pass via context.solid_exception, we'll move it up.
if we think context.solid_exception is not None == solid emits a failure event, we should skip the last else which handles entry.

python_modules/dagster/dagster/core/execution/context/system.py
585

solid_output_values here: D7591 - again, open to rename

yuhan retitled this revision from RFC solid-hook failure_hook provide access to the original error to RFC solid-hook context.solid_exception.Sat, Apr 24, 2:53 AM
yuhan edited the summary of this revision. (Show Details)

I'm ok with including the step exception if Alex is. Otherwise this LGTM. Will leave to @alangenfeld for final accept.

This revision is now accepted and ready to land.Tue, Apr 27, 4:46 PM
This revision was automatically updated to reflect the committed changes.