Page MenuHomeElementl

Add an error boundary within get_pipeline_run_observable to prevent dagit hanging on message parsing failures

Authored by dgibson on Aug 2 2021, 2:39 PM.



For reasons that we're honestly not entirely sure of, raising an exception within this event parsing code doesn't give the postgres event listener a chance to clean up, leaving it in a hanging state. This is an attempt to add an error boundary to avoid uncaught exceptions within the Observable callback.

Test Plan

Simulate an error within the event parsing code, see that the failure no longer bricks dagit

Diff Detail

R1 dagster
errorboundary (branched from master)
Lint Passed
No Test Coverage

Event Timeline

dgibson published this revision for review.Aug 2 2021, 3:05 PM

i'm not actually sure this is right despite fixing the hangs (in that it will cause batches of logs to just be skipped - if there was some way to still raise the exception, but give the observable time to clean up using its standard ways of doing that, that would be ideal)

I think @alangenfeld is in the process of unearthing a better fix here (fixing the root cause of the deadlock vs. preventing errors from getting thrown)