Page MenuHomeElementl

allow step failure events to have no serialized error
ClosedPublic

Authored by dgibson on Aug 2 2021, 1:06 PM.

Details

Summary

StepFailureData lets the error passed in be empty, but then when the event log query goes to create the event, it throws if there is no error set.

This diff makes the server more resilient to empty errors, and changes the frontend to handle step failure events with no serialized error data (falling back to rendering the message of the event instead of the full error, which we have pretty good frontend support for already)

However, before we should land this, we should gain understanding of why this serverside graphql error seems to *completely break the dagit server and stop it from serving any more request until the server is restarted*, at least when it is thrown in certain situations.

Test Plan

BK, raise a step failed error using the k8s executor, dagit no longer bricks, logs from that run render reasonably

Diff Detail

Repository
R1 dagster
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

js_modules/dagit/packages/core/src/runs/LogsRow.tsx
82

@alangenfeld i looked over https://sourcegraph.com/github.com/dagster-io/dagster/-/commit/b57b91ea7666da41a3dbc610116aed500c647d84 and it wasn't immediately clear to me what the consequences were of removing this line - the reason I did was that it looked bad for step failure events that were just a single line (due to having no stack trace to fill up space - you couldn't even see the context, it was just a big "Click to expand").

what condition did we add where we have a step failure with no exception?

js_modules/dagit/packages/core/src/runs/LogsRow.tsx
82

I believe the thinking was that the context header based on the source type is hidden information if the message isn't long enough to trigger the dialog window

step failure event with no serialized error info: https://sourcegraph.com/github.com/dagster-io/dagster/-/blob/python_modules/libraries/dagster-k8s/dagster_k8s/executor.py?L233:42

The data classes don't yell and allow an empty error to be passed in (and such events are now in user DBs for better or for worse)

make sure we update the mypy type on StepFailureData which is also not set as Optional

alangenfeld added inline comments.
js_modules/dagit/packages/core/src/runs/LogsRow.tsx
82

we should have @dish do a follow up pass for cleanup on these to be safe

This revision is now accepted and ready to land.Aug 2 2021, 5:02 PM
js_modules/dagit/packages/core/src/runs/LogsRow.tsx
82

Do you have before/after screenshots?

This revision was landed with ongoing or failed builds.Aug 2 2021, 5:39 PM
This revision was automatically updated to reflect the committed changes.