Page MenuHomeElementl

Surface multiple errors in daemon
ClosedPublic

Authored by johann on Jan 6 2021, 8:10 PM.

Details

Summary

Allow daemons to yield non-fatal errors to be reported in daemon heartbeats.

Each heartbeat includes the full list of errors from the last iteration. If it used errors from the current generation, it could randomly exclude errors based on if the daemon had reached them yet.

TODO: graphql +dagit side

Test Plan

Unit

Diff Detail

Repository
R1 dagster
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

Harbormaster returned this revision to the author for changes because remote builds failed.Jan 6 2021, 8:27 PM
Harbormaster failed remote builds in B23765: Diff 28900!
Harbormaster returned this revision to the author for changes because remote builds failed.Jan 13 2021, 9:09 PM
Harbormaster failed remote builds in B24310: Diff 29578!
johann edited the test plan for this revision. (Show Details)
johann added reviewers: dgibson, prha.
johann published this revision for review.Jan 13 2021, 9:16 PM

Publishing early, tests will fail due to missing gql changes

python_modules/dagster/dagster/daemon/controller.py
134

This whole block is rather messy, happy to take suggestions. Maybe Dan's earlier proposal to iterate through daemons and remove them as they finish would help, I just need to remember python's rules about mutating the data under an iterator

wait for first iteration to report heartbeat

prha requested changes to this revision.Jan 21 2021, 5:08 PM

Returning to your queue, also the bk errors look legit.

python_modules/dagster/dagster/daemon/controller.py
118

is there a reason we're doing this mutation of the daemon data structure, rather than having errors be a field on the controller itself?

Maybe we could just build up a simple data structure, like a dict mapping daemon to a list of the last iteration errors?

This revision now requires changes to proceed.Jan 21 2021, 5:08 PM
js_modules/dagit/src/instance/DaemonHealth.tsx
66

Leaving how to render the list up to Dish

some clean-up nits, but seems reasonable to me.

js_modules/dagit/src/instance/DaemonHealth.tsx
45

in JS, an empty array evaluates as true.... do we need to also check daemon.lastHeartbeatErrors.length here?

python_modules/dagster/dagster/daemon/controller.py
107

style nit: just create a variable daemon_type? Or maybe change to a property so it doesn't look as much like a method call?

style nit: consider helper methods (e.g. initialize_iteration_for_type, log_exception_for_type)? not sure if it would make this a bit easier to read...

113

remove?

120

should we do a type check here?

python_modules/dagster/dagster/daemon/daemon.py
48–49

typo

51

This seems misleading... It's not generating Exception, it's generating the serializable error info.

python_modules/dagster/dagster_tests/daemon_tests/test_dagster_daemon.py
109–111

should we just remove these, or expose a method we can check on the controller?

This revision is now accepted and ready to land.Jan 28 2021, 5:22 PM
This revision was automatically updated to reflect the committed changes.