Page MenuHomeElementl

[RFC] Instance health query
Needs RevisionPublic

Authored by johann on Feb 27 2021, 1:24 AM.

Details

Summary

A query for overall info about instance health. Two use cases:

  • the status indicator in Dagit nav
  • a monitoring tool, that would alert when this query fails
Test Plan

unit

not sure how to mock repo location failures, tried to copy python_modules/dagster-graphql/dagster_graphql_tests/graphql/test_reload_repository_location.py without success

Diff Detail

Repository
R1 dagster
Branch
instance-alert (branched from master)
Lint
Lint Skipped
Unit
No Test Coverage

Event Timeline

johann retitled this revision from Instance health query to [RFC] Instance health query.Feb 27 2021, 1:24 AM
johann edited the test plan for this revision. (Show Details)
johann added reviewers: dgibson, alangenfeld.
johann edited the summary of this revision. (Show Details)
python_modules/dagster-graphql/dagster_graphql/schema/instance.py
88

Do the more specific fields belong here? It seems natural to want to expose these along with an unhealthy alert, but you could query the other endpoints

johann published this revision for review.Feb 27 2021, 1:29 AM

Is there anything specific you are looking for feedback on here given its an RFC?

not sure how to mock repo location failures, tried to copy python_modules/dagster-graphql/dagster_graphql_tests/graphql/test_reload_repository_location.py without success

def worry about this going stale / breaking if its not effectively under test - you might need to manually create a context with a test or mock Workspace / repo location manager / whatever to instigate failure behavior

python_modules/dagster-graphql/dagster_graphql/schema/instance.py
103–108

if one of my dagit replicas has a (transient?) network connection failure reaching a user deploy, is my instance unhealthy?

This revision now requires changes to proceed.Mar 12 2021, 4:08 PM