Page MenuHomeElementl

[dagit] Make Non-Subscription GraphQL ops use HTTP
ClosedPublic

Authored by sidkmenon on Apr 30 2021, 8:42 PM.

Details

Summary

As part of an ask by a user (and generally for best practice), we are moving non-subscription GraphQL operations to use HTTP instead of a websocket connection.

The removal of the GEventExecutor is covered in GH issue #4143

Test Plan

bk

Also checked that the app behaves similarly to dagit built off of using only websockets (i.e. static JS built from master, dev JS built from branch).

When the webserver is killed, dagit does not immediately crash or anything. However, a few new behaviors are seen:

  1. As in the picture below, we can see that a red error bar is shown in the top right-hand corner, and the 'Runs' page renders a bit differently when the backing GraphQL server is offline:

Screen Shot 2021-05-06 at 1.53.33 AM.png (1×3 px, 404 KB)

These behaviors are not seen on the production dagit, which looks like this - the only indication that the server is down,besides the gray circle in the corner is console print statements

Screen Shot 2021-05-06 at 2.00.17 AM.png (1×3 px, 890 KB)

This is behavior that is due to the difference between websockets & http - the websocket response does not have an 'error' field on the JS side so the components render differently. Not sure if throwing away the error in this case (to conform to the old behavior) makes sense though -- feels like errors are important to pay attention to!

  1. (related) Some code paths that were not utilized before are now exercised. As detailed in https://github.com/apollographql/apollo-link/issues/547, the onError link does not run properly with a WebSocketLink - therefore, the logic set up here that creates the red banner in the corner in this version of Dagit upon network failure does not render at all / is dead code in the websocket-only Dagit.
  1. Checked that moving HTTP to the SyncExecutor still works fine - triggered a run using the toys/many_events pipeline, kept the Gantt chart open for that in one tab (which requires a subscription), and then clicked around as many panes in dagit as I could. Everything was still snappy and loaded fine.

Diff Detail

Repository
R1 dagster
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

make websocket construction eager

Fixing webserver lock up by changing HTTP executor to be default SyncExecutor

sidkmenon edited the test plan for this revision. (Show Details)
sidkmenon edited the test plan for this revision. (Show Details)
sidkmenon edited the test plan for this revision. (Show Details)
sidkmenon edited the test plan for this revision. (Show Details)
sidkmenon edited the test plan for this revision. (Show Details)

This is behavior that is due to the difference between websockets & http - the websocket response does not have an 'error' field on the JS side so the components render differently. Not sure if throwing away the error in this case (to conform to the old behavior) makes sense though -- feels like errors are important to pay attention to!

Hmm - in the screenshot shown we show "GraphQLError - see console for details" but the issue is just that there is no connectivity right? I personally think thats a rough regression. Replacing the contents of the page a big error warning when a polling-refresh query fails due to network connectivity seems bad to me.

Im not sure the exact fix - but I do think we should distinguish "the server sent back an error because something bad happened" from "you couldn't reach the server"

The removal of the GEventExecutor

I am pretty sure this is fine to do - and is effectively replicating the way the all-websocket approach worked - but we should add a test cases section specifically about making sure htttp requests can go through while the active run event log subscription is firing data back. You can do this by navigating around while another tab is actively watching a run of many_events from the toyes repo or something.

This is behavior that is due to the difference between websockets & http - the websocket response does not have an 'error' field on the JS side so the components render differently. Not sure if throwing away the error in this case (to conform to the old behavior) makes sense though -- feels like errors are important to pay attention to!

Hmm - in the screenshot shown we show "GraphQLError - see console for details" but the issue is just that there is no connectivity right? I personally think thats a rough regression. Replacing the contents of the page a big error warning when a polling-refresh query fails due to network connectivity seems bad to me.

Im not sure the exact fix - but I do think we should distinguish "the server sent back an error because something bad happened" from "you couldn't reach the server"

Made a small fix to the loading screen to check for network errors vs. GraphQL errors. This is the new UI when the server dies:

Screen Shot 2021-05-06 at 11.46.07 AM.png (1×3 px, 760 KB)

sidkmenon edited the test plan for this revision. (Show Details)

No longer overrwriting runs page on network errors

The removal of the GEventExecutor

I am pretty sure this is fine to do - and is effectively replicating the way the all-websocket approach worked - but we should add a test cases section specifically about making sure htttp requests can go through while the active run event log subscription is firing data back. You can do this by navigating around while another tab is actively watching a run of many_events from the toyes repo or something.

Just did this - everything seemed snappy, I clicked around in all of the panes I could fine & everything loaded fine while a run of many_events was open in another tab.

JS looks fine to me.

pikachu

js_modules/dagit/packages/core/src/app/AppProvider.tsx
90

You can remove the React.useMemo usage here, we didn't /really/ even need it before. :)

sidkmenon marked an inline comment as done.

Removing React.useMemo on rootServerURI as per @dish's suggestion

I haven't been following the full saga of this diff closely, but what's here seems reasonable - any blockers left here now that the release has shipped?

This revision is now accepted and ready to land.May 7 2021, 1:42 PM

I support landing this to maximize soak time til next release

push_n_pray

js_modules/dagit/packages/core/src/ui/Loading.tsx
41–43

add a comment here

sidkmenon edited the test plan for this revision. (Show Details)

adding comment