Page MenuHomePhabricator

Fix race condition in windows compute log tail process
ClosedPublic

Authored by dgibson on Tue, Nov 10, 3:49 PM.

Details

Summary

If the tail process termination call here happened before the compute log process started, it would never terminate. Address that by having the tail process signal to the calling process once it's started, and waiting until that happens before you terminate.

Also during termination, never actually raise a KeyboardInterrupt (that will pollute the logs) - just use it as a signal to return once there are no more logs.

Test Plan

Run test_ephemeral_event_log 50 times on Azure, now passing reliably

Diff Detail

Repository
R1 dagster
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

prha added inline comments.
python_modules/dagster/dagster/core/execution/poll_compute_logs.py
38–42

this is nice... so much better.

was really struggling to figure out how to defer killing the thread until all the output had been printed.

62

We're relying on tail_polling to return normally here ... .what happens if you raise afterwards?

This revision is now accepted and ready to land.Tue, Nov 10, 4:18 PM
python_modules/dagster/dagster/core/execution/poll_compute_logs.py
62

Hmm yeah I don't think we need this actually

nit: using the ipc machinery feels heavy here and get_system_temp_directory doesn't get automatically cleaned up, one option would be to use TemporaryDirectory + uuid and just touch a file / wait for it to exist.

Related - do we want the same failure semantics or latency as the ipc setup for timeouts?

very fair - do that instead