Page MenuHomePhabricator

Naive watcher for filesystem log storage
AbandonedPublic

Authored by max on Fri, Aug 9, 6:39 PM.

Details

Reviewers
alangenfeld
Group Reviewers
Restricted Project
Summary

This can & should be optimized a little (so as not to scan the entirety of a log file when it changes), and should probably be quieter about transient errors/do retries when a file can't be read/implement a file locking system.

Test Plan

Unit

To demo: Launch dagit two different ports with --log. Execute the log spew pipeline from one dagirt. Navigate to the run in the second dagit and observe logs streaming.

Diff Detail

Repository
R1 dagster
Branch
logsink
Lint
Lint OK
Unit
No Unit Test Coverage

Event Timeline

max created this revision.Fri, Aug 9, 6:39 PM
max retitled this revision from Wip to Naive watcher for filesystem log storage.Fri, Aug 16, 10:08 PM
max edited the summary of this revision. (Show Details)
max edited the test plan for this revision. (Show Details)
max added reviewers: alangenfeld, Restricted Project.

we could land this after a bit of cleanup - but i think im tempted to just have this feed in to a more systemic refactor of all this PipelineRun stuff.

just spit balling -

  • I wonder if we could simplify things if we operated over a temporary directory in the "in memory" case instead of holding on to everything in memory all the time. This could leave us with basically one implementation that is either fed $DAGSTER_HOME or a temp dir.
  • i could imagine a taxonomy of something like
    • CompletedPipelineRun - whether it came from my process or another a completed it should be treated the same
    • ActivePipelineRun - a run being filled out by the current process
    • ExternalPipelineRun - a run we are watching the FS for updates on - tracks the file seek position so we only process new

      I think a setup like this could help
python_modules/dagster-graphql/dagster_graphql/implementation/pipeline_run_storage.py
416–417

we should at minimum keep track of seek position or something and not re-process the whole file every time

max updated this revision to Diff 3794.Fri, Aug 16, 11:14 PM

Fixup

ideally we also wouldn't even watch the log files for external runs unless we had an active subscription on them

alangenfeld requested changes to this revision.Mon, Aug 19, 6:40 PM

to your queue for restacking on some lower level clean ups

This revision now requires changes to proceed.Mon, Aug 19, 6:40 PM
max abandoned this revision.Wed, Aug 21, 9:12 PM