Page MenuHomePhabricator

Naive watcher for filesystem log storage
AbandonedPublic

Authored by max on Aug 9 2019, 6:39 PM.

Details

Reviewers
alangenfeld
Group Reviewers
Restricted Project
Summary

This can & should be optimized a little (so as not to scan the entirety of a log file when it changes), and should probably be quieter about transient errors/do retries when a file can't be read/implement a file locking system.

Test Plan

Unit

To demo: Launch dagit two different ports with --log. Execute the log spew pipeline from one dagirt. Navigate to the run in the second dagit and observe logs streaming.

Diff Detail

Repository
R1 dagster
Branch
logsink
Lint
Lint WarningsExcuse: wip
Unit
No Unit Test Coverage

Event Timeline

max created this revision.Aug 9 2019, 6:39 PM
max retitled this revision from Wip to Naive watcher for filesystem log storage.Aug 16 2019, 10:08 PM
max edited the summary of this revision. (Show Details)
max edited the test plan for this revision. (Show Details)
max added reviewers: alangenfeld, Restricted Project.

we could land this after a bit of cleanup - but i think im tempted to just have this feed in to a more systemic refactor of all this PipelineRun stuff.

just spit balling -

  • I wonder if we could simplify things if we operated over a temporary directory in the "in memory" case instead of holding on to everything in memory all the time. This could leave us with basically one implementation that is either fed $DAGSTER_HOME or a temp dir.
  • i could imagine a taxonomy of something like
    • CompletedPipelineRun - whether it came from my process or another a completed it should be treated the same
    • ActivePipelineRun - a run being filled out by the current process
    • ExternalPipelineRun - a run we are watching the FS for updates on - tracks the file seek position so we only process new

      I think a setup like this could help
python_modules/dagster-graphql/dagster_graphql/implementation/pipeline_run_storage.py
371–372

we should at minimum keep track of seek position or something and not re-process the whole file every time

max updated this revision to Diff 3794.Aug 16 2019, 11:14 PM

Fixup

ideally we also wouldn't even watch the log files for external runs unless we had an active subscription on them

alangenfeld requested changes to this revision.Aug 19 2019, 6:40 PM

to your queue for restacking on some lower level clean ups

This revision now requires changes to proceed.Aug 19 2019, 6:40 PM
max abandoned this revision.Aug 21 2019, 9:12 PM