This can & should be optimized a little (so as not to scan the entirety of a log file when it changes), and should probably be quieter about transient errors/do retries when a file can't be read/implement a file locking system.
- Group Reviewers
To demo: Launch dagit two different ports with --log. Execute the log spew pipeline from one dagirt. Navigate to the run in the second dagit and observe logs streaming.
we could land this after a bit of cleanup - but i think im tempted to just have this feed in to a more systemic refactor of all this PipelineRun stuff.
just spit balling -
- I wonder if we could simplify things if we operated over a temporary directory in the "in memory" case instead of holding on to everything in memory all the time. This could leave us with basically one implementation that is either fed $DAGSTER_HOME or a temp dir.
- i could imagine a taxonomy of something like
- CompletedPipelineRun - whether it came from my process or another a completed it should be treated the same
- ActivePipelineRun - a run being filled out by the current process
- ExternalPipelineRun - a run we are watching the FS for updates on - tracks the file seek position so we only process new
I think a setup like this could help
we should at minimum keep track of seek position or something and not re-process the whole file every time