Page MenuHomePhabricator

add hooks to route standard out/err from a compute step to a file
ClosedPublic

Authored by prha on Tue, Aug 27, 11:16 PM.

Details

Summary

Creates a contextmanager to wrap any compute function execution and
routes all streams to the filesystem if configured. By default, we store all
stdout/stderr logs to a subdirectory within the $DAGSTER_HOME directory. All of
the existing behavior is maintained if $DAGSTER_HOME is not set.

Test Plan

Adds a toy example that allows both the multiprocess and inprocess
execution engines to pipe stdout/stderr to disk.

Diff Detail

Repository
R1 dagster
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

prha created this revision.Tue, Aug 27, 11:16 PM
prha updated this revision to Diff 4054.EditedTue, Aug 27, 11:37 PM

reorganize

prha added a reviewer: Restricted Project.Tue, Aug 27, 11:43 PM
prha updated this revision to Diff 4060.Wed, Aug 28, 1:05 AM

add test

prha updated this revision to Diff 4061.Wed, Aug 28, 3:44 PM

add missing mock dep

prha updated this revision to Diff 4062.Wed, Aug 28, 4:10 PM

sync mock dependency

prha updated this revision to Diff 4064.Wed, Aug 28, 4:17 PM

sync mock version with dagster-cron lib

prha updated this revision to Diff 4067.Wed, Aug 28, 4:30 PM

fix dep

natekupp accepted this revision.Wed, Aug 28, 4:32 PM
natekupp added a subscriber: natekupp.

A few comments, otherwise lgtm!

python_modules/dagster/dagster/core/execution/logs.py
23

do we want to log anything in this case? seems like a debug log at least would be good

32

probably clearer to leave as-is, but otherwise contextlib2.ExitStack might be interesting here

51

os.path.join because we love windows

python_modules/dagster/dagster/utils/__init__.py
352

does this work on py2? do we have a test case covering this function?

This revision is now accepted and ready to land.Wed, Aug 28, 4:32 PM
prha added inline comments.
python_modules/dagster/dagster/core/execution/logs.py
23

I think logging here would be pretty spammy since this will be executed once for every solid. Maybe when @alangenfeld comes up with a backup plan for the filesystem abstraction, we would just fall back on that?

51

good catch, will fix

python_modules/dagster/dagster/utils/__init__.py
352

Yeah, eerno.EEXIST is the py2 sanctioned way of handling this... FileExistsError was introduced in py3.3.

I'll add an explicit test for this

prha updated this revision to Diff 4068.Wed, Aug 28, 5:30 PM

add ensure_dir test, use os.path.join

prha added a comment.EditedWed, Aug 28, 5:45 PM

One main piece of feedback I'm looking for is: does this directory structure make sense?

$DAGSTER_HOME/
  logs/
    compute/
      <run_id>/
        <step_key>.out
        <step_key>.err
prha added a comment.Wed, Aug 28, 7:58 PM

the latest build failure was spurious (connection reset). kicking buildkite made it succeed (https://buildkite.com/dagster/dagster/builds/4402)

In D893#21127, @prha wrote:

One main piece of feedback I'm looking for is: does this directory structure make sense?

$DAGSTER_HOME/
  logs/
    compute/
      <run_id>/
        <step_key>.out
        <step_key>.err

Should we swap run_id and compute? Imagine most times you'd want to explore based on run_id at the top level, but I guess it depends on what other things we'd have besides compute

alangenfeld added inline comments.Wed, Aug 28, 8:13 PM
examples/dagster_examples/__init__.py
40

it would be nice to have a name that more clearly differentiates from log_spew, maybe this should be stdout_spew or something along those lines

python_modules/dagster/dagster/core/execution/logs.py
23

maybe tag this with a github issue or something so we dont lose track of it, but ya the StorageEngine will be able to give you a dagster home dir or temp dir location to put stuff

66–79

should we mirror instead of redirect?

python_modules/dagster/dagster/core/storage/event_log.py
136

;)

directory structure make sense?

this is another thing that might be best to defer worrying too much about until we have a single place that controls all of it

prha updated this revision to Diff 4074.Wed, Aug 28, 9:21 PM

update

prha updated this revision to Diff 4078.Wed, Aug 28, 10:12 PM

gate stdout redirect behind feature flag

prha added a comment.Wed, Aug 28, 10:14 PM

as discussed IRL, we're gating behind a feature flag and looking into mirroring solutions (https://github.com/dagster-io/dagster/issues/1699)

prha updated this revision to Diff 4092.Wed, Aug 28, 11:28 PM

mock out feature flags

prha retitled this revision from RFC: add hooks to route standard out/err from a compute step to a file to add hooks to route standard out/err from a compute step to a file.Thu, Aug 29, 12:00 AM