Page MenuHomeElementl

add captured log manager interface
Needs ReviewPublic

Authored by prha on Jun 15 2021, 6:00 PM.

Details

Summary

Rather than add a legacy flag onto the existing ComputeLogManager (D7945),
this diff creates a separate interface for a CapturedLogManager class. The plan is to
do inheritance checks on the instance compute log manager and then begin issuing
warnings on instance init if the compute log manager does not implement the newer
interface.

The new API explicitly separates out the log file data and the log file metadata.

Test Plan

none

Diff Detail

Repository
R1 dagster
Lint
Lint Passed
Unit
No Test Coverage

Event Timeline

prha requested review of this revision.Jun 15 2021, 6:25 PM

Does this change imply that daemon output streams should get handled by the same InstanceLogManager as the step process output streams? Or will they be separately configurable?

I won't push on this beyond this suggestion, but thoughts on using this as an opportunity to change the name to something that captures how DagsterLogManager and this thing do different things? Ideas:

  • CapturedOutputManager
  • StandardStreamManager
  • StreamCaptureManager
  • CapturedStreamManager
python_modules/dagster/dagster/core/storage/instance_log_manager.py
11 ↗(On Diff #39575)

Should this be bytes type?

20 ↗(On Diff #39575)

Consider using the condensed NamedTuple syntax if users won't be instantiating this

class CapturedLogMetadata(NamedTuple):
    location: Optional[str]
    download_url: Optional[str]
42 ↗(On Diff #39575)

Add bool return type annotation?

81 ↗(On Diff #39575)

I'm not 100% grasping why this should go on the instance log manager. If I, for example, decide to implement an InstanceLogManager that stores the output streams to S3, how would I implement this method?

For the in-process logging proposal [1], here's the DAG I'm imagining:

image.png (1×1 px, 267 KB)

In that proposal, all logs go through the Python logging, so there's no choice about how to route them aside from the set of handlers that are included in the instance-level Python logging config.

[1] https://elementl.quip.com/BtFRAY5cWKWq/Logging-Requirements-Proposal

I think the biggest feedback I'm looking for is, is this a better approach than D7945?

The main difference is in how we approach backcompat. This diff introduces a new top-level API class (which we have to name either InstanceLogManager or CapturedLogManager). D7945 puts all the functionality on the existing ComputeLogManager but adds a is_legacy_api property so that we can migrate the API.

python_modules/dagster/dagster/core/storage/instance_log_manager.py
11 ↗(On Diff #39575)

Yep, good idea to change to bytes.

81 ↗(On Diff #39575)

Yeah, we don't need this if we're going through the Python logging route, will remove

update based on sryza comments

Going to defer to Alex on this one

a tough call, but by focusing on end user experience I think the seemless-for-most-users option of changing ComputeLogManager in place is the way to go

This revision now requires changes to proceed.Jul 2 2021, 9:06 PM
prha retitled this revision from RFC: add instance log manager interface to add captured log manager interface.Jul 8 2021, 8:56 PM
prha edited the summary of this revision. (Show Details)
python_modules/dagster-graphql/dagster_graphql/implementation/execution/__init__.py
173–219

+ comments

python_modules/dagster/dagster/core/events/__init__.py
269–339

not for this diff - but we should rename these for clarity

  • report_and_log_from_xxx
  • log_and_report_from_xxx
  • some better name
1121–1153

these first two log directly, and this last case doesn't, should probably report directly here instead of doing it at the callsite [1]

python_modules/dagster/dagster/core/execution/compute_logs.py
172–175

mypy & comment

python_modules/dagster/dagster/core/executor/multiprocess.py
93

kwargs for clarity (mostly for what is None)

93–94

[1]

python_modules/dagster/dagster/core/instance/__init__.py
1217

mypy

python_modules/dagster/dagster/core/storage/captured_log_manager.py
35–36

from the compute steps of pipeline solids

this should be changed right?

to your queue for bunch of thoughts

python_modules/dagster/dagster/core/storage/captured_log_manager.py
10–19

maybe "chunk" or "slice" to indicate its a piece of the full logs

29

what exactly is location?

43–83

more docs here could be useful

47–65

do we anticipate this ever working except when the file is local ? should the names communicate as much?

67–77

is there a better name than _metadata ?

79–80

what is this exactly?

This revision now requires changes to proceed.Jul 13 2021, 10:25 PM

added comments, renamed some methods, dropped is_enabled

i think docstrings on the base class would make this easier to navigate

e.g. describing the lifecycle