Page MenuHomeElementl

RFC: add new api hooks for compute log manager to capture logs outside of a pipeline context, support process-based capture
AbandonedPublic

Authored by prha on May 17 2021, 8:32 PM.
Tags
None
Referenced Files
Unknown Object (File)
May 1 2023, 4:53 AM
Unknown Object (File)
Apr 23 2023, 2:03 AM
Unknown Object (File)
Apr 12 2023, 3:07 PM
Unknown Object (File)
Mar 20 2023, 5:54 AM
Unknown Object (File)
Mar 10 2023, 7:20 PM
Unknown Object (File)
Feb 21 2023, 7:56 PM
Unknown Object (File)
Feb 17 2023, 6:12 AM
Unknown Object (File)
Feb 9 2023, 1:51 PM
Subscribers
None

Details

Summary

Right now the compute log manager has an API that's tied to a pipeline run. This feels a little awkward, and closes us off from capturing stdout/stderr for other things (like daemon processes).

This diff introduces a handful of more generic APIs (capture_logs, get_logs) that the existing APIs (watch, read_logs_file) could be based on.

This diff also adds some call sites / flags to the API to toggle between step-based capture and process-based capture. The default implementation maintains existing behavior, which captures stdout/stderr for each individual step. To fully support process-based capture, we would need to modify all the executors and all the existing compute log managers.

I could decouple the two changes (step vs process based run capture, run-based API vs generic API), but wanted to get early thoughts.

Test Plan

bk

Diff Detail

Repository
R1 dagster
Lint
Lint Passed
Unit
No Test Coverage

Event Timeline

fix subscription properties, lint

prha retitled this revision from add new api hooks for compute log manager to capture logs outside of a pipeline context to RFC: add new api hooks for compute log manager to capture logs outside of a pipeline context, support process-based capture.May 18 2021, 4:27 PM
prha edited the summary of this revision. (Show Details)
python_modules/dagster/dagster/core/events/__init__.py
1133

mypy

I think its worth bias-ing towards not =Noneing stuff out

python_modules/dagster/dagster/core/execution/compute_logs.py
177–190

hmm this feels bad - gotta be something better we can do [1]

python_modules/dagster/dagster/core/executor/multiprocess.py
87–91

[1]

python_modules/dagster/dagster/core/storage/compute_log_manager.py
34–41

use nice typing.NamedTuple style if this isnt a serdes / user constructed object

class ComputeLogData(NamedTuple):
        stdout: ComputeLogFileData
        stderr:  ComputeLogFileData
        cursor: Optional[str]
60–61

yikes

python_modules/dagster/dagster/core/execution/compute_logs.py
177–190

I'm not sure what else to do... this was the part of the executor that I felt least comfortable with.

python_modules/dagster/dagster/core/storage/compute_log_manager.py
60–61

Yeah, I don't know the best way of introducing this without immediately breaking anyone who's written their own implementation (not even sure if anyone has done this).

My goal was to first introduce this, start setting warnings in 0.12.0 and remove in 0.13.0.

python_modules/dagster/dagster/core/execution/compute_logs.py
177–190

Is the part that feels bad that we're manually constructing the event record? DagsterInstance.report_engine_event does the same thing, but I opted for doing it here rather than introducing a new API method on DagsterInstance.

python_modules/dagster/dagster/core/storage/compute_log_manager.py
34–41

or.... dataclasses?

60

@property might be clearer

60–61

could break compat by deprecating ComputeLogManager and introducing a new abc instead

87

vs leaving it as an abstract method and raising the NotImplementedError in concrete subclasses that in fact don't implement its semantics?

153

actually find it confusing which parts of the API are legacy and which are new in this -- we should at least have comments clearly marking them

python_modules/dagster/dagster/core/execution/compute_logs.py
178

why the deferred import?

rebase and take a pass on initial feedback?

This revision now requires changes to proceed.Jun 8 2021, 8:17 PM