Page MenuHomePhabricator

object managers docs overview
ClosedPublic

Authored by sandyryza on Dec 10 2020, 12:30 AM.

Details

Summary

This is mostly just renaming from Asset Store -> Object Manager

I added a section on how to pass per-output config to ObjectManagers.

I also added high-level explanation of the relationship between the different manager types. Here's an issue to track explaining InputManagers with examples like we have for ObjectManager: https://github.com/dagster-io/dagster/issues/3383.







Test Plan

manual inspection

Diff Detail

Repository
R1 dagster
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Harbormaster returned this revision to the author for changes because remote builds failed.Dec 10 2020, 12:46 AM
Harbormaster failed remote builds in B22538: Diff 27399!
Harbormaster returned this revision to the author for changes because remote builds failed.Dec 10 2020, 1:51 AM
Harbormaster failed remote builds in B22539: Diff 27400!
Harbormaster returned this revision to the author for changes because remote builds failed.Dec 10 2020, 4:12 PM
Harbormaster failed remote builds in B22552: Diff 27414!
sandyryza edited the summary of this revision. (Show Details)

My comments are focused on content order and the info flow.

  1. when I was at the last section "Providing per-output config to the object manager", I had to go back to the previous section "Providing a custom object manager" to understand if I can only set config on my custom object manager - also it kinda feels magic that @object_manager can take the config schema without explaining the decorator [1].
  2. "Selecting an object manager per output" doesn't necessarily require custom object managers. So maybe we could re-org it to be:
    1. Setting a pipeline-wide object manager
    2. Selecting an object manager per output (switched with C) - and see suggestion in [2]
    3. Providing a custom object manager
    4. Providing per-output metadata to the object manager
    5. Providing per-output config to the object manager
  3. [3] is an appealing value prop. maybe we can write some pseudo code snippet to highlight it
docs/next/src/pages/overview/object-managers/object-managers.mdx
27

[3]

56

[1] did not mention @object_manager decorator anywhere but it is in code snippet

66

[2] the scenario could be "Maybe you only want to persist some of the outputs to your local filesystem, and leave others as in memory", then we could pass in the built-in mem and fs object manager to avoid introducing custom object manager before this section.

examples/docs_snippets/docs_snippets/overview/object_managers/metadata.py
42

other two places are using @object_manager instead of @resource

What is your plan to introducing things like root inputs or subselection cases? Interested to hear what your proposed rhetoric will be there too

docs/next/src/pages/overview/asset-materializations/index.mdx
19โ€“21

This is interesting. These descriptions are accurate but I think we need to come up with better language to describe this. For example, "restrict" I think has a subtext that the pure business logic variant is worse, when it is our idealized model for writing these things.

docs/next/src/pages/overview/object-managers/object-managers.mdx
11

Similar point to above, I think we can make the point that if solids were designed to be logical operations this problem needs to be solved.

25

We might want to take the opportunity to make this default different btw. That one has to opt-in config to make basic features of dagit work is a huge bummer.

31โ€“39

I think this needs some lead in text explaining that an object manager are really just a combined input manager and an output manager, in addition to the venn diagram.

Something like: "Object managers control the behavior of both loading inputs and handling outputs in one resource. One can also write a manager that only controls inputs or only controls outputs in independent resources."

then at the end to drive it home after the venn diagram.

"Code reflects this diagram directly: An object manager is an interface that implements the input manager and output manager interface."

class ObjectManager(InputManager, OutputManager):
   pass
78โ€“80

I think this might be an opportunity to explain why this wouldn't go on config or even configured

"In this case the table name will never change at run-time, so you do not want to specify it via configuration. You want to attach the name to the definition itself."

This ties into the per-output config section as well, so maybe its worth listing out all the use cases.

"Selecting an object manager per output" doesn't necessarily require custom object managers. So maybe we could re-org it to be:

๐Ÿ‘

What is your plan to introducing things like root inputs or subselection cases? Interested to hear what your proposed rhetoric will be there too

Added examples for these in the latest rev.

docs/next/src/pages/overview/asset-materializations/index.mdx
19โ€“21

Good point. I'm changing "restrict to" to "focus on". Let me know if you had something else in mind.

docs/next/src/pages/overview/object-managers/object-managers.mdx
11

I added a short paragraph on this.

25

Which basic dagit features do you have in mind?

31โ€“39

๐Ÿ‘

56

Good point - I'll add.

examples/docs_snippets/docs_snippets/overview/object_managers/metadata.py
42

oops - thanks for catching

sandyryza marked an inline comment as done.

up

This is looking great from an overall read. I'd love to be able to read a live previewing. Can we do that by pushing up a PR? Or is that only from the dagster-website repo?

docs/next/src/pages/overview/object-managers/object-managers.mdx
25

I should say execution features, of which dagit is one surface area. Re-execution, incremental execution is the most notable.

I'd love to be able to read a live previewing.

I chatted about this with @sashank a few weeks back - https://dagster.slack.com/archives/C012YCX5P8X/p1606328937007400. I believe that right now, the best you can do is arc patch / cd docs / make buildnext / cd next / yarn dev.

docs/next/src/pages/overview/object-managers/object-managers.mdx
25

Made a thread for this: https://threads.com/34390195611

I'd love to be able to read a live previewing.

Yeah easiest thing to do is arc patch for now. We'll have this functionality in our 0.10.0 docs site

I think this is excellent. I have some fairly minor comment which I'd love to see considered. Please check them out prior to landing.

docs/next/src/pages/overview/object-managers/object-managers.mdx
29

No space between S3 and ObjectManager

39

could be at the end of subselection here. not sure how you want to handle that case throughout this guide

54

all the inputs and outputs

64

consider: object store vs data warehouse might be more sense

or: parquet versus json in an object store?

78โ€“80

name more descriptive than MyObjectManager might be helpful

84

which inherits from resource definition? just to be clear/specific

91

maybe add: you don't want to ever change the metadata, so it makes more sense to attach it to a definition rather than expose it as a configuration option

95

i wonder if we are going to regret having a top-level metadata grabbag. thinking about reusable managers and collisions here

133

So I think we should emphasize input managers that handle both upstreams and the root case as well. I would incline towards introduce those first

168

Ah you get there here. I would have expected this first, but its all a qualitative judgement of what you think is more common

This revision is now accepted and ready to land.Dec 14 2020, 8:34 PM
sandyryza added inline comments.
docs/next/src/pages/overview/object-managers/object-managers.mdx
64

The prior version actually had that. @yuhan suggested this route, because it enables introducing "different object managers on different outputs" separately from "custom object manager". Agree that your suggestions and reflect more common use cases, but I am receptive to the value of separating the explanations of these capabilities, so am overall on the fence - open to reverting if you feel strongly.

95
168

My gut is that the root case is likely to be a little more common, but open to being convinced.

Separately though, a nice thing about introducing the root case first is that there's less going on that the reader needs to grok. It's fewer lines of code, and the reader doesn't need to yet reach the enlightenment that the the same definition can be used for both InputDefinition.manager_key and OutputDefinition.manager_key.

docs/next/src/pages/overview/object-managers/object-managers.mdx
64

maybe s3_object_manager vs gcs_object_manager for a pipeline that is pulling data from different sources. these two are system provided so don't need to explain "custom object manager"

docs/next/src/pages/overview/object-managers/object-managers.mdx
64

@yuhan I think that is a pretty cool idea!

This revision was automatically updated to reflect the committed changes.