Page MenuHomePhabricator

RFC: composable type loaders and materializers
Needs ReviewPublic

Authored by schrockn on Tue, Nov 17, 1:30 PM.

Details

Summary

This is not urgent and not even a real RFC. More of an
exploration. I wanted to put something out there to solve what I deemed
the "composability problem" in the type system. The thesis is that in
reality our types and all the attached artitfacts are actually just
business logic and quite application-specific. Therefore we should
providde APIs to build types rather than to have prefabricated types.

The first step is making a layer and allows one to compose type loaders
and materializers out of individual entries. This is cleaner as you no
longer have to manage the construction of the Selector and a monolithic
config processing function. More interestingly I think this will allow
things like cross-cutting changies to loaders and materializer
behaviors. E.g. having pre-built entries for interacting with asset
stores or filtering out entries based on mode.

This also plays with the idea of having a "builder" api that builds
types by annotating the core typecheck fn.

Test Plan

Bk

Diff Detail

Repository
R1 dagster
Branch
composable-types
Lint
Lint OK
Unit
No Unit Test Coverage

Event Timeline

Harbormaster returned this revision to the author for changes because remote builds failed.Tue, Nov 17, 1:48 PM
Harbormaster failed remote builds in B21256: Diff 25784!
schrockn edited the summary of this revision. (Show Details)
schrockn edited the test plan for this revision. (Show Details)
schrockn added reviewers: alangenfeld, sandyryza.

up

I think the thrust of this is purely good, exposing this capability of building up from a list of loader entries.
Open questions

  • TypeBuilder as separate versus just adding loader_entries on DagsterType
  • making it easy to build ScalarUnion[T, Selector] in the case where a direct value also makes sense

I strongly agree with the thinking behind this. Agree with Alex that this is a strict improvement over what we have.

A few thoughts:

  • I think it's worth thinking through whether AssetStores can subsume loaders and materializers. (Not something I have developed an opinion on.)
  • I think a lot of the need for selectors in materializers and loaders comes from the fact that we try to provide out-of-the-box materializers and loaders, which necessarily need to be flexible. I would suspect that, if users were expected to write their own application-specific, those often would not need selectors.
  • Did you consider implementing this via composite materializers / loaders? I.e. something like my_materializer = composite_materializer({"parquet": parquet_materializer, "csv": csv_materializer}). Would avoid introducing new concepts. Misses out on the magic of using the entry function names as the selector keys, but less magic can be a good thing.