Page MenuHomeElementl

Python API for specifying Asset/Output dependencies
ClosedPublic

Authored by owen on Feb 16 2021, 10:42 PM.

Details

Summary

just the python API part of D6369

Test Plan

pytest

Diff Detail

Repository
R1 dagster
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
python_modules/dagster/dagster/core/events/__init__.py
1030

Is the type of parent_assets List[AssetPartitions]? It could be helpful to be extremely explicit and have this variable called parent_asset_keys_and_partitions.

python_modules/dagster/dagster/core/execution/plan/execute_step.py
464

This function is getting pretty busy. It could be clearer to factor out all the asset stuff into a a separate function.

499

Nitpick: we prefer to avoid these kinds of abbreviations.

507

Nitpick: use f-string instead of .format

Harbormaster returned this revision to the author for changes because remote builds failed.Feb 23 2021, 11:32 PM
Harbormaster failed remote builds in B26349: Diff 32202!
owen marked 6 inline comments as done.
  • renamed AssetPartitions to AssetRelation
  • partitions specified by users are now specified as sets instead of lists
  • moved AssetRelation derivation into the StepInputSource, hiding AssetRelations from the public eye a bit better
owen marked 2 inline comments as done.Feb 24 2021, 1:16 AM
owen added inline comments.
python_modules/dagster/dagster/core/events/__init__.py
1030

along with renaming AssetPartitions to AssetRelation (in anticipation of adding more information to the AssetRelation class in the near future, such as what solid it came from, what IO Manager if any, etc.), I renamed this to parent_asset_relations.

python_modules/dagster/dagster/core/execution/plan/execute_step.py
372–373

I can delete this function

python_modules/dagster/dagster/core/execution/plan/inputs.py
124

now that I think about this, I don't see why this needs to be defined within the class itself (could go outside)

python_modules/dagster/dagster/core/definitions/events.py
554

will DynamicOutput not take in PartitionSpecificMetadataEntry also?

python_modules/dagster/dagster/core/definitions/input.py
162

I think this feels harder to reason about than I'd like...

We're returning a callable as a property so that it acts like a method call?

Is this to avoid actually exposing a method that curries the argument to the parameterized function?

I think my preference would be to rename the property to be asset_key_fn so it doesn't feel like a method.

Harbormaster returned this revision to the author for changes because remote builds failed.Feb 24 2021, 2:17 AM
Harbormaster failed remote builds in B26371: Diff 32226!
owen marked 2 inline comments as done.
  • removed some confusion around return types of get_asset_key() functions in input/output definitions
  • allow DynamicOutputs to have PartitionSpecificMetadataEntries
  • fixed some return types
Harbormaster returned this revision to the author for changes because remote builds failed.Feb 24 2021, 8:29 PM
Harbormaster failed remote builds in B26395: Diff 32254!

I started going through this with a finer-tooth comb. Thanks for bearing with me on all the nitpicks.

python_modules/dagster-test/dagster_test/toys/asset_lineage.py
2

The most common use cases for assets will be tables. Also, we'll probably end up wanting to use the toy examples in docs screenshots explaining the asset catalog and asset lineage.

In light of that, I'd like to advocate for making the asset toy examples straightforward table examples. I.e. the asset keys should be standard table names like "orders", "items", "events", or "users", and the solid names should be in the vein of "make_events_table" etc.

python_modules/dagster/dagster/check/__init__.py
205 ↗(On Diff #32254)

It took me a little while to parse these names. It would aid clarity to do at least one of:

  • Add docstring.
  • Change the invocation to something like coerce_callable(callable_or_inst_param(arg)).
python_modules/dagster/dagster/core/definitions/events.py
136

I think this is an improvement, but I'm not entirely sold on this name. Something I learned recently is that, at least in databases, a relation doesn't mean "a relationship between two tables" or "a pointer", but rather just means "a table". https://stackoverflow.com/questions/4045744/what-is-a-relation-in-database-terminology.

Of course, also very possible that I'm missing the meaning. What do you have in mind?

479–495

I'm trying to think of ways we can make this name shorter. Maybe PartitionMetadataEntry or PartitionEventMetadataEntry?

Also, did you explore making this a subclass of EventMetadataEntry? Or just including an optional partition field on EventMetadataEntry?

514–516

We should make metadata_entries experimental for now. Also, would be helpful to include docstring for it.

python_modules/dagster/dagster/core/definitions/input.py
70

Nitpick: add space before open paren

112

Nitpick: can do check.invariant(asset_key, <message>) instead of if / failed

159

Do we need a separate variable for this? Could we just do return self._asset_key_fn is not None?

172

Not high priority for now, but we might want to consider more specific language along the lines of "Get the set of partitions that the solid will read for this input". I.e. users might want to understand a little more concretely what it means for partitions (or assets) to be "associated" with an input.

176

InputContext

177

Indent

python_modules/dagster/dagster/core/execution/plan/execute_step.py
318

nit: avoid first person in comments

384

Use f-string

472

f-string is preferred

python_modules/dagster/dagster/core/execution/plan/execute_step.py
499

Hmm, looks like this comment got displaced, but was referring to "for p in relation.partitions:" etc.

python_modules/dagster/dagster/core/definitions/input.py
159

self._asset_key_fn will actually never be None as currently defined (that check.coerce_callable stuff really is confusing 😛 ). It can either be a function that returns an AssetKey or a function that returns None, but it will never be None itself. I'll refactor things to make that more clear.

162

I thought about it a bit more and there really isn't a great reason to do it like this. I can just turn these into regular functions that take in context.

owen marked 15 inline comments as done.
  • added support for transitive lineage
  • linted
  • cleaned up some parts of the code, extended interface ideas to InputDefinition
  • renamed AssetPartitions to AssetRelation
  • partitions specified by users are now specified as sets instead of lists
  • moved AssetRelation derivation into the StepInputSource, hiding AssetRelations from the public eye a bit better
  • removed some confusion around return types of get_asset_key() functions in input/output definitions
  • allow DynamicOutputs to have PartitionSpecificMetadataEntries
  • fixed some return types
  • cleaned up/renamed in response to review
Harbormaster returned this revision to the author for changes because remote builds failed.Mar 5 2021, 2:01 AM
Harbormaster failed remote builds in B26889: Diff 32869!
  • debugging airline demo
Harbormaster returned this revision to the author for changes because remote builds failed.Mar 5 2021, 6:26 PM
Harbormaster failed remote builds in B26915: Diff 32908!
Harbormaster returned this revision to the author for changes because remote builds failed.Mar 5 2021, 7:57 PM
Harbormaster failed remote builds in B26926: Diff 32919!
Harbormaster failed remote builds in B26927: Diff 32920!
Harbormaster returned this revision to the author for changes because remote builds failed.Mar 5 2021, 8:15 PM
Harbormaster failed remote builds in B26929: Diff 32922!
Harbormaster returned this revision to the author for changes because remote builds failed.Mar 5 2021, 9:31 PM
Harbormaster failed remote builds in B26936: Diff 32932!
Harbormaster returned this revision to the author for changes because remote builds failed.Mar 5 2021, 10:01 PM
Harbormaster failed remote builds in B26941: Diff 32939!
Harbormaster returned this revision to the author for changes because remote builds failed.Mar 5 2021, 10:18 PM
Harbormaster failed remote builds in B26942: Diff 32940!
  • removed transitive lineage, updated examples
Harbormaster returned this revision to the author for changes because remote builds failed.Mar 9 2021, 9:37 PM
Harbormaster failed remote builds in B27064: Diff 33091!
  • added support for transitive lineage
  • linted
  • cleaned up some parts of the code, extended interface ideas to InputDefinition
  • renamed AssetPartitions to AssetRelation
  • partitions specified by users are now specified as sets instead of lists
  • moved AssetRelation derivation into the StepInputSource, hiding AssetRelations from the public eye a bit better
  • removed some confusion around return types of get_asset_key() functions in input/output definitions
  • allow DynamicOutputs to have PartitionSpecificMetadataEntries
  • fixed some return types
  • cleaned up/renamed in response to review
  • debugging airline demo
  • still debugging
  • removed transitive lineage, updated examples
  • fixed tests
owen requested review of this revision.Mar 9 2021, 10:00 PM
This revision is now accepted and ready to land.Mar 10 2021, 12:14 AM