Page MenuHomeElementl

add tags schema to store, fetch tags
ClosedPublic

Authored by prha on Mar 5 2021, 1:54 AM.

Details

Summary

Changes the event log schema to add an asset_tags table.

Each tag is tied to the asset_key, as well as a "source" column, (just in case we want to distinguish between materialization-based tags and dagit-based tags).

Also includes some refactoring of the storage and instance APIs to take in tags.

Note: this change will break ursula, so after it clears review, it should be paired with a change to the base implementation there

Not included in this diff:

  • an experimental API on the asset materialization event to take tag values
  • graphql arg changes to query assets by tags
  • Any UI
Test Plan

bk

Diff Detail

Repository
R1 dagster
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

Harbormaster returned this revision to the author for changes because remote builds failed.Mar 5 2021, 2:26 AM
Harbormaster failed remote builds in B26893: Diff 32878!

handle foreign keys in mysql

prha requested review of this revision.Mar 5 2021, 6:41 PM
prha added reviewers: owen, sandyryza.

This will issue a delete whenever someone logs an asset materialization - right? Should we be concerned about that from a perf perspective? Simultaneous runs might be competing for database locks?

python_modules/dagster/dagster/core/storage/event_log/base.py
108–116

Can these type annotations include the key and value types? E.g. Dict[str, str]?

update types, add index to table to limit table locks on delete

spoke async about this approach - seems like there was general agreement in not adding a new table and instead just adding a latest_tags column to the existing asset key table. Motivations being:

  • less tables - less problems
  • pretty sure runs tags table is problematic so dont want to mimic that structure without resolving
  • migration out of using existing table v straight forward
This revision now requires changes to proceed.Mar 15 2021, 11:36 PM

rebase, update to mutate asset_keys table instead of creating a new asset tags table

added a column to store last materialization instead of just tags....

python_modules/dagster/dagster/core/storage/event_log/schema.py
34

the run id is not in the materialization is it? would it be useful to have that too?

python_modules/dagster/dagster/core/storage/event_log/sql_event_log.py
571–632

this function is wild

python_modules/libraries/dagster-postgres/dagster_postgres_tests/compat_tests/test_back_compat.py
217

redfordnod

python_modules/dagster/dagster/core/storage/event_log/sql_event_log.py
571–632

yeah, I think I'm going to kill the secondary migration code (according to https://github.com/dagster-io/dagster/issues/3507), but will do that in a separate diff.

I'm probably going to kill the prefix code also (and do it client side), which was added in panic back in Sep/Oct when the asset catalog perf was really bad (before the asset_keys table).

rename, add last_run_id column

This revision is now accepted and ready to land.Mar 16 2021, 7:35 PM
This revision was automatically updated to reflect the committed changes.