Page MenuHomePhabricator

RFC: Make default sharded sqlite implementation asset-aware
Needs ReviewPublic

Authored by prha on Tue, Jan 12, 10:28 PM.



Was going to try to do a migration of the default sqlite storage to the
ConsolidatedSqliteEventLogStorage, but then thought about this approach, which seems like it might
work and is way less complicated.

Different approaches weighed here:

It essentially mirrors all asset materialization events into a central assets.db sqlite file and
performs asset queries off of that db.

Test Plan


Diff Detail

R1 dagster
Lint OK
No Unit Test Coverage

Event Timeline

Harbormaster returned this revision to the author for changes because remote builds failed.Tue, Jan 12, 10:45 PM
Harbormaster failed remote builds in B24199: Diff 29438!
prha edited the summary of this revision. (Show Details)

clear out mirrored event rows in asset shard on asset wipe

Harbormaster returned this revision to the author for changes because remote builds failed.Tue, Jan 12, 11:57 PM
Harbormaster failed remote builds in B24204: Diff 29445!
prha requested review of this revision.Wed, Jan 13, 12:34 AM

Mechanically this is a pretty clever way to get this feature working in the sharded sqlite scenario but the implementation is some real "inheritance is the root of all evil" shit. I think it makes it clear that the way the run sharding was managed between the child/parent class is pretty fragile foundation.

I think if we can model it more cleanly this will be fine. I just want to give future us a real shot at being able to refactor all of this, and as it stands this is a ties nasty knot in the class hierarchy


this works by taking advantage of the fact that the postgres connect call ignores the parameter right? I think this feels a bit worse than the run_id shenanigans and warrants an explicitly named method i think



  • make two explicit connection calls in sql event log storage
prha planned changes to this revision.Fri, Jan 15, 6:40 PM