Page MenuHomePhabricator

Refactor SQL-based event log storages to use SQLAlchemy
ClosedPublic

Authored by max on Nov 23 2019, 12:50 AM.

Details

Summary

This resolves https://github.com/dagster-io/dagster/issues/1798 as well as performance issues stemming from calls to the default implementation of get_stats_for_run. Replaces D1457, stacked on D1470, D1472, D1473, D1474.

Test Plan

Unit

Diff Detail

Repository
R1 dagster
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

max created this revision.Nov 23 2019, 12:50 AM
max updated this revision to Diff 6823.Nov 23 2019, 1:19 AM

Rebase

max edited the summary of this revision. (Show Details)Nov 23 2019, 1:19 AM
max updated this revision to Diff 6824.Nov 23 2019, 1:20 AM

Rebase

Harbormaster completed remote builds in B5509: Diff 6824.
max updated this revision to Diff 6827.Nov 23 2019, 1:47 AM

Rebase

max edited the summary of this revision. (Show Details)Nov 23 2019, 1:53 AM
max updated this revision to Diff 6829.Nov 23 2019, 1:53 AM
max edited the summary of this revision. (Show Details)

Rebase

max updated this revision to Diff 6830.Nov 23 2019, 1:55 AM

Reorg for clarity

max edited the summary of this revision. (Show Details)Nov 23 2019, 2:02 AM
max updated this revision to Diff 6832.Nov 23 2019, 2:02 AM
max edited the summary of this revision. (Show Details)

Rebase

max edited the summary of this revision. (Show Details)Nov 23 2019, 2:02 AM
Harbormaster completed remote builds in B5515: Diff 6830.

I need another pair of eyes on this one -

python_modules/dagster/dagster/core/errors.py
265–270

still unclear to me how useful this info is - i guess we can use it if someone reports a bug to us?

python_modules/dagster/dagster/core/storage/event_log/base.py
65–215

separate file maybe

python_modules/dagster/dagster/core/storage/sql.py
52–59

using a finally to "catch" these errors feels a little off - what's the reason to avoid except over the errors explicitly?

71–73

should this be a raise_from so we can surface the original exception still?

python_modules/dagster/dagster_tests/compat_tests/test_back_compat.py
24

should dagster instance migrate be in this diff? Seems like its going to be just calling upgrade on the instances stores so i assume not much additional code

python_modules/libraries/dagster-postgres/dagster_postgres_tests/test_event_log.py
355

rm

max added inline comments.Nov 25 2019, 8:12 PM
python_modules/dagster/dagster/core/errors.py
265–270

That's my thought -- I would really like to have this info when we break something.

python_modules/dagster/dagster/core/storage/event_log/base.py
65–215

Definitely, what's better is that with the declarative base, you get a nice models.py file because all of this gets consolidated to one class and will make automigrations easier later on.

67

We can consolidate all of this into one class if we use declarative_base factory. This is a bit weird as written because you are trying to use the connection level execute while also trying to use ORM features which might lead to transactional consistency issues with alembic as we add more tables and get into distributed territory. The best way to think about it is that SQLAlchemy makes available three major objects for executing queries: Engines, Connections, and Sessions. Engines/Connections are the lowest level abstraction which are great for executing raw SQL Queries. However, the moment you start using ORM features, best practice is to use Session execute objects. The reasoning for this is because they handle a bunch of transactional consistency issues for you. I fear we will end up trying to eventually redo all this one bug at a time. This blog post outlines how this can all come together:

https://leportella.com/english/2019/01/10/sqlalchemy-basics-tutorial.html

You can also use sessionmakers to do automated session management, but we can get there later.

110

This is effectively re-implementing sessions, it might be worth exploring all of that.

nate added inline comments.Nov 25 2019, 8:48 PM
python_modules/dagster/dagster/core/storage/event_log/base.py
67

what do you think of putting all of our db.Table defs in one file?

It'll be really helpful later on when we have N tables and want to have one spot to look at our schema; this chunk of code is also nice documentation of our storage layer, sort of how schema.graphql is for the GraphQL API

schrockn resigned from this revision.Nov 25 2019, 10:39 PM

looks like there are enough chefs in the kitchen on this one

prha resigned from this revision.Nov 25 2019, 11:03 PM
alangenfeld requested changes to this revision.Dec 2 2019, 5:24 PM

to your queue

This revision now requires changes to proceed.Dec 2 2019, 5:24 PM
max added inline comments.Dec 3 2019, 7:08 PM
python_modules/dagster/dagster/core/storage/event_log/base.py
67

i'm not sure what ORM features we're using?

max added inline comments.Dec 3 2019, 7:51 PM
python_modules/dagster/dagster/core/storage/event_log/base.py
67

re: all our tables in one file, i feel fine about putting the schema for each individual storage in its own file, i feel very uneasy about mixing the run storage schema with the event log storage schema, etc. -- there is no reason to suppose that in general these will live in the same database. we've separated the storages in part because event log storage is generally >> run storage

max added inline comments.Dec 3 2019, 7:57 PM
python_modules/dagster/dagster/core/storage/sql.py
52–59

i am not sure what i was thinking

max updated this revision to Diff 7076.Dec 3 2019, 9:12 PM

Reorganize

alangenfeld accepted this revision.Dec 10 2019, 10:15 PM

add an entry to changes.md then this should be good to go

This revision is now accepted and ready to land.Dec 10 2019, 10:15 PM
max updated this revision to Diff 7539.Dec 10 2019, 11:37 PM

Changelog

max updated this revision to Diff 7543.Dec 10 2019, 11:45 PM

Rebase

max updated this revision to Diff 7598.Dec 11 2019, 10:26 PM

Rebase

max updated this revision to Diff 7661.Dec 13 2019, 12:48 AM

Rebase

max updated this revision to Diff 7667.Dec 13 2019, 5:10 AM

Rebase

alangenfeld added inline comments.Feb 19 2020, 9:32 PM
python_modules/dagster/dagster/core/storage/event_log/sqlite/sqlite_event_log.py
116

🙃