Page MenuHomeElementl

add index to run_tags
ClosedPublic

Authored by prha on Dec 1 2020, 8:40 PM.

Details

Test Plan

ran dagster instance migrate, saw index created on table

Diff Detail

Repository
R1 dagster
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

Harbormaster returned this revision to the author for changes because remote builds failed.Dec 1 2020, 8:59 PM
Harbormaster failed remote builds in B21962: Diff 26667!
prha requested review of this revision.Dec 1 2020, 9:27 PM

this seems good - is there additional work after this to make sure that the index is actually used when tags are included in a run query? It does a join but I'm not sure if its set up to actually apply the index in an efficient way

What's the convention around testing for migrations? I do see some tests in there, is that generally the plan for all alembic changes?

I generally view the migration tests as necessary for backwards compatibility. That's harder to test with the index, which affects performance.

I did capture the raw sql and did an explain to show that the index was being used:

BEFORE:

                                                                                                              QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 GroupAggregate  (cost=47.38..47.40 rows=1 width=1561)
   Group Key: runs.id
   Filter: (count(runs.run_id) = 2)
   ->  Sort  (cost=47.38..47.39 rows=1 width=1598)
         Sort Key: runs.id DESC
         ->  Nested Loop  (cost=0.27..47.37 rows=1 width=1598)
               ->  Seq Scan on run_tags  (cost=0.00..39.04 rows=1 width=37)
                     Filter: ((((key)::text = 'dagster/sensor_name'::text) AND ((value)::text = 'toy_file_sensor'::text)) OR (((key)::text = 'dagster/run_key'::text) AND ((value)::text = 'show_run_key_3:1606868874.916707'::text)))
               ->  Index Scan using runs_run_id_key on runs  (cost=0.27..8.29 rows=1 width=1598)
                     Index Cond: ((run_id)::text = (run_tags.run_id)::text)

AFTER:

                                                                                                                 QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 GroupAggregate  (cost=20.94..20.96 rows=1 width=1561)
   Group Key: runs.id
   Filter: (count(runs.run_id) = 2)
   ->  Sort  (cost=20.94..20.94 rows=1 width=1598)
         Sort Key: runs.id DESC
         ->  Nested Loop  (cost=8.85..20.93 rows=1 width=1598)
               ->  Bitmap Heap Scan on run_tags  (cost=8.58..12.60 rows=1 width=37)
                     Recheck Cond: ((((key)::text = 'dagster/sensor_name'::text) AND ((value)::text = 'toy_file_sensor'::text)) OR (((key)::text = 'dagster/run_key'::text) AND ((value)::text = 'show_run_key_3:1606868874.916707'::text)))
                     ->  BitmapOr  (cost=8.58..8.58 rows=1 width=0)
                           ->  Bitmap Index Scan on idx_run_tags  (cost=0.00..4.29 rows=1 width=0)
                                 Index Cond: (((key)::text = 'dagster/sensor_name'::text) AND ((value)::text = 'toy_file_sensor'::text))
                           ->  Bitmap Index Scan on idx_run_tags  (cost=0.00..4.29 rows=1 width=0)
                                 Index Cond: (((key)::text = 'dagster/run_key'::text) AND ((value)::text = 'show_run_key_3:1606868874.916707'::text))
               ->  Index Scan using runs_run_id_key on runs  (cost=0.27..8.29 rows=1 width=1598)
                     Index Cond: ((run_id)::text = (run_tags.run_id)::text)
This revision is now accepted and ready to land.Dec 2 2020, 3:05 PM
This revision was automatically updated to reflect the committed changes.