I realized that to get the full benefits of https://dagster.phacility.com/D6641 (avoiding the situation where a bunch of schedules/sensors cause an iteration to take more than 2 minutes and trigger a heartbeat failure), we need to be heartbeating more often on the first iteration as well. To still accomplish the goal of not incorrectly saying that the daemon is healthy, I added logic to ensure we log a heartbeat with an error the first time one comes up. This could lead us to incorrectly saying the first iteration is healthy, but I think that's better than the daemon crashing due to a long first iteration.
Integration, BK (see channes to error test)
I'm confused by this...
should this be:
if status.healthy == False and status.last_heartbeat.errors: assert len(status.last_heartbeat.errors) == 2 ...
This should test that any errors get grouped with the iteration, right?