Right now there are two checks for if a daemon has gone bad. The first looks for dead threads, and the second looks for missing heartbeats. The second one requires a much longer time to wait, but if a thread dies we can shut down the daemon process much sooner. This diff makes those checks happen on different intervals.
Also add a guard around the heartbeat add function - before, a transient heartbeat add failure would bring down the whole thread, now we log an error (the process will still shut down eventually if the heartbeat is permanently down, it will just take longer).
Reviewers; johann, alangenfeld