View the original community article here
Last tested: Feb 4, 2020
The custodian in 7.2+ is meant to automatically find schedules that are stuck in a state that they cannot recover from and mark them as complete with an error message. It will not affect any schedules that are actually executing.
The custodian runs as a background process every 5 minutes on all nodes and does two things:
First, it checks all schedules that are marked as currently executing or delivering in the backend database along with the threads they're marked as running on and checks those threads to make sure those threads current state indicates it's working on that job.
Second, it will check for the most recent jobs that started executing over an hour ago, and look for any jobs still in the queue waiting to execute that should have started before that one. If there are ones that are stuck in the queue, it will finalize those with an error message.