Yevhenii Kurtov (3) [Avatar] Offline
#1
Hello, Saša

In section `8.3.5 Restart frequency` you denotes that Supervisor will terminate itself in case of maximum restart frequency exceedance

It’s important to keep in mind that a supervisor won’t restart a child process forever. The supervisor relies on the maximum restart frequency, which defines how many restarts are allowed in a given time period. By default, the maximum restart frequency is five restarts in five seconds. If this frequency is exceeded, the supervisor gives up and terminates itself.


Does that apply for GenServers that shutting down themselves by returning {:stop, reason, reply, new_state}?
What if :normal or :shutdown was specified as the stoppage reason?
Will it happen if those GenServers was spawn with temporary restart strategy?

The use-case on mind is following: there are a bunch of workers that are doing document export and if they are failing to do their job after N attempts (due to network errors mostly) they will exit they wait it will be logged and doesn't affect the parent supervisor.
sjuric (86) [Avatar] Offline
#2
Maximum restart frequency is the number of restarts per time interval after which supervisor will terminate all of its children (together with itself). This can be tuned through max_restarts and max_seconds options to supervise/2.

Whether a child process (e.g. GenServer) is going to be restarted after a termination depends on the restart option given to worker/supervisor functions. If this is not specified, the default value is permanent, meaning the child is restarted regardless of the exit reason (even normal or shutdown).

If you explicitly want to stop a permanent worker without restarting it, you could use Supervisor.terminate_child. However, I can't recall a single time I used that function in practice.

For your use case, I'd recommend using temporary workers with explicit error control. Retrying with supervisors is meant to help with unexpected bugs. A network failure is a very expected situation, so IMO it should be handled explicitly. In your GenServer you could issue a network call (preferably asynchronous call). Then if the call fails or timeouts, you can retry with a (possibly growing) delay. Once you reached max attempts the worker can just stop itself by returning a :stop tuple. Using a non-normal reason will also ensure the termination is logged as an error.