mirror of
https://github.com/nextcloud/all-in-one.git
synced 2026-05-21 10:50:10 +00:00
fix: prevent 502 Bad Gateway via PHP-FPM worker pool exhaustion and cold-start latency
- Add request_terminate_timeout = PHP_MAX_TIME in start.sh: without this (default 0 = disabled) workers blocked on a slow DB query, stalled Redis connection, or hung syscall are never reaped. Over time they fill pm.max_children and Apache returns 502 Bad Gateway to the reverse proxy. - Set pm.process_idle_timeout = 300s in Dockerfile: the upstream default of 10 s kills all idle workers after a brief quiet period. The next request burst must then wait for fresh PHP-FPM forks; on a loaded host that spawn latency can push Apache past its FastCGI deadline and produce a 502. 300 s keeps a warm pool through normal desktop-sync polling cycles. - Add a dedicated 502 troubleshooting subsection to reverse-proxy.md documenting the six most common causes (proxy timeout, worker exhaustion, stuck workers, Redis session lock contention, container cold start, Caddy cert renewal) with actionable diagnostics. Agent-Logs-Url: https://github.com/nextcloud/all-in-one/sessions/2fd7a6d1-bfdb-4f26-a8d0-cd54a7307999 Co-authored-by: szaimen <42591237+szaimen@users.noreply.github.com>
This commit is contained in:
committed by
GitHub
parent
119f68b6ee
commit
46eb2dfc7d
@@ -250,6 +250,14 @@ RUN set -ex; \
|
||||
# We don't actually expect so many children but don't want to limit it artificially because people will report issues otherwise.
|
||||
# Also children will usually be terminated again after the process is done due to the ondemand setting
|
||||
sed -i 's/^pm.max_children =.*/pm.max_children = 5000/' /usr/local/etc/php-fpm.d/www.conf; \
|
||||
# With pm = ondemand, workers are killed after pm.process_idle_timeout seconds
|
||||
# of inactivity. The upstream default is 10 s, which is aggressive: after a
|
||||
# brief quiet period (e.g. desktop-sync clients polling every few seconds), all
|
||||
# workers are reaped and the next request burst must wait for fresh forks. On
|
||||
# a loaded host that spawn latency can push Apache past its FastCGI timeout and
|
||||
# produce a 502. 300 s (5 min) keeps a warm pool through normal sync-client
|
||||
# polling cycles while still reclaiming memory during genuinely idle periods.
|
||||
sed -i 's/^;*pm.process_idle_timeout.*/pm.process_idle_timeout = 300s/' /usr/local/etc/php-fpm.d/www.conf; \
|
||||
sed -i 's|access.log = /proc/self/fd/2|access.log = /proc/self/fd/1|' /usr/local/etc/php-fpm.d/docker.conf; \
|
||||
\
|
||||
echo "[ -n \"\$TERM\" ] && [ -f /root.motd ] && cat /root.motd" >> /root/.bashrc; \
|
||||
|
||||
@@ -156,6 +156,15 @@ while [ "$THIS_IS_AIO" = "true" ] && [ -z "$(dig nextcloud-aio-apache A +short +
|
||||
sleep 5
|
||||
done
|
||||
|
||||
# Set request_terminate_timeout so that PHP-FPM forcibly kills workers that
|
||||
# exceed the wall-clock limit. Without this (default = 0 = disabled) a worker
|
||||
# stuck on a slow DB query, a stalled Redis connection, or a hung syscall is
|
||||
# never reaped. Over time these zombies fill up pm.max_children, leaving no
|
||||
# free slots for legitimate requests and causing Apache to return 502 Bad
|
||||
# Gateway upstream. Setting it equal to PHP_MAX_TIME means a worker lives at
|
||||
# most as long as a PHP script is allowed to run, which keeps the pool healthy.
|
||||
sed -i "s|^;*request_terminate_timeout = .*|request_terminate_timeout = ${PHP_MAX_TIME}|" /usr/local/etc/php-fpm.d/www.conf
|
||||
|
||||
set -x
|
||||
# shellcheck disable=SC2235
|
||||
if [ "$THIS_IS_AIO" = "true" ] && [ "$APACHE_PORT" = 443 ]; then
|
||||
|
||||
Reference in New Issue
Block a user