Friday | 29 MAR 2024
[ previous ]
[ next ]

Nginx Crashing Randomly

Title:
Date: 2023-04-25
Tags:  nginx, sysadmin

A strange issue recently started up with my web server. Randomly throughout the day my server would stop working, the nginx portion anyway. The rest of the server was trucking along but my sites would stop responding. I wouldn't even get a 502 error so it definitely meant that nginx was crashing. This was pretty unexpected, I did have memory issues before but after adding some swap that stopped.

I first took a look at the nginx error logs under /var/logs/nginx/error.log and found nothing out of the ordinary. The access logs were similarily fine. I then watched top for a bit to see if there was any clues there. Ultimately I could see my node applications being the most intensive but even then they weren't eating up enough memory for nginx to get killed by the oom killer.

I finally checked journalctl and now I know that should have been my first stop.

journalctl -u nginx.service

Which resulted in seeing:

nginx.service: Main process exited, code=dumped, status=11/SEGV
nginx.service: Killing process 43771 (nginx) with signal SIGKILL.
nginx.service: Killing process 43771 (nginx) with signal SIGKILL.
nginx.service: Failed with result 'core-dump'.

This was strewn about everywhere in the journalctl logs.

The error itself means that nginx was crashing somewhere but it wasn't obvious how or why.

Luckily the first result in google gave me the answer.

https://community.letsencrypt.org/t/nginx-server-fails-after-certbot-renew/142660

Certbot was trying to renew certificates and somehow that was causing nginx to crash. That and the nginx extra modules that I had recently installed. I had used apt to install some extra modules and I had loaded them in to use in a future project.

This was somehow causing my nginx to crash specifically when certbot ran. I'm not sure what the interplay was between certbot, nginx and the extra modules but I disabled the modules and was able to renew my certificates using certbot without crashing.

Weird error but all part of the fun. The big lesson here is that I should really be splitting off my production work from my dev work.