From 839a40c20c39c1777bc118610e97d26841d79082 Mon Sep 17 00:00:00 2001 From: Maximilian Friedersdorff Date: Sun, 3 Mar 2019 14:02:43 +0000 Subject: [PATCH] content: Nightly apache crashes + cert renewal --- content/apache_restart.rst | 143 +++++++++++++++++++++++++++++++++++++ 1 file changed, 143 insertions(+) create mode 100644 content/apache_restart.rst diff --git a/content/apache_restart.rst b/content/apache_restart.rst new file mode 100644 index 0000000..3312a8a --- /dev/null +++ b/content/apache_restart.rst @@ -0,0 +1,143 @@ +Fixing nightly Apache crashes and improving the letsencrypt renewal configuration +================================================================================= + +:date: 2019-03-03 10:21 +:category: System Administration +:tags: apache, system administration, letsencrypt +:authors: Maximilian Friedersdorff +:summary: My apache web server is stopping every night. I investigate why and fix it. + +Over the last few days the Apache web server that runs on my home +server has been acting up again. Every morning I noticed that it +had stopped running at some point in the night. + +This is not the first time this has happened. In the past, I +just restarted the server in the morning and did not think about +it too much. After a week or so the issue would typically sort +itself out. It's time to fix it properly. + +Since the behaviour is intermittent I'm guessing that Apache is +crashing, so let's take a look at the error log at +``/var/log/httpd/error_log``. I'm only really interested at +events that are happening over night, since that is when the +server is crashing. There are ways to `filter a log file by a +date range`_, but since the number of lines to go through is +small, I didn't think it was worth the effort. Here are the +lines of interest for two consecutive days:: + + [Tue Feb 26 04:20:04.029627 2019] [core:error] [pid 5539:tid 140104264849280] (2)No such file or directory: AH00095: failed to remove PID file /var/run/httpd.pid + [Tue Feb 26 04:20:04.076544 2019] [mpm_event:notice] [pid 5539:tid 140104264849280] AH00491: caught SIGTERM, shutting down + [Wed Feb 27 04:20:02.324497 2019] [core:error] [pid 11281:tid 140662696130432] (2)No such file or directory: AH00095: failed to remove PID file /var/run/httpd.pid + [Wed Feb 27 04:20:02.324674 2019] [mpm_event:notice] [pid 11281:tid 140662696130432] AH00491: caught SIGTERM, shutting down + +On both days, Apache receives a SIGTERM signal, it tries (and fails) to delete +a PID file and then shuts down. In both cases this happens within seconds of +04:20. This is clearly a shutdown triggered by some external process, rather +than a crash. It's also happening at a similar time every night, close to a +round number. I suspect that this is caused by some cronjob. Let's take a +look:: + + # Run hourly cron jobs at 47 minutes after the hour: + 47 * * * * /usr/bin/run-parts /etc/cron.hourly 1> /dev/null + # + # Run daily cron jobs at 4:40 every day: + 40 4 * * * /usr/bin/run-parts /etc/cron.daily 1> /dev/null + # + # Run weekly cron jobs at 4:30 on the first day of the week: + 30 4 * * 0 /usr/bin/run-parts /etc/cron.weekly 1> /dev/null + # + # Run monthly cron jobs at 4:20 on the first day of the month: + 20 4 1 * * /usr/bin/run-parts /etc/cron.monthly 1> /dev/null + + # Renew ssl certificates + 20 4 * * * /bin/sh -c "/etc/rc.d/rc.httpd stop && letsencrypt renew && /etc/rc.d/rc.httpd start" 1> /dev/null 2>&1 + +This looks promising, there is a single cronjob running nightly at 04:20 that +attempts to renew letsencrypt SSL certificates, and it is shutting down Apache +in order to do so. Unfortunately I've been optimistic and redirected all output +from that cronjob to ``/dev/null``. Fortunately, letsencrypt is keeping a log +of all renewal attempts at ``/var/log/letsencrypt``. Here is the relevant line:: + + StandaloneBindError: Problem binding to port 80: Could not bind to IPv4 or IPv6. + +That's a bit strange. Apache is being stopped before the renewal attempt, so +there shouldn't be anything still bound to port 80. I can use ``netstat`` to +take a look at what is bound to port 80: + +.. code-block:: bash + + # netstat -nlp | grep ':80' | grep -v tcp6 + tcp 0 0 0.0.0.0:80 0.0.0.0:* LISTEN 11525/nginx: master + +I'm using netstat to list listening (``-l``) ports numericaly (``-n``), along +with the process that owns them (``-p``). I'm grepping for port 80 and +excluding any IPv6 results. + +Why is nginx running? I need to have a word with my past self. + +Nginx is only listening on port 80 and is configured to always respond with a +redirect to https:: + + worker_processes 1; + + events { + worker_connections 1024; + } + + http { + include mime.types; + default_type application/octet-stream; + + keepalive_timeout 65; + + server { + listen 80 default_server; + listen [::]:80 default_server; + server_name _; + return 301 https://$host$request_uri; + } + } + +I'm not sure what my thought process was when I set this up. It would be much +better to configure Apache to do perform this redirect instead. I'm using +Slackware on this server, it doesn't even package nginx so I'm compiling this +with a slackbuild from https://slackbuilds.org. Uninstalling it would be +desirable. + +To perform the same redirect in Apache instead, I've added the following lines +to the configuration file (thanks to `Gordon on Stackoverflow`_):: + + Listen 80 + + + RewriteEngine On + RewriteCond %{HTTPS} !=on + RewriteRule ^ https://%{HTTP_HOST}%{REQUEST_URI} [L,R=301] + + +This allows Apache to respond to requests on port 80 and adds a default +VirtualHost (there are no others for port 80) that responds with a permanent +redirect to the https version of the same URL. + +The cronjob can now renew the SSL certificates and successfully restart Apache +afterwards. For additional robustness, the cronjob should restart Apache whether +or not the actual renewal was successful:: + + # Renew ssl certificates + 20 4 * * * /bin/sh -c "/etc/rc.d/rc.httpd stop && letsencrypt renew; /etc/rc.d/rc.httpd start" 1> /dev/null 2>&1 + +I actually think that I can do one better than that. Certbot has a mature Apache +plugin that should be able to handle the renewal process using Apache. I wasn't +actually expecting this to work. I changed the value of the ``authenticator`` +configuration option from ``standalone`` to ``apache`` in the renewal +configuration of letsencrypt. Running ``certbot renew --dry-run`` confirms that +this works successfully. + +I can now make a final change to the cronjob:: + + # Renew ssl certificates + 20 4 * * * certbot renew /dev/null 2>&1 + + +.. _filter a log file by a date range: https://stackoverflow.com/questions/7706095/filter-log-file-entries-based-on-date-range +.. _Gordon on Stackoverflow: https://stackoverflow.com/a/4399158 -- 2.46.2