--- /dev/null
+Fixing nightly Apache crashes and improving the letsencrypt renewal configuration
+=================================================================================
+
+:date: 2019-03-03 10:21
+:category: System Administration
+:tags: apache, system administration, letsencrypt
+:authors: Maximilian Friedersdorff
+:summary: My apache web server is stopping every night. I investigate why and fix it.
+
+Over the last few days the Apache web server that runs on my home
+server has been acting up again. Every morning I noticed that it
+had stopped running at some point in the night.
+
+This is not the first time this has happened. In the past, I
+just restarted the server in the morning and did not think about
+it too much. After a week or so the issue would typically sort
+itself out. It's time to fix it properly.
+
+Since the behaviour is intermittent I'm guessing that Apache is
+crashing, so let's take a look at the error log at
+``/var/log/httpd/error_log``. I'm only really interested at
+events that are happening over night, since that is when the
+server is crashing. There are ways to `filter a log file by a
+date range`_, but since the number of lines to go through is
+small, I didn't think it was worth the effort. Here are the
+lines of interest for two consecutive days::
+
+ [Tue Feb 26 04:20:04.029627 2019] [core:error] [pid 5539:tid 140104264849280] (2)No such file or directory: AH00095: failed to remove PID file /var/run/httpd.pid
+ [Tue Feb 26 04:20:04.076544 2019] [mpm_event:notice] [pid 5539:tid 140104264849280] AH00491: caught SIGTERM, shutting down
+ [Wed Feb 27 04:20:02.324497 2019] [core:error] [pid 11281:tid 140662696130432] (2)No such file or directory: AH00095: failed to remove PID file /var/run/httpd.pid
+ [Wed Feb 27 04:20:02.324674 2019] [mpm_event:notice] [pid 11281:tid 140662696130432] AH00491: caught SIGTERM, shutting down
+
+On both days, Apache receives a SIGTERM signal, it tries (and fails) to delete
+a PID file and then shuts down. In both cases this happens within seconds of
+04:20. This is clearly a shutdown triggered by some external process, rather
+than a crash. It's also happening at a similar time every night, close to a
+round number. I suspect that this is caused by some cronjob. Let's take a
+look::
+
+ # Run hourly cron jobs at 47 minutes after the hour:
+ 47 * * * * /usr/bin/run-parts /etc/cron.hourly 1> /dev/null
+ #
+ # Run daily cron jobs at 4:40 every day:
+ 40 4 * * * /usr/bin/run-parts /etc/cron.daily 1> /dev/null
+ #
+ # Run weekly cron jobs at 4:30 on the first day of the week:
+ 30 4 * * 0 /usr/bin/run-parts /etc/cron.weekly 1> /dev/null
+ #
+ # Run monthly cron jobs at 4:20 on the first day of the month:
+ 20 4 1 * * /usr/bin/run-parts /etc/cron.monthly 1> /dev/null
+
+ # Renew ssl certificates
+ 20 4 * * * /bin/sh -c "/etc/rc.d/rc.httpd stop && letsencrypt renew && /etc/rc.d/rc.httpd start" 1> /dev/null 2>&1
+
+This looks promising, there is a single cronjob running nightly at 04:20 that
+attempts to renew letsencrypt SSL certificates, and it is shutting down Apache
+in order to do so. Unfortunately I've been optimistic and redirected all output
+from that cronjob to ``/dev/null``. Fortunately, letsencrypt is keeping a log
+of all renewal attempts at ``/var/log/letsencrypt``. Here is the relevant line::
+
+ StandaloneBindError: Problem binding to port 80: Could not bind to IPv4 or IPv6.
+
+That's a bit strange. Apache is being stopped before the renewal attempt, so
+there shouldn't be anything still bound to port 80. I can use ``netstat`` to
+take a look at what is bound to port 80:
+
+.. code-block:: bash
+
+ # netstat -nlp | grep ':80' | grep -v tcp6
+ tcp 0 0 0.0.0.0:80 0.0.0.0:* LISTEN 11525/nginx: master
+
+I'm using netstat to list listening (``-l``) ports numericaly (``-n``), along
+with the process that owns them (``-p``). I'm grepping for port 80 and
+excluding any IPv6 results.
+
+Why is nginx running? I need to have a word with my past self.
+
+Nginx is only listening on port 80 and is configured to always respond with a
+redirect to https::
+
+ worker_processes 1;
+
+ events {
+ worker_connections 1024;
+ }
+
+ http {
+ include mime.types;
+ default_type application/octet-stream;
+
+ keepalive_timeout 65;
+
+ server {
+ listen 80 default_server;
+ listen [::]:80 default_server;
+ server_name _;
+ return 301 https://$host$request_uri;
+ }
+ }
+
+I'm not sure what my thought process was when I set this up. It would be much
+better to configure Apache to do perform this redirect instead. I'm using
+Slackware on this server, it doesn't even package nginx so I'm compiling this
+with a slackbuild from https://slackbuilds.org. Uninstalling it would be
+desirable.
+
+To perform the same redirect in Apache instead, I've added the following lines
+to the configuration file (thanks to `Gordon on Stackoverflow`_)::
+
+ Listen 80
+
+ <VirtualHost *:80>
+ RewriteEngine On
+ RewriteCond %{HTTPS} !=on
+ RewriteRule ^ https://%{HTTP_HOST}%{REQUEST_URI} [L,R=301]
+ </VirtualHost>
+
+This allows Apache to respond to requests on port 80 and adds a default
+VirtualHost (there are no others for port 80) that responds with a permanent
+redirect to the https version of the same URL.
+
+The cronjob can now renew the SSL certificates and successfully restart Apache
+afterwards. For additional robustness, the cronjob should restart Apache whether
+or not the actual renewal was successful::
+
+ # Renew ssl certificates
+ 20 4 * * * /bin/sh -c "/etc/rc.d/rc.httpd stop && letsencrypt renew; /etc/rc.d/rc.httpd start" 1> /dev/null 2>&1
+
+I actually think that I can do one better than that. Certbot has a mature Apache
+plugin that should be able to handle the renewal process using Apache. I wasn't
+actually expecting this to work. I changed the value of the ``authenticator``
+configuration option from ``standalone`` to ``apache`` in the renewal
+configuration of letsencrypt. Running ``certbot renew --dry-run`` confirms that
+this works successfully.
+
+I can now make a final change to the cronjob::
+
+ # Renew ssl certificates
+ 20 4 * * * certbot renew /dev/null 2>&1
+
+
+.. _filter a log file by a date range: https://stackoverflow.com/questions/7706095/filter-log-file-entries-based-on-date-range
+.. _Gordon on Stackoverflow: https://stackoverflow.com/a/4399158