content/apache_restart.rst

   1 Fixing nightly Apache crashes and improving the letsencrypt renewal configuration
   2 =================================================================================
   3
   4 :date: 2019-03-03 10:21
   5 :category: System Administration
   6 :tags: apache, system administration, letsencrypt
   7 :authors: Maximilian Friedersdorff
   8 :summary: My apache web server is stopping every night.  I investigate why and fix it.
   9 :status: published
  10
  11 Over the last few days the Apache web server that runs on my home
  12 server has been acting up again.  Every morning I noticed that it
  13 had stopped running at some point in the night.
  14
  15 This is not the first time this has happened.  In the past, I
  16 just restarted the server in the morning and did not think about
  17 it too much.  After a week or so the issue would typically sort
  18 itself out.   It's time to fix it properly.
  19
  20 Since the behaviour is intermittent I'm guessing that Apache is
  21 crashing, so let's take a look at the error log at
  22 ``/var/log/httpd/error_log``.  I'm only really interested at
  23 events that are happening over night, since that is when the
  24 server is crashing.  There are ways to `filter a log file by a
  25 date range`_, but since the number of lines to go through is
  26 small, I didn't think it was worth the effort.  Here are the
  27 lines of interest for two consecutive days::
  28
  29    [Tue Feb 26 04:20:04.029627 2019] [core:error] [pid 5539:tid 140104264849280] (2)No such file or directory: AH00095: failed to remove PID file /var/run/httpd.pid
  30    [Tue Feb 26 04:20:04.076544 2019] [mpm_event:notice] [pid 5539:tid 140104264849280] AH00491: caught SIGTERM, shutting down
  31    [Wed Feb 27 04:20:02.324497 2019] [core:error] [pid 11281:tid 140662696130432] (2)No such file or directory: AH00095: failed to remove PID file /var/run/httpd.pid
  32    [Wed Feb 27 04:20:02.324674 2019] [mpm_event:notice] [pid 11281:tid 140662696130432] AH00491: caught SIGTERM, shutting down
  33
  34 On both days, Apache receives a SIGTERM signal, it tries (and fails) to delete
  35 a PID file and then shuts down.  In both cases this happens within seconds of
  36 04:20.  This is clearly a shutdown triggered by some external process, rather
  37 than a crash. It's also happening at a similar time every night, close to a
  38 round number.  I suspect that this is caused by some cronjob.  Let's take a
  39 look::
  40
  41    # Run hourly cron jobs at 47 minutes after the hour:
  42    47 * * * * /usr/bin/run-parts /etc/cron.hourly 1> /dev/null
  43    #
  44    # Run daily cron jobs at 4:40 every day:
  45    40 4 * * * /usr/bin/run-parts /etc/cron.daily 1> /dev/null
  46    #
  47    # Run weekly cron jobs at 4:30 on the first day of the week:
  48    30 4 * * 0 /usr/bin/run-parts /etc/cron.weekly 1> /dev/null
  49    #
  50    # Run monthly cron jobs at 4:20 on the first day of the month:
  51    20 4 1 * * /usr/bin/run-parts /etc/cron.monthly 1> /dev/null
  52
  53    # Renew ssl certificates
  54    20 4 * * * /bin/sh -c "/etc/rc.d/rc.httpd stop && letsencrypt renew && /etc/rc.d/rc.httpd start" 1> /dev/null 2>&1
  55
  56 This looks promising, there is a single cronjob running nightly at 04:20 that
  57 attempts to renew letsencrypt SSL certificates, and it is shutting down Apache
  58 in order to do so.  Unfortunately I've been optimistic and redirected all output
  59 from that cronjob to ``/dev/null``.  Fortunately, letsencrypt is keeping a log
  60 of all renewal attempts at ``/var/log/letsencrypt``.  Here is the relevant line::
  61
  62    StandaloneBindError: Problem binding to port 80: Could not bind to IPv4 or IPv6.
  63
  64 That's a bit strange.  Apache is being stopped before the renewal attempt, so
  65 there shouldn't be anything still bound to port 80.  I can use ``netstat`` to
  66 take a look at what is bound to port 80:
  67
  68 .. code-block:: bash
  69
  70    # netstat -nlp | grep ':80' | grep -v tcp6
  71    tcp        0      0 0.0.0.0:80              0.0.0.0:*               LISTEN      11525/nginx: master
  72
  73 I'm using netstat to list listening (``-l``) ports numericaly (``-n``), along
  74 with the process that owns them (``-p``).  I'm grepping for port 80 and
  75 excluding any IPv6 results.
  76
  77 Why is nginx running?  I need to have a word with my past self.
  78
  79 Nginx is only listening on port 80 and is configured to always respond with a
  80 redirect to https::
  81
  82    worker_processes  1;
  83
  84    events {
  85            worker_connections  1024;
  86    }
  87
  88    http {
  89            include       mime.types;
  90            default_type  application/octet-stream;
  91
  92            keepalive_timeout  65;
  93
  94            server {
  95                    listen 80 default_server;
  96                    listen [::]:80 default_server;
  97                    server_name _;
  98                    return 301 https://$host$request_uri;
  99            }
 100    }
 101
 102 I'm not sure what my thought process was when I set this up.  It would be much
 103 better to configure Apache to do perform this redirect instead.  I'm using
 104 Slackware on this server, it doesn't even package nginx so I'm compiling this
 105 with a slackbuild from https://slackbuilds.org.  Uninstalling it would be
 106 desirable.
 107
 108 To perform the same redirect in Apache instead, I've added the following lines
 109 to the configuration file (thanks to `Gordon on Stackoverflow`_)::
 110
 111    Listen 80
 112
 113    <VirtualHost *:80>
 114    RewriteEngine On
 115    RewriteCond %{HTTPS} !=on
 116    RewriteRule ^ https://%{HTTP_HOST}%{REQUEST_URI} [L,R=301]
 117    </VirtualHost>
 118
 119 This allows Apache to respond to requests on port 80 and adds a default
 120 VirtualHost (there are no others for port 80) that responds with a permanent
 121 redirect to the https version of the same URL.
 122
 123 The cronjob can now renew the SSL certificates and successfully restart Apache
 124 afterwards.  For additional robustness, the cronjob should restart Apache whether
 125 or not the actual renewal was successful::
 126
 127    # Renew ssl certificates
 128    20 4 * * * /bin/sh -c "/etc/rc.d/rc.httpd stop && letsencrypt renew; /etc/rc.d/rc.httpd start" 1> /dev/null 2>&1
 129
 130 I actually think that I can do one better than that.  Certbot has a mature Apache
 131 plugin that should be able to handle the renewal process using Apache. I wasn't
 132 actually expecting this to work.  I changed the value of the ``authenticator``
 133 configuration option from ``standalone`` to ``apache`` in the renewal
 134 configuration of letsencrypt. Running ``certbot renew --dry-run`` confirms that
 135 this works successfully.
 136
 137 I can now make a final change to the cronjob::
 138
 139    # Renew ssl certificates
 140    20 4 * * * certbot renew /dev/null 2>&1
 141
 142
 143 .. _filter a log file by a date range: https://stackoverflow.com/questions/7706095/filter-log-file-entries-based-on-date-range
 144 .. _Gordon on Stackoverflow: https://stackoverflow.com/a/4399158