23 Apr 2013

No downtime deploy with capistrano, Thin and nginx

As I mentioned a couple of weeks ago I’ve been working on a tutorial about thinking through problems in graphs and since it’s a Sinatra application I thought thin would be a decent choice for web server.

In my initial setup I had the following nginx config file which was used to proxy requests on to thin:

/etc/nginx/sites-available/thinkingingraphs.conf

upstream thin {
  server 127.0.0.1:3000;
}

server {
  listen       80 default;
  server_name _;
  charset utf-8;

  rewrite  ^\/status(.*)$  $1 last;

  gzip  on;
  gzip_disable "MSIE [1-6]\.(?!.*SV1)";
  gzip_types       text/plain application/xml text/xml text/css application/x-javascript application/xml+rss text/javascript application/json;

  gzip_vary on;

  access_log  /var/www/thinkingingraphs/shared/log/nginx_access.log;
  error_log  /var/www/thinkingingraphs/shared/log/nginx_error.log;

  root   /var/www/thinkingingraphs/current/public;

  location / {
    proxy_pass http://thin;
  }

  error_page  404              /404.html;
  error_page   500 502 503 504  /500.html;
}

I had an upstart script which started the thin server...

/etc/init/thinkingingraphs.conf

script
  export RACK_ENV=production
  export RUBY=ruby

  cd /var/www/thinkingingraphs/current
  exec su -s /bin/sh vagrant -c '$RUBY -S bundle exec thin -p 3000 start >> /var/www/thinkingingraphs/current/log/production.log 2>&1'
end script

... and then I used the following capistrano script to stop and start the server whenever I was deploying a new version of the application:

config/deploy.rb

namespace :deploy do
  task(:start) {}
  task(:stop) {}

  desc "Restart Application"
  task :restart do
    sudo "stop thinkingingraphs || echo 0"
    sudo "start thinkingingraphs"
  end
end

The problem with this approach is that some requests receive a 502 response code while its restarting:

$ bundle exec cap deploy

$ while true; do curl -w %{http_code}:%{time_total} http://localhost/ -o /dev/null -s; printf "\n"; sleep 0.5; done

200:0.076
200:0.074
200:0.095
502:0.003
200:0.696

I wanted to try and make a no downtime deploy script and I came across a couple of posts which helped me work out how to do it.

The first step was to make sure that I had more than one thin instance running so that requests could be sent to one of the other ones while a restart was in progress.

I created the following config file:

/etc/thin/thinkingingraphs.yml

chdir: /var/www/thinkingingraphs/current
environment: production
address: 0.0.0.0
port: 3000
timeout: 30
log: log/thin.log
pid: tmp/pids/thin.pid
max_conns: 1024
max_persistent_conns: 100
require: []
wait: 30
servers: 3
daemonize: true
onebyone: true

One of the other properties that we need to set is 'onebyone' which means that when you restart thin it will take down the thin instances one at a time. This means one of the other two can handle incoming requests.

We’ve set the number of servers to 3 which will spin up 3 instances on ports 3000, 3001 and 3002.

I changed my upstart script to look like this:

/etc/init/thinkingingraphs.conf

script
  export RACK_ENV=production
  export RUBY=ruby

  cd /var/www/thinkingingraphs/current
  exec su -s /bin/sh vagrant -c '$RUBY -S bundle exec thin -C /etc/thin/thinkingingraphs.yml start >> /var/www/thinkingingraphs/current/log/production.log 2>&1'
end script

I also had to change the capistrano script to call 'thin restart' instead of stopping and starting the upstart script:

config/deploy.rb

namespace :deploy do
  task(:start) {}
  task(:stop) {}

  desc "Restart Application"
  task :restart do
    run "cd #{current_path} && bundle exec thin restart -C /etc/thin/thinkingingraphs.yml"
  end
end

Finally I had to make some changes to the nginx config file to send on requests to other thin instances if the first attempt failed (due to it being restarted) using the proxy_next_upstream method:

/etc/nginx/sites-available/thinkingingraphs.conf

upstream thin {
  server 127.0.0.1:3000 max_fails=1 fail_timeout=15s;
  server 127.0.0.1:3001 max_fails=1 fail_timeout=15s;
  server 127.0.0.1:3002 max_fails=1 fail_timeout=15s;
}

server {
  listen       80 default;
  server_name _;
  charset utf-8;

  rewrite  ^\/status(.*)$  $1 last;

  gzip  on;
  gzip_disable "MSIE [1-6]\.(?!.*SV1)";
  gzip_types       text/plain application/xml text/xml text/css application/x-javascript application/xml+rss text/javascript application/json;

  gzip_vary on;

  access_log  /var/www/thinkingingraphs/shared/log/nginx_access.log;
  error_log  /var/www/thinkingingraphs/shared/log/nginx_error.log;

  root   /var/www/thinkingingraphs/current/public;

  location / {
    proxy_pass http://thin;
    proxy_next_upstream error timeout http_502 http_503;
  }

  error_page  404              /404.html;
  error_page   500 502 503 504  /500.html;
}

We’ve also made a change to our upstream definition to proxy requests to one of the thin instances which will be running.

When I deploy the application now there is no downtime:

$ bundle exec cap deploy

$ while true; do curl -w %{http_code}:%{time_total} http://localhost/ -o /dev/null -s; printf "\n"; sleep 0.5; done

200:0.094
200:0.095
200:0.082
200:0.102
200:0.080
200:0.081

The only problem is that upstart now seems to have lost a handle on the thin processes and from what I can tell there isn’t a master process which upstart could get a handle on so I’m not sure how to wire this up.

Any ideas welcome!

About the author

I'm currently working on short form content at ClickHouse. I publish short 5 minute videos showing how to solve data problems on YouTube @LearnDataWithMark. I previously worked on graph analytics at Neo4j, where I also co-authored the O'Reilly Graph Algorithms Book with Amy Hodler.