Mark Needham

Thoughts on Software Development

Upstart: Job getting stuck in the start/killed state

with 3 comments

We’re using upstart to handle the processes running on our machines and since the haproxy package only came package with an init.d script we wanted to make it upstartified.

When defining an upstart script you need to specify an expect stanza in which you specify whether or not the process which you’re launching is going to fork.

If you do not specify the expect stanza, Upstart will track the life cycle of the first PID that it executes in the exec or script stanzas.

However, most Unix services will “daemonize”, meaning that they will create a new process (using fork(2)) which is a child of the initial process.

Often services will “double fork” to ensure they have no association whatsoever with the initial process.

There is a table on the upstart cookbook under the ‘Implications of Misspecifying expect‘ section which explains what will happen if we specify this incorrectly:

Expect Stanza Behaviour
  Specification of Expect Stanza
Forks no expect expect fork expect daemon
0 Correct start hangs start hangs
1 Wrong pid tracked † Correct start hangs
2 Wrong pid tracked Wrong pid tracked Correct

When we were defining our script we went for expect daemon instead of expect fork and had also mistyped the arguments to the haproxy script which meant it failed to start and ended up in the start/killed state.

From what we could tell upstart had a handle on a PID which didn’t actually exist and when we tried a stop haproxy the command seemed to succeed but didn’t actually do anything.

Phil pointed us to a neat script written by Clint Byrum which spins up and then kills loads of processes in order to exhaust the PID space until a process with the PID upstart is tracking exists and can be re-attached and killed.

It’s available on his website but that wasn’t responding for a period of time yesterday so I’ll repeat it here just in case:

#!/usr/bin/env ruby1.8
 
class Workaround
  def initialize target_pid
    @target_pid = target_pid
 
    first_child
  end
 
  def first_child
    pid = fork do
      Process.setsid
 
      rio, wio = IO.pipe
 
      # Keep rio open
      until second_child rio, wio
        print "\e[A"
      end
    end
 
    Process.wait pid
  end
 
  def second_child parent_rio, parent_wio
    rio, wio = IO.pipe
 
    pid = fork do
      rio.close
      parent_wio.close
 
      puts "%20.20s" % Process.pid
 
      if Process.pid == @target_pid
        wio << 'a'
        wio.close
 
        parent_rio.read
      end
    end
    wio.close
 
    begin
      if rio.read == 'a'
        true
      else
        Process.wait pid
        false
      end
    ensure
      rio.close
    end
  end
end
 
if $0 == __FILE__
  pid = ARGV.shift
  raise "USAGE: #{$0} pid" if pid.nil?
  Workaround.new Integer pid
end

We can put that into a shell script, run it and the world of upstart will get back into a good place again!

Be Sociable, Share!

Written by Mark Needham

September 29th, 2012 at 9:56 am

Posted in Shell Scripting

Tagged with

  • http://johan.kiviniemi.name/ ion

    Hi,

    I wrote the script in 2009 as a quick workaround, too bad it’s still needed. :-P

    I moved the script to GitHub and added a README and a copyright notice. The original URL redirects there now. https://github.com/ion1/workaround-upstart-snafu

  • Calum

    tried this, the script starts spinning through the pids but when it finishes (goes through all pids) the upstart process still seems to be in start/killed state

  • Calum

    i take that back, used incorrect pid