Upstart: Job getting stuck in the start/killed state
We’re using upstart to handle the processes running on our machines and since the haproxy package only came package with an init.d script we wanted to make it upstartified.
When defining an upstart script you need to specify an expect stanza in which you specify whether or not the process which you’re launching is going to fork.
If you do not specify the expect stanza, Upstart will track the life cycle of the first PID that it executes in the exec or script stanzas.
However, most Unix services will “daemonize”, meaning that they will create a new process (using fork(2)) which is a child of the initial process.
Often services will “double fork” to ensure they have no association whatsoever with the initial process.
There is a table on the upstart cookbook under the ‘Implications of Misspecifying expect‘ section which explains what will happen if we specify this incorrectly:
| Specification of Expect Stanza | |||
|---|---|---|---|
| Forks | no expect | expect fork | expect daemon |
| 0 | Correct | start hangs | start hangs |
| 1 | Wrong pid tracked † | Correct | start hangs |
| 2 | Wrong pid tracked | Wrong pid tracked | Correct |
When we were defining our script we went for expect daemon instead of expect fork and had also mistyped the arguments to the haproxy script which meant it failed to start and ended up in the start/killed state.
From what we could tell upstart had a handle on a PID which didn’t actually exist and when we tried a stop haproxy the command seemed to succeed but didn’t actually do anything.
Phil pointed us to a neat script written by Clint Byrum which spins up and then kills loads of processes in order to exhaust the PID space until a process with the PID upstart is tracking exists and can be re-attached and killed.
It’s available on his website but that wasn’t responding for a period of time yesterday so I’ll repeat it here just in case:
#!/usr/bin/env ruby1.8 class Workaround def initialize target_pid @target_pid = target_pid first_child end def first_child pid = fork do Process.setsid rio, wio = IO.pipe # Keep rio open until second_child rio, wio print "\e[A" end end Process.wait pid end def second_child parent_rio, parent_wio rio, wio = IO.pipe pid = fork do rio.close parent_wio.close puts "%20.20s" % Process.pid if Process.pid == @target_pid wio << 'a' wio.close parent_rio.read end end wio.close begin if rio.read == 'a' true else Process.wait pid false end ensure rio.close end end end if $0 == __FILE__ pid = ARGV.shift raise "USAGE: #{$0} pid" if pid.nil? Workaround.new Integer pid end
We can put that into a shell script, run it and the world of upstart will get back into a good place again!
-
http://johan.kiviniemi.name/ ion
-
Calum
-
Calum