Mark Needham

Thoughts on Software Development

Java/Scala: Runtime.exec hanging/in ‘pipe_w’ state

with 7 comments

On the system that I’m currently working on we have a data ingestion process which needs to take zip files, unzip them and then import their contents into the database.

As a result we delegate from Scala code to the system unzip command like so:

def extract {
  var command = "unzip %s -d %s" format("/file/to/unzip.zip", "/place/to/unzip/to")
  var process: Process = null
 
  try {
    process = Runtime.getRuntime.exec(command)
    val exitCode = process.waitFor
  } catch {
    case e : Exception => // do some stuff
  } finally {
    // close the stream here
  }
}

We ran into a problem where the unzipping process was hanging and executing ‘ps’ showed us that the ‘unzip’ process was stuck in the ‘pipe_w’ (pipe waiting) state which suggested that it was waiting for some sort of input.

After a bit of googling Duncan found this blog which explained that we needed to process the output stream from our process otherwise it might end up hanging

a.k.a. RTFM:

The Runtime.exec methods may not work well for special processes on certain native platforms, such as native windowing processes, daemon processes, Win16/DOS processes on Microsoft Windows, or shell scripts.

The created subprocess does not have its own terminal or console. All its standard io (i.e. stdin, stdout, stderr) operations will be redirected to the parent process through three streams (Process.getOutputStream(), Process.getInputStream(), Process.getErrorStream()).

The parent process uses these streams to feed input to and get output from the subprocess.

Because some native platforms only provide limited buffer size for standard input and output streams, failure to promptly write the input stream or read the output stream of the subprocess may cause the subprocess to block, and even deadlock.

For most of the zip files we presumably hadn’t been reaching the limit of the buffer because the list of files being sent to STDOUT by ‘unzip’ wasn’t that high.

In order to get around the problem we needed to gobble up the output stream from unzip like so:

import org.apache.commons.io.IOUtils
def extract {
  var command = "unzip %s -d %s" format("/file/to/unzip.zip", "/place/to/unzip/to")
  var process: Process = null
 
  try {
    process = Runtime.getRuntime.exec(command)
    val thisVariableIsNeededToSuckDataFromUnzipDoNotRemove = "Output: " + IOUtils.readLines(process.getInputStream)
    val exitCode = process.waitFor
  } catch {
    case e : Exception => // do some stuff
  } finally {
    // close the stream here
  }
}

We need to do the same thing with the error stream as well in case ‘unzip’ ends up overflowing that buffer as well.

On a couple of blog posts that we came across it was suggested that we should ‘gobble up’ the output and error streams on separate threads but we weren’t sure why exactly that was considered necessary…

If anyone knows then please let me know in the comments.

Written by Mark Needham

November 20th, 2011 at 8:20 pm

Posted in Java,Scala

Tagged with

  • Carfield Yim

    I guess you always need to start a new thread to get output from stdout and stderr if you need to use native process. 

    BTW, why you don’t use the java library for unzip? really huge difference of performance?

  • Guest

    I suppose this solution is still not sufficient. Imagine it takes a couple of seconds for process to start producing output. readLines() finishes without reading anything and the subsequent process output still fills the buffers. You should try to read in a loop until the process has finished.

  • Christian Schlichtherle

    As said on DZone, I recommend to use TrueZIP instead. Here’s your use case: http://truezip.java.net/usecases/sbt.html

  • Ido

    Don’t use scala its EJB2, invented by crazy scientist not getting work done! But if you really need to:

    import scala.sys.process._
    val exitcode  = “unzip timings.zip unzipplace” ! ProcessLogger(s => println(“out: “+s),s => println(“err: “+s))

    But you really should RTFM a little bit at least.

  • Ido

    btw. this example abuses operator overloading and implicits and type systems. Its completely uncomprehensible!
    (And theres a -d missing in the command)

  • http://www.markhneedham.com/blog Mark Needham

    I checked with one of my colleagues about this and he said we had originally started off using the java unzip library but had a problem with at some stage. Sadly we couldn’t remember exactly what the problem was :(

  • Christian Schlichtherle

    I’ld review this. There are many Java implementations for ZIP available and calling out for a separate process is one of the worst options.