Mark Needham

Thoughts on Software Development

Unix parallel: Populating all the USB sticks

without comments

The day before Graph Connect Europe 2016 we needed to create a bunch of USB sticks containing Neo4j and the training materials and eventually iterated our way to a half decent approach which made use of the GNU parallel command which I’ve always wanted to use!

But first I needed to get a USB hub so I could do lots of them at the same time. I bought the EasyAcc USB 3.0 but there are lots of other ones that do the same job.

Next I mouunted all the USB sticks and then renamed the volumes to be NEO4J1 -> NEO4J7:

for i in 1 2 3 4 5 6 7; do diskutil renameVolume "USB DISK" NEO4J${i}; done

I then created a bash function called ‘duplicate’ to do the copying work:

function duplicate() {
  echo ${i}
  time rsync -avP --size-only --delete --exclude '.*' --omit-dir-times /Users/markneedham/Downloads/graph-connect-europe-2016/ /Volumes/NEO4J${i}/

We can now call this function in parallel like so:

seq 1 7 | parallel duplicate

And that’s it. We didn’t get a 7x improvement in the throughput of USB creation from doing 7 in parallel but it took ~ 9 minutes to complete 7 compared to 5 minutes each. Presumably there’s still some part of the copying that is sequential further down – Amdahl’s law #ftw.

I want to go and find other things that I can use pipe into parallel now!

Be Sociable, Share!

Written by Mark Needham

June 1st, 2016 at 5:53 am

Posted in Shell Scripting

Tagged with