Mark Needham

Thoughts on Software Development

Archive for the ‘Hadoop’ Category

Mahout/Hadoop: org.apache.hadoop.ipc.RemoteException: Server IPC version 9 cannot communicate with client version 4

without comments

I’ve been working my way through Dragan Milcevski’s mini tutorial on using Mahout to do content based filtering on documents and reached the final step where I needed to read in the generated item-similarity files.

I got the example compiling by using the following Maven dependency:

<dependency>
      <groupId>org.apache.mahout</groupId>
      <artifactId>mahout-core</artifactId>
      <version>0.9</version>
</dependency>

Unfortunately when I ran the code I ran into a version incompatibility problem:

Exception in thread "main" org.apache.hadoop.ipc.RemoteException: Server IPC version 9 cannot communicate with client version 4
	at org.apache.hadoop.ipc.Client.call(Client.java:1113)
	at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
	at com.sun.proxy.$Proxy1.getProtocolVersion(Unknown Source)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:497)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:85)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:62)
	at com.sun.proxy.$Proxy1.getProtocolVersion(Unknown Source)
	at org.apache.hadoop.ipc.RPC.checkVersion(RPC.java:422)
	at org.apache.hadoop.hdfs.DFSClient.createNamenode(DFSClient.java:183)
	at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:281)
	at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:245)
	at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:100)
	at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1446)
	at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:67)
	at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1464)
	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:263)
	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:124)
	at com.markhneedham.mahout.Similarity.getDocIndex(Similarity.java:86)
	at com.markhneedham.mahout.Similarity.main(Similarity.java:25)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:497)
	at com.intellij.rt.execution.application.AppMain.main(AppMain.java:144)

Version 0.9.0 of mahout-core was published in early 2014 so I expect it was built against an earlier version of Hadoop than I’m using (2.7.2).

I tried updating the Hadoop dependencies that were being called in the stack trace to no avail.

<dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-client</artifactId>
    <version>2.7.2</version>
</dependency>
 
<dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-hdfs</artifactId>
    <version>2.7.2</version>
</dependency>

When stepping through the stack trace I noticed that my program was still using an old version of hadoop-core, so with one last throw of the dice I decided to try explicitly excluding that:

<dependency>
    <groupId>org.apache.mahout</groupId>
    <artifactId>mahout-core</artifactId>
    <version>0.9</version>
 
    <exclusions>
        <exclusion>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-core</artifactId>
        </exclusion>
    </exclusions>
</dependency>

And amazingly it worked. Now, finally, I can see how similar my documents are!

Written by Mark Needham

July 22nd, 2016 at 1:55 pm

Posted in Hadoop

Tagged with ,

Hadoop: DataNode not starting

without comments

In my continued playing with Mahout I eventually decided to give up using my local file system and use a local Hadoop instead since that seems to have much less friction when following any examples.

Unfortunately all my attempts to upload any files from my local file system to HDFS were being met with the following exception:

java.io.IOException: File /user/markneedham/book2.txt could only be replicated to 0 nodes, instead of 1
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1448)
at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:690)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:342)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1350)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1346)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:742)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1344)
 
at org.apache.hadoop.ipc.Client.call(Client.java:905)
at org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:198)
at $Proxy0.addBlock(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy0.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:928)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:811)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:427)

I eventually realised, from looking at the output of jps, that the DataNode wasn’t actually starting up which explains the error message I was seeing.

A quick look at the log files showed what was going wrong:


/usr/local/Cellar/hadoop/2.7.1/libexec/logs/hadoop-markneedham-datanode-marks-mbp-4.zte.com.cn.log

2016-07-21 18:58:00,496 WARN org.apache.hadoop.hdfs.server.common.Storage: java.io.IOException: Incompatible clusterIDs in /usr/local/Cellar/hadoop/hdfs/tmp/dfs/data: namenode clusterID = CID-c2e0b896-34a6-4dde-b6cd-99f36d613e6a; datanode clusterID = CID-403dde8b-bdc8-41d9-8a30-fe2dc951575c
2016-07-21 18:58:00,496 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for Block pool <registering> (Datanode Uuid unassigned) service to /0.0.0.0:8020. Exiting.
java.io.IOException: All specified directories are failed to load.
        at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:477)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1361)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1326)
        at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:316)
        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:223)
        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:801)
        at java.lang.Thread.run(Thread.java:745)
2016-07-21 18:58:00,497 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Ending block pool service for: Block pool <registering> (Datanode Uuid unassigned) service to /0.0.0.0:8020
2016-07-21 18:58:00,602 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Removed Block pool <registering> (Datanode Uuid unassigned)
2016-07-21 18:58:02,607 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Exiting Datanode
2016-07-21 18:58:02,608 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 0
2016-07-21 18:58:02,610 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:

I’m not sure how my clusterIDs got out of sync, although I expect it’s because I reformatted HDFS without realising at some stage. There are other ways of solving this problem but the quickest for me was to just nuke the DataNode’s data directory which the log file told me sits here:

sudo rm -r /usr/local/Cellar/hadoop/hdfs/tmp/dfs/data/current

I then re-ran the hstart script that I stole from this tutorial and everything, including the DataNode this time, started up correctly:

$ jps
26736 NodeManager
26392 DataNode
26297 NameNode
26635 ResourceManager
26510 SecondaryNameNode

And now I can upload local files to HDFS again. #win!

Written by Mark Needham

July 22nd, 2016 at 1:31 pm

Posted in Hadoop

Tagged with

Mahout: Exception in thread “main” java.lang.IllegalArgumentException: Wrong FS: file:/… expected: hdfs://

without comments

I’ve been playing around with Mahout over the last couple of days to see how well it works for content based filtering.

I started following a mini tutorial from Stack Overflow but ran into trouble on the first step:

bin/mahout seqdirectory \
--input file:///Users/markneedham/Downloads/apache-mahout-distribution-0.12.2/foo \
--output file:///Users/markneedham/Downloads/apache-mahout-distribution-0.12.2/foo-out \
-c UTF-8 \
-chunk 64 \
-prefix mah
16/07/21 21:19:20 INFO AbstractJob: Command line arguments: {--charset=[UTF-8], --chunkSize=[64], --endPhase=[2147483647], --fileFilterClass=[org.apache.mahout.text.PrefixAdditionFilter], --input=[file:///Users/markneedham/Downloads/apache-mahout-distribution-0.12.2/foo], --keyPrefix=[mah], --method=[mapreduce], --output=[file:///Users/markneedham/Downloads/apache-mahout-distribution-0.12.2/foo-out], --startPhase=[0], --tempDir=[temp]}
16/07/21 21:19:20 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/07/21 21:19:20 INFO deprecation: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
16/07/21 21:19:20 INFO deprecation: mapred.compress.map.output is deprecated. Instead, use mapreduce.map.output.compress
16/07/21 21:19:20 INFO deprecation: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
Exception in thread "main" java.lang.IllegalArgumentException: Wrong FS: file:/Users/markneedham/Downloads/apache-mahout-distribution-0.12.2/foo, expected: hdfs://localhost:8020
	at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:646)
	at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:194)
	at org.apache.hadoop.hdfs.DistributedFileSystem.access$000(DistributedFileSystem.java:106)
	at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1305)
	at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301)
	at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
	at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1301)
	at org.apache.mahout.text.SequenceFilesFromDirectory.runMapReduce(SequenceFilesFromDirectory.java:156)
	at org.apache.mahout.text.SequenceFilesFromDirectory.run(SequenceFilesFromDirectory.java:90)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
	at org.apache.mahout.text.SequenceFilesFromDirectory.main(SequenceFilesFromDirectory.java:64)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
	at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
	at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:152)
	at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
	at org.apache.hadoop.util.RunJar.main(RunJar.java:136)

I was trying to run the command against the local file system on my laptop which should have been possible according to the instructions. I couldn’t find any flag I could pass any flag that I could pass to Mahout to tell it not to use HDFS but I eventually stumbled on someone else experiencing a similar problem.

It turns out the last time I was playing around with Hadoop, in late 2015, I’d actually configured that and had completely forgotten. I needed to comment out the following config:

/usr/local/Cellar/hadoop/2.7.1/libexec/etc/hadoop/core-site.xml

<property>
    <name>fs.default.name</name>
    <value>hdfs://localhost:8020</value>
</property>

I commented that property out and all was happy with the (Hadoop) world again.

Written by Mark Needham

July 21st, 2016 at 5:57 pm

Posted in Hadoop

Tagged with ,