· pinot

Apache Pinot: Deleting instances in a bad state

Sometimes when I start up a local Pinot cluster after doing a hard shutdown (by restarting my computer) I noticed that the Pinot Data Explorer shows controllers, brokers, or servers in a bad state. In this blog post we’ll see how to get rid of those bad instances.

bad state banner
Figure 1. Apache Pinot: Deleting instances in a bad state

The screenshot below shows several instances in the bad state.

bad instances
Figure 2. Instances in a bad state

I’m not entirely sure why this happens, but I assume it’s something to do with my local IP address changing as far as Pinot’s concerned. Having these instances in a bad state doesn’t actually cause me any problems, they’re more of an irritation.

Luckily we can get rid of that irritation with help from the REST API’s drop instance endpoint, shown in the screen shot below:

drop instance endpoint
Figure 3. Drop instance endpoint

So, what does calling this end point actually do? I had a quick look at the code and learnt that it deletes the following Zookeeper entries:

  • INSTANCES/<instanceName>

  • /CONFIGS/PARTICIPANT/<instanceName>

We can see an example of what that part of our Zookeeper metadata looks like below:

zookeeper metadata
Figure 4. Zookeeper meta

We want to remove the following instances:

  • Controller_172.21.0.4_9000

  • Controller_172.21.0.2_9000

  • Controller_172.21.0.5_9000

  • Server_172.21.0.3_8098

  • Server_172.21.0.4_8098

  • Broker_172.21.0.3_8099

Let’s try to remove those instances, by running the following command:

for instance in "Controller_172.21.0.4_9000" "Controller_172.21.0.2_9000" "Controller_172.21.0.5_9000" "Server_172.21.0.3_8098" "Server_172.21.0.4_8098" "Broker_172.21.0.3_8099"; do
  curl -X DELETE "http://localhost:9000/instances/${instance}" \
    -H "accept: application/json" 2>/dev/null;
  printf "\n"
done
Output
{"status":"Successfully dropped instance"}
{"status":"Successfully dropped instance"}
{"status":"Successfully dropped instance"}
{"status":"Successfully dropped instance"}
{"_code":409,"_error":"Failed to drop instance Server_172.21.0.4_8098 - Instance Server_172.21.0.4_8098 exists in ideal state for races_REALTIME"}
{"_code":409,"_error":"Failed to drop instance Broker_172.21.0.3_8099 - Instance Broker_172.21.0.3_8099 exists in ideal state for brokerResource"}

We were able to remove four of the instances, but it looks like two of them are still in use. Let’s see if we can figure out what’s going on.

We can return the ideal state for the brokerResource by running the following command:

curl -X GET "http://localhost:9000/zk/get?path=%2FPinotCluster%2FIDEALSTATES%2FbrokerResource" \
  -H "accept: text/plain" 2>/dev/null
Output
{
  "id" : "brokerResource",
  "simpleFields" : {
    "BATCH_MESSAGE_MODE" : "false",
    "IDEAL_STATE_MODE" : "CUSTOMIZED",
    "NUM_PARTITIONS" : "3",
    "REBALANCE_MODE" : "CUSTOMIZED",
    "REPLICAS" : "0",
    "STATE_MODEL_DEF_REF" : "BrokerResourceOnlineOfflineStateModel",
    "STATE_MODEL_FACTORY_NAME" : "DEFAULT"
  },
  "mapFields" : {
    "courses_OFFLINE" : {
      "Broker_172.21.0.3_8099" : "ONLINE",
      "Broker_172.21.0.5_8099" : "ONLINE"
    },
    "parkrun_REALTIME" : {
      "Broker_172.21.0.3_8099" : "ONLINE",
      "Broker_172.21.0.5_8099" : "ONLINE"
    },
    "races_REALTIME" : {
      "Broker_172.21.0.3_8099" : "ONLINE",
      "Broker_172.21.0.5_8099" : "ONLINE"
    }
  },
  "listFields" : { }
}

Let’s create a file, newBrokerState.json, that removes the Broker_172.21.0.3_8099 entry. The file should look like this:

newBrokerState.json
{
  "id" : "brokerResource",
  "simpleFields" : {
    "BATCH_MESSAGE_MODE" : "false",
    "IDEAL_STATE_MODE" : "CUSTOMIZED",
    "NUM_PARTITIONS" : "3",
    "REBALANCE_MODE" : "CUSTOMIZED",
    "REPLICAS" : "0",
    "STATE_MODEL_DEF_REF" : "BrokerResourceOnlineOfflineStateModel",
    "STATE_MODEL_FACTORY_NAME" : "DEFAULT"
  },
  "mapFields" : {
    "courses_OFFLINE" : {
      "Broker_172.21.0.5_8099" : "ONLINE"
    },
    "parkrun_REALTIME" : {
      "Broker_172.21.0.5_8099" : "ONLINE"
    },
    "races_REALTIME" : {
      "Broker_172.21.0.5_8099" : "ONLINE"
    }
  },
  "listFields" : { }
}

Now navigate to the Zookeeper browser at http://localhost:9000/#/zookeeper and navigate down to Pinot Cluster → IDEALSTATES → brokerResource. Click on the edit button, paste in this JSON document, and click Update.

update ideal state
Figure 5. Updating the brokerResouce ideal state

And now let’s do the same thing for races_REALTIME. The ideal state at the moment looks like this:

curl -X GET "http://localhost:9000/zk/get?path=%2FPinotCluster%2FIDEALSTATES%2Fraces_REALTIME" \
  -H "accept: text/plain" 2>/dev/null
Output
{
  "id" : "races_REALTIME",
  "simpleFields" : {
    "BATCH_MESSAGE_MODE" : "false",
    "IDEAL_STATE_MODE" : "CUSTOMIZED",
    "INSTANCE_GROUP_TAG" : "races_REALTIME",
    "MAX_PARTITIONS_PER_INSTANCE" : "1",
    "NUM_PARTITIONS" : "11",
    "REBALANCE_MODE" : "CUSTOMIZED",
    "REPLICAS" : "1",
    "STATE_MODEL_DEF_REF" : "SegmentOnlineOfflineStateModel",
    "STATE_MODEL_FACTORY_NAME" : "DEFAULT"
  },
  "mapFields" : {
    "races__0__0__20220127T1647Z" : {
      "Server_172.21.0.6_8098" : "ONLINE"
    },
    "races__0__10__20220218T1304Z" : {
      "Server_172.21.0.4_8098" : "CONSUMING"
    },
    "races__0__1__20220202T1635Z" : {
      "Server_172.21.0.6_8098" : "ONLINE"
    },
    "races__0__2__20220203T1636Z" : {
      "Server_172.21.0.6_8098" : "ONLINE"
    },
    "races__0__3__20220209T1442Z" : {
      "Server_172.21.0.6_8098" : "ONLINE"
    },
    "races__0__4__20220210T1442Z" : {
      "Server_172.21.0.6_8098" : "ONLINE"
    },
    "races__0__5__20220212T1807Z" : {
      "Server_172.21.0.6_8098" : "ONLINE"
    },
    "races__0__6__20220214T1304Z" : {
      "Server_172.21.0.6_8098" : "ONLINE"
    },
    "races__0__7__20220215T1304Z" : {
      "Server_172.21.0.6_8098" : "ONLINE"
    },
    "races__0__8__20220216T1304Z" : {
      "Server_172.21.0.6_8098" : "ONLINE"
    },
    "races__0__9__20220217T1304Z" : {
      "Server_172.21.0.6_8098" : "ONLINE"
    }
  },
  "listFields" : { }
}

And now let’s replace Server_172.21.0.4_8098 with Server_172.21.0.6_8098 for the consuming segment.

newServerState.json
{
  "id" : "races_REALTIME",
  "simpleFields" : {
    "BATCH_MESSAGE_MODE" : "false",
    "IDEAL_STATE_MODE" : "CUSTOMIZED",
    "INSTANCE_GROUP_TAG" : "races_REALTIME",
    "MAX_PARTITIONS_PER_INSTANCE" : "1",
    "NUM_PARTITIONS" : "11",
    "REBALANCE_MODE" : "CUSTOMIZED",
    "REPLICAS" : "1",
    "STATE_MODEL_DEF_REF" : "SegmentOnlineOfflineStateModel",
    "STATE_MODEL_FACTORY_NAME" : "DEFAULT"
  },
  "mapFields" : {
    "races__0__0__20220127T1647Z" : {
      "Server_172.21.0.6_8098" : "ONLINE"
    },
    "races__0__10__20220218T1304Z" : {
      "Server_172.21.0.6_8098" : "CONSUMING"
    },
    "races__0__1__20220202T1635Z" : {
      "Server_172.21.0.6_8098" : "ONLINE"
    },
    "races__0__2__20220203T1636Z" : {
      "Server_172.21.0.6_8098" : "ONLINE"
    },
    "races__0__3__20220209T1442Z" : {
      "Server_172.21.0.6_8098" : "ONLINE"
    },
    "races__0__4__20220210T1442Z" : {
      "Server_172.21.0.6_8098" : "ONLINE"
    },
    "races__0__5__20220212T1807Z" : {
      "Server_172.21.0.6_8098" : "ONLINE"
    },
    "races__0__6__20220214T1304Z" : {
      "Server_172.21.0.6_8098" : "ONLINE"
    },
    "races__0__7__20220215T1304Z" : {
      "Server_172.21.0.6_8098" : "ONLINE"
    },
    "races__0__8__20220216T1304Z" : {
      "Server_172.21.0.6_8098" : "ONLINE"
    },
    "races__0__9__20220217T1304Z" : {
      "Server_172.21.0.6_8098" : "ONLINE"
    }
  },
  "listFields" : { }
}

We’ll update that node in Zookeeper the same way that we did with the brokerResource, and now we can re-run our command to delete these two instances:

for instance in "Controller_172.21.0.4_9000" "Controller_172.21.0.2_9000" "Controller_172.21.0.5_9000" "Server_172.21.0.3_8098" "Server_172.21.0.4_8098" "Broker_172.21.0.3_8099"; do
  curl -X DELETE "http://localhost:9000/instances/${instance}" \
    -H "accept: application/json" 2>/dev/null;
  printf "\n"
done
Output
{"_code":404,"_error":"Instance Controller_172.21.0.4_9000 not found"}
{"_code":404,"_error":"Instance Controller_172.21.0.2_9000 not found"}
{"_code":404,"_error":"Instance Controller_172.21.0.5_9000 not found"}
{"_code":404,"_error":"Instance Server_172.21.0.3_8098 not found"}
{"status":"Successfully dropped instance"}
{"status":"Successfully dropped instance"}

The first 4 instances return a 404 status since we’ve already deleted them, but the last two have now been deleted!

Now if we navigate back to the home page of the Pinot Data Explorer, we’ll see that there are no bad instances anymore:

no more bad
Figure 6. No instances in a bad state
  • LinkedIn
  • Tumblr
  • Reddit
  • Google+
  • Pinterest
  • Pocket