16 Apr 2015

R: Think Bayes - More posterior probability calculations

As I mentioned in a post last week I’ve been reading through Think Bayes and translating some of the examples form Python to R.

After my first post Antonios suggested a more idiomatic way of writing the function in R so I thought I’d give it a try to calculate the probability that combinations of cookies had come from each bowl.

In the simplest case we have this function which takes in the names of the bowls and the likelihood scores:

R f = function(names,likelihoods) {
  # Assume each option has an equal prior
  priors = rep(1, length(names)) / length(names)

  # create a data frame with all info you have
  dt = data.frame(names,priors,likelihoods)

  # calculate posterior probabilities
  dt$post = dt$priors*dt$likelihoods / sum(dt$priors*dt$likelihoods)

  # specify what you want the function to return
  list(names=dt$names, priors=dt$priors, likelihoods=dt$likelihoods, posteriors=dt$post)
}

We assume a prior probability of 0.5 for each bowl.

Given the following probabilities of of different cookies being in each bowl...

PYTHON mixes = {
  'Bowl 1':dict(vanilla=0.75, chocolate=0.25),
  'Bowl 2':dict(vanilla=0.5, chocolate=0.5),
}

...we can simulate taking one vanilla cookie with the following parameters:

R Likelihoods = c(0.75,0.5)
Names = c("Bowl 1", "Bowl 2")
res=f(Names,Likelihoods)

> res$posteriors[res$name == "Bowl 1"]
[1] 0.6
> res$posteriors[res$name == "Bowl 2"]
[1] 0.4

If we want to simulate taking 3 vanilla cookies and 1 chocolate one we’d have the following:

R Likelihoods = c((0.75 ** 3) * (0.25 ** 1), (0.5 ** 3) * (0.5 ** 1))
Names = c("Bowl 1", "Bowl 2")
res=f(Names,Likelihoods)

> res$posteriors[res$name == "Bowl 1"]
[1] 0.627907
> res$posteriors[res$name == "Bowl 2"]
[1] 0.372093

That’s a bit clunky and the intent of '3 vanilla cookies and 1 chocolate' has been lost. I decided to refactor the code to take in a vector of cookies and calculate the likelihoods internally.

First we need to create a data structure to store the mixes of cookies in each bowl that we defined above. It turns out we can do this using a nested list:

R bowl1Mix = c(0.75, 0.25)
names(bowl1Mix) = c("vanilla", "chocolate")
bowl2Mix = c(0.5, 0.5)
names(bowl2Mix) = c("vanilla", "chocolate")
Mixes = list("Bowl 1" = bowl1Mix, "Bowl 2" = bowl2Mix)

> Mixes
$`Bowl 1`
  vanilla chocolate
     0.75      0.25

$`Bowl 2`
  vanilla chocolate
      0.5       0.5

Now let’s tweak our function to take in observations rather than likelihoods and then calculate those likelihoods internally:

_~r likelihoods = function(names, observations) { scores = c(1,1) names(scores) = names for(name in names) { for(observation in observations) { scores[name] = scores[name] * mixes[observation] } } return(scores) } f = function(names,mixes,observations) { # Assume each option has an equal prior priors = rep(1, length(names)) / length(names) # create a data frame with all info you have dt = data.frame(names,priors) dt$likelihoods = likelihoods(Names, Observations) # calculate posterior probabilities dt$post = dt$priors*dt$likelihoods / sum(dt$priors*dt$likelihoods) # specify what you want the function to return list(names=dt$names, priors=dt$priors, likelihoods=dt$likelihoods, posteriors=dt$post) } _~

And if we call that function: _~r Names = c("Bowl 1", "Bowl 2") bowl1Mix = c(0.75, 0.25) names(bowl1Mix) = c("vanilla", "chocolate") bowl2Mix = c(0.5, 0.5) names(bowl2Mix) = c("vanilla", "chocolate") Mixes = list("Bowl 1" = bowl1Mix, "Bowl 2" = bowl2Mix) Mixes Observations = c("vanilla", "vanilla", "vanilla", "chocolate") res=f(Names,Mixes,Observations) > res$posteriors[res$names == "Bowl 1"] [1] 0.627907 > res$posteriors[res$names == "Bowl 2"] [1] 0.372093 _~

Exactly the same result as before! #win

About the author

I'm currently working on short form content at ClickHouse. I publish short 5 minute videos showing how to solve data problems on YouTube @LearnDataWithMark. I previously worked on graph analytics at Neo4j, where I also co-authored the O'Reilly Graph Algorithms Book with Amy Hodler.