The cross entropy method is a general approach to optimization that relies in two nice ideas. In the context of finding the maximum of a scalar-valued function this means:

- Generate some random parameters and evaluate the function there.
- Use the best values of the parameter to generate new candidates.

One simple way of generating such random parameters is fitting a normal distribution at every iteration: we choose a subset of “elite parameters” (a fraction of those we tried), calculate their mean and covariance and use it to generate a new population of parameters.

This kind of optimization methods is quite useful for instance in reinforcement learning.

For instance, suppose we want to maximize the function:

# the function we want to optimize f <- function(theta){ reward = -sum((solution - theta)**2) return(reward) } solution <- c(0.5, 0.1, -0.3)

Then the cross-entropy algorithm in this case is:

library(MASS) cem <- function(f, n_iter, theta_mean, theta_std, batch_size=25, elite_frac=0.2){ # Now, for the algorithms for(it in 1:n_iter){ # Sample parameter vectors thetas <- matrix(mvrnorm(n=batch_size*dim_theta, mu= theta_mean, Sigma=theta_std) , ncol = dim_theta) rewards <- apply(thetas,1,f) # Get elite parameters n_elite <- as.integer(batch_size * elite_frac) elite_inds <- sort(rewards, decreasing = T, index.return=T)$ix[1:n_elite] elite_thetas <- thetas[elite_inds,] # Update theta_mean, theta_std theta_mean <- apply(elite_thetas, 2,mean) theta_std <- 0.01*diag(dim_theta)+0.99*cov(elite_thetas) } return(theta_mean) } cem(f,300)

and we call this like:

dim_theta <- length(solution) theta_mean <- matrix(0,dim_theta,1) theta_std <- diag(dim_theta) batch_size <- 25 # number of samples per batch elite_frac <- 0.2 # fraction of samples used as elite set cem(f,300, theta_mean=theta_mean, theta_std=theta_std , batch_size=batch_size, elite_frac=elite_frac)