Friday, 17 April 2020

DP #2 : TopCoder SRM 782

DIV 2C - Empty The Box

Problem Statement - here

Solution: We need to find the expected value of the total penalty. The expected value of a penalty score is nothing but - 
 E(X)= ∑ xp(x). here X denotes all the possible subsets of [1....T] whose sum equals X. 
Let's randomly choose any given subset of [1.....T] whose sum equals X. Now, since we have 2 D-dice, we can easily calculate the probability of getting sum as X. 

For example: if X = 12 and the value of D is  6, then the probability of getting 12 when 2 dice (D = 6) are thrown is nothing but 1/36.  

So, once we loop over all the possible subsets whose sum equals X, we can easily calculate X.P(X), once we do that we can add it to our expected value of X. 

The next thing to consider is - it is said in the question that we can only choose a particular subset (whose sum equals X) if the tokens are available to us. So, to ensure this we will loop over all the possible subsets of T. 

Concepts Involved - BITMASKING, DYNAMIC PROGRAMMING.

We know the value of T is at most 50. We can easily mask each subset of T in O(2^T) time. 
Given this subset - consider it as the set of available tokens on the desk. We can compute what is the value of the sum of tokens which are available (this corresponds to the set bits of our number). In the worst case, it might happen that we could not pick up any tokens in the subset. In this case, the value of X becomes this total and we will add X.P(X) to our expected value table. Also, it may happen that we can pick up some tokens from this subset and then the remaining subset reduces to a lesser value whose answer (best response) is already computed. So we can take the minimum of all possible options and update the answer for the current subset of available tokens.

Implementation : 

we will precompute the probability of getting the sum as X, given the dice value D.

Here is the snippet for that part - 
double* rollprob = new double[2*D+1](); for(int i = 1; i<=D;++i){ for (int j = 1; j<=D;++j){ rollprob[i+j] += 1./(D*D); } }

Let's create the DP array which will store the minimum expected value for each subset of T. 
double* dp = new double[1<<T]();

Now let's loop over all the possible subsets of tokens in the range [1......T]. In this - we will loop from the subset which has no tokens in it and will gradually increase to the subset which has all the tokens in it. 
for(long long int subset = 0; subset<(1<<T);++subset){

}
Once we have the subset, we will first calculate the value of tokens present in it - we can create a separate function that will give us this value.
double Calculate(long long int subset){ double t = 1,sum=0; while(subset > 0){ if (subset&1) sum += t; subset >>=1; t++; } return sum; }

Once our function is created, we will use it to compute the value of our subset. We will try to find another smaller subset that can be picked up by throwing a dice. For example, suppose our subset has [1,3,5,7] and it's sum = 16. If I can get a score of 9,4,etc.. I can remove [1,3,5] or [1.3] respectively. So that's what I am searching for. And, what will I remove then? This will be the minimum of what is remaining after i remove each. So, I will compute the X(random value) as minimum at each step and finally update it.

Complete Code : 

class EmptyTheBox{ public: double Calculate(long long int subset){ double t = 1,sum=0; while(subset > 0){ if (subset&1) sum += t; subset >>=1; t++; } return sum; } double minExpectedPenalty(int D, int T){ double* rollprob = new double[2*D+1](); for(int i = 1; i<=D;++i){ for (int j = 1; j<=D;++j){ rollprob[i+j] += 1./(D*D); } } int excess = 0; while(T > 2*D) excess+=T,T--; T = min(2*D,T); double* dp = new double[1<<T](); for(long long int subset = 0; subset<(1<<T);++subset){ for (int roll = 2; roll <= 2*D; ++roll){ double X = Calculate(subset); for(long long int reduceset = 1;reduceset < (1<<T);++reduceset){   if (((subset&reduceset) == reduceset) && (Calculate(reduceset) == roll)){ X = min(X,dp[subset^reduceset]); } } dp[subset] += rollprob[roll]*X; } } return (dp[(1<<T)-1] + excess); } };