Parallelize msmf corr coeff.m
Parallelize msmf corr coeff.m | |
---|---|
Project Information | |
Has title | Parallelize msmf corr coeff.m |
Has owner | Wei Wu |
Has start date | 2018-07-09 |
Has deadline date | |
Has keywords | Matlab, parallel computing, CPU |
Has project status | Active |
Is dependent on | Estimating Unobserved Complementarities between Entrepreneurs and Venture Capitalists Matlab Code |
Dependent(s): | NOTS Computing for Matching Entrepreneurs to VCs |
Has sponsor | McNair Center |
Copyright © 2019 edegan.com. All Rights Reserved. |
This page documents all changes that Wei made to the Matlab code for Matching Entrepreneurs to VC. It also explains the decisions Wei made, and analyzes where further improvement might be achieved in the future.
Contents
Changes made to the code
To speed up the valuation of the objective function used by ga, Wei parallelized a couple of lines in msmf_corr_coeff.m.
Around line 80 of msmf_corr_coeff
Original code:
for m = 1 : M cholA = chol(exchsig(N1(m)-1, N2(m)-1, th)); epsim12 = zeros(N1(m), N2(m), R); epsimR12 = reshape(mtimesx(repmat(cholA', [1 1 R]), reshape(epsimR(psa(m)+1: psa(m)+(N1(m)-1)*(N2(m)-1), :, :), [(N1(m)-1)*(N2(m)-1) 1 R])), [N1(m)-1 N2(m)-1 R]); epsim12(2:end, 2:end, :) = epsimR12; f(psa(m)+1 : psa(m+1), :) = f(psa(m)+1 : psa(m+1), :) - reshape(epsim12, [N1(m)*N2(m) R]) end
For R=200, monte_M = 70 (hence large M), and mktsize = 30, each call to msmf_corr_coeff takes ~140 seconds. Computing epsimR12 is taking more than 50% time of msmf_corr_coeff.m. Each computation of epsimR12 is taking no more than 2 seconds, but for a large M it will take a long time in total. To improve this, we use a parfor to paralellize this part.
New code:
epsim_cell = cell(M,1); parfor m = 1 : M cholA = chol(exchsig(N1(m)-1, N2(m)-1, th)); epsim12 = zeros(N1(m), N2(m), R); epsimR12 = reshape(mtimesx(repmat(cholA', [1 1 R]), reshape(epsimR(psa(m)+1 : psa(m)+(N1(m)-1)*(N2(m)-1), :, :), [(N1(m)-1)*(N2(m)-1) 1 R])), [N1(m)-1 N2(m)-1 R]); epsim12(2:end, 2:end, :) = epsimR12; epsim_cell{m} = reshape(epsim12, [N1(m)*N2(m) R]); end for m = 1 : M f(psa(m)+1 : psa(m+1), :) = f(psa(m)+1 : psa(m+1), :) - epsim_cell{m}; end
To avoid data-racing, in each iteration m, we stored epsim12 into epsim_cell, and compute f using another for loop. Using parfor on a 12-cores machine gives a four time speed up for computing msmf_corr_coeff.m. Note that the actual speedup depends how many cores you are using. For the same R, monte_M, and mktsize, it now takes ~ 35 seconds to finish one call to msmf_corr_coeff.
Around line 120 of msmf_corr_coeff solving LP problems
For a large R, e.g. R = 200, solving 200 linear programming problems in parallel should be beneficial, especially when the each LP itself is easy to solve (takes less than 0.1 seconds each). Initially we managed to use gurobi inside a parfor:
% Gurobi inisde parfor. For large R (>100). By Wei. model.rhs = [full(constrB); full(constrM)]; model.A = [sparse(B);sparse(-B)]; model.ub = ones(L,1); model.lb = zeros(L,1); model.sense = '<'; for r = 1 : R % solve_LP is a function wrapper. It calls gurobi to solve a single LP. asgr(:,r) = solve_LP(r,f,model); end
% The function wrapper to use Gurobi function X = solve_LP(r,f,model) model.obj = f(:,r); parameters.outputflag = 0; result = gurobi(model, parameters); X = result.x; end
However using Gurobi actually did not help with the code performance. This is possibly due to the cost to create models for each Gurobi object. Gurobi is indeed the best commercial solver in the market, but it should be used for solving large LPs (hundreds of thousands of constraints), where as our Lp has only a few thousand constraints. We reversed back to using Matlab's native linprog:
% Solve LPs in parallel with Matlab's native linprog parfor r = 1 : R options = optimset('Display', 'off'); [asgr(:, r), ~, eflag(r)] = linprog(f(:, r), [B;-B], [constrB; constrM], [], [], zeros(L, 1), ones(L,1),options); end
Possible Further Improvements
At this point I believe I have done everything I could for code performance. However if you look at the profiler, you will find that the function moments also takes a good percentage of time. I tried to parallelise moments with parfor, but it did not give any improvements.