Difference between revisions of "Matlab, CUDA, and GPU Computing"

From edegan.com
Jump to navigation Jump to search
 
(19 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 +
{{Project
 +
|Has title=Matlab, CUDA, and GPU Computing
 +
|Has owner=Wei Wu
 +
|Has start date=2018/06/22
 +
|Has keywords=Matlab, parallel computing, CPU
 +
|Has project output=Content
 +
|Has project status=Active
 +
|Has sponsor=McNair Center
 +
|Depends upon it=Estimating Unobserved Complementarities between Entrepreneurs and Venture Capitalists Matlab Code
 +
}}
 
Main Project here: [[Estimating Unobserved Complementarities between Entrepreneurs and Venture Capitalists Matlab Code]]
 
Main Project here: [[Estimating Unobserved Complementarities between Entrepreneurs and Venture Capitalists Matlab Code]]
  
==Synposis==
+
==Synopsis==
On July 2nd of 2018, Chenyu Yang, the original code author of the above main project, requested me to have "an implementation of LP on GPU", and Ed seconded this idea. I henceforth started exploring the possibility of such an endeavor. What is more important, such "an implementation of LP on GPU" should beat our current solver (Gurobi or Matlab's linprog). After some research, my conclusion is that we cannot beat performance of Gurobi on solving a single LP, and we should continued to use Gurobi instead of linprog. However, we might be still able to speed up our code by task-parallelising on CPU cores.
+
On July 2nd of 2018, Chenyu Yang, the original code author of the above main project, requested me to have "an implementation of LP (linear programming) on GPU", and Ed seconded this idea. I henceforth started exploring the possibility of such an endeavor. What is more important, such "an implementation of LP on GPU" should beat our current solver (Gurobi or Matlab's linprog). <br>
 +
After some research, my conclusion is that we cannot beat performance of Gurobi on solving a single LP, and we should continue to use Gurobi instead of linprog. However, we might be still able to speed up our code by task-parallelising on CPU cores.
  
 
==Getting Started with our GPU==
 
==Getting Started with our GPU==
Line 30: Line 41:
 
* Maltba '''does''' have an option to [https://www.mathworks.com/help/gads/how-to-use-parallel-processing.html run ga in parallel].
 
* Maltba '''does''' have an option to [https://www.mathworks.com/help/gads/how-to-use-parallel-processing.html run ga in parallel].
  
* Gurobi '''does''' support CPU parallelism.
+
* Gurobi '''does''' support [https://www.gurobi.com/documentation/8.0/refman/continuous_models.html CPU parallelism]. In fact, it will use the concurrent solver for LP by default.
  
 
* In general, individual threads of GPU only perform well with simple arithmetic tasks. Task-parallelizing  toolboxes/libraries/packages onto threads of GPU is '''NO GO'''.
 
* In general, individual threads of GPU only perform well with simple arithmetic tasks. Task-parallelizing  toolboxes/libraries/packages onto threads of GPU is '''NO GO'''.
  
==Serial vs Parallel solving many LPs on CPU==
+
==Serial vs Parallel: solving many LPs on CPU==
 
===Using a function wrapper for Gurobi===
 
===Using a function wrapper for Gurobi===
I manage to call Gurobi solvers on many LP problems in a parallel manner. To accomplish this, one need to wrap the LP solving part into a function, and call this function inside the parfor. See [[File:parallel gurobi.pdf]] for a (lazy) example.
+
I manage to call Gurobi solvers on many LP problems in a parallel manner. To accomplish this, one needs to wrap the LP solving part into a function, and call this function inside the parfor. See [[File:parallel gurobi.pdf]] for a (lazy) example.
  
 
===Testing performance===
 
===Testing performance===
Line 56: Line 67:
 
Serial: 9000+s[[File:serial_gurobi_R=1000.png]] <br>
 
Serial: 9000+s[[File:serial_gurobi_R=1000.png]] <br>
 
Parallel: 2873s[[File:parallel_call_gurobi_R=1000.png]] <br>
 
Parallel: 2873s[[File:parallel_call_gurobi_R=1000.png]] <br>
 +
 +
====Testing instance 3====
 +
We used the following parameters:
 +
'''R = 200''', K = 1, S = 2, industry = 4, monte_N = 1, monte_M = 2. quota_per_up = 1, mktsize = 3, n_trail = 2, ini_w_M = 10, n_cores = 12.
 +
<br>
 +
 +
Result:
 +
Serial: 3231 s [[File:serial_gurobi_R=200.png]] <br>
 +
Parallel: 1605 s [[File:parallel_gurobi_R=200.png]] <br>
 +
 +
==Further parallelize msmf_corr_coeff==
 +
At this point I decided to make a new project page to document all the changes I made to the code for parallelism. Please refer to this page: [[Parallelize msmf corr coeff.m]]
  
 
==Deliverable==
 
==Deliverable==
 
===Week July 2 - 6===
 
===Week July 2 - 6===
<s>In Matlab, use parfor for both CPU and GPU to solve a set of LPs. Profile and compare.</s> Just try to do CPU instead. Figure out how to do CPU-based parallel computing with Gurobi. I cannot find a way to run Gurobi solvers inside a parfor. I believe Matlab's linprog can, but linprog is much slower than Gurobi. There will be some trade off. I need to test this. <br>
+
<s>In Matlab, use parfor for both CPU and GPU to solve a set of LPs. Profile and compare.</s>
 +
<br>Just try to do CPU instead. Figure out how to do CPU-based parallel computing with Gurobi. I cannot find a way to run Gurobi solvers inside a parfor. I believe Matlab's linprog can, but linprog is much slower than Gurobi. There will be some trade off. I need to test this. <br>
 
It is too hard to test this using the code written by Chenyu. Instead I will write my small piece of code to test this.
 
It is too hard to test this using the code written by Chenyu. Instead I will write my small piece of code to test this.
  

Latest revision as of 11:06, 13 November 2020


Project
Matlab, CUDA, and GPU Computing
Project logo 02.png
Project Information
Has title Matlab, CUDA, and GPU Computing
Has owner Wei Wu
Has start date 2018/06/22
Has deadline date
Has keywords Matlab, parallel computing, CPU
Has project status Active
Has sponsor McNair Center
Has project output Content
Copyright © 2019 edegan.com. All Rights Reserved.

Main Project here: Estimating Unobserved Complementarities between Entrepreneurs and Venture Capitalists Matlab Code

Synopsis

On July 2nd of 2018, Chenyu Yang, the original code author of the above main project, requested me to have "an implementation of LP (linear programming) on GPU", and Ed seconded this idea. I henceforth started exploring the possibility of such an endeavor. What is more important, such "an implementation of LP on GPU" should beat our current solver (Gurobi or Matlab's linprog).
After some research, my conclusion is that we cannot beat performance of Gurobi on solving a single LP, and we should continue to use Gurobi instead of linprog. However, we might be still able to speed up our code by task-parallelising on CPU cores.

Getting Started with our GPU

We are running remotely on the Database Server via VNC. The VNC service on DB Server was configured by Wei during Summer 2018. Matlab, CUDA, and the Titan GPU were installed/configured by Office of Information Technology(OIT).

  • To start/configure the VNC service on DB Server and to get connected remotely, see the documentation here.
  • Once you are connected to DB Server through VNC, open a terminal on DB Server and type
matlab

This will bring up the Matlab GUI.

  • To check if Matlab is working with our Nvidia graphics card, in the Matlab command window, type
gpuDevice. 

GpuDevice.png

What Works, and What Doesn't

  • For the above reason, Gurobi does not support GPU computing.
  • Matlab does not have GPU-based linprog.
  • Gurobi does support CPU parallelism. In fact, it will use the concurrent solver for LP by default.
  • In general, individual threads of GPU only perform well with simple arithmetic tasks. Task-parallelizing toolboxes/libraries/packages onto threads of GPU is NO GO.

Serial vs Parallel: solving many LPs on CPU

Using a function wrapper for Gurobi

I manage to call Gurobi solvers on many LP problems in a parallel manner. To accomplish this, one needs to wrap the LP solving part into a function, and call this function inside the parfor. See File:Parallel gurobi.pdf for a (lazy) example.

Testing performance

Testing instance 1

We used the following parameters for testing:

R = 10, K = 1, S = 2, industry = 4, monte_N = 1, monte_M = 2. quota_per_up = 1, mktsize = 3, n_trail = 2, ini_w_M = 10, n_cores = 12. 


Run serial and parallel(using the above method) versions of msmf_corr_coeff.m. With profiling, the parallel version takes much longer time than the serial one. On a closer inspection, a single call to gurobi takes a very small amount of time. Using parfor will probably not give us any benefit. However, I suspect that for large R (~1000) we will see improvements from parallelism.

Serial: 356sSerial gurobi R=10.png
Parallel: 874sParallel call gurobi R=10.png

Testing instance 2

We used the following parameters:

R = 1000, K = 1, S = 2, industry = 4, monte_N = 1, monte_M = 2. quota_per_up = 1, mktsize = 3, n_trail = 2, ini_w_M = 10, n_cores = 12. 


The parallel version now is ~3.5 time faster than the serial version. I believe it might be even faster with more cores. However I do not completely understand how the computation time scales with R (and other parameters). Will run another test with R = 200.
Serial: 9000+sSerial gurobi R=1000.png
Parallel: 2873sParallel call gurobi R=1000.png

Testing instance 3

We used the following parameters:

R = 200, K = 1, S = 2, industry = 4, monte_N = 1, monte_M = 2. quota_per_up = 1, mktsize = 3, n_trail = 2, ini_w_M = 10, n_cores = 12. 


Result: Serial: 3231 s Serial gurobi R=200.png
Parallel: 1605 s Parallel gurobi R=200.png

Further parallelize msmf_corr_coeff

At this point I decided to make a new project page to document all the changes I made to the code for parallelism. Please refer to this page: Parallelize msmf corr coeff.m

Deliverable

Week July 2 - 6

In Matlab, use parfor for both CPU and GPU to solve a set of LPs. Profile and compare.
Just try to do CPU instead. Figure out how to do CPU-based parallel computing with Gurobi. I cannot find a way to run Gurobi solvers inside a parfor. I believe Matlab's linprog can, but linprog is much slower than Gurobi. There will be some trade off. I need to test this.
It is too hard to test this using the code written by Chenyu. Instead I will write my small piece of code to test this.

Week July 9 - 13

  • Continue experimenting with calling gurobi inside parfor.
  • Look at ways to speed up moments.m. I was wrong about the computational complexity of moments.m. However it might still benefit us to speed up moments.m, but it is on a lower priority now.

Reference

1. MATLAB GPU Computing Support for NVIDIA CUDA-Enabled GPUs
2. Getting Started with Parallel Computing Toolbox
3. Speeding Up Algorithms: When Parallel Computing and GPUs Do and Don't Accelerate