Changes

Jump to navigation Jump to search
3,246 bytes added ,  12:37, 13 November 2020
Created page with "There are many different ways to work with GPUs using Python. This page explores them! ==Foundations== ===CUDA vs. OpenCL=== At a fundamental level, using a GPU for computi..."
There are many different ways to work with GPUs using Python. This page explores them!

==Foundations==

===CUDA vs. OpenCL===

At a fundamental level, using a GPU for computing means using [[https://en.wikipedia.org/wiki/CUDA CUDA]], [[https://en.wikipedia.org/wiki/OpenCL OpenCL]], or some other interface (OpenGL compute, Microsoft's DirectCompute, etc.) The big trade-off between CUDA and OpenCL is proprietary performance vs. open-source generality. Usually, I favour the later. However, at this point, the nVIDIA chipsets dominate the market and CUDA (which only runs on nVIDIA) seems to be the obvious choice. There have also been some attempts to make CUDA run on CL.

===CUDA for C++ or Fortran===

If you are coding in C/C++ then you can compile CUDA code into PTX (a low-level virtual machine language that runs on GPUs) with the [[https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/#introduction nVIDIA CUDA Compiler (nvcc)]]. The nvcc separates out CUDA code that will run on the GPU and compiles it to PTX, and leaves the rest to be compiled using your regular compiler (likely GCC or the Microsoft Visual C compiler). Likewise, nVIDIA provides a dedicated Fortran CUDA compiler (nVIDIA bought the Portland Group, Inc. -- PGI -- to this end).

===Approaches for other languages===

However, if you want to write GPU compute code in Python, Perl, Java, Matlab, or a host of other languages, you'll need to think carefully about which of the offered approaches is right for you. There are broadly four classes of approaches:
# Getting a language-specific compiler that isn't made by nVIDIA!
# Wrapping CUDA C++ (or Fortran) code directly into your code
# Using the low-level CUDA video driver API
# Using the higher-level CUDA Runtime API, which sits on top of the low-level CUDA video library (i.e., using both APIs)

The distinction between the last two is that only the Runtime API gives you access to the full set of libraries including:
*[https://docs.nvidia.com/cuda/cublas/index.html cuBLAS] – CUDA Basic Linear Algebra Subroutines library
*[https://docs.nvidia.com/cuda/cusolver/index.html cuSOLVER] – CUDA based collection of dense and sparse direct solvers, see main and docs
*[https://docs.nvidia.com/cuda/cusparse/index.html cuSPARSE] – CUDA Sparse Matrix library

If you're an economist, then these libraries are very likely what you're going to want! (If you're a physicist, or doing signal processing, then you'll probably want [https://docs.nvidia.com/cuda/cufft/index.html cuFFT] and other libraries that are also in the Runtime API).

For the second option, you'll need to use [http://www.swig.org/ SWIG (Simplified Wrapper and Interface Generator)] or something that gives you equivalent functionality for your language, such as [https://cython.org/ Cython] for Python. ([https://github.com/rmcgibbo/npcuda-example NPCUDA] is a simple project to demo both of these methods in Python.) The major advantage of this option is that you aren't hitching your horse to the continued support of a whole series of intermediate APIs.

{{Colored Box|Title=Notice for Perl Programmers|Content=A quick look at the GPU support for Perl suggests that SWIG the way to go!}}

===Compiling a Kernel===


==CUDA and Python==

Navigation menu