Hello GPU¶
In [3]:
import pyopencl as cl
import numpy as np
import numpy.linalg as la
mf = cl.mem_flags
This notebook demonstrates a simple GPU workflow that touches all essential pieces:
- Data transfer
- Kernel compilation
- Execution
In [4]:
a = np.random.rand(50000).astype(np.float32)
Now create a context ctx
and a command queue queue
:
In [5]:
ctx = cl.create_some_context()
queue = cl.CommandQueue(ctx)
Now allocate a buffer. Buffer(context, flags, size=None, hostbuf=None)
In [6]:
a_buf = cl.Buffer(ctx, mf.READ_WRITE, size=a.nbytes)
Then transfer data:
In [7]:
cl.enqueue_copy(queue, a_buf, a)
Out[7]:
<pyopencl._cl.NannyEvent at 0x7f722c1a9888>
Here's our kernel source code:
In [8]:
prg = cl.Program(ctx, """
__kernel void twice(__global float *a)
{
int gid = get_global_id(0);
a[gid] = 2*a[gid];
}
""").build()
Run the kernel.
In [9]:
prg.twice(queue, a.shape, None, a_buf)
Out[9]:
<pyopencl._cl.Event at 0x7f72249540f8>
Copy the data back.
In [10]:
result = np.empty_like(a)
cl.enqueue_copy(queue, result, a_buf)
Out[10]:
<pyopencl._cl.NannyEvent at 0x7f722c916e08>
Check the result.
In [11]:
print(la.norm(result - 2*a), la.norm(a))
0.0 128.81612
In [ ]: