RAPIDS is a collection of open source libraries to write, deploy and manage data pipelines end-to-end on GPUs. It uses NVIDIA CUDA® for optimizing compute resources, but exposes parallelism through well known Python interfaces.
The Focus of this post is not to share the details for RAPIDS but to detail steps to get started with it without many difficulties. The RAPIDS team has done a great job in compiling the Startup Guide. But, if you are someone like me who is very new to the world of GPU’s but got some decent experience in designing data pipelines then this post will help you very much in getting up and running using AWS platform .
These are the prerequisites mentioned on the Startup Guide.
Container Host Prerequisites
- NVIDIA Pascal™ GPU architecture or better
- CUDA 9.2 or 10.0 compatible nvidia driver
- Ubuntu 16.04 or 18.04
- Docker CE v18+
- nvidia-docker v2+
Well, I was not aware of most of these requirements and what they mean when I first started working on RAPIDS . To make it easy I have compiled these below steps –
Note:- You need to have AWS account in place for this, GCP and other cloud providers also provides support for machines required for RAPIDS but for the sake of this post I have selected AWS
Step 1) In AWS console launch an instance with ami Deep Learning Base AMI (Amazon Linux) Version 16.2 (ami-038f5aa6f8673b785) .
Step2) Once you have selected the AMI, make sure to choose GPU instances in filter-by, else you won’t be able to run the docker image of RAPIDS as the NVIDIA CUDA® framework requires GPU in place
Step3) Once the instance is up and running
- Install docker
yum install docker
- once docker is installed you need to download the image for RAPIDS from docker repository, there are many other places where you can find the image , follow Startup Guide for more details on this. Run below command to download RAPIDS image
$ docker pull nvcr.io/nvidia/rapidsai/rapidsai:latest
- once the image is downloaded run below command to start RAPIDS container
$ docker run --runtime=nvidia \ --rm -it \ -p 8888:8888 \ -p 8787:8787 \ -p 8786:8786 \ -p 8889:8889 \ nvcr.io/nvidia/rapidsai/rapidsai:latest
Note the port mentioned in this command it is required that the jupyter gets started on of the mentioned port else you won’t be able to access the notebook and will get error
File "/conda/envs/gdf/lib/python3.5/site-packages/tornado/netutil.py", line 168, in bind_sockets sock.bind(sockaddr) OSError: [Errno 99] Cannot assign requested address
It is mentioned in Startup Guide that the above command will start jupyter but it was not the case with me . I had to start the jupyter separately. If, this happens with you too use below command to start the jupyter
jupyter notebook --ip=0.0.0.0 --port=8889 --allow-root
- Install docker
And YA!! you got the jupyter notebook running which using RAPIDS to perform ETL and many other transformations. Follow Startup Guide for all ETL operations as the intent for this post was just to get RAPIDS up and running using docker image.
For reference here is the cheat sheet for RAPIDS