Creazione di un pacchetto R piccolo con CUDA e Rcpp

Sto lavorando a un tiny R package che utilizza CUDA e Rcpp, adattato dall'output di Rcpp.package.skeleton(). Descriverò per prima cosa cosa succede sul ramo principale per il commit intitolato "namespace fisso". Il pacchetto si installa correttamente se dimentico CUDA (ad esempio, se rimuovo src/Makefile, cambia src/rcppcuda.cu in src/rcppcuda.cpp e commenta il codice che definisce e chiama kernel). Ma come è, la compilazione fallisce.Creazione di un pacchetto R piccolo con CUDA e Rcpp

Mi piacerebbe anche sapere come compilare con un Makevars o Makevars.in invece di un Makefile e, in generale, cercare di renderlo come piattaforma indipendente come è realistico. Ho letto di Makevars nello R extensions manual, ma non sono ancora riuscito a farlo funzionare.

Alcuni di voi potrebbero suggerire rCUDA, ma quello che sto cercando di migliorare un grande pacchetto che sto già sviluppando da qualche tempo, e non sono sicuro che il passaggio valga la pena ricominciare da capo.

Ad ogni modo, ecco cosa succede quando faccio un R CMD build e R CMD INSTALL su this one (branch master, si impegnano dal titolo "namespace fisso").

* installing to library ‘/home/landau/.R/library’ 
* installing *source* package ‘rcppcuda’ ... 
** libs 
** arch - 
/usr/local/cuda/bin/nvcc -c rcppcuda.cu -o rcppcuda.o --shared -Xcompiler "-fPIC" -gencode arch=compute_20,code=sm_20 -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -I/apps/R-3.2.0/include -I/usr/local/cuda/include 
rcppcuda.cu:1:18: error: Rcpp.h: No such file or directory 
make: *** [rcppcuda.o] Error 1 
ERROR: compilation failed for package ‘rcppcuda’ 
* removing ‘/home/landau/.R/library/rcppcuda’

... che è strano, perché faccio includere Rcpp.h, e Rcpp è installato.

$ R 

R version 3.2.0 (2015-04-16) -- "Full of Ingredients" 
Copyright (C) 2015 The R Foundation for Statistical Computing 
Platform: x86_64-unknown-linux-gnu (64-bit)

...

> library(Rcpp) 
> sessionInfo() 
R version 3.2.0 (2015-04-16) 
Platform: x86_64-unknown-linux-gnu (64-bit) 
Running under: CentOS release 6.6 (Final) 

locale: 
[1] LC_CTYPE=en_US.UTF-8  LC_NUMERIC=C    
[3] LC_TIME=en_US.UTF-8  LC_COLLATE=en_US.UTF-8  
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 
[7] LC_PAPER=en_US.UTF-8  LC_NAME=C     
[9] LC_ADDRESS=C    LC_TELEPHONE=C    
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C  

attached base packages: 
[1] stats  graphics grDevices utils  datasets methods base  

other attached packages: 
[1] Rcpp_0.11.6 
>

sto usando CentOS,

$ cat /etc/*-release 
CentOS release 6.6 (Final) 
LSB_VERSION=base-4.0-amd64:base-4.0-noarch:core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch 
CentOS release 6.6 (Final) 
CentOS release 6.6 (Final)

CUDA versione 6,

$ nvcc --version 
nvcc: NVIDIA (R) Cuda compiler driver 
Copyright (c) 2005-2013 NVIDIA Corporation 
Built on Thu_Mar_13_11:58:58_PDT_2014 
Cuda compilation tools, release 6.0, V6.0.1

e ho accesso a 4 GPU della stessa marca e modello.

$ /usr/local/cuda/samples/bin/x86_64/linux/release/deviceQuery 
/usr/local/cuda/samples/bin/x86_64/linux/release/deviceQuery Starting... 

CUDA Device Query (Runtime API) version (CUDART static linking) 

Detected 4 CUDA Capable device(s) 

Device 0: "Tesla M2070" 
    CUDA Driver Version/Runtime Version   6.0/6.0 
    CUDA Capability Major/Minor version number: 2.0 
    Total amount of global memory:     5375 MBytes (5636554752 bytes) 
    (14) Multiprocessors, (32) CUDA Cores/MP:  448 CUDA Cores 
    GPU Clock rate:        1147 MHz (1.15 GHz) 
    Memory Clock rate:        1566 Mhz 
    Memory Bus Width:        384-bit 
    L2 Cache Size:         786432 bytes 
    Maximum Texture Dimension Size (x,y,z)   1D=(65536), 2D=(65536, 65535), 3D=(2048, 2048, 2048) 
    Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers 
    Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers 
    Total amount of constant memory:    65536 bytes 
    Total amount of shared memory per block:  49152 bytes 
    Total number of registers available per block: 32768 
    Warp size:          32 
    Maximum number of threads per multiprocessor: 1536 
    Maximum number of threads per block:   1024 
    Max dimension size of a thread block (x,y,z): (1024, 1024, 64) 
    Max dimension size of a grid size (x,y,z): (65535, 65535, 65535) 
    Maximum memory pitch:       2147483647 bytes 
    Texture alignment:        512 bytes 
    Concurrent copy and kernel execution:   Yes with 2 copy engine(s) 
    Run time limit on kernels:      No 
    Integrated GPU sharing Host Memory:   No 
    Support host page-locked memory mapping:  Yes 
    Alignment requirement for Surfaces:   Yes 
    Device has ECC support:      Enabled 
    Device supports Unified Addressing (UVA):  Yes 
    Device PCI Bus ID/PCI location ID:   11/0 
    Compute Mode: 
    < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

...

> Peer access from Tesla M2070 (GPU0) -> Tesla M2070 (GPU1) : Yes 
> Peer access from Tesla M2070 (GPU0) -> Tesla M2070 (GPU2) : Yes 
> Peer access from Tesla M2070 (GPU0) -> Tesla M2070 (GPU3) : Yes 
> Peer access from Tesla M2070 (GPU1) -> Tesla M2070 (GPU1) : No 
> Peer access from Tesla M2070 (GPU1) -> Tesla M2070 (GPU2) : Yes 
> Peer access from Tesla M2070 (GPU1) -> Tesla M2070 (GPU3) : Yes 
> Peer access from Tesla M2070 (GPU2) -> Tesla M2070 (GPU1) : Yes 
> Peer access from Tesla M2070 (GPU2) -> Tesla M2070 (GPU2) : No 
> Peer access from Tesla M2070 (GPU2) -> Tesla M2070 (GPU3) : Yes 
> Peer access from Tesla M2070 (GPU1) -> Tesla M2070 (GPU0) : Yes 
> Peer access from Tesla M2070 (GPU1) -> Tesla M2070 (GPU1) : No 
> Peer access from Tesla M2070 (GPU1) -> Tesla M2070 (GPU2) : Yes 
> Peer access from Tesla M2070 (GPU2) -> Tesla M2070 (GPU0) : Yes 
> Peer access from Tesla M2070 (GPU2) -> Tesla M2070 (GPU1) : Yes 
> Peer access from Tesla M2070 (GPU2) -> Tesla M2070 (GPU2) : No 
> Peer access from Tesla M2070 (GPU3) -> Tesla M2070 (GPU0) : Yes 
> Peer access from Tesla M2070 (GPU3) -> Tesla M2070 (GPU1) : Yes 
> Peer access from Tesla M2070 (GPU3) -> Tesla M2070 (GPU2) : Yes 

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 6.0, CUDA Runtime Version = 6.0, NumDevs = 4, Device0 = Tesla M2070, Device1 = Tesla M2070, Device2 = Tesla M2070, Device3 = Tesla M2070 
Result = PASS

Edit: si compila per qualsiasi commit dopo "namespace fissa" su uno dei due rami, ma ci sono ancora problemi con la combinazione Rcpp e CUDA

Per rendere il pacchetto compilato, risulta che avevo solo bisogno di separare il mio codice C++ e CUDA in separato *.cpp e *.cu file. Tuttavia, quando provo il "cpp compilazione e Cu separatamente" commesso sul ramo maestro, ho

> library(rcppcuda) 
> hello() 
An object of class "MyClass" 
Slot "x": 
[1] 1 2 3 4 5 6 7 8 9 10 

Slot "y": 
[1] 1 2 3 4 5 6 7 8 9 10 

Error in .Call("someCPPcode", r) : 
    "someCPPcode" not resolved from current namespace (rcppcuda) 
>

L'errore va via nel ramo withoutCUDA nel commettere titolo "aggiungendo ramo withoutCUDA".

> library(rcppcuda) 
> hello() 
An object of class "MyClass" 
Slot "x": 
[1] 1 2 3 4 5 6 7 8 9 10 

Slot "y": 
[1] 1 2 3 4 5 6 7 8 9 10 

[1] "Object changed." 
An object of class "MyClass" 
Slot "x": 
[1] 500 2 3 4 5 6 7 8 9 10 

Slot "y": 
[1] 1 1000 3 4 5 6 7 8 9 10 

>

Le uniche differenze tra il "cpp compilazione e Cu separatamente" commit sul master e il "ramo aggiungendo withoutCUDA" commettere il withoutCUDA sono

Il Makefile e someCUDAcode.cu se ne sono andati da withoutCUDA.
In withoutCUDA, tutti i riferimenti a someCUDAcode() sono passati da someCPPcode.cpp.

Inoltre, sarebbe comunque possibile utilizzare CUDA e Rcpp nello stesso file *.cu. Mi piacerebbe davvero sapere come risolvere il commit del "namespace fisso" sul ramo master.

fonte

2015-06-03 landau

Passando attraverso il pacchetto ci sono più aspetti che devono essere modificati.

non si dovrebbe usare un 'Makefile', ma un file 'Makevars' invece per migliorare la compatibilità per l'architettura più animazioni.
Provare a seguire i nomi delle variabili standard (ad esempio CPPC dovrebbe essere CXX), questo rende tutto molto meglio insieme.
Non cercare e compilare l'oggetto condiviso da soli, ci sono buoni macro all'interno del makefile di base R che rendono questo molto più semplice (ad esempio PKG_LIBS, oggetti, etc.)
Con più compilatori, si desidera utilizzare il Macro OGGETTI. Qui sostituirai il tentativo di base di R di impostare i file oggetto da collegare (assicurati di includerli tutti).
È inoltre necessario (AFAIK) per rendere disponibili le funzioni CUDA con extern "C". Si assegnerà il prefisso sia alla funzione nel file sia al momento della dichiarazione all'inizio del file cpp.

Il seguente Makevars ha lavorato per me per cui ho modificato il mio CUDA_HOME, R_HOME, e RCPP_INC (reinserita per voi). Nota, qui è consigliato un file configure per rendere il pacchetto il più portabile possibile.

CUDA_HOME = /usr/local/cuda 
R_HOME = /apps/R-3.2.0 
CXX = /usr/bin/g++ 

# This defines what the shared object libraries will be 
PKG_LIBS= -L/usr/local/cuda-7.0/lib64 -Wl,-rpath,/usr/local/cuda-7.0/lib64 -lcudart -d 


######################################### 

R_INC = /usr/share/R/include 
RCPP_INC = $(R_HOME)/library/Rcpp/include 

NVCC = $(CUDA_HOME)/bin/nvcc 
CUDA_INC = $(CUDA_HOME)/include 
CUDA_LIB = $(CUDA_HOME)/lib64 

LIBS = -lcudart -d 
NVCC_FLAGS = -Xcompiler "-fPIC" -gencode arch=compute_20,code=sm_20 -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -I$(R_INC) 

### Define objects 
cu_sources := $(wildcard *cu) 
cu_sharedlibs := $(patsubst %.cu, %.o,$(cu_sources)) 

cpp_sources := $(wildcard *.cpp) 
cpp_sharedlibs := $(patsubst %.cpp, %.o, $(cpp_sources)) 

OBJECTS = $(cu_sharedlibs) $(cpp_sharedlibs) 

all : rcppcuda.so 

rcppcuda.so: $(OBJECTS) 

%.o: %.cpp $(cpp_sources) 
     $(CXX) $< -c -fPIC -I$(R_INC) -I$(RCPP_INC) 

%.o: %.cu $(cu_sources) 
     $(NVCC) $(NVCC_FLAGS) -I$(CUDA_INC) $< -c

Un punto di follow-up (come dici tu che questo è un esercizio di apprendimento):

A. Non si sta utilizzando una delle parti del Rcpp che lo rendono un pacchetto meraviglioso, vale a dire ' attributi. Ecco come il file cpp dovrebbe apparire:

#include <Rcpp.h> 
using namespace Rcpp; 

extern "C" 
void someCUDAcode(); 

//[[Rcpp::export]] 
SEXP someCPPcode(SEXP r) { 
    S4 c(r); 
    double *x = REAL(c.slot("x")); 
    int *y = INTEGER(c.slot("y")); 
    x[0] = 500.0; 
    y[1] = 1000; 
    someCUDAcode(); 
    return R_NilValue; 
}

Questo genererà automaticamente i corrispondenti RcppExports.cpp e RcppExports.R file e non hai più bisogno di una funzione .Call da soli. Basta chiamare la funzione.Ora .Call('someCPPcode', r) diventa someCPPcode(r) :)

Per completezza, ecco la versione aggiornata del file someCUDAcode.cu:

__global__ void mykernel(int a){ 
    int id = threadIdx.x; 
    int b = a; 
    b++; 
    id++; 
} 


extern "C" 
void someCUDAcode() { 
    mykernel<<<1, 1>>>(1); 
}

Rispetto a un file di configurazione (usando autoconf), siete invitati a controllare il mio pacchetto gpuRcuda utilizzando Rcpp, CUDA e ViennaCL (una libreria di calcolo della GPU C++).

fonte

2015-06-08 14:20:28 cdeterman

Questo risolve entrambi i problemi che stavo cercando: usare correttamente Makevars e usare Rcpp con CUDA. Grazie! – landau

Felice di aiutare :-) – cdeterman

Devo dire, però, che apportare le modifiche per consentire 'someCPPcode (r)' invece di '.Call ('someCPPcode', r)' non ha funzionato per me. Non è stato possibile accedere alla funzione 'someCPPcode' all'interno di R. – landau

Diversi pacchetti su CRAN utilizzano GPU mediante CUDA:

Vorrei iniziare con questi.

fonte

2015-06-03 21:17:02

Sono d'accordo che questi sono buoni esempi di combinazione di R e CUDA, ma nessuno usa 'Rcpp', e ognuno si affida a un Makefile o Makefile.in piuttosto che a un Makevar. ['WideLM'] (http://cran.r-project.org/web/packages/WideLM/index.html) usa sia' Rcpp' che CUDA, ma [non verrà installato sulla macchina che sono usando] (http://stackoverflow.com/questions/30631612/widelm-installation-error), quindi non so quanto bene farà per scavare nella sorgente a questo punto. – landau

Shucks. Buon punto Ho dimenticato WideLM. Il suo autore ha un nuovo pacchetto per Rcpp, RcppArmadillo e CUDA in arrivo, quindi potresti metterti in contatto. Dillo a Mark I said Hi :) –

Creazione di un pacchetto R piccolo con CUDA e Rcpp

risposta

Problemi correlati