1 of 22

R’s C interface

from Advance R (1st ed)

2 of 22

An incomplete review

3 of 22

C data structures*

* Organized in groups according to how they use memory

4 of 22

Atomic Vectors

  • Subtypes:
    • REALSXP: numeric vector
    • INTSXP: integer vector
    • LGLSXP: logical vector

5 of 22

Atomic Vectors

  • allocVector() → crea un nuevo vector ®, requiere un SEXP y el largo

zeroes <- cfunction(c(n_ = "integer"), '

int n = asInteger(n_);

SEXP out = PROTECT(allocVector(INTSXP, n));

memset(INTEGER(out), 0, n * sizeof(int));

UNPROTECT(1);

return out;

')

zeroes(10);

#> [1] 0 0 0 0 0 0 0 0 0 0

6 of 22

Atomic Vectors

  • allocVector() → crea un nuevo vector ®, requiere un SEXP y el largo
  • asInteger() → coerse a INTSXP ® into a int ©

zeroes <- cfunction(c(n_ = "integer"), '

int n = asInteger(n_);

SEXP out = PROTECT(allocVector(INTSXP, n));

memset(INTEGER(out), 0, n * sizeof(int));

UNPROTECT(1);

return out;

')

zeroes(10);

#> [1] 0 0 0 0 0 0 0 0 0 0

7 of 22

Atomic Vectors

  • allocVector() → creats a new vector ®, requieres SEXP and the lenght
  • asInteger() → coerse a INTSXP ® into a int ©
  • memset() loops through each element in the vector and set it to a constant.

zeroes <- cfunction(c(n_ = "integer"), '

int n = asInteger(n_);

SEXP out = PROTECT(allocVector(INTSXP, n));

memset(INTEGER(out), 0, n * sizeof(int));

UNPROTECT(1);

return out;

')

zeroes(10);

#> [1] 0 0 0 0 0 0 0 0 0 0

8 of 22

Atomic Vectors

  • allocVector() → creats a new vector ®, requieres SEXP and the lenght
  • asInteger() → coerse a INTSXP ® into a int ©
  • memset() loops through each element in the vector and set it to a constant.

  • PROTECT() → tells R that the object is in use and it shouldn’t be deleted by the garbage collector.
  • UNPROTECT() → all objects needs to be unprotected after used.

zeroes <- cfunction(c(n_ = "integer"), '

int n = asInteger(n_);

SEXP out = PROTECT(allocVector(INTSXP, n));

memset(INTEGER(out), 0, n * sizeof(int));

UNPROTECT(1);

return out;

')

zeroes(10);

#> [1] 0 0 0 0 0 0 0 0 0 0

Better use PROTECT()

that sorry

Applies to all SEXP types

9 of 22

Atomic Vectors

  • REAL(),

INTEGER(),

LOGICAL(),

COMPLEX(),

RAW() → allows you to access the stored data

  • Accessing to the vector once and assigning to a pointer like px = REAL(x) is more efficient.

add_two <- cfunction(c(x = "numeric"), "

int n = length(x);

double *px, *pout;

SEXP out = PROTECT(allocVector(REALSXP, n));

px = REAL(x);

pout = REAL(out);

for (int i = 0; i < n; i++) {

pout[i] = px[i] + 2;

}

UNPROTECT(1);

return out;

")

add_two(as.numeric(1:10))

#> [1] 3 4 5 6 7 8 9 10 11 12

10 of 22

Character vectors and lists

  • Subtypes:
    • STRSXP: character vector
    • VECSXP: lists
  • Made from other SEPX types
    • STRSXP is a CHARSXP (object that contains a pointer to a C string.
    • VECSXP can be any other SEXP
  • “hard to work with in C”

11 of 22

Character vectors and lists

  • SET_STRING_ELT() → to set values
  • STRING_ELT() → to access the element (CHARSXP)
  • CHAR(STRING_ELT(x, i)) → to get the string

abc <- cfunction(NULL, '

SEXP out = PROTECT(allocVector(STRSXP, 3));

SET_STRING_ELT(out, 0, mkChar("a"));

SET_STRING_ELT(out, 1, mkChar("b"));

SET_STRING_ELT(out, 2, mkChar("c"));

UNPROTECT(1);

return out;

')

abc()

#> [1] "a" "b" "c"

12 of 22

Character vectors and lists

  • SET_STRING_ELT() → to set values
  • STRING_ELT() → to access the element (CHARSXP)
  • CHAR(STRING_ELT(x, i)) → to get the string

  • mkChar() → turns a C string into a CHARSXP ®
  • mkString() → turns a C string into a CHARSXP ®

abc <- cfunction(NULL, '

SEXP out = PROTECT(allocVector(STRSXP, 3));

SET_STRING_ELT(out, 0, mkChar("a"));

SET_STRING_ELT(out, 1, mkChar("b"));

SET_STRING_ELT(out, 2, mkChar("c"));

UNPROTECT(1);

return out;

')

abc()

#> [1] "a" "b" "c"

13 of 22

Character vectors and lists

  • SET_STRING_ELT() → to set values
  • STRING_ELT() → to access the element (CHARSXP)
  • CHAR(STRING_ELT(x, i)) → to get the string

  • mkChar() → turns a C string into a CHARSXP ® to use in SET_STRING_ELT()
  • mkString() → turns a C string into a STRSXP ®

abc <- cfunction(NULL, '

SEXP out = PROTECT(allocVector(STRSXP, 3));

SET_STRING_ELT(out, 0, mkChar("a"));

SET_STRING_ELT(out, 1, mkChar("b"));

SET_STRING_ELT(out, 2, mkChar("c"));

UNPROTECT(1);

return out;

')

Coercing scalars

® → ©

asLogical(x): LGLSXP -> int

asInteger(x): INTSXP -> int

asReal(x): REALSXP -> double

CHAR(asChar(x)): STRSXP -> const char*

© → ®

ScalarLogical(x): int -> LGLSXP

ScalarInteger(x): int -> INTSXP

ScalarReal(x): double -> REALSXP

mkString(x): const char* -> STRSXP

use translateChar() instead

14 of 22

Character vectors and lists

  • SET_STRING_ELT() → to set values
  • STRING_ELT() → to access the element (CHARSXP)
  • CHAR(STRING_ELT(x, i)) → to get the string

  • mkChar() → turns a C string into a CHARSXP ®
  • mkString() → turns a C string into a CHARSXP ®

  • SET_VECTOR_ELT() → to set values
  • VECTOR_ELT() → to access the elements

abc <- cfunction(NULL, '

SEXP out = PROTECT(allocVector(STRSXP, 3));

SET_STRING_ELT(out, 0, mkChar("a"));

SET_STRING_ELT(out, 1, mkChar("b"));

SET_STRING_ELT(out, 2, mkChar("c"));

UNPROTECT(1);

return out;

')

abc()

#> [1] "a" "b" "c"

15 of 22

Pairlists

  • Subtypes:
    • LISTSXP: pairlists
  • Used for calls, unevaluated arguments, attributes, and in ....
  • R has helpers to navigate a pairlist.
    • CAR() extracts the first element
    • CDR() extracts the rest
    • Always finishes with R_NilValue

16 of 22

Missing values

is_na <- cfunction(c(x = "ANY"), '

int n = length(x);

SEXP out = PROTECT(allocVector(LGLSXP, n));

for (int i = 0; i < n; i++) {

switch(TYPEOF(x)) {

case LGLSXP:

LOGICAL(out)[i] = (LOGICAL(x)[i] == NA_LOGICAL);

break;

case INTSXP:

LOGICAL(out)[i] = (INTEGER(x)[i] == NA_INTEGER);

break;

case REALSXP:

LOGICAL(out)[i] = ISNA(REAL(x)[i]);

break;

case STRSXP:

LOGICAL(out)[i] = (STRING_ELT(x, i) == NA_STRING);

break;

default:

LOGICAL(out)[i] = NA_LOGICAL;

}

}

UNPROTECT(1);

return out;

')

is_na(c(NA, 1L))

#> [1] TRUE FALSE

is_na(c(NA, 1))

#> [1] TRUE FALSE

is_na(c(NA, "a"))

#> [1] TRUE FALSE

is_na(c(NA, TRUE))

#> [1] TRUE FALSE

17 of 22

Input validation

Usually done at the R side of the function

If done at the C side: TYPEOF or

  • For atomic vectors: isInteger(), isReal(), isComplex(), isLogical(), isString().
  • For combinations of atomic vectors: isNumeric() (integer, logical, real), isNumber() (integer, logical, real, complex), isVectorAtomic() (logical, integer, numeric, complex, string, raw).
  • For matrices, isMatrix() and arrays, isArray().
  • For more esoteric objects: isEnvironment(), isExpression(), isList() (a pair list), isNewList() (a list), isSymbol(), isNull(), isObject() (S4 objects), isVector() (atomic vectors, lists, expressions).

18 of 22

Exercise:

Finding the C source code for a function

19 of 22

Exercise

  • Use pryr::show_c_source() to find the C source for an R function that uses .Internal().
  • Use inline::cfunction() to write equivalent code that uses .Call() instead.
  • Convert the C routine to pure R and write some R code to test that your R version is equivalent.

20 of 22

> pryr::show_c_source(.Internal(mean()))

mean is implemented by do_summary with op = 1

21 of 22

22 of 22

integer_mean <- inline::cfunction(c(x = "SEXP"), '

R_xlen_t n = XLENGTH(x);

double s = 0.0;

for (R_xlen_t i = 0; i < n; i++) {

int xi = INTEGER_ELT(x, i);

if(xi == NA_INTEGER)

return ScalarReal(R_NaReal);

s += xi;

}

return ScalarReal((double) (s/n));

')

> integer_mean(as.integer(c(2, 2, 4)))

[1] 2.666667

> mean(c(2, 2, 4))

[1] 2.666667