Before optimizing, identify where time and memory are actually spent. Guessing leads to wasted effort on code paths that are not the bottleneck.
Base R provides a sampling profiler via Rprof(). It records the call stack at fixed intervals and writes results to a file.
Rprof("profile.out", interval = 0.01, memory.profiling = TRUE)
result <- slow_function(data)
Rprof(NULL)
summaryRprof("profile.out")
summaryRprof() returns two tables: by.self (time spent in a function excluding callees) and by.total (including callees). by.self is the more actionable view for locating hotspots.
The profvis package wraps Rprof with an interactive flame graph and source-line annotation.
library(profvis)
profvis({
result <- vapply(1:1e5, function(i) sqrt(i) + log(i), numeric(1))
})
The flame graph's x-axis is sample count (proxy for time), y-axis is call depth. Wide blocks at the top of the stack are self-time hotspots. The memory column (in MB) shows allocations and GC events per line — useful for spotting unnecessary copies.
For comparing alternative implementations of a small code unit, use bench::mark() or microbenchmark::microbenchmark(). Both run expressions many times and report distributional statistics rather than a single timing.
library(bench)
mark(
loop = { x <- numeric(n); for(i in 1:n) x[i] <- i^2; x },
vec = (1:n)^2,
sapply = sapply(1:n, function(i) i^2),
iterations = 100,
check = FALSE
)
bench::mark() additionally reports memory allocation (mem_alloc) and garbage collections per iteration (n_gc), both relevant for allocation-heavy code.
bench::mark() checks that all expressions return equal results by default (check = TRUE). Set to FALSE when comparing implementations with different output types (e.g. matrix vs vector), or results that differ by floating-point tolerance.
lobstr::obj_size() reports object size including shared references. tracemem() identifies when an object is copied — critical for diagnosing unexpected duplication from R's copy-on-modify semantics.
x <- 1:1e6
tracemem(x)
# [1] "<0x55d2a1b4c9e0>"
y <- x # no copy yet (shared reference)
y[1] <- 0L # tracemem[0x55d2a1b4c9e0 -> 0x55d2a3f1b220]: triggers copy
| Pattern | Symptom | Fix |
|---|---|---|
Growing a vector/list in a loop with c() or append() | O(n²) time, high n_gc | Preallocate with vector(mode, length) |
rbind() inside a loop | Quadratic copy cost | Collect rows in a list, do.call(rbind, list) once, or use data.table::rbindlist |
Repeated data.frame column access in a loop | High self-time in [.data.frame | Extract columns to vectors before looping, or use matrices/data.table |
Nested apply family with closures capturing large environments | High memory, slow GC | Avoid capturing large objects; pass as arguments |
ifelse() over large vectors | Evaluates both branches fully for all elements | Use direct logical indexing or dplyr::case_when (still vectorized but clearer) |
R's core arithmetic, comparison, and many mathematical operations are implemented as vectorized C loops. Replacing element-wise R-level iteration with calls to these primitives removes interpreter overhead per element.
Each iteration of a for loop over R-level code involves: bytecode dispatch, S3/S4 method lookup for generic operators, argument matching, and (frequently) object copying due to copy-on-modify. A vectorized call to +, sqrt, or %*% performs this dispatch once and then runs a tight C loop over the full buffer.
# Scalar loop: ~50ms for n = 1e6
result <- numeric(n)
for (i in 1:n) {
result[i] <- sqrt(x[i]) * 2 + 1
}
# Vectorized: ~5ms for n = 1e6
result <- sqrt(x) * 2 + 1
The apply family (sapply, vapply, lapply, mapply) does not avoid R-level iteration — it iterates in C but still calls an R closure per element. They are convenience tools for code clarity, not performance, unless the underlying operation cannot be vectorized.
| Idiom | Vectorized alternative |
|---|---|
sapply(x, function(v) v^2) | x^2 |
sapply(seq_along(x), function(i) x[i] + y[i]) | x + y |
sapply(x, function(v) if (v > 0) "pos" else "neg") | ifelse(x > 0, "pos", "neg") |
for (i in seq_len(n)) total <- total + x[i] | sum(x) |
| Nested loop for outer-product-like computation | outer(x, y, FUN) or matrix algebra |
Row-wise apply(m, 1, sum) | rowSums(m) / colSums(m) / rowMeans(m) |
Vector recycling lets shorter vectors be reused element-wise against longer ones without explicit replication — but silent recycling of mismatched lengths is a frequent source of bugs.
x <- 1:6
x * c(1, 0) # recycles c(1,0) to c(1,0,1,0,1,0): zeroes out even-indexed elements
# Conditional assignment via logical indexing (faster than ifelse for assignment)
x[x %% 2 == 0] <- 0
length() when vectors should align.
Operations expressible as matrix algebra benefit from BLAS-backed routines (%*%, solve(), crossprod(), tcrossprod()), which are orders of magnitude faster than equivalent loop-based implementations and benefit further from an optimized BLAS (OpenBLAS, MKL) linked at the R installation level.
# Sum of squares per row, loop vs vectorized vs matrix algebra
rowSums(m^2) # vectorized
diag(m %*% t(m)) # matrix algebra (avoid for large m: computes full n x n matrix)
For tabular workloads, data.table performs grouped aggregation and joins using compiled C routines with radix sorting, avoiding both R-level loops and the row-by-row overhead of data.frame indexing.
library(data.table)
dt[, .(total = sum(value), n = .N), by = group]
dt[other_dt, on = "id"] # vectorized merge join
Rcpp provides a C++ API that maps R types (vectors, lists, data frames, environments) to C++ classes with near-zero-cost interop, and handles the .Call boilerplate, memory protection, and type conversion required for a standard R/C extension.
| Scenario | Recommendation |
|---|---|
| Computation expressible as vector/matrix ops on existing data | Vectorize in R first; Rcpp unlikely to help |
| Sequential algorithm with per-iteration state dependency (e.g. recursive filters, MCMC, simulations) | Rcpp — cannot be vectorized in R |
| Tight loop over millions of elements with simple per-element logic | Rcpp, or check if a vectorized package primitive already exists |
| String processing / regex over large character vectors | Rcpp with std::string, or stringi (already C-backed) |
| Calling an existing C/C++ library | Rcpp (wraps the call) or .Call with manual SEXP handling |
library(Rcpp)
cppFunction('
NumericVector cumsum_custom(NumericVector x) {
int n = x.size();
NumericVector out(n);
double acc = 0;
for (int i = 0; i < n; i++) {
acc += x[i];
out[i] = acc;
}
return out;
}
')
cumsum_custom(c(1, 2, 3, 4)) # 1 3 6 10
For larger functions, use a .cpp file with sourceCpp("file.cpp"). The // [[Rcpp::export]] attribute marks functions to be made available in R.
// [[Rcpp::depends(Rcpp)]]
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
NumericVector running_mean(NumericVector x, int window) {
int n = x.size();
NumericVector out(n);
double sum = 0;
for (int i = 0; i < n; i++) {
sum += x[i];
if (i >= window) sum -= x[i - window];
out[i] = (i >= window - 1) ? sum / window : NA_REAL;
}
return out;
}
| R type | Rcpp type | Notes |
|---|---|---|
| numeric vector | NumericVector | double precision; backed by R's SEXP |
| integer vector | IntegerVector | NA_INTEGER for missing |
| character vector | CharacterVector | elements are String / SEXP CHARSXP |
| logical vector | LogicalVector | three-valued: TRUE, FALSE, NA_LOGICAL |
| list | List | heterogeneous; access via as<T>() |
| data.frame | DataFrame | column access via df["colname"] |
| matrix | NumericMatrix / IntegerMatrix | (i,j) indexing, column-major like R |
| environment | Environment | for accessing/modifying R environments |
| function | Function | callback into R — has call overhead, avoid in hot loops |
NumericVector x(5) has valid indices 0 through 4.
// Checking for NA in a NumericVector element
if (NumericVector::is_na(x[i])) {
continue;
}
// Integer NA differs from double NA — use NA_INTEGER, not NA_REAL, for IntegerVector
// [[Rcpp::export]]
List summary_stats(NumericVector x) {
return List::create(
Named("mean") = mean(x),
Named("sd") = sd(x),
Named("n") = x.size()
);
}
"Sugar" expressions provide R-like vectorized syntax that compiles down to efficient loops, avoiding manual element-wise iteration while retaining C++ performance.
NumericVector x = ...;
NumericVector y = Rcpp::sqrt(x) + 1; // vectorized sugar
LogicalVector mask = (x > 0); // element-wise comparison
double total = Rcpp::sum(x[mask]); // logical subsetting + sum
For linear algebra-heavy code, RcppArmadillo exposes the Armadillo C++ library (MATLAB-like syntax: mat, vec, solve(), inv()) and RcppEigen exposes Eigen. Both integrate with Rcpp's type conversions and link against the same BLAS/LAPACK as R.
// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>
using namespace Rcpp;
using namespace arma;
// [[Rcpp::export]]
vec ols_coef(mat X, vec y) {
return solve(X, y); // equivalent to lm.fit, no formula overhead
}
| Context | Mechanism |
|---|---|
| Interactive / script | sourceCpp() — recompiles on file change, caches output |
| Package | usethis::use_rcpp(), source in src/, LinkingTo: Rcpp + Imports: Rcpp in DESCRIPTION, run compileAttributes() to regenerate exports |
| Compiler flags | src/Makevars (Linux/macOS) and src/Makevars.win for PKG_CXXFLAGS, -O2, OpenMP flags |
| OpenMP parallelism | // [[Rcpp::plugins(openmp)]], #pragma omp parallel for — must not call R API functions inside parallel regions |
SEXPs, calling back into R via Function, anything touching R's memory manager) are not thread-safe. Inside OpenMP or other parallel regions, operate only on raw C++ containers (e.g. copy to std::vector first) and convert back to Rcpp types after the parallel region completes.
| Step | Action |
|---|---|
| 1 | Profile with profvis or Rprof to confirm the actual hotspot — do not optimize unmeasured code |
| 2 | Check for an existing vectorized primitive or package function (rowSums, data.table, matrixStats, collapse) that replaces the loop entirely |
| 3 | If no vectorized equivalent exists and the loop has sequential state dependencies, write the inner loop in Rcpp |
| 4 | Benchmark the Rcpp version against the R version with bench::mark() on representative data sizes — compilation and call overhead can make Rcpp slower for small n |
| 5 | Re-profile the full pipeline; the new bottleneck is often elsewhere (I/O, data.frame coercion, plotting) |