big.matrix, shared.big.matrix, filebacked.big.matrix, is.big.matrix, as.big.matrix, is.shared, is.separated, is.filebacked, remove.backing, nebytes {bigmemory} | R Documentation |
Create a big.matrix
(or check to see if an object is a big.matrix
,
or create a big.matrix
from a matrix
, and so on). The big.matrix
may be in-memory (shared or not), or file backed (which is always shareable).
big.matrix(nrow, ncol, type = "integer", init = NULL, dimnames = NULL, separated = FALSE, shared = FALSE, backingfile = NULL, backingpath = NULL, descriptorfile = NULL, preserve = TRUE, nebytes = 0) shared.big.matrix(nrow, ncol, type = "integer", init = NULL, dimnames = NULL, separated = FALSE, backingfile = NULL, backingpath = NULL, descriptorfile = NULL, preserve = TRUE, nebytes = 0) filebacked.big.matrix(nrow, ncol, type = "integer", init = NULL, dimnames = NULL, separated = FALSE, backingfile = NULL, backingpath = NULL, descriptorfile = NULL, preserve = TRUE, nebytes = 0) as.big.matrix(x, type = NULL, separated = FALSE, shared = FALSE, backingfile = NULL, backingpath = NULL, descriptorfile = NULL, preserve = TRUE) is.big.matrix(x) is.separated(x) is.shared(x) is.filebacked(x) nebytes(x) backingpath(x)
x |
a matrix or vector ; if a vector, a one-column big.matrix is created by as.big.matrix . |
nrow |
number of rows. |
ncol |
number of columns. |
type |
the type of the atomic element ("integer" by default). |
init |
a scalar value for initializing the matrix (NULL by default to avoid unnecessary time spent doing the initializing). |
dimnames |
a list of the row and column names. |
separated |
use separated column organization of the data; see details. |
shared |
TRUE if the big.matrix should be allocated to shared memory. |
backingfile |
the root name for the file(s) for the cache of x . |
backingpath |
the path to the directory containing the file backing cache. |
descriptorfile |
the name of the file to hold the filebacked description, for subsequent use with attach.big.matrix ; if NULL, the backingfile is used as the root. The descriptor file is placed in the same directory as the backing files. |
preserve |
if this is a filebacked big.matrix , it is preserved, by default, even after the end of the R session unless this option is set to FALSE . |
nebytes |
undocumented option to support internal experimentation |
A big.matrix
consists of an object in R that does nothing more than point to
the data structure implemented in C++. The object acts
much like a traditional R matrix, but helps protect the user from many inadvertant
memory-consuming pitfalls of traditional R matrices and data frames.
There are three big.matrix
types which manage
data in different ways. The basic (or default)
big.matrix
is not shared across processes
and is limited to available RAM. A shared big.matrix
has identical
size constraints as the basic big.matrix
, but may be shared across
separate Rprocesses. A file-backed big.matrix
may
exceed available RAM by using hard drive space, and may also be
shared across processes. The atomic types of these matrices may be
double, integer, short, or char (8, 4, 2, and 1 bytes, respectively).
If x
is a big.matrix
, then x[1:5,]
is returned as an R
matrix
containing the first five rows of x
. If x
is of type
double
, then the result will be numeric
; otherwise, the result will
be an integer
R matrix. The expression x
alone
will display information about the R object (e.g. the external pointer) rather
than evaluating the matrix itself (the user should try x[,]
with extreme caution,
recognizing that a huge R matrix
will be created).
If x
has a huge number of rows, then the use of rownames
will be extremely memory-intensive and should be avoided. If x
has a huge
number of columns, the user might want to store the transpose as there is
overhead of a pointer (and possibly mutexes) for each column in the matrix.
If separated
is TRUE
, then the memory is allocated into separate
vectors for each column. If separated
is FALSE
, the matrix is
stored in traditional column-major format.
The function is.separated()
returns
the separation type of the big.matrix
.
When a big.matrix
, x
, is passed as an argument
to a function, it is essentially providing call-by-reference rather than
call-by-value behavior. If the function modified any of the values of x
within the function, the changes are not limited in scope to
a local copy within the function.
A shared big.matrix
object is essentially the same as a non-shared
big.matrix
object except the memory being managed may be shared
across R sessions.
A file-backed big.matrix
may exceed available RAM in size by using a file
cache (or possibly multiple file caches, if separated
is TRUE
).
This can incur a substantial performance penalty for large matrices, but could
be useful nonetheless. A side-effect of creating a filebacked object is
not only the filebacking(s), but a descriptor file (in the same directory) that can
be used for subsequent attachments (see attach.big.matrix
).
A big.matrix
is returned (for big.matrix
, shared.big.matrix
,
filebacked.big.matrix
, and as.big.matrix
),
and TRUE
or FALSE
for is.big.matrix
and the other functions.
John W. Emerson and Michael J. Kane
bigmemory
, and perhaps the class documentation of
big.matrix
; attach.big.matrix
and
describe
.
x <- big.matrix(10, 2, type='integer', init=-5) colnames(x) = c("alpha", "beta") is.big.matrix(x) dim(x) colnames(x) rownames(x) x[,] x[1:8,1] <- 11:18 x[,] colmin(x) colmax(x) colrange(x) colsum(x) colprod(x) colmean(x) colvar(x) summary(x) gc() x <- as.big.matrix(matrix(-5, 10, 2)) colnames(x) <- c("alpha", "beta") is.big.matrix(x) dim(x) colnames(x) rownames(x) x[1:8,1] <- 11:18 x[,] gc() # The following shared memory example is quite silly, as you wouldn't likely do # this in a single R session. But if zdescription were passed to another R session # via SNOW, NetWorkSpaces, or even by a simple file read/write, # then the attach.big.matrix() within the second R process would give access to the # same object in memory. Please see the package vignette for real examples. z <- shared.big.matrix(3, 3, type='integer', init=3) z[,] dim(z) z[1,1] <- 2 z[,] zdescription <- describe(z) zdescription y <- attach.big.matrix(zdescription) y[,] y z y[1,1] <- -100 y[,] z[,] gc() # A short filebacked example, showing the creation of associated files and mutexes: files <- dir() files[grep("example.bin", files)] z <- filebacked.big.matrix(3, 3, type='integer', init=123, backingfile="example.bin", descriptorfile="example.desc", dimnames=list( c('a','b','c'), c('d', 'e', 'f'))) z[,] files <- dir() files[grep("example.bin", files)] zz <- attach.big.matrix("example.desc") zz[,] zz[1,1] <- 0 zzz <- attach.big.matrix(describe(z)) zzz[,]