Title: | Spatiotemporal Autoregression Analyses for Large Data Sets |
---|---|
Description: | These tools were created to test map-scale hypotheses about trends in large remotely sensed data sets but any data with spatial and temporal variation can be analyzed. Tests are conducted using the PARTS method for analyzing spatially autocorrelated time series (Ives et al., 2021: <doi:10.1016/j.rse.2021.112678>). The method's unique approach can handle extremely large data sets that other spatiotemporal models cannot, while still appropriately accounting for spatial and temporal autocorrelation. This is done by partitioning the data into smaller chunks, analyzing chunks separately and then combining the separate analyses into a single, correlated test of the map-scale hypotheses. |
Authors: | Clay Morrow [aut, cre] , Anthony Ives [aut] |
Maintainer: | Clay Morrow <[email protected]> |
License: | GPL (>= 3) |
Version: | 1.0.4 |
Built: | 2024-11-21 04:37:53 UTC |
Source: | https://github.com/morrowcj/remoteparts |
calculate degrees of freedom for partitioned GLS
calc_dfpart(partsize, p, p0)
calc_dfpart(partsize, p, p0)
partsize |
number of pixels in each partition |
p |
number of predictors in alternate model |
p0 |
number of parameters in null model |
a named vector containing the first and second degrees of freedom ("df1" and "df2", respectively)
Check if a matrix is positive definite
check_posdef(M)
check_posdef(M)
M |
numeric matrix |
check if a matrix is 1) square, 2) symmetric, and 3) positive definite
returns a named logical vector with the following elements:
logical: indicating whether M
is square
logical: indicating whether M
is symmetric
logical: indicating whether M
is positive-definitive
# distance matrix M = distm_scaled(expand.grid(x = 1:3, y = 1:3)) # check if it is positive definitive check_posdef(M) # check if the covariance matrix is positive definitive check_posdef(covar_exp(M, .1)) # non-symmetric matrix check_posdef(matrix(1:9, 3, 3)) # non-square matrix check_posdef(matrix(1:6, 3, 2))
# distance matrix M = distm_scaled(expand.grid(x = 1:3, y = 1:3)) # check if it is positive definitive check_posdef(M) # check if the covariance matrix is positive definitive check_posdef(covar_exp(M, .1)) # non-symmetric matrix check_posdef(matrix(1:9, 3, 3)) # non-square matrix check_posdef(matrix(1:6, 3, 2))
generic S3 method for a chi-squared test
chisqr(x, ...)
chisqr(x, ...)
x |
object on which to conduct the test |
... |
additional arguments |
results of the chi-squared test (generic)
Conduct a correlated chi-square test on a partitioned GLS
## S3 method for class 'partGLS' chisqr(x, ...)
## S3 method for class 'partGLS' chisqr(x, ...)
x |
"remoteGLS" object |
... |
additional arguments passed to print |
a p-value for the correlated chisqr test
Tapered-spherical distance-based covariance function
Exponential distance-based covariance function
Exponential-power distance-based covariance function
covar_taper(d, theta, cor = NULL) covar_exp(d, range) covar_exppow(d, range, shape)
covar_taper(d, theta, cor = NULL) covar_exp(d, range) covar_exppow(d, range, shape)
d |
a numeric vector or matrix of distances |
theta |
distance beyond which covariances are forced to 0. |
cor |
optional correlation parameter. If included, the covariance is
subtracted from |
range |
range parameter |
shape |
shape parameter |
covar_taper
calculates covariance v as follows:
if d <= theta
, then v = ((1 - (d/theta))^2) * (1 + (d/(2 * theta)))
if d > theta
, then v = 0
covar_exp
calculates covariance v as follows:
v = exp(-d/range)
covar_exppow
calculates covariance v as follows:
v = exp(-(d/range)^2)
Note that covar_exppow(..., shape = 1)
is equivalent to
covar_exp()
but is needed as a separate function for use with fitCor
.
a tapered-spherical transformation of d is returned.
the exponential covariance (v)
exponential-power covariance (v)
# simulate dummy data map.width = 5 # square map width coords = expand.grid(x = 1:map.width, y = 1:map.width) # coordinate matrix # calculate distance D = geosphere::distm(coords) # distance matrix # visualize covariance matrix image(covar_taper(D, theta = .5*max(D))) # plot tapered covariance function curve(covar_taper(x, theta = .5), from = 0, to = 1);abline(v = 0.5, lty = 2, col = "grey80") # visualize covariance matrix image(covar_exp(D, range = .2*max(D))) # plot exponential function with different ranges curve(covar_exp(x, range = .2), from = 0, to = 1) curve(covar_exp(x, range = .1), from = 0, to = 1, col = "blue", add = TRUE) legend("topright", legend = c("range = 0.2", "range = 0.1"), col = c("black", "blue"), lty = 1) # visualize Exponential covariance matrix image(covar_exppow(D, range = .2*max(D), shape = 1)) # visualize Exponential-power covariance matrix image(covar_exppow(D, range = .2*max(D), shape = .5)) # plot exponential power function with different shapes curve(covar_exppow(x, range = .2, shape = 1), from = 0, to = 1) curve(covar_exppow(x, range = .2, shape = .5), from = 0, to = 1, col = "blue", add = TRUE) legend("topright", legend = c("shape = 1.0", "shape = 0.5"), col = c("black", "blue"), lty = 1)
# simulate dummy data map.width = 5 # square map width coords = expand.grid(x = 1:map.width, y = 1:map.width) # coordinate matrix # calculate distance D = geosphere::distm(coords) # distance matrix # visualize covariance matrix image(covar_taper(D, theta = .5*max(D))) # plot tapered covariance function curve(covar_taper(x, theta = .5), from = 0, to = 1);abline(v = 0.5, lty = 2, col = "grey80") # visualize covariance matrix image(covar_exp(D, range = .2*max(D))) # plot exponential function with different ranges curve(covar_exp(x, range = .2), from = 0, to = 1) curve(covar_exp(x, range = .1), from = 0, to = 1, col = "blue", add = TRUE) legend("topright", legend = c("range = 0.2", "range = 0.1"), col = c("black", "blue"), lty = 1) # visualize Exponential covariance matrix image(covar_exppow(D, range = .2*max(D), shape = 1)) # visualize Exponential-power covariance matrix image(covar_exppow(D, range = .2*max(D), shape = .5)) # plot exponential power function with different shapes curve(covar_exppow(x, range = .2, shape = 1), from = 0, to = 1) curve(covar_exppow(x, range = .2, shape = .5), from = 0, to = 1, col = "blue", add = TRUE) legend("topright", legend = c("shape = 1.0", "shape = 0.5"), col = c("black", "blue"), lty = 1)
Calculate cross-partition statistics between two GLS partitions
crosspart_GLS( xxi, xxj, xxi0, xxj0, invChol_i, invChol_j, Vsub, nug_i, nug_j, df1, df2, small = TRUE, ncores = NA )
crosspart_GLS( xxi, xxj, xxi0, xxj0, invChol_i, invChol_j, Vsub, nug_i, nug_j, df1, df2, small = TRUE, ncores = NA )
xxi |
numeric matrix xx from partition i |
xxj |
numeric matrix xx from partition j |
xxi0 |
numeric matrix xx0 from partition i |
xxj0 |
numeric matrix xx0 from partition j |
invChol_i |
numeric matrix invcholV from partition i |
invChol_j |
numeric matrix invcholV from partition j |
Vsub |
numeric variance matrix for Xij (upper block) |
nug_i |
nugget from partition i |
nug_j |
nugget from partition j |
df1 |
first degree of freedom |
df2 |
second degree of freedom |
small |
logical: if |
ncores |
an optional integer indicating how many CPU threads to use for matrix calculations. |
crosspart_GLS
returns a list of cross-partition statistics.
If small = FALSE
, the list contains the following elements
If small = FALSE
, the list only contains the necessary elements
rcoefij
, rSSRij
, and rSSEij
.
Other partitionedGLS:
MC_GLSpart()
,
sample_partitions()
Calculate the distances among points from a single coordinate matrix or
distm_km(coords, coords2 = NULL) distm_scaled(coords, coords2 = NULL, distm_FUN = "distm_km")
distm_km(coords, coords2 = NULL) distm_scaled(coords, coords2 = NULL, distm_FUN = "distm_km")
coords |
a coordinate matrix with 2 columns and rows corresponding to each location. |
coords2 |
an optional coordinate matrix |
distm_FUN |
function used to calculate the distance matrix. This function dictates the units of "max.dist" |
distm_km
is simply a wrapper for geosphere::distm()
distm_km
returns a distance matrix in km
A distance matrix is returned.
If coords2 = NULL
, then distances among points in coords
are
calculated. Otherwise, distances are calculated between points in coords
and coords2
distm_km
returns a distance matrix in km and distm_scaled
returns
relative distances (between 0 and 1). The resulting matrix has the attribute
"max.dist" which stores the maximum distance of the map. "max.dist" is in
km for distm_km
and in the units of distm_FUN
for distm_scaled
.
?geosphere::distm()
map.width = 3 # square map width coords = expand.grid(x = 1:map.width, y = 1:map.width) # coordinate matrix distm_scaled(coords) # calculate relative distance matrix
map.width = 3 # square map width coords = expand.grid(x = 1:map.width, y = 1:map.width) # coordinate matrix distm_scaled(coords) # calculate relative distance matrix
fitAR
is used to fit AR(1) time series regression
analysis using restricted maximum likelihood
fitAR(formula, data = NULL) AR_fun(par, y, X, logLik.only = TRUE)
fitAR(formula, data = NULL) AR_fun(par, y, X, logLik.only = TRUE)
formula |
a model formula, as used by |
data |
optional data environment to search for variables in |
par |
AR parameter value |
y |
vector of time series (response) |
X |
model matrix (predictors) |
logLik.only |
logical: should only the partial log-likelihood be computed |
This function finds the restricted maximum likelihood (REML) to estimate parameters for the regression model with AR(1) random error terms
where is the response at time
;
is a model matrix containing covariates;
is a vector of effects of
;
is the autocorrelated random error;
is a temporally independent
Gaussian random variable with mean zero and standard deviation
;
and is the AR(1) autoregression parameter
fitAR
estimates the parameter via mathematical optimization
of the restricted log-likelihood function.
AR_fun
is the work horse behind fitAR
that is called
by optim
to estimate the autoregression parameter .
fitAR
returns a list object of class "remoteTS", which contains
the following elements.
the function call
a named vector of coefficients
the standard errors of parameter estimates
the t-statistics for coefficients
the p-values corresponding to t-tests of coefficients
the model mean squared error
the log-likelihood of the model fit
the residuals: response minus fitted values
the fitted mean values
The AR parameter, determined via REML
the numeric rank of the fitted model
the residual degrees of freedom
the stats::terms
object used
Output is structured similarly to an "lm" object.
When logLik.only == F
, AR_fun
returns the output described in
?fitAR
. When logLik.only == T
, it returns a quantity that is
linearly and negatively related to the restricted log likelihood
(i.e., partial log-likelihood).
Ives, A. R., K. C. Abbott, and N. L. Ziebarth. 2010. Analysis of ecological
time series with ARMA(p,q) models. Ecology 91:858-871.
fitAR_map
to easily apply fit_AR
to many pixels;
fitCLS
and fitCLS_map
for conditional least squares
time series analyses.
Other remoteTS:
fitAR_map()
,
fitCLS_map()
,
fitCLS()
Other remoteTS:
fitAR_map()
,
fitCLS_map()
,
fitCLS()
# simulate dummy data t = 1:30 # times series Z = rnorm(30) # random independent variable x = .2*Z + (.05*t) # generate dependent effects x[2:30] = x[2:30] + .2*x[1:29] # add autocorrelation # fit the AR model, using Z as a covariate (AR = fitAR(x ~ Z)) # get specific components AR$residuals AR$coefficients AR$pval # now using time as a covariate (AR.time <- fitAR(x ~ t)) # source variable from a dataframe df = data.frame(y = x, t.scaled = t/30, Z = Z) fitAR(y ~ t.scaled + Z, data = df) ## Methods summary(AR) residuals(AR) coefficients(AR)
# simulate dummy data t = 1:30 # times series Z = rnorm(30) # random independent variable x = .2*Z + (.05*t) # generate dependent effects x[2:30] = x[2:30] + .2*x[1:29] # add autocorrelation # fit the AR model, using Z as a covariate (AR = fitAR(x ~ Z)) # get specific components AR$residuals AR$coefficients AR$pval # now using time as a covariate (AR.time <- fitAR(x ~ t)) # source variable from a dataframe df = data.frame(y = x, t.scaled = t/30, Z = Z) fitAR(y ~ t.scaled + Z, data = df) ## Methods summary(AR) residuals(AR) coefficients(AR)
fitAR_map
is used to fit AR REML regression to each spatial
location (pixel) within spatiotemporal data.
fitAR_map( Y, coords, formula = "y ~ t", X.list = list(t = 1:ncol(Y)), resids.only = FALSE )
fitAR_map( Y, coords, formula = "y ~ t", X.list = list(t = 1:ncol(Y)), resids.only = FALSE )
Y |
a spatiotemporal response variable: a numeric matrix or data frame where columns correspond to time points and rows correspond to pixels. |
coords |
a numeric coordinate matrix or data frame, with two columns and rows corresponding to each pixel |
formula |
a model formula, passed to |
X.list |
a named list of temporal or spatiotemporal predictor variables:
elements must be either numeric vectors with one element for each time point
or a matrix/data frame with rows corresponding to pixels and columns
corresponding to time point. These elements must be named and referred to
in |
resids.only |
logical: should output beyond coordinates and residuals be
withheld? Useful when passing output to |
fitAR_map
is a wrapper function that applies fitAR
to
many pixels.
The function loops through the rows of Y
, matched with rows of
spatiotemporal predictor matrices. Purely temporal predictors, given by
vectors, are used for all pixels. These predictor variables, given by the
right side of formula
are sourced from named elements in X.list
.
fitCLS_map
returns a list object of class "mapTS".
The output will always contain at least these elements:
the function call
the coordinate matrix or dataframe
time series residuals: rows correspond to pixels
(coords
)
When resids.only = FALSE
, the output will also contain the following
components. Matrices have rows that correspond to pixels and columns that
correspond to time points and vector elements correspond to pixels.
a numeric matrix of coefficeints
a numeric matrix of coefficient standard errors
a numeric matrix of t-statistics for coefficients
a numeric matrix of p-values for coefficients t-tests
a vector of rho values for each pixel
a numeric vector of MSEs
a numeric vector of log-likelihoods
a numeric matrix of fitted values
An attribute called "resids.only" is also set to match the value of
resids.only
fitAR
for fitting AR REML to individual time series and fitCLS
& fitCLS_map
for time series analysis based on conditional least squares.
Other remoteTS:
fitAR()
,
fitCLS_map()
,
fitCLS()
# simulate dummy data time.points = 9 # time series length map.width = 5 # square map width coords = expand.grid(x = 1:map.width, y = 1:map.width) # coordinate matrix ## create empty spatiotemporal variables: X <- matrix(NA, nrow = nrow(coords), ncol = time.points) # response Z <- matrix(NA, nrow = nrow(coords), ncol = time.points) # predictor # setup first time point: Z[, 1] <- .05*coords[,"x"] + .2*coords[,"y"] X[, 1] <- .5*Z[, 1] + rnorm(nrow(coords), 0, .05) #x at time t ## project through time: for(t in 2:time.points){ Z[, t] <- Z[, t-1] + rnorm(map.width^2) X[, t] <- .2*X[, t-1] + .1*Z[, t] + .05*t + rnorm(nrow(coords), 0 , .25) } # visualize dummy data (NOT RUN) library(ggplot2);library(dplyr) data.frame(coords, X) %>% reshape2::melt(id.vars = c("x", "y")) %>% ggplot(aes(x = x, y = y, fill = value)) + geom_tile() + facet_wrap(~variable) # fit AR, showing all output fitAR_map(X, coords, formula = y ~ t, resids.only = TRUE) # fit AR with temporal and spatiotemporal predictors (AR.map <- fitAR_map(X, coords, formula = y ~ t + Z, X.list = list(t = 1:ncol(X), Z = Z), resids.only = FALSE)) ## extract some values AR.map$coefficients # coefficients AR.map$logLik # log-likelihoods ## Methods summary(AR.map) residuals(AR.map) coefficients(AR.map)
# simulate dummy data time.points = 9 # time series length map.width = 5 # square map width coords = expand.grid(x = 1:map.width, y = 1:map.width) # coordinate matrix ## create empty spatiotemporal variables: X <- matrix(NA, nrow = nrow(coords), ncol = time.points) # response Z <- matrix(NA, nrow = nrow(coords), ncol = time.points) # predictor # setup first time point: Z[, 1] <- .05*coords[,"x"] + .2*coords[,"y"] X[, 1] <- .5*Z[, 1] + rnorm(nrow(coords), 0, .05) #x at time t ## project through time: for(t in 2:time.points){ Z[, t] <- Z[, t-1] + rnorm(map.width^2) X[, t] <- .2*X[, t-1] + .1*Z[, t] + .05*t + rnorm(nrow(coords), 0 , .25) } # visualize dummy data (NOT RUN) library(ggplot2);library(dplyr) data.frame(coords, X) %>% reshape2::melt(id.vars = c("x", "y")) %>% ggplot(aes(x = x, y = y, fill = value)) + geom_tile() + facet_wrap(~variable) # fit AR, showing all output fitAR_map(X, coords, formula = y ~ t, resids.only = TRUE) # fit AR with temporal and spatiotemporal predictors (AR.map <- fitAR_map(X, coords, formula = y ~ t + Z, X.list = list(t = 1:ncol(X), Z = Z), resids.only = FALSE)) ## extract some values AR.map$coefficients # coefficients AR.map$logLik # log-likelihoods ## Methods summary(AR.map) residuals(AR.map) coefficients(AR.map)
fitCLS
is used to fit conditional least squares regression
to time series data.
fitCLS( formula, data = NULL, lag.y = 1, lag.x = 1, debug = FALSE, model = FALSE, y = FALSE )
fitCLS( formula, data = NULL, lag.y = 1, lag.x = 1, debug = FALSE, model = FALSE, y = FALSE )
formula |
a model formula, as used by |
data |
optional data environment to search for variables in |
lag.y |
an integer indicating the lag (in time steps) between y and y.0 |
lag.x |
an integer indicating the lag (in time steps) between y and the independent variables (except y.0). |
debug |
logical debug mode |
model |
logical, should the used model matrix be returned? As used by
|
y |
logical, should the used response variable be returned? As used by
|
This function regresses the response variable (y) at time t, conditional on the
response at time t-lag.y
and the specified dependent variables (X) at
time t-lag.x
:
where is the response at time
;
is a model matrix containing covariates;
is a vector of effects of
;
and is a temporally independent Gaussian random
variable with mean zero and standard deviation
stats::lm()
is then called, using the above equation.
fitCLS
returns a list object of class "remoteTS", which
inherits from "lm". In addition to the default "lm" components, the output
contains these additional list elements:
the t-statistics for coefficients
the p-values corresponding to t-tests of coefficients
the model mean squared error
the log-likelihood of the model fit
fitCLS_map
to easily apply fitCLS
to many pixels;
fitAR
and fitAR_map
for AR time series analyses.
Other remoteTS:
fitAR_map()
,
fitAR()
,
fitCLS_map()
# simulate dummy data t = 1:30 # times series Z = rnorm(30) # random independent variable x = .2*Z + (.05*t) # generate dependent effects x[2:30] = x[2:30] + .2*x[1:29] # add autocorrelation x = x + rnorm(30, 0, .01) df = data.frame(x, t, Z) # collect in data frame # fit a CLS model with previous x, t, and Z as predictors ## note, this model does not follow the underlying process. ### See below for a better fit. (CLS <- fitCLS(x ~ t + Z, data = df)) # extract other values CLS$MSE #MSE CLS$logLik #log-likelihood # fit with no lag in independent variables (as simulated): (CLS2 <- fitCLS(x ~ t + Z, df, lag.x = 0)) summary(CLS2) # no lag in x fitCLS(x ~ t + Z, df, lag.y = 0) # visualize the lag ## large lag in x fitCLS(x ~ t + Z, df, lag.y = 2, lag.x = 0, debug = TRUE)$lag ## large lag in Z fitCLS(x ~ t + Z, df, lag.y = 0, lag.x = 2, debug = TRUE)$lag # # throws errors (NOT RUN) # fitCLS(x ~ t + Z, df, lag.y = 28) # longer lag than time # fitCLS(cbind(x, rnorm(30)) ~ t + Z, df) # matrix response ## Methods summary(CLS) residuals(CLS)
# simulate dummy data t = 1:30 # times series Z = rnorm(30) # random independent variable x = .2*Z + (.05*t) # generate dependent effects x[2:30] = x[2:30] + .2*x[1:29] # add autocorrelation x = x + rnorm(30, 0, .01) df = data.frame(x, t, Z) # collect in data frame # fit a CLS model with previous x, t, and Z as predictors ## note, this model does not follow the underlying process. ### See below for a better fit. (CLS <- fitCLS(x ~ t + Z, data = df)) # extract other values CLS$MSE #MSE CLS$logLik #log-likelihood # fit with no lag in independent variables (as simulated): (CLS2 <- fitCLS(x ~ t + Z, df, lag.x = 0)) summary(CLS2) # no lag in x fitCLS(x ~ t + Z, df, lag.y = 0) # visualize the lag ## large lag in x fitCLS(x ~ t + Z, df, lag.y = 2, lag.x = 0, debug = TRUE)$lag ## large lag in Z fitCLS(x ~ t + Z, df, lag.y = 0, lag.x = 2, debug = TRUE)$lag # # throws errors (NOT RUN) # fitCLS(x ~ t + Z, df, lag.y = 28) # longer lag than time # fitCLS(cbind(x, rnorm(30)) ~ t + Z, df) # matrix response ## Methods summary(CLS) residuals(CLS)
fitCLS_map
is used to fit conditional least squares
regression to each spatial location (pixel) within spatiotemporal data.
fitCLS_map( Y, coords, formula = "y ~ t", X.list = list(t = 1:ncol(Y)), lag.y = 1, lag.x = 0, resids.only = FALSE )
fitCLS_map( Y, coords, formula = "y ~ t", X.list = list(t = 1:ncol(Y)), lag.y = 1, lag.x = 0, resids.only = FALSE )
Y |
a spatiotemporal response variable: a numeric matrix or data frame where columns correspond to time points and rows correspond to pixels. |
coords |
a numeric coordinate matrix or data frame, with two columns and rows corresponding to each pixel |
formula |
a model formula, passed to |
X.list |
a named list of temporal or spatiotemporal predictor variables:
elements must be either numeric vectors with one element for each time point
or a matrix/data frame with rows corresponding to pixels and columns
corresponding to time point. These elements must be named and referred to
in |
lag.y |
the lag between y and y.0, passed to |
lag.x |
the lag between y and predictor variables, passed to
|
resids.only |
logical: should output beyond coordinates and residuals be
withheld? Useful when passing output to |
fitCLS_map
is a wrapper function that applies
fitCLS()
to many pixels.
The function loops through the rows of Y
, matched with rows of
spatiotemporal predictor matrices. Purely temporal predictors, given by
vectors, are used for all pixels. These predictor variables, given by the
right side of formula
are sourced from named elements in X.list
.
fitCLS_map
returns a list object of class "mapTS".
The output will always contain at least these elements:
the function call
the coordinate matrix or dataframe
time series residuals: rows correspond to pixels (coords
)
When resids.only = FALSE
, the output will also contain the following
components. Matrices have rows that correspond to pixels and columns that
correspond to time points and vector elements correspond to pixels.
a numeric matrix of coefficeints
a numeric matrix of coefficient standard errors
a numeric matrix of t-statistics for coefficients
a numeric matrix of p-values for coefficients t-tests
a numeric vector of MSEs
a numeric vector of log-likelihoods
a numeric matrix of fitted values
An attribute called "resids.only" is also set to match the value of
resids.only
fitCLS
for fitting CLS on individual time series and
fitAR
and fitAR_map
for AR REML time series analysis.
Other remoteTS:
fitAR_map()
,
fitAR()
,
fitCLS()
# simulate dummy data time.points = 9 # time series length map.width = 5 # square map width coords = expand.grid(x = 1:map.width, y = 1:map.width) # coordinate matrix ## create empty spatiotemporal variables: X <- matrix(NA, nrow = nrow(coords), ncol = time.points) # response Z <- matrix(NA, nrow = nrow(coords), ncol = time.points) # predictor # setup first time point: Z[, 1] <- .05*coords[,"x"] + .2*coords[,"y"] X[, 1] <- .5*Z[, 1] + rnorm(nrow(coords), 0, .05) #x at time t ## project through time: for(t in 2:time.points){ Z[, t] <- Z[, t-1] + rnorm(map.width^2) X[, t] <- .2*X[, t-1] + .1*Z[, t] + .05*t + rnorm(nrow(coords), 0 , .25) } # # visualize dummy data (NOT RUN) # library(ggplot2);library(dplyr) # data.frame(coords, X) %>% # reshape2::melt(id.vars = c("x", "y")) %>% # ggplot(aes(x = x, y = y, fill = value)) + # geom_tile() + # facet_wrap(~variable) # fit CLS, showing all output fitCLS_map(X, coords, formula = y ~ t, resids.only = TRUE) # fit CLS with temporal and spatiotemporal predictors (CLS.map <- fitCLS_map(X, coords, formula = y ~ t + Z, X.list = list(t = 1:ncol(X), Z = Z), resids.only = FALSE)) ## extract some values CLS.map$coefficients # coefficients CLS.map$logLik # log-likelihoods ## Methods summary(CLS.map) residuals(CLS.map) coefficients(CLS.map)
# simulate dummy data time.points = 9 # time series length map.width = 5 # square map width coords = expand.grid(x = 1:map.width, y = 1:map.width) # coordinate matrix ## create empty spatiotemporal variables: X <- matrix(NA, nrow = nrow(coords), ncol = time.points) # response Z <- matrix(NA, nrow = nrow(coords), ncol = time.points) # predictor # setup first time point: Z[, 1] <- .05*coords[,"x"] + .2*coords[,"y"] X[, 1] <- .5*Z[, 1] + rnorm(nrow(coords), 0, .05) #x at time t ## project through time: for(t in 2:time.points){ Z[, t] <- Z[, t-1] + rnorm(map.width^2) X[, t] <- .2*X[, t-1] + .1*Z[, t] + .05*t + rnorm(nrow(coords), 0 , .25) } # # visualize dummy data (NOT RUN) # library(ggplot2);library(dplyr) # data.frame(coords, X) %>% # reshape2::melt(id.vars = c("x", "y")) %>% # ggplot(aes(x = x, y = y, fill = value)) + # geom_tile() + # facet_wrap(~variable) # fit CLS, showing all output fitCLS_map(X, coords, formula = y ~ t, resids.only = TRUE) # fit CLS with temporal and spatiotemporal predictors (CLS.map <- fitCLS_map(X, coords, formula = y ~ t + Z, X.list = list(t = 1:ncol(X), Z = Z), resids.only = FALSE)) ## extract some values CLS.map$coefficients # coefficients CLS.map$logLik # log-likelihoods ## Methods summary(CLS.map) residuals(CLS.map) coefficients(CLS.map)
fitCor()
estimates parameter values of a distance-based
variance function from the pixel-wise correlations among time series residuals.
fitCor( resids, coords, distm_FUN = "distm_scaled", covar_FUN = "covar_exp", start = list(r = 0.1), fit.n = 1000, index, save_mod = TRUE, ... )
fitCor( resids, coords, distm_FUN = "distm_scaled", covar_FUN = "covar_exp", start = list(r = 0.1), fit.n = 1000, index, save_mod = TRUE, ... )
resids |
a matrix of time series residuals, with rows corresponding to pixels and columns to time points |
coords |
a numeric coordinate matrix or data frame, with two columns and rows corresponding to each pixel |
distm_FUN |
a function to calculate a distance matrix from |
covar_FUN |
a function to estimate distance-based covariances |
start |
a named list of starting parameter values for |
fit.n |
an integer indicating how many pixels should be used to estimate parameters. |
index |
an optional index of pixels to use for parameter estimation |
save_mod |
logical: should the nls model be saved in the output? |
... |
additional arguments passed to |
For accurate results, resids
and coords
must be paired matrices.
Rows of both matrices should correspond to the same pixels.
Distances between sapmled pixels are calculated with the function specified by
distm_FUN
. This function can be any that takes a coordinate
matrix as input and returns a distance matrix between points. Some options
provided by remotePARTS
are distm_km()
, which returns distances
in kilometers and distm_scaled()
, which returns distances scaled between
0 and 1.
covar_FUN
can be any function that takes a vector of distances as its
first argument, and at least one parameter as additional arguments. remotePARTS
provides three suitable functions: covar_exp
, covar_exppow
, and
covar_taper
; but user-defined functions are also possible.
Parameters are estimated with stats::nls()
by regressing correlations
among time series residuals on a function of distances specified by covar_FUN
.
start
is used by nls
to determine how many parameters need
estimating, and starting values for those parameters. As such, it is important
that start
has named elements for each parameter in covar_FUN
.
The fit will be performed for all pixels specified in index
, if provided.
Otherwise, a random sample of length fit.n
is used. If fit.n
exceeds the number of pixels, all pixels are used. When random pixels are used,
parameter estimates will be different for each call of the function. For reproducible
results, we recommend taking a random sample of pixels manually and passing in
those values as index
.
Caution: Note that a distance matrix, of size must be fit to the
sampled data where
is either
fit.n
or length(index)
.
Take your computer's memory and processing time into consideration when choosing
this size.
Parameter estimates are always returned in the same scale of distances
calculated by distm_FUN
. It is very important that these estimates
are re-scaled by users if output of distm_FUN
use units different from
the desired scale. For example,
if the function covar_FUN = function(d, r, a){-(d/r)^a}
is used
with distm_FUN = "distm_scaled"
, the estimated range parameter r
will be based on a unit-map. Users will likely want to re-scaled it to map
units by multiplying r
by the maximum distance among points on your map.
If the distm_FUN
is on the scale of your map (e.g., "distm_km"),
re-scaling is not needed but may be preferable, since it is scaled to the
maximum distance among the sampled data rather than the true maximum
distance. For example, dividing the range parameter by max.distance
and then multiplying it by the true max distance may provide a better range
estimate.
fitCor
returns a list object of class "remoteCor", which contains
these elements:
the function call
the nls
fit object, if save_mod=TRUE
a vector of the estimated spatial correlation parameters
the maximum distance among the sampled pixels, as calculated by dist_FUN
.
the log-likelihood of the fit
# simulate dummy data set.seed(19) time.points = 30 # time series length map.width = 8 # square map width coords = expand.grid(x = 1:map.width, y = 1:map.width) # coordinate matrix ## create empty spatiotemporal variables: X <- matrix(NA, nrow = nrow(coords), ncol = time.points) # response Z <- matrix(NA, nrow = nrow(coords), ncol = time.points) # predictor ## setup first time point: Z[, 1] <- .05*coords[,"x"] + .2*coords[,"y"] X[, 1] <- .5*Z[, 1] + rnorm(nrow(coords), 0, .05) #x at time t ## project through time: for(t in 2:time.points){ Z[, t] <- Z[, t-1] + rnorm(map.width^2) X[, t] <- .2*X[, t-1] + .1*Z[, t] + .05*t + rnorm(nrow(coords), 0 , .25) } AR.map = fitAR_map(X, coords, formula = y ~ Z, X.list = list(Z = Z), resids.only = FALSE) # using pre-defined covariance function ## exponential covariance fitCor(AR.map$residuals, coords, covar_FUN = "covar_exp", start = list(range = .1)) ## exponential-power covariance fitCor(AR.map$residuals, coords, covar_FUN = "covar_exppow", start = list(range = .1, shape = .2)) # user-specified covariance function fitCor(AR.map$residuals, coords, covar_FUN = function(d, r){d^r}, start = list(r = .1)) # un-scaled distances: fitCor(AR.map$residuals, coords, distm_FUN = "distm_km", start = list(r = 106)) # specify which pixels to use, for reproducibility fitCor(AR.map$residuals, coords, index = 1:64)$spcor #all fitCor(AR.map$residuals, coords, index = 1:20)$spcor #first 20 fitCor(AR.map$residuals, coords, index = 21:64)$spcor # last 43 # randomly select pixels fitCor(AR.map$residuals, coords, fit.n = 20)$spcor #random 20 fitCor(AR.map$residuals, coords, fit.n = 20)$spcor # different random 20
# simulate dummy data set.seed(19) time.points = 30 # time series length map.width = 8 # square map width coords = expand.grid(x = 1:map.width, y = 1:map.width) # coordinate matrix ## create empty spatiotemporal variables: X <- matrix(NA, nrow = nrow(coords), ncol = time.points) # response Z <- matrix(NA, nrow = nrow(coords), ncol = time.points) # predictor ## setup first time point: Z[, 1] <- .05*coords[,"x"] + .2*coords[,"y"] X[, 1] <- .5*Z[, 1] + rnorm(nrow(coords), 0, .05) #x at time t ## project through time: for(t in 2:time.points){ Z[, t] <- Z[, t-1] + rnorm(map.width^2) X[, t] <- .2*X[, t-1] + .1*Z[, t] + .05*t + rnorm(nrow(coords), 0 , .25) } AR.map = fitAR_map(X, coords, formula = y ~ Z, X.list = list(Z = Z), resids.only = FALSE) # using pre-defined covariance function ## exponential covariance fitCor(AR.map$residuals, coords, covar_FUN = "covar_exp", start = list(range = .1)) ## exponential-power covariance fitCor(AR.map$residuals, coords, covar_FUN = "covar_exppow", start = list(range = .1, shape = .2)) # user-specified covariance function fitCor(AR.map$residuals, coords, covar_FUN = function(d, r){d^r}, start = list(r = .1)) # un-scaled distances: fitCor(AR.map$residuals, coords, distm_FUN = "distm_km", start = list(r = 106)) # specify which pixels to use, for reproducibility fitCor(AR.map$residuals, coords, index = 1:64)$spcor #all fitCor(AR.map$residuals, coords, index = 1:20)$spcor #first 20 fitCor(AR.map$residuals, coords, index = 21:64)$spcor # last 43 # randomly select pixels fitCor(AR.map$residuals, coords, fit.n = 20)$spcor #random 20 fitCor(AR.map$residuals, coords, fit.n = 20)$spcor # different random 20
Fit a PARTS GLS model.
fitGLS( formula, data, V, nugget = 0, formula0 = NULL, save.xx = FALSE, save.invchol = FALSE, logLik.only = FALSE, no.F = FALSE, coords, distm_FUN, covar_FUN, covar.pars, invCholV, ncores = NA, suppress_compare_warning = FALSE, ... )
fitGLS( formula, data, V, nugget = 0, formula0 = NULL, save.xx = FALSE, save.invchol = FALSE, logLik.only = FALSE, no.F = FALSE, coords, distm_FUN, covar_FUN, covar.pars, invCholV, ncores = NA, suppress_compare_warning = FALSE, ... )
formula |
a model formula |
data |
an optional data frame environment in which to search for
variables given by |
V |
a covariance matrix, which must be positive definitive. This argument
is optional if |
nugget |
an optional numeric nugget, must be positive |
formula0 |
an optional formula for the null model to be compared with
|
save.xx |
logical: should information needed for cross-partition comparisons be returned? |
save.invchol |
logical: should the inverse of the Cholesky matrix be returned? |
logLik.only |
logical: should calculations stop after calculating parital log-likelihood? |
no.F |
logical: should F-test calculations be made? |
coords |
optional coordinate matrix for calculating |
distm_FUN |
optional function for calculating a distance matrix from
|
covar_FUN |
optional distance-based covariance function for calculating
|
covar.pars |
an optional named list of parameters passed to |
invCholV |
optional pre-calculated inverse cholesky matrix to use in place
of |
ncores |
an optional integer indicating how many CPU threads to use for matrix calculations. |
suppress_compare_warning |
an optional variable to suppress warning that
arises from identical |
... |
additional arguments passed to |
conduct generalized least-squares regression of spatiotemporal trends
fitGLS
fits a GLS model, using terms specified in formula
.
In the PARTS method, generally the left side of formula
should be
pixel-level trend estimates and the right side should be spatial predictors.
The errors of the GLS are correlated according to covariance matrix V
.
If nugget = NA
, an ML nugget is estimated from the data using the
optimize_nugget()
function. Arguments additional arguments (...
)
are passed to optimize_nugget
in this case. V
must be provided
for nugget optimization.
If formula0
is not specified, the default is to fit an intercept-only
null model.
save.xx
is included to allow for manually conducting a partitioned
GLS analyses. Because most users will not need this feature, opting instead
to use fitGLS_parition()
, save.xx = FALSE
by default.
Similarly, save.invchol
is included to allow for recycling of the
inverse cholesky matrix. Often, inverting the large cholesky matrix
(i.e., invert_chol(V)
) is the slowest part of GLS. This argument exists
to allow users to recycle this process, though no remotePARTS
function
currently exists that can use invert_chol(V)
to fit the GLS.
logLik.only = TRUE
will return only the partial log-likelihood, which can
minimized to obtain the maximum likelihood for a given set of data.
If no.F = TRUE
, then the model given by formula
is not compared
to the model given by formula0
.
If V
is not provided, it can be fit internally by specifying all of
coords
, distm_FUN
, covar_FUN
, and covar.pars
.
The function given by distm_FUN
will calculate a distance matrix from
coords
, which is then transformed into a distance-based covariance
matrix with covar_FUN
and parameters given by covar.pars
.
This function uses C++ code that uses the Eigen matrix library (RcppEigen package) to fit models as efficiently as possible. As such, all available CPU cores are used for matrix calculations on systems with OpenMP support.
ncores
is passed to the C++ code Eigen::setNpThreads() which sets
the number of cores used for compatible Eigen matrix operations (when OpenMP
is used).
fitGLS
returns a list object of class "remoteGLS", if
logLik.only = FALSE
. The list contains at least the following elements:
coefficient estimates for predictor variables
sum of squares error
mean squared error
standard errors
degrees of freedom for the t-test
log-determinant of V
t-test statistic
p-value of the t-statistic
the Log-likelihood of the model
the nugget used in fitting
the covariance matrix of the coefficients
If no.F = FALSE
, the following elements, corresponding to the null
model and F-test are also calculated:
coefficient estimates for the null model
sum of squares error for the null model
mean squared error for the null model
the standard errors for null coefficients
the regression mean square
the null model F-test degrees of freedom
the log-likelihood of the null model
the F-test degrees of freedom, for the main model
the F-statistic
the F-test p-value
the alternate formula used
the null formula used
An attribute called also set to "no.F"
is set to the value of
argument no.F
, which signals to generic methods how to handle the output.
If save.invchol = TRUE
, output also includes
the inverse of the Cholesky decomposition of the covariance
matrix obtained with invert_chol(V, nugget)
If save.xx = TRUE
, output also includes the following elements
the predictor variables X
, from the right side of formula
,
transformed by the inverse cholesky matrix: xx = invcholV %*% X
the predictor variables X0
, from the right side of formula0
,
transformed by the inverse cholesky matrix: xx0 = invcholV %*% X0
The primary use of xx
and xx0
is for use with fitGLS_partition()
.
If logLik.only = TRUE
, a single numeric output containing the
log-likelihood is returned.
## read data data(ndvi_AK10000) df = ndvi_AK10000[seq_len(200), ] # first 200 rows ## fit covariance matrix V = covar_exp(distm_scaled(cbind(df$lng, df$lat)), range = .01) ## run GLS (GLS = fitGLS(CLS_coef ~ 0 + land, data = df, V = V)) ## with F-test calculations to compare with the NULL model (GLS.F = fitGLS(CLS_coef ~ 0 + land, data = df, V = V, no.F = FALSE)) ## find ML nugget fitGLS(CLS_coef ~ 0 + land, data = df, V = V, no.F = FALSE, nugget = NA) ## calculate V internally coords = cbind(df$lng, df$lat) fitGLS(CLS_coef ~ 0 + land, data = df, logLik.only = FALSE, coords = coords, distm_FUN = "distm_scaled", covar_FUN = "covar_exp", covar.pars = list(range = .01)) ## use inverse cholesky fitGLS(CLS_coef ~ 0 + land, data = df, invCholV = invert_chol(V)) ## save inverse cholesky matrix invchol = fitGLS(CLS_coef ~ 0 + land, data = df, V = V, save.invchol = TRUE)$invcholV ## re-use inverse cholesky instead of V fitGLS(CLS_coef ~ 0 + land, data = df, invCholV = invchol) ## Log-likelihood (fast) fitGLS(CLS_coef ~ 0 + land, data = df, V = V, logLik.only = TRUE)
## read data data(ndvi_AK10000) df = ndvi_AK10000[seq_len(200), ] # first 200 rows ## fit covariance matrix V = covar_exp(distm_scaled(cbind(df$lng, df$lat)), range = .01) ## run GLS (GLS = fitGLS(CLS_coef ~ 0 + land, data = df, V = V)) ## with F-test calculations to compare with the NULL model (GLS.F = fitGLS(CLS_coef ~ 0 + land, data = df, V = V, no.F = FALSE)) ## find ML nugget fitGLS(CLS_coef ~ 0 + land, data = df, V = V, no.F = FALSE, nugget = NA) ## calculate V internally coords = cbind(df$lng, df$lat) fitGLS(CLS_coef ~ 0 + land, data = df, logLik.only = FALSE, coords = coords, distm_FUN = "distm_scaled", covar_FUN = "covar_exp", covar.pars = list(range = .01)) ## use inverse cholesky fitGLS(CLS_coef ~ 0 + land, data = df, invCholV = invert_chol(V)) ## save inverse cholesky matrix invchol = fitGLS(CLS_coef ~ 0 + land, data = df, V = V, save.invchol = TRUE)$invcholV ## re-use inverse cholesky instead of V fitGLS(CLS_coef ~ 0 + land, data = df, invCholV = invchol) ## Log-likelihood (fast) fitGLS(CLS_coef ~ 0 + land, data = df, V = V, logLik.only = TRUE)
Fit a PARTS GLS model, with maximum likelihood spatial parameters
fitGLS_opt( formula, data = NULL, coords, distm_FUN = "distm_scaled", covar_FUN = "covar_exp", start = c(range = 0.01, nugget = 0), fixed = c(), opt.only = FALSE, formula0 = NULL, save.xx = FALSE, save.invchol = FALSE, no.F = TRUE, trans = list(), backtrans = list(), debug = TRUE, ncores = NA, ... )
fitGLS_opt( formula, data = NULL, coords, distm_FUN = "distm_scaled", covar_FUN = "covar_exp", start = c(range = 0.01, nugget = 0), fixed = c(), opt.only = FALSE, formula0 = NULL, save.xx = FALSE, save.invchol = FALSE, no.F = TRUE, trans = list(), backtrans = list(), debug = TRUE, ncores = NA, ... )
formula |
a model formula, passed to |
data |
an optional data frame environment in which to search for
variables given by |
coords |
a numeric coordinate matrix or data frame, with two columns and rows corresponding to each pixel |
distm_FUN |
a function to calculate a distance matrix from |
covar_FUN |
a function to estimate distance-based covariances |
start |
a named vector of starting values for each parameter to be estimated;
names must match the names of arguments in |
fixed |
an optional named vector of fixed parameter values; names
must match the names of arguments in |
opt.only |
logical: if TRUE, execution will halt after estimating the parameters; a final GLS will not be fit with the estimated parameters |
formula0 , save.xx , save.invchol , no.F
|
arguments passed to |
trans |
optional list of functions for transforming the values in
|
backtrans |
optional list of functions for back-transforming parameters
to their correct scale (for use with |
debug |
logical: debug mode (for use with |
ncores |
an optional integer indicating how many CPU threads to use for calculations. |
... |
additional arguments passed to |
Estimate spatial parameters, via maximum likelihood, from data rather than from time series residuals; Fit a GLS with these specifications.
fitGLS_opt
fits a GLS by estimating spatial parameters from
data. fitCor
, combined with fitGLS(nugget = NA)
,
gives better estimates of spatial parameters, but time-series residuals may
not be available in all cases. In these cases, spatial parameters can be
estimated from distances among points and a response vector. Mathematical
optimization of the log likelihood of different GLS models are computed by
calling optim()
on fitGLS
.
Distances are calculated with distm_FUN
and a covariance matrix is
calculated from these distances with covar_FUN
. Arguments to to
covar_FUN
, except distances, are given by start
and fixed
.
Parameters specified in start
will be be estimated while those given
by fixed
will remain constant throughout fitting. Parameter names in
start
and fixed
should exactly match the names of arguments in
covar_FUN
and should not overlap (though, fixed
takes precedence).
In addition to arguments of covar_FUN
a "nugget" component can
also be occur in start
or fixed
. If "nugget" does not occur
in either vector, the GLS are fit with nugget = 0
. A zero nugget also
allows much faster computation, through recycling the common
inverse cholesky matrix in each GLS computation. A non-zero nugget requires
inversion of a different matrix at each iteration, which can be
substantially slower.
If opt.only = FALSE
, the estimated parameters are used to fit the final
maximum likelihood GLS solution with fitGLS()
and arguments
formula0
, save.xx
, save.invchol
, and no.F
.
Some parameter combinations may not produce valid covariance matrices. During
the optimization step messages about non-positive definitive V may result on
some iterations. These warnings are produced by fitGLS
and NA
log-likelihoods are returned in those cases.
Note that fitGLS_opt
fits multiple GLS models, which requires
inverting a large matrix for each one (unless a fixed 0 nugget is used).
This process is very computationally intensive and may take a long time to
finish depending upon your machine and the size of the data.
Sometimes optim
can have a difficult time finding a reasonable solution
and without any constraits on parameter space (with certain algorithms), results
may even be nonsensical. To combat this, fitGLS_opt
has the arguments
trans
and backtrans
which allow you to transform
(and back-transform) parameters to a different scale. For example, you may
want to force the 'range' parameter between 0 and 1. The logit function can
do just that, as its limits are -Inf and Inf as x approaches 0 and 1,
respectively. So, we can set trans
to the logit function:
trans = list(range = function(x)log(x/(1-x)))
. Then we need to set
backtrans
to the inverse logit function to return a parameter value
between 0 and 1: backtrans = list(range = function(x)1/(1+exp(-x)))
.
This will force the optimizer to only search for the range parameter in the
space from 0 to 1. Any other constraint function can be used for trans
provided that there is a matching back-transformation.
If opt.only = TRUE
, fitGLS_opt
returns the
output from stats::optim()
: see it's documentation for more details.
Otherwise, a list with two elements is returned:
output from optim
, as above
a "remoteGLS" object. See fitGLS
for more details.
fitCor
for estimating spatial parameters from time
series residuals; fitGLS
for fitting GLS and with the option
of estimating the maximum-likelihood nugget component only.
## read data data(ndvi_AK10000) df = ndvi_AK10000[seq_len(200), ] # first 200 rows ## estimate nugget and range (very slow) fitGLS_opt(formula = CLS_coef ~ 0 + land, data = df, coords = df[, c("lng", "lat")], start = c(range = .1, nugget = 0), opt.only = TRUE) ## estimate range only, fixed nugget at 0, and fit full GLS (slow) fitGLS_opt(formula = CLS_coef ~ 0 + land, data = df, coords = df[, c("lng", "lat")], start = c(range = .1), fixed = c("nugget" = 0), method = "Brent", lower = 0, upper = 1) ## constrain nugget to 0 and 1 logit <- function(p) {log(p / (1 - p))} inv_logit <- function(l) {1 / (1 + exp(-l))} fitGLS_opt(formula = CLS_coef ~ 0 + land, data = df, coords = df[, c("lng", "lat")], start = c(range = .1, nugget = 1e-10), trans = list(nugget = logit), backtrans = list(nugget = inv_logit), opt.only = TRUE)
## read data data(ndvi_AK10000) df = ndvi_AK10000[seq_len(200), ] # first 200 rows ## estimate nugget and range (very slow) fitGLS_opt(formula = CLS_coef ~ 0 + land, data = df, coords = df[, c("lng", "lat")], start = c(range = .1, nugget = 0), opt.only = TRUE) ## estimate range only, fixed nugget at 0, and fit full GLS (slow) fitGLS_opt(formula = CLS_coef ~ 0 + land, data = df, coords = df[, c("lng", "lat")], start = c(range = .1), fixed = c("nugget" = 0), method = "Brent", lower = 0, upper = 1) ## constrain nugget to 0 and 1 logit <- function(p) {log(p / (1 - p))} inv_logit <- function(l) {1 / (1 + exp(-l))} fitGLS_opt(formula = CLS_coef ~ 0 + land, data = df, coords = df[, c("lng", "lat")], start = c(range = .1, nugget = 1e-10), trans = list(nugget = logit), backtrans = list(nugget = inv_logit), opt.only = TRUE)
Function that fitGLS_opt optimizes over
fitGLS_opt_FUN( op, fp, formula, data = NULL, coords, covar_FUN = "covar_exp", distm_FUN = "distm_scaled", is.trans = FALSE, backtrans = list(), ncores = NA )
fitGLS_opt_FUN( op, fp, formula, data = NULL, coords, covar_FUN = "covar_exp", distm_FUN = "distm_scaled", is.trans = FALSE, backtrans = list(), ncores = NA )
op |
a named vector of parameters to be optimized |
fp |
a named vector of fixed parameters |
formula |
GLS model formula |
data |
data source |
coords |
a coordinate matrix |
covar_FUN |
a covariance function |
distm_FUN |
a distm function |
is.trans |
logical: are any of the values in |
backtrans |
optional: a named list of functions used to backtransform any element
of |
ncores |
an optional integer indicating how many CPU threads to use for calculations. |
fitGLS_opt_FUN
returns the negative log likelihood of a GLS,
given the parameters in op
and fp
Invert the cholesky decomposition of V
invert_chol(M, nugget = 0, ncores = NA)
invert_chol(M, nugget = 0, ncores = NA)
M |
numeric (double), positive definite matrix |
nugget |
numeric (double) nugget to add to M |
ncores |
optional integer indicating how many cores to use during the inversion calculation |
Calculates the inverse of the Cholesky decomposition of M which should not be confused with the inverse of M *derived* from the Cholesky decomposition (i.e. 'chol2inv(M)').
numeric matrix: inverse of the Cholesky decomposition (lower triangle)
M <- crossprod(matrix(1:6, 3)) # without a nugget: invert_chol(M) # with a nugget: invert_chol(M, nugget = 0.2)
M <- crossprod(matrix(1:6, 3)) # without a nugget: invert_chol(M) # with a nugget: invert_chol(M, nugget = 0.2)
calculate maximum distance among a table of coordinates
max_dist(coords, dist_FUN = "distm_km")
max_dist(coords, dist_FUN = "distm_km")
coords |
the coordinate matrix (or dataframe) from which a maximum distance is desired. |
dist_FUN |
the distance function used to calculate distances |
First the outermost points are found by fitting a convex hull in
Euclidean space. Then, the distances between these outer points is calculated
with dist_FUN
, and the maximum of these distances is returned
This is a fast, simple way of determining the maximum distance.
The maximum distance between two points (units determined by
dist_FUN
)
fit a GLS model to a large data set by partitioning the data into smaller pieces (partitions) and processing these pieces individually and summarizing output across partitions to conduct hypothesis tests.
MC_GLSpart( formula, partmat, formula0 = NULL, part_FUN = "part_data", distm_FUN = "distm_scaled", covar_FUN = "covar_exp", covar.pars = c(range = 0.1), nugget = NA, ncross = 6, save.GLS = FALSE, ncores = parallel::detectCores(logical = FALSE) - 1, debug = FALSE, ... ) MCGLS_partsummary( MCpartGLS, covar.pars = c(range = 0.1), save.GLS = FALSE, partsize ) multicore_fitGLS_partition( formula, partmat, formula0 = NULL, part_FUN = "part_data", distm_FUN = "distm_scaled", covar_FUN = "covar_exp", covar.pars = c(range = 0.1), nugget = NA, ncross = 6, save.GLS = FALSE, ncores = parallel::detectCores(logical = FALSE) - 1, do.t.test = TRUE, do.chisqr.test = TRUE, debug = FALSE, ... ) fitGLS_partition( formula, partmat, formula0 = NULL, part_FUN = "part_data", distm_FUN = "distm_scaled", covar_FUN = "covar_exp", covar.pars = c(range = 0.1), nugget = NA, ncross = 6, save.GLS = FALSE, do.t.test = TRUE, do.chisqr.test = TRUE, progressbar = TRUE, debug = FALSE, ncores = NA, parallel = TRUE, ... ) part_data(index, formula, data, formula0 = NULL, coord.names = c("lng", "lat")) part_csv(index, formula, file, formula0 = NULL, coord.names = c("lng", "lat"))
MC_GLSpart( formula, partmat, formula0 = NULL, part_FUN = "part_data", distm_FUN = "distm_scaled", covar_FUN = "covar_exp", covar.pars = c(range = 0.1), nugget = NA, ncross = 6, save.GLS = FALSE, ncores = parallel::detectCores(logical = FALSE) - 1, debug = FALSE, ... ) MCGLS_partsummary( MCpartGLS, covar.pars = c(range = 0.1), save.GLS = FALSE, partsize ) multicore_fitGLS_partition( formula, partmat, formula0 = NULL, part_FUN = "part_data", distm_FUN = "distm_scaled", covar_FUN = "covar_exp", covar.pars = c(range = 0.1), nugget = NA, ncross = 6, save.GLS = FALSE, ncores = parallel::detectCores(logical = FALSE) - 1, do.t.test = TRUE, do.chisqr.test = TRUE, debug = FALSE, ... ) fitGLS_partition( formula, partmat, formula0 = NULL, part_FUN = "part_data", distm_FUN = "distm_scaled", covar_FUN = "covar_exp", covar.pars = c(range = 0.1), nugget = NA, ncross = 6, save.GLS = FALSE, do.t.test = TRUE, do.chisqr.test = TRUE, progressbar = TRUE, debug = FALSE, ncores = NA, parallel = TRUE, ... ) part_data(index, formula, data, formula0 = NULL, coord.names = c("lng", "lat")) part_csv(index, formula, file, formula0 = NULL, coord.names = c("lng", "lat"))
formula |
a formula for the GLS model |
partmat |
a numeric partition matrix, with values containing indices of locations. |
formula0 |
an optional formula for the null GLS model |
part_FUN |
a function to partition individual data. See details for more information about requirements for this function. |
distm_FUN |
a function to calculate distances from a coordinate matrix |
covar_FUN |
a function to calculate covariances from a distance matrix |
covar.pars |
a named list of parameters passed to |
nugget |
a numeric fixed nugget component: if NA, the nugget is estimated for each partition |
ncross |
an integer indicating the number of partitions used to calculate cross-partition statistics |
save.GLS |
logical: should full GLS output be saved for each partition? |
ncores |
an optional integer indicating how many CPU threads to use for calculations. |
debug |
logical debug mode |
... |
arguments passed to |
MCpartGLS |
object resulting from MC_partGLS() |
partsize |
number of locations per partition |
do.t.test |
logical: should a t-test of the GLS coefficients be conducted? |
do.chisqr.test |
logical: should a correlated chi-squared test of the model fit be conducted? |
progressbar |
logical: should progress be tracked with a progress bar? |
parallel |
logical: should all calculations be done in parallel? See details for more information |
index |
a vector of pixels with which to subset the data |
data |
a data frame |
coord.names |
a vector containing names of spatial coordinate variables (x and y, respectively) |
file |
a text string indicating the csv file from which to read data |
The function specified by part_FUN
is called internally to obtain
properly formatted subsets of the full data (i.e., partitions). Two functions
are provided in the remotePARTs
package for this purpose: part_data
and part_csv
. Both of these functions have required arguments that
must be specified through the call to fitGLS_partition
(via ...
).
Check each function's argument list and see "part_FUN
details" below
for more information.
partmat
is used to partition the data. partmat
must be a complete
matrix, without any missing or non-finite values. Columns of partmat
are
passed as the first argument part_FUN
to obtain data, which is then
passed to fitGLS
. Users are encouraged to use sample_partitions()
to obtain a valid partmat
.
The specific dimensions of partmat
can have a substantial effect on the
efficiency of fitGLS_partition
. For most systems, we do not recommend
fitting with partitions exceeding 3000 locations or pixels
(i.e., partmat(partsize = 3000, ...)
). Any larger, and the covariance
matrix inversions may become quite slow (or impossible for some machines).
It may help performance to use smaller even partitions of around 1000-2000
locations.
ncross
determines how many partitions are used to estimate cross-partition
statistics. All partitions, up to ncross
are compared with all others
in a pairwise fashion. There is no hard rule for setting mincross
. More
crosses will ensure convergence, but we believe that the default of 6
(10 total comparisons) should be sufficient for most moderate-sized maps
if 1500-3000 pixel partitions are used. This may require testing with each
individual dataset to determine at what point convergence occurs.
Covariance matrices for each partition are calculated with covar_FUN
from distances among points within the partition. Parameter values for
covar_FUN
are given by covar.pars
.
The distances among points are calculated with distm_FUN
.
distm_FUN
can be any function, modeled after geosphere::distm()
,
that satisfies both: 1) returns a distance matrix among points when a single
coordinate matrix is given as first argument; and 2) returns a matrix
containing distances between two coordinate matrices if given as the first and
second arguments.
If nugget = NA
, a ML nugget is obtained for each partition. Otherwise,
a fixed nugget is used for all partitions.
It is not required to use all partitions for cross-partition calculations, nor is it recommended to do so for most large data sets.
If progressbar = TRUE
a text progress bar shows the current status
of the calculations in the console.
a "MC_partGLS", which is a precursor to a "partGLS" object
a "partGLS" object
"partGLS" object
fitGLS_partition
returns a list object of class "partGLS" which
contains at least the following elements:
the function call
an optional list of "remoteGLS" objects, one for each partition
statistics calculated from each partition: see below for further details
statistics calculated from each pair of crossed partitions,
determined by ncross
: see below for further details
summary statistics of the overall model: see below for further details
part
is a sub-list containing the following elements
a numeric matrix of GLS coefficients for each partition
a numeric matrix of coefficient standard errors
a numeric matrix of coefficient t-statstitics
a numeric matrix of t-test pvalues
a numeric vector of nuggets for each partition
covar.pars
input vector
a numeric matrix with rows corresponding to partitions and
columns corresponding to log-likelihoods (logLik
),
sum of square error (SSE
), mean-squared error (MSE
),
regression mean-square (MSR
), F-statistics (Fstat
),
and p-values from F-tests (pval_F
)
cross
is a sub-list containing the following elements, which are use
to calculate the combined (across partitions) standard errors of the coefficient
estimates and statistical tests. See Ives et al. (2022).
a numeric matrix of cross-partition correlations in the estimates of the coefficients
a numeric vector of cross-partition correlations in the regression sum of squares
a numeric vector of cross-partition correlations in the sum of squared errors
and overall
is a sub-list containing the elements
a numeric vector of the average coefficient estimates across all partitions
a numeric vector of the average cross-partition coefficient from across all crosses
the average cross-partition correlation in the regression sum of squares
the average cross-partition correlation in the sum of squared errors
the average f-statistic across partitions
degrees of freedom to be used with partitioned GLS f-test
dimensions of partmat
if chisqr.test = TRUE
, a p-value for the correlated
chi-squared test
if do.t.test = TRUE
, a table with t-test results, including
the coefficient estimates, standard errors, t-statistics, and p-values
part_data
and part_csv
both return a list with two elements:
a dataframe, containing the data subset
a coordinate matrix for the subset
In order to be efficient and account for different user situations, parallel
processing is available natively in fitGLS_partition
. There are a few
different specifications that will result in different behavior:
When parallel = TRUE
and ncores > 1
, all calculations are done
completely in parallel (via multicore_fitGLS_partition()
).
In this case, parallelization is implemented with the
parallel
, doParallel
, and foreach
packages. In this version,
all matrix operations are serialized on each worker but multiple operations
can occur simultaneously..
When parallel = FALSE
and ncores > 1
, then most calculations
are done on a single core but matrix opperations use multiple cores. In this
case, ncores
is passed to fitGLS. In this option, it is suggested
to not exceed the number of physical cores (not threads).
When ncores <= 1
, then the calculations are completely serialized
When ncores = NA
(the default), only one core is used.
In the parallel implementation of this function, a progress bar is not possible,
so progressbar
is ignored.
part_FUN
detailspart_FUN
can be any function that satisfies the following criteria
1. the first argument of part_FUN
must accept an index of pixels by which
to subset the data;
2. part_FUN
must also accept formula
and formula0
from
fitGLS_partition
; and
3. the output of part_FUN
must be a list with at least the
following elements, which are passed to fitGLS
;
a data frame containing all variables given by formula
.
Rows should correspond to pixels specified by the first argument
a coordinate matrix or data frame. Rows should correspond to pixels specified by the first argument
Two functions that satisfy these criteria are provided by remotePARTS
:
part_data
and part_csv
.
part_data
uses an in-memory data frame (data
)
as a data source. part_csv
, instead reads data from a
csv file (file
), one partition at a time, for efficient memory usage.
part_csv
internally calls sqldf::read.csv.sql()
for fast and
efficient row extraction.
Both functions use index
to subset rows of data and formula
and
formula0
(optional) to determine which variables to select.
Both functions also use coord.names
to indicate which variables contain
spatial coordinates. The name of the x-coordinate column should always preceed
the y-coordinate column: c("x", "y")
.
Users are encouraged to write their own part_FUN
functions to meet their
needs. For example, one might be interested in using data stored in a raster
stack or any other file type. In this case, a user-defined part_FUN
function allows access to fitGLS_partition
without saving reformatted
copies of data.
Ives, A. R., L. Zhu, F. Wang, J. Zhu, C. J. Morrow, and V. C. Radeloff. in review. Statistical tests for non-independent partitions of large autocorrelated datasets. MethodsX.
Other partitionedGLS:
crosspart_GLS()
,
sample_partitions()
Other partitionedGLS:
crosspart_GLS()
,
sample_partitions()
Other partitionedGLS:
crosspart_GLS()
,
sample_partitions()
## read data data(ndvi_AK10000) df = ndvi_AK10000[seq_len(1000), ] # first 1000 rows ## create partition matrix pm = sample_partitions(nrow(df), npart = 3) ## fit GLS with fixed nugget partGLS = fitGLS_partition(formula = CLS_coef ~ 0 + land, partmat = pm, data = df, nugget = 0, do.t.test = TRUE) ## hypothesis tests chisqr(partGLS) # explanatory power of model t.test(partGLS) # significance of predictors ## now with a numeric predictor fitGLS_partition(formula = CLS_coef ~ lat, partmat = pm, data = df, nugget = 0) ## fit ML nugget for each partition (slow) (partGLS.opt = fitGLS_partition(formula = CLS_coef ~ 0 + land, partmat = pm, data = df, nugget = NA)) partGLS.opt$part$nuggets # ML nuggets # Certain model structures may not be useful: ## 0 intercept with numeric predictor (produces NAs) and gives a warning in statistical tests fitGLS_partition(formula = CLS_coef ~ 0 + lat, partmat = pm, data = df, nugget = 0) ## intercept-only, gives warning fitGLS_partition(formula = CLS_coef ~ 1, partmat = pm, data = df, nugget = 0, do.chisqr.test = FALSE) ## part_data examples part_data(1:20, CLS_coef ~ 0 + land, data = ndvi_AK10000) ## part_csv examples - ## CAUTION: examples for part_csv() include manipulation side-effects: # first, create a .csv file from ndviAK data(ndvi_AK10000) file.path = file.path(tempdir(), "ndviAK10000-remotePARTS.csv") write.csv(ndvi_AK10000, file = file.path) # build a partition from the first 30 pixels in the file part_csv(1:20, formula = CLS_coef ~ 0 + land, file = file.path) # now with a random 20 pixels part_csv(sample(3000, 20), formula = CLS_coef ~ 0 + land, file = file.path) # remove the example csv file from disk file.remove(file.path)
## read data data(ndvi_AK10000) df = ndvi_AK10000[seq_len(1000), ] # first 1000 rows ## create partition matrix pm = sample_partitions(nrow(df), npart = 3) ## fit GLS with fixed nugget partGLS = fitGLS_partition(formula = CLS_coef ~ 0 + land, partmat = pm, data = df, nugget = 0, do.t.test = TRUE) ## hypothesis tests chisqr(partGLS) # explanatory power of model t.test(partGLS) # significance of predictors ## now with a numeric predictor fitGLS_partition(formula = CLS_coef ~ lat, partmat = pm, data = df, nugget = 0) ## fit ML nugget for each partition (slow) (partGLS.opt = fitGLS_partition(formula = CLS_coef ~ 0 + land, partmat = pm, data = df, nugget = NA)) partGLS.opt$part$nuggets # ML nuggets # Certain model structures may not be useful: ## 0 intercept with numeric predictor (produces NAs) and gives a warning in statistical tests fitGLS_partition(formula = CLS_coef ~ 0 + lat, partmat = pm, data = df, nugget = 0) ## intercept-only, gives warning fitGLS_partition(formula = CLS_coef ~ 1, partmat = pm, data = df, nugget = 0, do.chisqr.test = FALSE) ## part_data examples part_data(1:20, CLS_coef ~ 0 + land, data = ndvi_AK10000) ## part_csv examples - ## CAUTION: examples for part_csv() include manipulation side-effects: # first, create a .csv file from ndviAK data(ndvi_AK10000) file.path = file.path(tempdir(), "ndviAK10000-remotePARTS.csv") write.csv(ndvi_AK10000, file = file.path) # build a partition from the first 30 pixels in the file part_csv(1:20, formula = CLS_coef ~ 0 + land, file = file.path) # now with a random 20 pixels part_csv(sample(3000, 20), formula = CLS_coef ~ 0 + land, file = file.path) # remove the example csv file from disk file.remove(file.path)
NDVI remote sensing data for 10,000 random pixels from Alaska, with rare land classes removed.
ndvi_AK10000
ndvi_AK10000
data frame with 10,000 rows corresponding to sites and 37 columns:
longitude of the pixel
latitude of the pixel
pre-calculated AR REML coefficient standardized by mean ndvi values for each pixel
pre-calculated CLS coefficient standardized by mean ndvi values for each pixel
dominant land class of the pixel
logical: is this land class rare?
ndvi value of the pixel during the year <t>
Find the maximum likelihood estimate of the nugget
optimize_nugget( X, y, V, lower = 0.001, upper = 0.999, tol = .Machine$double.eps^0.25, debug = FALSE, ncores = NA )
optimize_nugget( X, y, V, lower = 0.001, upper = 0.999, tol = .Machine$double.eps^0.25, debug = FALSE, ncores = NA )
X |
numeric (double) nxp matrix |
y |
numeric (double) nx1 column vector |
V |
numeric (double) nxn matrix |
lower |
lower boundary for nugget search |
upper |
upper boundary for nugget search |
tol |
desired accuracy of nugget search |
debug |
logical: debug mode? |
ncores |
an optional integer indicating how many CPU threads to use for matrix calculations. |
Finds the maximum likelihood nugget estimate via mathematical optimization.
To maximize efficiency, optimize_nugget()
is implemented entirely
in C++. Optimization takes place via a C++ version of the fmin
routine
(Forsythe et al 1977). Translated from http://www.netlib.org/fmm/fmin.f
The function LogLikGLS()
is optimized for nugget
. Once the
LogLikGLS()
functionality is absorbed by fitGLS()
, it will
be used instead.
maximum likelihood nugget estimate
?stats::optimize()
Chisqr test for partitioned GLS
part_chisqr(Fmean, rSSR, df1, npart)
part_chisqr(Fmean, rSSR, df1, npart)
Fmean |
mean value of F-statistic from correlated F-tests |
rSSR |
correlation among partition regression sum of squares |
df1 |
first degree of freedom for F-tests |
npart |
number of partitions |
a p-value for the correlated chisqr test
Correlated t-test for paritioned GLS
part_ttest(coefs, part.covar_coef, rcoefficients, df2, npart)
part_ttest(coefs, part.covar_coef, rcoefficients, df2, npart)
coefs |
vector average GLS coefficients |
part.covar_coef |
an array of covar_coef from each partition |
rcoefficients |
an rcoefficeints array, one for each partition |
df2 |
second degree of freedom from partitioned GLS |
npart |
number of partitions |
a list whose first element is a coefficient table with estimates, standard errors, t-statistics, and p-values and whose second element is a matrix of correlations among coefficients.
Example output from fitGLS_partition() fit to the ndvi_AK
data set
partGLS_ndviAK
partGLS_ndviAK
an S3 class "partGLS" object. See ?fitGLS_partition() for further details
S3 print method for "partGLS" objects
## S3 method for class 'partGLS' print(x, ...)
## S3 method for class 'partGLS' print(x, ...)
x |
"partGLS" object |
... |
additional arguments passed to print |
a print-formatted version of key elements of the "partGLS" object.
S3 print method for "remoteCor" class
## S3 method for class 'remoteCor' print(x, ...)
## S3 method for class 'remoteCor' print(x, ...)
x |
remoteCor object to print |
... |
additional arguments passed to print() |
a print-formatted version of key elements of the "remoteCor" object.
print method for remoteGLS
## S3 method for class 'remoteGLS' print(x, digits = max(3L, getOption("digits") - 3L), ...)
## S3 method for class 'remoteGLS' print(x, digits = max(3L, getOption("digits") - 3L), ...)
x |
remoteGLS object |
digits |
digits to print |
... |
additional arguments |
formatted output for remoteGLS object
S3 print method for remoteTS class
S3 summary method for remoteTS class
S3 print method for mapTS class
S3 summary method for mapTS class
helper summary function (matrix)
helper summary function (vector)
## S3 method for class 'remoteTS' print( x, digits = max(3L, getOption("digits") - 3L), signif.stars = getOption("show.signif.stars"), ... ) ## S3 method for class 'remoteTS' summary( object, digits = max(3L, getOption("digits") - 3L), signif.stars = getOption("show.signif.stars"), ... ) ## S3 method for class 'mapTS' print(x, digits = max(3L, getOption("digits") - 3L), ...) ## S3 method for class 'mapTS' summary( object, digits = max(3L, getOption("digits") - 3L), CL = 0.95, na.rm = TRUE, ... ) smry_funM(x, CL = 0.95, na.rm = TRUE) smry_funV(x, CL = 0.95, na.rm = TRUE)
## S3 method for class 'remoteTS' print( x, digits = max(3L, getOption("digits") - 3L), signif.stars = getOption("show.signif.stars"), ... ) ## S3 method for class 'remoteTS' summary( object, digits = max(3L, getOption("digits") - 3L), signif.stars = getOption("show.signif.stars"), ... ) ## S3 method for class 'mapTS' print(x, digits = max(3L, getOption("digits") - 3L), ...) ## S3 method for class 'mapTS' summary( object, digits = max(3L, getOption("digits") - 3L), CL = 0.95, na.rm = TRUE, ... ) smry_funM(x, CL = 0.95, na.rm = TRUE) smry_funV(x, CL = 0.95, na.rm = TRUE)
x |
numeric matrix |
digits |
significant digits to show |
signif.stars |
logical, passed to |
... |
additional parameters passed to further print methods |
object |
mapTS object |
CL |
confidence level (default = .95) |
na.rm |
logical, should observations with NA be removed? |
returns formatted output
returns formatted output, including summary stats
returns formatted output
returns formatted summary stats
summary statistics for each column including quartiles, mean, and upper and lower confidence levels (given by CL)
summary statistics including quartiles, mean, and upper and lower confidence levels (given by CL)
# simulate dummy data time.points = 9 # time series length map.width = 5 # square map width coords = expand.grid(x = 1:map.width, y = 1:map.width) # coordinate matrix ## create empty spatiotemporal variables: X <- matrix(NA, nrow = nrow(coords), ncol = time.points) # response Z <- matrix(NA, nrow = nrow(coords), ncol = time.points) # predictor # setup first time point: Z[, 1] <- .05*coords[,"x"] + .2*coords[,"y"] X[, 1] <- .5*Z[, 1] + rnorm(nrow(coords), 0, .05) #x at time t ## project through time: for(t in 2:time.points){ Z[, t] <- Z[, t-1] + rnorm(map.width^2) X[, t] <- .2*X[, t-1] + .1*Z[, t] + .05*t + rnorm(nrow(coords), 0 , .25) } ## Pixel CLS tmp.df = data.frame(x = X[1, ], t = nrow(X), z = Z[1, ]) CLS <- fitCLS(x ~ z, data = tmp.df) print(CLS) summary(CLS) residuals(CLS) coef(CLS) logLik(CLS) fitted(CLS) # plot(CLS) # doesn't work ## Pixel AR AR <- fitAR(x ~ z, data = tmp.df) print(AR) summary(AR) coef(AR) residuals(AR) logLik(AR) fitted(AR) # plot(AR) # doesn't work ## Map CLS CLS.map <- fitCLS_map(X, coords, y ~ Z, X.list = list(Z = Z), lag.x = 0, resids.only = TRUE) print(CLS.map) summary(CLS.map) residuals(CLS.map) # plot(CLS.map)# doesn't work CLS.map <- fitCLS_map(X, coords, y ~ Z, X.list = list(Z = Z), lag.x = 0, resids.only = FALSE) print(CLS.map) summary(CLS.map) coef(CLS.map) residuals(CLS.map) # logLik(CLS.map) # doesn't work fitted(CLS.map) # plot(CLS.map) # doesn't work ## Map AR AR.map <- fitAR_map(X, coords, y ~ Z, X.list = list(Z = Z), resids.only = TRUE) print(AR.map) summary(AR.map) residuals(AR.map) # plot(AR.map)# doesn't work AR.map <- fitAR_map(X, coords, y ~ Z, X.list = list(Z = Z), resids.only = FALSE) print(AR.map) summary(AR.map) coef(AR.map) residuals(AR.map) # logLik(AR.map) # doesn't work fitted(AR.map) # plot(AR.map) # doesn't work
# simulate dummy data time.points = 9 # time series length map.width = 5 # square map width coords = expand.grid(x = 1:map.width, y = 1:map.width) # coordinate matrix ## create empty spatiotemporal variables: X <- matrix(NA, nrow = nrow(coords), ncol = time.points) # response Z <- matrix(NA, nrow = nrow(coords), ncol = time.points) # predictor # setup first time point: Z[, 1] <- .05*coords[,"x"] + .2*coords[,"y"] X[, 1] <- .5*Z[, 1] + rnorm(nrow(coords), 0, .05) #x at time t ## project through time: for(t in 2:time.points){ Z[, t] <- Z[, t-1] + rnorm(map.width^2) X[, t] <- .2*X[, t-1] + .1*Z[, t] + .05*t + rnorm(nrow(coords), 0 , .25) } ## Pixel CLS tmp.df = data.frame(x = X[1, ], t = nrow(X), z = Z[1, ]) CLS <- fitCLS(x ~ z, data = tmp.df) print(CLS) summary(CLS) residuals(CLS) coef(CLS) logLik(CLS) fitted(CLS) # plot(CLS) # doesn't work ## Pixel AR AR <- fitAR(x ~ z, data = tmp.df) print(AR) summary(AR) coef(AR) residuals(AR) logLik(AR) fitted(AR) # plot(AR) # doesn't work ## Map CLS CLS.map <- fitCLS_map(X, coords, y ~ Z, X.list = list(Z = Z), lag.x = 0, resids.only = TRUE) print(CLS.map) summary(CLS.map) residuals(CLS.map) # plot(CLS.map)# doesn't work CLS.map <- fitCLS_map(X, coords, y ~ Z, X.list = list(Z = Z), lag.x = 0, resids.only = FALSE) print(CLS.map) summary(CLS.map) coef(CLS.map) residuals(CLS.map) # logLik(CLS.map) # doesn't work fitted(CLS.map) # plot(CLS.map) # doesn't work ## Map AR AR.map <- fitAR_map(X, coords, y ~ Z, X.list = list(Z = Z), resids.only = TRUE) print(AR.map) summary(AR.map) residuals(AR.map) # plot(AR.map)# doesn't work AR.map <- fitAR_map(X, coords, y ~ Z, X.list = list(Z = Z), resids.only = FALSE) print(AR.map) summary(AR.map) coef(AR.map) residuals(AR.map) # logLik(AR.map) # doesn't work fitted(AR.map) # plot(AR.map) # doesn't work
remoteGLS constructor (S3)
remoteGLS(formula, formula0, no.F = FALSE)
remoteGLS(formula, formula0, no.F = FALSE)
formula |
optional argument specifying the GLS formula |
formula0 |
optional argument specifying the null GLS formula |
no.F |
optional argument specifying the no.F attribute |
an empty S3 object of class "remoteGLS"
Create a matrix whose columns contain indices of non-overlapping random samples.
sample_partitions( npix, npart = 10, partsize = NA, pixels = NA, verbose = FALSE )
sample_partitions( npix, npart = 10, partsize = NA, pixels = NA, verbose = FALSE )
npix |
number of pixels in full dataset |
npart |
number of partitions to create |
partsize |
size of each partition |
pixels |
vector of pixel indexes to sample from |
verbose |
logical: TRUE prints additional info |
If both npart
and partsize
is specified, a partition matrix with
these dimensions is returned. If only npart
, is specified,
partsize
is selected as the largest integer possible that creates
equal sized partitions. Similarly, if only npart = NA
, then npart
is selected to obtain as many partitions as possible.
sample_partitions
returns a matrix with partsize
rows and npart
columns. Columns contain random, non-overlapping samples
from 1:npix
Other partitionedGLS:
MC_GLSpart()
,
crosspart_GLS()
# dummy data with 100 pixels and 20 time points dat.M <- matrix(rnorm(100*20), ncol = 20) # 4 partitions (exhaustive) sample_partitions(npix = nrow(dat.M), npart = 4) # partitions with 10 pixels each (exhaustive) sample_partitions(npix = nrow(dat.M), partsize = 10) # 4 partitions each with 10 pixels (non-exhaustive, produces warning) sample_partitions(npix = nrow(dat.M), npart = 4, partsize = 10) # index of 50 pixels to use as subset sub.indx <- c(1:10, 21:25, 30:62, 70:71) # 5 partitions (exhaustive) from only the specified pixel subset sample_partitions(npix = nrow(dat.M), npart = 5, pixels = sub.indx)
# dummy data with 100 pixels and 20 time points dat.M <- matrix(rnorm(100*20), ncol = 20) # 4 partitions (exhaustive) sample_partitions(npix = nrow(dat.M), npart = 4) # partitions with 10 pixels each (exhaustive) sample_partitions(npix = nrow(dat.M), partsize = 10) # 4 partitions each with 10 pixels (non-exhaustive, produces warning) sample_partitions(npix = nrow(dat.M), npart = 4, partsize = 10) # index of 50 pixels to use as subset sub.indx <- c(1:10, 21:25, 30:62, 70:71) # 5 partitions (exhaustive) from only the specified pixel subset sample_partitions(npix = nrow(dat.M), npart = 5, pixels = sub.indx)
Conduct a correlated t-test of a partitioned GLS
## S3 method for class 'partGLS' t.test(x, ...)
## S3 method for class 'partGLS' t.test(x, ...)
x |
"partGLS" object |
... |
additional arguments passed to print |
a list whose first element is a coefficient table with estimates, standard errors, t-statistics, and p-values and whose second element is a matrix of correlations among coefficients.
Test passing a covariance function and arguments
test_covar_fun(d, covar_FUN = "covar_exppow", covar.pars = list(range = 0.5))
test_covar_fun(d, covar_FUN = "covar_exppow", covar.pars = list(range = 0.5))
d |
numeric vector or matrix of distances |
covar_FUN |
distance-based covariance function to use,
which must take |
covar.pars |
vector or list of parameters (other than d) passed to the covar function |