Title: | The Shuffle Estimator for Explainable Variance |
---|---|
Description: | Implementation of the shuffle estimator, a non-parametric estimator for signal and noise variance under mild noise correlations. |
Authors: | Yuval Benjamini |
Maintainer: | Yuval Benjamini <[email protected]> |
License: | GPL (>= 2.0) |
Version: | 1.0.1 |
Built: | 2024-11-17 03:38:48 UTC |
Source: | https://github.com/cran/shuffle |
This package implements the algorithms underlying the shuffle estimators, variance estimators for one-way analysis of variance designs. The estimators can overcome correlated noise by recomputing the mean-square-between statistics on a permuted version of the data. The permutations should preserve the noise covariance matrix, but a parametric model for the noise covariance is not necessary. For more details see Benjamini and Yu, and here http://statweb.stanford.edu/~yuvalben.
Two functions implement the important stages of estimation:
prepareShuffle(design_vec, premutation), which preprocess the design and computes the normalization constant for a given permutation.
estimateShuffle(response_vec, prepare), which estimates variances and effect sizes for a specific data vector.
Package: | shuffle |
Type: | Package |
Version: | 1.0.1 |
Date: | 2013-4-24 |
License: | GPL (>= 2) |
Yuval Benjamini <[email protected]>
Benjamini and Yu (2013), "The shuffle estimator for explainable variance in fMRI experiments", Annals of Applied Statistics 7 (4) http://projecteuclid.org/euclid.aoas/1387823308
data(design_vec,fMRI_responses,prediction_res) # Make example shorter - for paper example use T = ncol(fMRI_responses) T = 156*4 fMRI_responses_sm = fMRI_responses[,1:T] design_sm = design_vec[1:T] permutation = rev(1:T) prep_shuffle = prepareShuffle(design_sm,permutation) var_explained = numeric(nrow(fMRI_responses_sm)) for (i in 1:nrow(fMRI_responses_sm)) { var_explained[i] = estimateShuffle(fMRI_responses_sm[i,],prep_shuffle)$effect } plot(var_explained, pmax(prediction_res,0)^2, xlim = c(0,0.7), ylim = c(0,0.7), xlab = "Explainable variance", ylab = "Corr^2") abline(0,1,col=4)
data(design_vec,fMRI_responses,prediction_res) # Make example shorter - for paper example use T = ncol(fMRI_responses) T = 156*4 fMRI_responses_sm = fMRI_responses[,1:T] design_sm = design_vec[1:T] permutation = rev(1:T) prep_shuffle = prepareShuffle(design_sm,permutation) var_explained = numeric(nrow(fMRI_responses_sm)) for (i in 1:nrow(fMRI_responses_sm)) { var_explained[i] = estimateShuffle(fMRI_responses_sm[i,],prep_shuffle)$effect } plot(var_explained, pmax(prediction_res,0)^2, xlim = c(0,0.7), ylim = c(0,0.7), xlab = "Explainable variance", ylab = "Corr^2") abline(0,1,col=4)
The design vector for the validation data in an fMRI experiment. At time t the image design_vec[t] was shown.
data(design_vec)
data(design_vec)
The format is: num [1:1560] 2 3 6 1 11 2 2 7 2 10 ...
The design vector for the validation data in an fMRI experiment. The experiment consisted of 1560 timeframes, 120 images each repeated 13 times. The imshrd were organized into 10 separate blocks, each repeating 12 images.
Kay, Naselaris, Prenger and Gallant (2008), "Identifying natural images from human brain activity"
Benjamini and Yu (2013), "The shuffle estimator for explainable variance"
data(design_vec) plot(design_vec,xlab = "event", ylab = "treatment", main="Design of the full experiment" ) plot(design_vec[1:120],xlab = "event",ylab= "treatment",main="Design of a single block")
data(design_vec) plot(design_vec,xlab = "event", ylab = "treatment", main="Design of the full experiment" ) plot(design_vec[1:120],xlab = "event",ylab= "treatment",main="Design of a single block")
estimateShuffle estimates the following quantities for a response vector: the signal variance (signalVar), the noise variance (noiseVar), the total variance (YVar), and the explainable variance (effect). Inputs to the function are the response vector, and a preprocessing structure (the output of prepareShuffle) which holds the design, the shuffle permutation, and the calculated normalizer.
estimateShuffle(dat, prep, neg = FALSE)
estimateShuffle(dat, prep, neg = FALSE)
dat |
A vector of reponses - should be of the same size as the design vector and the shuffle permutation. |
prep |
The output of prepareShuffle; includes the design, the shuffling permuation, and a normalizer. |
neg |
If neg=FALSE does not allow the signal variance to get arbitrary negative values, but instead sets signal variance to -1e-05. |
estimateShuffle compares the mean-square-between of the data to the mean-square-between of the permuted data, the difference being the scaled noise variance. Effect size is the ratio between the estimated signal data and the estimated total variance.
signalVar |
The estimated variance of the signal |
noiseVar |
The estimated variance of the noise |
YVar |
The estimated total variance |
effect |
The proportion of explainable variance (signalVar/Yvar) |
Yuval Benjamini
Benjamini and Yu (2013), "The shuffle estimator for explainable variance in fMRI experiments".
A 30x1560 data matrix consisting of a sample of 30 fMRI responses from the visual cortex V1 of a human subject, to a sequence of natural images. The treatment index is found in design_vec.
data(fMRI_responses)
data(fMRI_responses)
A 30x1560 matrix of real values. Each row corresponds to a different voxel.
Measurement were recorded as a person was watching a sequence of natural images. Each column corresponds to a displayed image; each row corresponds to the response of a single voxel in the fMRI scan. This data consists of a small subset of the voxels in V1 from the original scans. The data was recorded in the Gallant lab at UC Berkeley.
Kay, Naselaris, Prenger and Gallant (2008), "Identifying natural images from human brain activity"
Benjamini and Yu (2013), "The shuffle estimator for explainable variance in fMRI experiments"
getAveraging(des) converts a design (either a vector or a matrix) into averaging matrix notation (from the paper). For a response vector Y, (B Y)[t] is the mean of all responses corresponding to the treatment at time t, and (B-G)Y [t] is the averaged-removed treatment mean.
getAveraging(des)
getAveraging(des)
des |
Either a vector or a matrix representation of design (see designVec2Mat and designMat2Vec). |
A list with components
m |
The number of treatments |
ns |
An m-length vector with the number of repeats for each treatment. For balanced designs with n repeats, ns=rep(n,m) |
B |
The averaging matrix according to the design |
G |
1/T for T the number of measurements |
Yuval Benjamini
Benjamini and Yu (2013).
data(design_vec) design_avg = getAveraging(design_vec) rand_resp = rnorm(length(design_vec)) global_mean = mean(rand_resp[design_vec != 0 ]) first_treatment_mean= mean(rand_resp[design_vec == design_vec[1]]) cat((design_avg$B %*% rand_resp)[1], first_treatment_mean ) cat(((design_avg$B-design_avg$G) %*% rand_resp)[1], first_treatment_mean- global_mean)
data(design_vec) design_avg = getAveraging(design_vec) rand_resp = rnorm(length(design_vec)) global_mean = mean(rand_resp[design_vec != 0 ]) first_treatment_mean= mean(rand_resp[design_vec == design_vec[1]]) cat((design_avg$B %*% rand_resp)[1], first_treatment_mean ) cat(((design_avg$B-design_avg$G) %*% rand_resp)[1], first_treatment_mean- global_mean)
Computes the normalizer 1/(1-alpha) for a given design and permutation. The shuffle estimator is [MSbet(Y) - MSbet(PY)]*normalizer. \ We prefer the normalizer to be close to 1.
getNormalizer(avgmat, perm)
getNormalizer(avgmat, perm)
avgmat |
The output of getAveraging. |
perm |
The shuffling permutation. |
Under balanced designs, the normalizer = 1/[1-alpha]. More generally, we call facA = 1 and facB = alpha(design, permutation).
norm |
The value by which to correct the difference of variances [1/(facA-facB)] |
facA |
The signal coefficient of the original design, should be 1 |
facB |
The signal variance coefficient of the permuted design |
Yuval Benjamini
Benjamini and Yu (2013).
data(design_vec) # Make example shorter - for paper example use T = ncol(fMRI_responses) = 156*10 T = 156*4 design_sm = design_vec[1:T] identity_perm = 1:T reverse_perm = rev(identity_perm) shift_perm = c(2:T, 1) design_avg = getAveraging(design_sm) identity_norm = getNormalizer(design_avg, identity_perm) print('For the identity, we cannot get an estimator') print(sprintf('facA(1) %.3f, facB(alpha) %.3f, normalizer %.3f ', identity_norm$facA, identity_norm$facB, identity_norm$norm)) reverse_norm = getNormalizer(design_avg, reverse_perm) print('For the reverse, we get a normalizer close to 1 ') print(sprintf('facA(1) %.3f, facB(alpha) %.3f, normalizer %.3f ', reverse_norm$facA, reverse_norm$facB, reverse_norm$norm)) shift_norm = getNormalizer(design_avg, shift_perm) print('The shift mixes across blocks. The normalizer is smaller, but assumptions may not hold') print(sprintf('facA(1) %.3f, facB(alpha) %.3f, normalizer %.3f ', shift_norm$facA, shift_norm$facB, shift_norm$norm))
data(design_vec) # Make example shorter - for paper example use T = ncol(fMRI_responses) = 156*10 T = 156*4 design_sm = design_vec[1:T] identity_perm = 1:T reverse_perm = rev(identity_perm) shift_perm = c(2:T, 1) design_avg = getAveraging(design_sm) identity_norm = getNormalizer(design_avg, identity_perm) print('For the identity, we cannot get an estimator') print(sprintf('facA(1) %.3f, facB(alpha) %.3f, normalizer %.3f ', identity_norm$facA, identity_norm$facB, identity_norm$norm)) reverse_norm = getNormalizer(design_avg, reverse_perm) print('For the reverse, we get a normalizer close to 1 ') print(sprintf('facA(1) %.3f, facB(alpha) %.3f, normalizer %.3f ', reverse_norm$facA, reverse_norm$facB, reverse_norm$norm)) shift_norm = getNormalizer(design_avg, shift_perm) print('The shift mixes across blocks. The normalizer is smaller, but assumptions may not hold') print(sprintf('facA(1) %.3f, facB(alpha) %.3f, normalizer %.3f ', shift_norm$facA, shift_norm$facB, shift_norm$norm))
MSbetAvg calculates the mean-square-between contrast according to the design vector. Responses for each condition are averaged, and the sample variance is calculated for these averages.
MSbetAvg(dat, avgmat)
MSbetAvg(dat, avgmat)
dat |
The vector of measurements on which the constrast is computed. |
avgmat |
The design parameters, as extracted by getAveraging(). |
The value of the quadratic contrast computed on the data vector.
Yuval Benjamini
data(fMRI_responses,design_vec) msbet = MSbetAvg(fMRI_responses[1,], getAveraging(design_vec)) # Compute same value using "aov" when design is balanced ... ns =tapply(design_vec,design_vec, length) # (check that design is balanced) stopifnot(length(unique(ns))==1) m = length(unique(design_vec)) aov_sum = summary(aov(fMRI_responses[1,] ~ factor(design_vec))) ss_bet = aov_sum[[1]][1,2] # In unbalanced designs, each example should require more care... msbet_aov = (ss_bet / ns[1] )/(m-1) cat(msbet, msbet_aov)
data(fMRI_responses,design_vec) msbet = MSbetAvg(fMRI_responses[1,], getAveraging(design_vec)) # Compute same value using "aov" when design is balanced ... ns =tapply(design_vec,design_vec, length) # (check that design is balanced) stopifnot(length(unique(ns))==1) m = length(unique(design_vec)) aov_sum = summary(aov(fMRI_responses[1,] ~ factor(design_vec))) ss_bet = aov_sum[[1]][1,2] # In unbalanced designs, each example should require more care... msbet_aov = (ss_bet / ns[1] )/(m-1) cat(msbet, msbet_aov)
The correlation between measured response and predicted response on validation data for 1250 V1 voxels. The prediction algorithms are described in detail in Kay et al, 2008. We compare these prediction results to the explainable variance estimated with the shuffle estimator.
data(prediction_res)
data(prediction_res)
A numerical vector of correlation coefficients.
Kay, Naselaris, Prenger and Gallant (2008), "Identifying natural images from human brain activity"
Benjamini and Yu (2013)
prepareShuffle computes the averaging matrices and normalizing constants for the shuffle estimator. It can be run once for all data vectors sharing the design.
prepareShuffle(des, perm)
prepareShuffle(des, perm)
des |
A design vector or matrix |
perm |
The shuffling permutation |
m |
The number of treatments |
ns |
An m-length vector with the number of repeats for each treatment. For balanced designs with n repeats, ns=rep(n,m) |
B |
The averaging matrix according to the design |
G |
1/T for T the number of measurements |
norm |
The value by which to correct the difference of variances [1/(facA-facB)] |
facA |
The signal coefficient of the original design |
facB |
The signal variance coefficient of the permuted design |
Yuval Benjamini
Benjamini and Yu (2013)
getAverage getNormalizer