| Version: | 1.1.0 |
| Date: | 2025-10-21 |
| Title: | Mixed-Effects Models Enhancing Functions |
| Maintainer: | Giovanny Covarrubias-Pazaran <cova_ruber@live.com.mx> |
| Description: | Special functions that enhance other mixed effect model packages by creating overlayed, reduced rank, and reduced model matrices together with multiple data sets to practice the use of these models. For more details see Covarrubias-Pazaran (2016) <doi:10.1371/journal.pone.0156744>. |
| Depends: | R(≥ 3.5.0), Matrix (≥ 1.0), MASS, methods, crayon |
| LazyLoad: | yes |
| LazyData: | yes |
| License: | GPL-2 | GPL-3 [expanded from: GPL (≥ 2)] |
| Author: | Giovanny Covarrubias-Pazaran
|
| Repository: | CRAN |
| Suggests: | rmarkdown, knitr, orthopolynom, RSpectra, evola, lme4breeding, sommer |
| Config/testthat/edition: | 3 |
| NeedsCompilation: | no |
| Packaged: | 2025-11-12 02:38:20 UTC; giovannycovarrubias |
| Date/Publication: | 2025-11-17 09:00:02 UTC |
Additive relationship matrix
Description
Calculates the realized additive relationship matrix (R implementation).
Usage
A.matr(X,min.MAF=NULL)
Arguments
X |
Matrix ( |
min.MAF |
Minimum minor allele frequency. The A matrix is not sensitive to rare alleles, so by default only monomorphic markers are removed. |
Details
For vanraden method: the marker matrix is centered by subtracting column means M= X - ms where ms is the coumn means. Then A=M M'/c, where c = \sum_k{d_k}/k, the mean value of the diagonal values of the M M' portion.
Value
If return.imputed = FALSE, the n \times n additive relationship matrix is returned.
If return.imputed = TRUE, the function returns a list containing
- $A
the A matrix
References
Giovanny Covarrubias-Pazaran (2024). lme4breeding: enabling genetic evaluation in the age of genomic data. To be submitted to Bioinformatics.
Douglas Bates, Martin Maechler, Ben Bolker, Steve Walker (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1-48.
Examples
## random population of 200 lines with 1000 markers
X <- matrix(rep(0,200*1000),200,1000)
for (i in 1:200) {
X[i,] <- ifelse(runif(1000)<0.5,-1,1)
}
A <- A.matr(X)
## take a look at the Genomic relationship matrix
colfunc <- colorRampPalette(c("steelblue4","springgreen","yellow"))
hv <- heatmap(A[1:15,1:15], col = colfunc(100),Colv = "Rowv")
str(hv)
DT_augment design example.
Description
This dataset contains phenotpic data for one trait evaluated in the experimental design known as augmented design. This model allows to obtain BLUPs for genotypes that are unreplicated by dividing the field in blocks and replicating 'check genotypes' in the blocks and unreplicated genotypes randomly within the blocks. The presence of check genotypes (usually cultivars) allows the adjustment of unreplicated genotypes.
Usage
data("DT_augment")
Format
The format is: chr "DT_augment"
Source
This data was generated by a potato study.
References
Covarrubias-Pazaran G (2016) Genome assisted prediction of quantitative traits using the R package sommer. PLoS ONE 11(6): doi:10.1371/journal.pone.0156744
Giovanny Covarrubias-Pazaran (2024). lme4breeding: enabling genetic evaluation in the age of genomic data. To be submitted to Bioinformatics.
Douglas Bates, Martin Maechler, Ben Bolker, Steve Walker (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1-48.
Examples
## AUGMENTED DESIGN EXAMPLE
data(DT_augment)
DT <- DT_augment
head(DT)
############## sommer ##################
if(requireNamespace("sommer")){
library(sommer)
mix1 <- mmes(TSW ~ Check.Gen,
random = ~ Block + Genotype:Check,
data=DT)
summary(mix1)$varcomp
}
############## lme4breeding ##################
if(requireNamespace("lme4breeding")){
library(lme4breeding)
## fit the mixed model and check summary
mix <- lmeb(TSW ~ Check.Gen + (1|Block) + (1|Genotype:Check),
data=DT)
vc <- VarCorr(mix); print(vc,comp=c("Variance"))
sigma(mix)^2 # error variance
BLUP <- ranef(mix, condVar=TRUE)
condVAR <- lapply(BLUP, function(x){attr(x, which="postVar")}) # take sqrt() for SEs
}
############## end ##################
Big GxE Quantitative Genetic Simulation
Description
A data frame and genetic markers for a population of 1000 individuals evaluated in 25 environments and 2 replicates per environment giving a total of 50 K records to show how to fit big GxE models with genomic information.
Usage
data("DT_big")
Format
The format is: chr "DT_big"
References
Giovanny Covarrubias-Pazaran (2024). lme4breeding: enabling genetic evaluation in the age of genomic data. To be submitted to Bioinformatics.
Douglas Bates, Martin Maechler, Ben Bolker, Steve Walker (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1-48.
Examples
data(DT_big)
DT = DT_big
M = apply(M_big,2,as.numeric)
rownames(M) <- rownames(M_big)
# compute the relationship matrix
MMT <- tcrossprod(M)
m <- sum(diag(MMT))/nrow(MMT)
MMT <- MMT/m
MMT <- MMT + diag(1e-05, ncol(MMT), ncol(MMT))
# let's fit the GxE model (diagonal first)
DT[,"envf_repf"] = paste(DT[,"envf"],DT[,"repf"],sep="_")
DT=DT[with(DT, order(envf_repf)), ]
if(requireNamespace("lme4breeding")){
library(lme4breeding)
# fit the models
mixm2 <- lmeb(value ~ (1|envf_repf) + (0+envf_repf||id),
data = DT,
relmat = list(id = MMT),
control = lmerControl(
restart_edge = FALSE
), nIters=1000,
verbose=1L, rotation=TRUE
)
vc <- VarCorr(mixm2); print(vc,comp=c("Variance"))
ran2 = ranef(mixm2, condVar=FALSE)
H0 <- ran2$id
}
Blue Tit Data for a Quantitative Genetic Experiment
Description
a data frame with 828 rows and 7 columns, with variables tarsus length (tarsus) and colour (back) measured on 828 individuals (animal). The mother of each is also recorded (dam) together with the foster nest (fosternest) in which the chicks were reared. The date on which the first egg in each nest hatched (hatchdate) is recorded together with the sex (sex) of the individuals.
Usage
data("DT_btdata")
Format
The format is: chr "DT_btdata"
References
Covarrubias-Pazaran G (2016) Genome assisted prediction of quantitative traits using the R package sommer. PLoS ONE 11(6): doi:10.1371/journal.pone.0156744
Giovanny Covarrubias-Pazaran (2024). lme4breeding: enabling genetic evaluation in the age of genomic data. To be submitted to Bioinformatics.
Douglas Bates, Martin Maechler, Ben Bolker, Steve Walker (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1-48.
Examples
data(DT_btdata)
DT <- DT_btdata
head(DT)
############# sommer ################
if(requireNamespace("sommer")){
library(sommer)
mix4 <- mmes(tarsus ~ sex,
random = ~ dam + fosternest,
rcov=~units,
data = DT)
summary(mix4)$varcomp
# MULTI-TRAIT EXAMPLE
traits <- c("tarsus","back","hatchdate")
DT[,traits] <- apply(DT[,traits],2,scale)
DTL <- reshape(DT[,c("animal","sex","dam","fosternest", traits)],
idvar = c("animal","sex","dam","fosternest"),
varying = traits,
v.names = "value", direction = "long",
timevar = "trait", times = traits )
DTL <- DTL[with(DTL, order(trait,animal)), ]
head(DTL)
mix3 <- mmes(value ~ trait:sex - 1, henderson=TRUE,
random = ~ vsm(usm(trait),ism(dam)) +
vsm(usm(trait), ism(fosternest)),
rcov= ~ vsm(dsm(trait),ism(units)),
data = DTL)
summary(mix3)$varcomp
#### calculate the genetic correlation
cov2cor(mix3$theta[[1]])
cov2cor(mix3$theta[[2]])
}
############# lme4breeding ################
if(requireNamespace("lme4breeding")){
library(lme4breeding)
mix4 <- lmeb(tarsus ~ sex + (1|dam) + (1|fosternest),
data = DT)
vc <- VarCorr(mix4); print(vc,comp=c("Variance"))
sigma(mix4)^2 # error variance
BLUP <- ranef(mix4, condVar=TRUE)
PEV <- lapply(BLUP, function(x){attr(x, which="postVar")}) # take sqrt() for SEs
### multi-trait model
traits <- c("tarsus" ,"back", "hatchdate")
for(iTrait in traits){DT[,iTrait] <- scale(DT[,iTrait])}
DTL <- reshape(DT[,c("animal", traits)], idvar = "animal", varying = traits,
v.names = "value", direction = "long",
timevar = "trait", times = traits )
DTL <- merge(DTL, unique(DT[,c("animal","dam","fosternest","sex")]),
by="animal", all.x = TRUE)
DTL <- DTL[with(DTL, order(trait)), ]
head(DTL)
system.time(
mix <- lmeb(value ~ (0+trait|dam),
# relmat = list(geno=A),
# rotation = TRUE,
data=DTL)
)
vc <- VarCorr(mix); print(vc,comp=c("Variance"))
cov2cor(vc$dam)
sigma(mix)^2 # error variance
}
############# end ################
Corn crosses and markers
Description
This dataset contains phenotpic data for plant height and grain yield for 100 out of 400 possible hybrids originated from 40 inbred lines belonging to 2 heterotic groups, 20 lines in each, 1600 rows exist for the 400 possible hybrids evaluated in 4 locations but only 100 crosses have phenotypic information. The purpose of this data is to show how to predict the other 300 crosses.
The data contains 3 elements. The first is the phenotypic data and the parent information for each cross evaluated in the 4 locations. 1200 rows should have missing data but the 100 crosses performed were chosen to be able to estimate the GCA and SCA effects of everything.
The second element of the data set is the phenotypic data and other relevant information for the 40.
The third element is the genomic relationship matrix for the 40 inbred lines originated from 511 SNP markers and calculated using the A.mat function.
Usage
data("DT_cornhybrids")
Format
The format is: chr "DT_cornhybrids"
Source
This data was generated by a corn study.
References
Covarrubias-Pazaran G (2016) Genome assisted prediction of quantitative traits using the R package sommer. PLoS ONE 11(6): doi:10.1371/journal.pone.0156744
Giovanny Covarrubias-Pazaran (2024). lme4breeding: enabling genetic evaluation in the age of genomic data. To be submitted to Bioinformatics.
Douglas Bates, Martin Maechler, Ben Bolker, Steve Walker (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1-48.
Examples
data(DT_cornhybrids)
DT <- DT_cornhybrids
GT <- GT_cornhybrids
A <- GT
K1 <- A[levels(DT$GCA1), levels(DT$GCA1)]; dim(K1)
K2 <- A[levels(DT$GCA2), levels(DT$GCA2)]; dim(K2)
S <- kronecker(K1, K2) ; dim(S)
rownames(S) <- colnames(S) <- levels(DT$SCA)
################ sommer ###################
if(requireNamespace("sommer")){
library(sommer)
ans <- mmes(Yield ~ Location,
random = ~ vsm(ism(GCA1),Gu=K1) + vsm(ism(GCA2),Gu=K2), # + vsm(ism(SCA),Gu=S),
rcov=~units,
data=DT)
summary(ans)$varcomp
## mmec uses the inverse of the relationship matrix
K1i <- solve(K1 + diag(1e-4,ncol(K1),ncol(K1)))
K1i <- as(as(as( K1i, "dMatrix"), "generalMatrix"), "CsparseMatrix")
attr(K1i, 'inverse')=TRUE
K2i <- solve(K2 + diag(1e-4,ncol(K2),ncol(K2)))
K2i <- as(as(as( K2i, "dMatrix"), "generalMatrix"), "CsparseMatrix")
attr(K2i, 'inverse')=TRUE
Si <- solve(S + diag(1e-4,ncol(S),ncol(S)))
Si <- as(as(as( Si, "dMatrix"), "generalMatrix"), "CsparseMatrix")
attr(Si, 'inverse')=TRUE
ans2 <- mmes(Yield ~ Location,
random = ~ vsm(ism(GCA1),Gu=K1i) + vsm(ism(GCA2),Gu=K2i), # + vsm(ism(SCA),Gu=Si),
henderson=TRUE,
rcov=~units,
data=DT)
summary(ans2)$varcomp
}
################ lme4breeding ###################
if(requireNamespace("lme4breeding")){
library(lme4breeding)
ans <- lmeb(Yield ~ Location + (1| GCA1) + (1|GCA2),
relmat = list(GCA1=K1,
GCA2=K2),
data=DT)
vc <- VarCorr(ans); print(vc,comp=c("Variance"))
sigma(ans)^2 # error variance
BLUP <- ranef(ans, condVar=TRUE)
condVAR <- lapply(BLUP, function(x){attr(x, which="postVar")}) # take sqrt() for SEs
}
Genotypic and Phenotypic data for a CP population
Description
A CP population or F1 cross is the designation for a cross between 2 highly heterozygote individuals; i.e. humans, fruit crops, bredding populations in recurrent selection.
This dataset contains phenotpic data for 363 siblings for an F1 cross. These are averages over 2 environments evaluated for 4 traits; color, yield, fruit average weight, and firmness. The columns in the CPgeno file are the markers whereas the rows are the individuals. The CPpheno data frame contains the measurements for the 363 siblings, and as mentioned before are averages over 2 environments.
Usage
data("DT_cpdata")
Format
The format is: chr "DT_cpdata"
Source
This data was simulated for fruit breeding applications.
References
Covarrubias-Pazaran G (2016) Genome assisted prediction of quantitative traits using the R package sommer. PLoS ONE 11(6): doi:10.1371/journal.pone.0156744
Giovanny Covarrubias-Pazaran (2024). lme4breeding: enabling genetic evaluation in the age of genomic data. To be submitted to Bioinformatics.
Douglas Bates, Martin Maechler, Ben Bolker, Steve Walker (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1-48.
Giovanny Covarrubias-Pazaran (2024). evola: a simple evolutionary algorithm for complex problems. To be submitted to Bioinformatics.
Gaynor, R. Chris, Gregor Gorjanc, and John M. Hickey. 2021. AlphaSimR: an R package for breeding program simulations. G3 Gene|Genomes|Genetics 11(2):jkaa017. https://doi.org/10.1093/g3journal/jkaa017.
Chen GK, Marjoram P, Wall JD (2009). Fast and Flexible Simulation of DNA Sequence Data. Genome Research, 19, 136-142. http://genome.cshlp.org/content/19/1/136.
Examples
data(DT_cpdata)
DT <- DT_cpdata
GT <- GT_cpdata
MP <- MP_cpdata
## create the variance-covariance matrix
A <- A.matr(GT) # additive relationship matrix
A <- A + diag(1e-4, ncol(A), ncol(A))
## look at the data and fit the model
head(DT)
######## sommer #########
if(requireNamespace("sommer")){
library(sommer)
mix1 <- mmes(Yield~1, henderson=FALSE,
random=~vsm(ism(id),Gu=A)
+ Rowf + Colf,
rcov=~units,
data=DT)
summary(mix1)$varcomp
## mmec uses the inverse of the relationship matrix
Ai <- solve(A + diag(1e-4,ncol(A),ncol(A)))
Ai <- as(as(as( Ai, "dMatrix"), "generalMatrix"), "CsparseMatrix")
attr(Ai, 'inverse')=TRUE
mix2 <- mmes(Yield~1, henderson=TRUE,
random=~vsm(ism(id),Gu=Ai)
+ Rowf + Colf,
rcov=~units,
data=DT)
summary(mix2)$varcomp
vg <- summary(mix2)$varcomp[1,1] # genetic variance
G <- A*vg # genetic variance-covariance
Ci <- mix2$Ci # coefficient matrix
ind <- as.vector(mix2$partitions$`vsm(ism(id), Gu = Ai)`)
ind <- seq(ind[1],ind[2])
Ctt <- Ci[ind,ind] # portion of Ci for genotypes
R2 <- (G - Ctt)/G # reliability matrix
mean(diag(R2)) # average reliability of the trial
####====================####
#### multivariate model ####
#### 2 traits ####
####====================####
traits <- c("color","Yield")
DT[,traits] <- apply(DT[,traits],2,scale)
DTL <- reshape(DT[,c("id", traits)],
idvar = c("id"),
varying = traits,
v.names = "value", direction = "long",
timevar = "trait", times = traits )
DTL <- DTL[with(DTL, order(trait)), ]
head(DTL)
# if using mmes=TRUE you need to provide the inverse
Ai <- solve(A + diag(1e-4,ncol(A),ncol(A)))
Ai <- as(as(as( Ai, "dMatrix"), "generalMatrix"), "CsparseMatrix")
attr(Ai, 'inverse')=TRUE
#### be patient take some time
ansm <- mmes( value ~ trait, # henderson=TRUE,
random=~ vsm(usm(trait), ism(id), Gu=A), # Ai if henderson
rcov=~ vsm(dsm(trait), ism(units)),
data=DTL)
cov2cor(ansm$theta[[1]])
}
######## lme4breeding #########
if(requireNamespace("lme4breeding")){
library(lme4breeding)
mix1 <- lmeb(Yield~ (1|id) + (1|Rowf) + (1|Colf),
relmat=list(id=A),
data=DT)
vc <- VarCorr(mix1); print(vc,comp=c("Variance"))
sigma(mix1)^2 # error variance
# run one last iteration with imputed data
# to make sure you get predictions for every level
DT2 <- DT
DT2$Yield <- imputev(DT2$Yield)
mix2 <- update(mix1, suppressOpt = TRUE,
start=getME(mix1, "theta"),
data=DT2)
predsMix2 <- ranef(mix2)
condVAR <- lapply(predsMix2, function(x){attr(x, which="postVar")}) # take sqrt() for SEs
# if you don't want the imputed vector to have an effect in
# the predictions you can use the getMME function to use
# the extended model and get predictions without including the
# imputed data (I know is a bit messy)
preds <- getMME(object=mix2, # extended model
vc=VarCorr(mix1), # variance components
recordsToUse = which(!is.na(DT$Yield)) # records to use for MME
)
# now you could compare between both types of predictions, the last ones are in
# theory the correct ones.
plot(preds$bu[2:364,], predsMix2$id[,1])
}
######## evola #########
if(requireNamespace("evola")){
library(evola)
DT$occ <- 1
DT$Yield <- imputev(DT$Yield)
A <- A[DT$id,DT$id]
# get best 20 individuals weighting variance by ~0.5=(30*pi)/180
res<-evolafit(formula=cbind(Yield, occ)~id, dt= DT,
# constraints: if sum is greater than this ignore
constraintsUB = c(Inf,20),
# constraints: if sum is smaller than this ignore
constraintsLB= c(-Inf,-Inf),
# weight the traits for the selection
b = c(1,0),
# population parameters
nCrosses = 100, nProgeny = 10,
recombGens=1, nChr=1, mutRateAllele=0,
# coancestry parameters
D=A, lambda= (30*pi)/180 , nQtlStart = 20,
# selection parameters
propSelBetween = 0.5, propSelWithin =0.5,
nGenerations = 40)
Q <- pullQtlGeno(res$pop, simParam = res$simParam, trait=1); Q <- Q/2
best = bestSol(res$pop)[,"fitness"];best
qa = (Q %*% DT$Yield)[best,]; qa
qDq = Q[best,] %*% A %*% Q[best,]; qDq
sum(Q[best,]) # total # of inds selected
evolmonitor(res)
plot(DT$Yield, col=as.factor(Q[best,]),
pch=(Q[best,]*19)+1)
pareto(res)
}
############ end ############
Simulated phenotypic records for a set of 100 environments and 200 entries.
Description
This is a simulated dataset that aims to show how to pick the most representative locations and how to optimize sparse testing allocation.
Usage
data("DT_envs")
Format
The format is: chr "DT_envs"
Source
This data was simulated for experimental design applications.
References
Giovanny Covarrubias-Pazaran (2024). evola: a simple evolutionary algorithm for complex problems. To be submitted to Bioinformatics.
Gaynor, R. Chris, Gregor Gorjanc, and John M. Hickey. 2021. AlphaSimR: an R package for breeding program simulations. G3 Gene|Genomes|Genetics 11(2):jkaa017. https://doi.org/10.1093/g3journal/jkaa017.
Chen GK, Marjoram P, Wall JD (2009). Fast and Flexible Simulation of DNA Sequence Data. Genome Research, 19, 136-142. http://genome.cshlp.org/content/19/1/136.
Examples
data(DT_envs)
# scale passport data for environments
S=scale(W)
# build a relationship matrix between environments
G=tcrossprod(S)/ncol(S)
G[1:4,1:4]
# build the dataset for the genetic algorithm
DT <- data.frame(cov=rnorm(nrow(G)), occ=1, id=rownames(G))
nLocs=10 # desired number of representative locations
if(requireNamespace("evola")){
library(evola)
res<-evolafit(formula=cbind(cov, occ)~id, dt= DT,
# constraints: if sum is greater than this ignore
constraintsUB = c(Inf, Inf),
# constraints: if sum is smaller than this ignore
constraintsLB= c(-Inf, nLocs),
# weight the traits for the selection
b = c(1,0),
# population parameters
nCrosses = 100, nProgeny = 20,
recombGens=1, nChr=1, mutRateAllele=0,
# coancestry parameters
D=G, lambda= 0 , nQtlStart = nLocs,
# selection parameters
fitnessf = inbFun, selectTop = FALSE,
# fixNumQtlPerInd = TRUE,
# propSelBetween = 0.5, propSelWithin =0.5,
nGenerations = 50, verbose = TRUE)
Q <- pullQtlGeno(res$pop, simParam = res$simParam, trait=1); Q <- Q/2
best <- bestSol(res$pop, selectTop = FALSE)[,"fitness"]
solution <- Q[best,]
sum(solution) # total # of inds selected
names(solution[which(solution>0)])
evolmonitor(res)
pareto(res)
svdW <- svd(G, nu = 2, nv = 2)
PCW <- G %*% svdW$v
rownames(PCW) <- rownames(G)
plot(PCW[,1], PCW[,2], col = (solution*2)+2,
pch=(solution*14)+4, xlab = "pc1", ylab = "pc2")
labs <- rownames(PCW)
labs[which(solution==0)]=""
text(x=PCW[,1], y=PCW[,2], labels=labs, cex=0.5, pos=3)
}
Broad sense heritability calculation.
Description
This dataset contains phenotpic data for 41 potato lines evaluated in 3 environments in an RCBD design. The phenotypic trait is tuber quality and we show how to obtain an estimate of DT_example for the trait.
Usage
data("DT_example")
Format
The format is: chr "DT_example"
Source
This data was generated by a potato study.
References
Covarrubias-Pazaran G (2016) Genome assisted prediction of quantitative traits using the R package sommer. PLoS ONE 11(6): doi:10.1371/journal.pone.0156744
Giovanny Covarrubias-Pazaran (2024). lme4breeding: enabling genetic evaluation in the age of genomic data. To be submitted to Bioinformatics.
Douglas Bates, Martin Maechler, Ben Bolker, Steve Walker (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1-48.
Examples
data(DT_example)
DT <- DT_example
A <- A_example
head(DT)
################## sommer #####################
if(requireNamespace("sommer")){
library(sommer)
## Compound simmetry (CS) model
ans1 <- mmes(Yield~Env,
random= ~ Name + Env:Name,
rcov= ~ units,
data=DT)
summary(ans1)$varcomp
####===========================================####
#### Univariate heterogeneous variance models ####
####===========================================####
## Compound simmetry (CS) + Diagonal (DIAG) model
DT=DT[with(DT, order(Env)), ]
ans2 <- mmes(Yield~Env,
random= ~Name + vsm(dsm(Env),ism(Name)),
rcov= ~ vsm(dsm(Env),ism(units)),
data=DT)
summary(ans2)
####===========================================####
#### Multi-trait variance models ####
####===========================================####
# stack traits
traits <- c("Yield","Weight")
DT[,traits] <- apply(DT[,traits],2,scale)
DTL <- reshape(DT[,c("Name","Env","Block", traits)],
idvar = c("Name","Env","Block"),
varying = traits,
v.names = "value", direction = "long",
timevar = "trait", times = traits )
DTL <- DTL[with(DTL, order(trait,Env)), ]
head(DTL)
## model
ans1 <- mmes(value~ trait,
random= ~ vsm(usm(trait), ism(Name)),
rcov= ~ vsm(dsm(trait), ism(units)),
data=DTL)
summary(ans1)$varcomp
cov2cor(ans1$theta$`vsm(usm(trait), ism(Name`)
}
################## lme4breeding #####################
if(requireNamespace("lme4breeding")){
library(lme4breeding)
## Compound simmetry (CS) model
ans1 <- lmeb(Yield~Env + (1|Name) + (1|Env:Name),
data=DT)
vc <- VarCorr(ans1); print(vc,comp=c("Variance"))
BLUP <- ranef(ans1, condVar=TRUE)
condVAR <- lapply(BLUP, function(x){attr(x, which="postVar")}) # take sqrt() for SEs
## Main (M) + Diagonal (DIAG) model
## with relationship matrix
ansCSDG <- lmeb(Yield ~ Env + (Env||Name),
relmat = list(Name = A ),
data=DT)
vc <- VarCorr(ansCSDG); print(vc,comp=c("Variance"))
## Main (M) + Diagonal (DIAG) model
## with diagonal residuals
## with relationship matrix
ansCSDG <- lmeb(Yield ~ Env + (Env||Name) + (0+Env||unitsR),
relmat = list(Name = A ),
data=DT)
vc <- VarCorr(ansCSDG); print(vc,comp=c("Variance"))
sigma(ansCSDG)^2 # error variance
}
Data for different experimental designs
Description
The following data is a list containing data frames for different type of experimental designs relevant in plant breeding:
1) Augmented designs (2 examples)
2) Incomplete block designs (1 example)
3) Split plot design (2 examples)
4) Latin square designs (1 example)
5) North Carolina designs I,II and III
How to fit each is shown at the Examples section. This may help you get introduced to experimental designs relevant to plant breeding. Good luck.
Format
Different based on the design.
Source
Datasets and more detail about them can be found in the agricolae package. Here we just show the datasets and how to analyze them using mixed model packages.
References
Covarrubias-Pazaran G (2016) Genome assisted prediction of quantitative traits using the R package sommer. PLoS ONE 11(6): doi:10.1371/journal.pone.0156744
Giovanny Covarrubias-Pazaran (2024). lme4breeding: enabling genetic evaluation in the age of genomic data. To be submitted to Bioinformatics.
Douglas Bates, Martin Maechler, Ben Bolker, Steve Walker (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1-48.
Examples
data(DT_expdesigns)
DT <- DT_expdesigns
names(DT)
data1 <- DT$au1
head(data1)
## response variable: "yield"
## check indicator: "entryc" ('nc' for all unreplicated, but personal.name for checks)
## blocking factor: "block"
## treatments, personal names for replicated and non-replicated: "trt"
## check no check indicator: "new"
############ sommer ##############
if(requireNamespace("sommer")){
library(sommer)
mix1 <- mmes(yield~entryc,
random=~block+trt,
rcov=~units,
data=data1)
summary(mix1)$varcomp
}
############ lme4breeding ##############
if(requireNamespace("lme4breeding")){
library(lme4breeding)
mix1 <- lmeb(yield~entryc + (1|block)+(1|trt),
data=data1)
vc <- VarCorr(mix1); print(vc,comp=c("Variance"))
sigma(mix1)^2 # error variance
BLUP <- ranef(mix1, condVar=TRUE)
condVAR <- lapply(BLUP, function(x){attr(x, which="postVar")}) # take sqrt() for SEs
}
Full diallel data for corn hybrids
Description
This dataset contains phenotpic data for 36 winter bean hybrids, coming from a full diallel design and evaluated for 9 traits. The column male and female origin columns are included as well.
Usage
data("DT_fulldiallel")
Format
The format is: chr "DT_fulldiallel"
Source
This data was generated by a winter bean study and originally included in the agridat package.
References
Covarrubias-Pazaran G (2016) Genome assisted prediction of quantitative traits using the R package sommer. PLoS ONE 11(6): doi:10.1371/journal.pone.0156744
Giovanny Covarrubias-Pazaran (2024). lme4breeding: enabling genetic evaluation in the age of genomic data. To be submitted to Bioinformatics.
Douglas Bates, Martin Maechler, Ben Bolker, Steve Walker (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1-48.
Examples
data(DT_fulldiallel)
DT <- DT_fulldiallel
head(DT)
############## sommer ################
if(requireNamespace("sommer")){
library(sommer)
mix <- mmes(stems~1, random=~female+male, data=DT)
summary(mix)
}
############## lme4breeding ################
if(requireNamespace("lme4breeding")){
library(lme4breeding)
mix <- lmeb(stems~1 + (1|female)+(1|male),
data=DT)
vc <- VarCorr(mix); print(vc,comp=c("Variance"))
sigma(mix)^2 # error variance
BLUP <- ranef(mix, condVar=TRUE)
condVAR <- lapply(BLUP, function(x){attr(x, which="postVar")}) # take sqrt() for SEs
}
Gryphon data from the Journal of Animal Ecology
Description
This is a dataset that was included in the Journal of animal ecology by Wilson et al. (2010; see references) to help users understand how to use mixed models with animal datasets with pedigree data.
The dataset contains 3 elements:
gryphon; variables indicating the animal, the mother of the animal, sex of the animal, and two quantitative traits named 'BWT' and 'TARSUS'.
pedi; dataset with 2 columns indicating the sire and the dam of the animals contained in the gryphon dataset.
A; additive relationship matrix formed using the 'getA()' function used over the pedi dataframe.
Usage
data("DT_gryphon")
Format
The format is: chr "DT_gryphon"
Source
This data comes from the Journal of Animal Ecology. Please, if using this data cite Wilson et al. publication. If using our mixed model solver please cite Covarrubias' publication.
References
Wilson AJ, et al. (2010) An ecologist's guide to the animal model. Journal of Animal Ecology 79(1): 13-26.
Giovanny Covarrubias-Pazaran (2024). lme4breeding: enabling genetic evaluation in the age of genomic data. To be submitted to Bioinformatics.
Douglas Bates, Martin Maechler, Ben Bolker, Steve Walker (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1-48.
Examples
data(DT_gryphon)
DT <- DT_gryphon
A <- A_gryphon
P <- P_gryphon
#### look at the data
head(DT)
############## sommer #################
if(requireNamespace("sommer")){
library(sommer)
mix1 <- mmes(BWT~1,
random=~vsm(ism(ANIMAL),Gu=A),
rcov=~units,
data=DT)
summary(mix1)$varcomp
## mmes algorithm uses the inverse of the relationship matrix
## if you select henderson=TRUE
Ai <- solve(A + diag(1e-4,ncol(A),ncol(A)))
Ai <- as(as(as( Ai, "dMatrix"), "generalMatrix"), "CsparseMatrix")
attr(Ai, 'inverse')=TRUE
####====================####
#### multivariate model ####
#### 2 traits ####
####====================####
head(DT)
traits <- c("BWT","TARSUS")
DT[,traits] <- apply(DT[,traits],2,scale)
DTL <- reshape(DT[,c("ANIMAL","MOTHER","BYEAR","SEX", traits)],
idvar = c("ANIMAL","MOTHER","BYEAR","SEX"),
varying = traits,
v.names = "value", direction = "long",
timevar = "trait", times = traits )
DTL <- DTL[with(DTL, order(trait,ANIMAL)), ]
head(DTL)
# #### fit the multivariate model with no fixed effects (intercept only)
mix2 <- mmes(value~trait, # henderson=TRUE,
random=~vsm(usm(trait),ism(ANIMAL),Gu=A),
rcov=~vsm(dsm(trait),ism(units)),
data=DTL)
summary(mix2)$varcomp
cov2cor(mix2$theta[[1]])
}
############## lme4breeding #################
if(requireNamespace("lme4breeding")){
library(lme4breeding)
## fit the model with no fixed effects (intercept only)
mix1 <- lmeb(BWT~ (1|ANIMAL),
relmat = list(ANIMAL=A),
data=DT)
vc <- VarCorr(mix1); print(vc,comp=c("Variance"))
sigma(mix1)^2 # error variance
BLUP <- ranef(mix1, condVar=TRUE)
condVAR <- lapply(BLUP, function(x){attr(x, which="postVar")}) # take sqrt() for SEs
### multi-trait model
traits <- c("BWT","TARSUS")
for(iTrait in traits){DT[,iTrait] <- scale(imputev(DT[,iTrait]))}
DTL <- reshape(DT[,c("ANIMAL", traits)], idvar = "ANIMAL", varying = traits,
v.names = "value", direction = "long",
timevar = "trait", times = traits )
DTL <- DTL[with(DTL, order(trait)), ]
head(DTL)
system.time(
mix <- lmeb(value ~ (0+trait|ANIMAL),
relmat = list(ANIMAL=A),
# rotation = TRUE,
data=DTL)
)
vc <- VarCorr(mix); print(vc,comp=c("Variance"))
cov2cor(vc$ANIMAL)
sigma(mix)^2 # error variance
}
Broad sense heritability calculation.
Description
This dataset contains phenotpic data for 41 potato lines evaluated in 5 locations across 3 years in an RCBD design. The phenotypic trait is tuber quality and we show how to obtain an estimate of DT_h2 for the trait.
Usage
data("DT_h2")
Format
The format is: chr "DT_h2"
Source
This data was generated by a potato study.
References
Covarrubias-Pazaran G (2016) Genome assisted prediction of quantitative traits using the R package sommer. PLoS ONE 11(6): doi:10.1371/journal.pone.0156744
Giovanny Covarrubias-Pazaran (2024). lme4breeding: enabling genetic evaluation in the age of genomic data. To be submitted to Bioinformatics.
Douglas Bates, Martin Maechler, Ben Bolker, Steve Walker (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1-48.
Examples
data(DT_h2)
DT <- DT_h2
head(DT)
############### sommer ################
if(requireNamespace("sommer")){
library(sommer)
DT=DT[with(DT, order(Env)), ]
ans1 <- mmes(y~Env, henderson=TRUE,
random=~vsm(dsm(Env),ism(Name)) + vsm(dsm(Env),ism(Block)),
rcov=~vsm(dsm(Env),ism(units)),
data=DT)
summary(ans1)$varcomp
}
############### lme4breeding ################
if(requireNamespace("lme4breeding")){
library(lme4breeding)
# model with:
# Main+Diagonal for Name
# Diagonal for Block
# Diagonal for residuals
DT=DT[with(DT, order(Env)), ]
ans1b <- lmeb(y ~ Env + (Env||Name) + (0+Env||Block) + (0+Env||unitsR),
verbose = 1L,
data=DT)
vc <- VarCorr(ans1b); print(vc,comp=c("Variance"))
sigma(ans1b)^2 # error variance
BLUP <- ranef(ans1b, condVar=TRUE)
condVAR <- lapply(BLUP, function(x){attr(x, which="postVar")}) # take sqrt() for SEs
}
############### end ################
half diallel data for corn hybrids
Description
This dataset contains phenotpic data for 21 corn hybrids, with 2 technical repetitions, coming from a half diallel design and evaluated for sugar content. The column geno indicates the hybrid and male and female origin columns are included as well.
Usage
data("DT_halfdiallel")
Format
The format is: chr "DT_halfdiallel"
Source
This data was generated by a corn study.
References
Covarrubias-Pazaran G (2016) Genome assisted prediction of quantitative traits using the R package sommer. PLoS ONE 11(6): doi:10.1371/journal.pone.0156744
Giovanny Covarrubias-Pazaran (2024). lme4breeding: enabling genetic evaluation in the age of genomic data. To be submitted to Bioinformatics.
Douglas Bates, Martin Maechler, Ben Bolker, Steve Walker (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1-48.
Examples
data("DT_halfdiallel")
DT <- DT_halfdiallel
head(DT)
DT$femalef <- as.factor(DT$female)
DT$malef <- as.factor(DT$male)
DT$genof <- as.factor(DT$geno)
############# sommer #############
if(requireNamespace("sommer")){
library(sommer)
A <- diag(7); colnames(A) <- rownames(A) <- 1:7;A # if you want to provide a covariance matrix
#### model using overlay
modh <- mmes(sugar~1,
random=~vsm(ism(overlay(femalef,malef, sparse = FALSE)), Gu=A)
+ genof,
data=DT)
summary(modh)$varcomp
# if ussing mmes=TRUE provide Gu with inverses and give more iterations
# Ai <- solve(A + diag(1e-4,ncol(A),ncol(A)))
# Ai <- as(as(as( Ai, "dMatrix"), "generalMatrix"), "CsparseMatrix")
# attr(Ai, 'inverse')=TRUE
}
############# lme4breeding #############
if(requireNamespace("lme4breeding")){
library(lme4breeding)
# overlay matrix to be added to the addmat argument
Z <- with(DT, overlay(femalef,malef) )
## model using overlay without relationship matrix
modh <- lmeb(sugar ~ (1|genof) + (1|fema),
addmat = list(fema=Z),
data=DT)
vc <- VarCorr(modh); print(vc,comp=c("Variance"))
## model using overlay with relationship matrix
## relationship matrix to be added to the relmat argument
A <- diag(7); colnames(A) <- rownames(A) <- 1:7;A
modh <- lmeb(sugar ~ (1|genof) + (1|fema),
addmat = list(fema=Z),
relmat = list(fema=A),
data=DT)
vc <- VarCorr(modh); print(vc,comp=c("Variance"))
sigma(modh)^2 # error variance
BLUP <- ranef(modh, condVar=TRUE)
condVAR <- lapply(BLUP, function(x){attr(x, which="postVar")}) # take sqrt() for SEs
}
############# end #############
Data to fit indirect genetic effects.
Description
This dataset contains phenotpic data for 98 individuals where they are measured with the purpose of identifying the effect of the neighbour in a focal individual.
Usage
data("DT_ige")
Format
The format is: chr "DT_ige"
Source
This data was masked from a shared study.
References
Giovanny Covarrubias-Pazaran (2024). lme4breeding: enabling genetic evaluation in the age of genomic data. To be submitted to Bioinformatics.
Douglas Bates, Martin Maechler, Ben Bolker, Steve Walker (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1-48.
Examples
data(DT_ige)
DT <- DT_ige
########################################
############### sommer #################
########################################
if(requireNamespace("sommer")){
library(sommer)
# Indirect genetic effects model without covariance between DGE and IGE
modIGE <- mmes(trait ~ block, dateWarning = FALSE,
random = ~ focal + neighbour,
rcov = ~ units, nIters=100,
data = DT)
summary(modIGE)$varcomp
pmonitor(modIGE)
# Indirect genetic effects model with covariance between DGE and IGE using relationship matrices
modIGE <- mmes(trait ~ block, dateWarning = FALSE,
random = ~ covm( vsm(ism(focal)), vsm(ism(neighbour)) ),
rcov = ~ units, nIters=100,
data = DT)
summary(modIGE)$varcomp
pmonitor(modIGE)
# form relationship matrix
Ai <- solve(A_ige + diag(1e-5, nrow(A_ige),nrow(A_ige) ))
Ai <- as(as(as( Ai, "dMatrix"), "generalMatrix"), "CsparseMatrix")
attr(Ai, 'inverse')=TRUE
# Indirect genetic effects model with covariance between DGE and IGE using relationship matrices
modIGE <- mmes(trait ~ block, dateWarning = FALSE,
random = ~ covm( vsm(ism(focal), Gu=Ai), vsm(ism(neighbour), Gu=Ai) ),
rcov = ~ units, nIters=100,
data = DT)
summary(modIGE)$varcomp
pmonitor(modIGE)
}
##############################################
############### lme4breeding #################
##############################################
if(requireNamespace("lme4breeding")){
library(lme4breeding)
# Indirect genetic effects model without covariance between DGE and IGE
modIGE <- lmeb(trait ~ block + (1|focal) + (1|neighbour),
data = DT)
vc <- VarCorr(modIGE); print(vc,comp=c("Variance"))
## Add relationship matrices
A_ige <- A_ige + diag(1e-4, ncol(A_ige), ncol(A_ige) )
modIGE <- lmeb(trait ~ block + (1|focal) + (1|neighbour),
relmat = list(focal=A_ige,
neighbour=A_ige),
data = DT)
vc <- VarCorr(modIGE); print(vc,comp=c("Variance"))
## Indirect genetic effects model with covariance between DGE and IGE using relationship matrices
## Relationship matrix
A_ige <- A_ige + diag(1e-4, ncol(A_ige), ncol(A_ige) )
## Define 2 dummy variables to make a fake covariance
## for two different random effects
DT$fn <- DT$nn <- 1
## Create the incidence matrix for the first random effect
Zf <- Matrix::sparse.model.matrix( ~ focal-1, data=DT )
colnames(Zf) <- gsub("focal","", colnames(Zf))
## Create the incidence matrix for the second random effect
Zn <- Matrix::sparse.model.matrix( ~ neighbour-1, data=DT )
colnames(Zn) <- gsub("neighbour","", colnames(Zn))
## Fit the model
modIGE <- lmeb(trait ~ block + (0+fn+nn|both),
addmat = list(both=list(Zf,Zn)),
relmat = list(both=A_ige),
data = DT)
vc <- VarCorr(modIGE); print(vc,comp=c("Variance"))
sigma(modIGE)^2 # error variance
blups <- ranef(modIGE)
condVAR <- lapply(blups, function(x){attr(x, which="postVar")}) # take sqrt() for SEs
pairs(blups$both)
cov2cor(vc$both)
}
########## end ###########
Simulated data for random regression
Description
A data frame with 4 columns; SUBJECT, X, Xf and Y to show how to use the Legendre polynomials in the lmeb function using a numeric variable X and a response variable Y.
Usage
data("DT_legendre")
Format
The format is: chr "DT_legendre"
Source
This data was simulated for fruit breeding applications.
References
Giovanny Covarrubias-Pazaran (2024). lme4breeding: enabling genetic evaluation in the age of genomic data. To be submitted to Bioinformatics.
Douglas Bates, Martin Maechler, Ben Bolker, Steve Walker (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1-48.
Examples
data(DT_legendre)
DT <- DT_legendre
head(DT)
########################################
############### sommer #################
########################################
if(requireNamespace("sommer")){
library(sommer)
library(orthopolynom)
mRR2<-mmes(Y~ 1 + Xf
, random=~ vsm(usm(leg(X,1)),ism(SUBJECT))
, rcov=~units
, data=DT)
summary(mRR2)$varcomp
}
##############################################
############### lme4breeding #################
##############################################
if(requireNamespace("lme4breeding")){
library(lme4breeding)
library(orthopolynom)
Z <- with(DT, smm(leg(X,1)) )
for(i in 1:ncol(Z)){DT[,colnames(Z)[i]] <- Z[,i]}
## diagonal random regression Y ~ Xf + (0+leg0+leg1|| SUBJECT)
## unstructured random regression Y ~ Xf + (0+leg0+leg1| SUBJECT)
mRR2b<-lmeb(Y ~ Xf + (0+leg0+leg1| SUBJECT),
, data=DT)
vc <- VarCorr(mRR2b); print(vc,comp=c("Variance"))
sigma(mRR2b)^2 # error variance
BLUP <- ranef(mRR2b, condVar=TRUE)
condVAR <- lapply(BLUP, function(x){attr(x, which="postVar")}) # take sqrt() for SEs
}
Full diallel data for corn hybrids
Description
This dataset contains phenotpic data for 36 winter bean hybrids, coming from a full diallel design and evaluated for 9 traits. The column male and female origin columns are included as well.
Usage
data("DT_mohring")
Format
The format is: chr "DT_mohring"
Source
This data was generated by a winter bean study and originally included in the agridat package.
References
Covarrubias-Pazaran G (2016) Genome assisted prediction of quantitative traits using the R package sommer. PLoS ONE 11(6): doi:10.1371/journal.pone.0156744
Giovanny Covarrubias-Pazaran (2024). lme4breeding: enabling genetic evaluation in the age of genomic data. To be submitted to Bioinformatics.
Douglas Bates, Martin Maechler, Ben Bolker, Steve Walker (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1-48.
Examples
data(DT_mohring)
DT <- DT_mohring
head(DT)
DT2 <- add.diallel.vars(DT,par1="Par1", par2="Par2")
head(DT2)
## is.cross denotes a hybrid (1)
## is.self denotes an inbred (1)
## cross.type denotes one way (-1, e.g. AxB) and reciprocal (1, e.g., BxA) and no cross (0)
## cross.id denotes the name of the cross (same name for direct & reciprocal)
##############################################
############### sommer. #################
##############################################
if(requireNamespace("sommer")){
library(sommer)
# GRIFFING MODEL 2 with reciprocal effects ###########################
mod1hb <- mmes(Ftime ~ 1, data=DT2,
random = ~ Block
# GCA male & female overlayed
+ vsm(ism(overlay(Par1, Par2)))
# SCA effects (includes cross and selfs)
+ cross.id
# SCAR reciprocal effects (remaining variance in crosses;
# if zero there's no reciprocal effects)
+ vsm(dsm(cross.type), ism(cross.id)) )
summary(mod1hb)$varcomp
## VarComp VarCompSE Zratio
## Block.Ftime-Ftime 0.00000 9.32181 0.000000
## overlay(Par1, Par2).Ftime-Ftime 1276.73089 750.17269 1.701916
## cross.id.Ftime-Ftime 1110.99090 330.16921 3.364914
## cross.id:cross.type.Ftime-Ftime 66.02295 49.26876 1.340057
## units.Ftime-Ftime 418.47949 74.56442 5.612321
##
# GRIFFING MODEL 2, no reciprocal effects ##############################
mod1h <- mmes(Ftime ~ Block + is.cross, data=DT2, nIters = 50,
random = ~
# GCA effects for all (hybrids and inbreds)
vsm(ism(overlay(Par1, Par2)))
# GCA effect (calculated only in hybrids; remaining variance)
+ vsm(ism(is.cross),ism(overlay(Par1, Par2)))
# SCA effect (calculated in hybrids only)
+ vsm(ism(is.cross), ism(cross.id))
)
summary(mod1h)$varcomp
## VarComp VarCompSE Zratio
## overlay(Par1, Par2).Ftime-Ftime 2304.1781 1261.63193 1.826347
## overlay(Par1, Par2):is.cross.Ftime-Ftime 613.6040 402.74347 1.523560
## cross.id:is.cross.Ftime-Ftime 340.7030 148.56225 2.293335
## units.Ftime-Ftime 501.6275 74.36075 6.745864
##
# GRIFFING MODEL 3, no reciprocal effects ###############################
mod1h <- mmes(Ftime ~ Block + is.cross, data=DT2, nIters = 100,
random = ~
# GCAC (only for hybrids)
vsm(ism(is.cross),ism(overlay(Par1, Par2)))
# male GCA (only for inbreds)
+ vsm(ism(is.self),ism(Par1))
# SCA (for hybrids only)
+ vsm(ism(is.cross), ism(cross.id))
)
summary(mod1h)$varcomp
## VarComp VarCompSE Zratio
## overlay(Par1, Par2):is.cross.Ftime-Ftime 927.7895 537.91218 1.724797
## Par1:is.self.Ftime-Ftime 9960.9247 5456.58188 1.825488
## cross.id:is.cross.Ftime-Ftime 341.4567 148.53667 2.298804
## units.Ftime-Ftime 498.5974 73.92066 6.745035
##
# GRIFFING MODEL 2, with reciprocal effects #############################
# In Mohring: mixed model 3 reduced
mod1h <- mmes(Ftime ~ Block + is.cross, data=DT2, nIters = 100,
random = ~
# GCAC (for hybrids only)
vsm(ism(is.cross),ism(overlay(Par1, Par2)))
# male GCA (for selfs only)
+ vsm(ism(is.self),ism(Par1))
# SCA (for hybrids only)
+ vsm(ism(is.cross), ism(cross.id))
# SCAR reciprocal effects (remaning SCA variance)
+ vsm(ism(cross.type), ism(cross.id))
)
summary(mod1h)$varcomp
## VarComp VarCompSE Zratio
## overlay(Par1, Par2):is.cross.Ftime-Ftime 927.78742 537.89981 1.724833
## Par1:is.self.Ftime-Ftime 10001.78854 5456.47578 1.833013
## cross.id:is.cross.Ftime-Ftime 361.89712 148.54264 2.436318
## cross.id:cross.type.Ftime-Ftime 66.43695 49.24492 1.349113
## units.Ftime-Ftime 416.82960 74.27202 5.612203
##
# GRIFFING MODEL 3, with RGCA + RSCA ####################################
# In Mohring: mixed model 3
mod1h <- mmes(Ftime ~ Block + is.cross, data=DT2,nIters = 100,
random = ~
# GCAC (for hybrids only)
vsm(ism(is.cross),ism(overlay(Par1, Par2)))
# RGCA: exclude selfs (to identify reciprocal GCA effects)
+ vsm(ism(cross.type),ism(overlay(Par1, Par2)))
# male GCA (for selfs only)
+ vsm(ism(is.self),ism(Par1))
# SCA (for hybrids only)
+ vsm(ism(is.cross), ism(cross.id))
# SCAR: exclude selfs (if zero there's no reciprocal SCA effects)
+ vsm(ism(cross.type), ism(cross.id))
)
summary(mod1h)$varcomp
## VarComp VarCompSE Zratio
## overlay(Par1, Par2):is.cross.Ftime-Ftime 927.7843 537.88164 1.7248857
## Par1:is.self.Ftime-Ftime 10001.7570 5456.30125 1.8330654
## cross.id:is.cross.Ftime-Ftime 361.8958 148.53670 2.4364068
## overlay(Par1, Par2):cross.type.Ftime-Ftime 17.9799 19.92428 0.9024114
## cross.id:cross.type.Ftime-Ftime 30.9519 46.43908 0.6665054
## units.Ftime-Ftime
}
##############################################
############### lme4breeding #################
##############################################
if(requireNamespace("lme4breeding")){
library(lme4breeding)
## GRIFFING MODEL 2 with reciprocal effects ###########################
## overlay matrix to be added to the addmat argument
Z <- with(DT, overlay(Par1, Par2) )
fema <- (rep(colnames(Z), nrow(Z)))[1:nrow(Z)]
mod1h <- lmeb(Ftime ~ 1 + (1|Block) +
## GCA male & female overlayed
(1|fema) +
## SCA effects (includes cross and selfs)
(cross.type||cross.id),
addmat=list(fema=Z),
data=DT2 )
vc <- VarCorr(mod1h); print(vc,comp=c("Variance"))
BLUP <- ranef(mod1h, condVar=TRUE)
condVAR <- lapply(BLUP, function(x){attr(x, which="postVar")}) # take sqrt() for SEs
## VarComp VarCompSE Zratio
## Block.Ftime-Ftime 0.00000 9.32181 0.000000
## overlay(Par1, Par2).Ftime-Ftime 1276.73089 750.17269 1.701916
## cross.id.Ftime-Ftime 1110.99090 330.16921 3.364914
## cross.id:cross.type.Ftime-Ftime 66.02295 49.26876 1.340057
## units.Ftime-Ftime 418.47949 74.56442 5.612321
##
## GRIFFING MODEL 2, no reciprocal effects ##############################
mod1h <- lmeb(Ftime ~ Block + is.cross +
## GCA male & female overlayed main and rr
## GCA effect (calculated only in hybrids; remaining variance)
(is.cross||fema) +
## SCA effect (calculated in hybrids only)
(0+is.cross||cross.id),
addmat=list(fema=Z),
data=DT2 )
vc <- VarCorr(mod1h); print(vc,comp=c("Variance"))
## VarComp VarCompSE Zratio
## overlay(Par1, Par2).Ftime-Ftime 2304.1781 1261.63193 1.826347
## overlay(Par1, Par2):is.cross.Ftime-Ftime 613.6040 402.74347 1.523560
## cross.id:is.cross.Ftime-Ftime 340.7030 148.56225 2.293335
## units.Ftime-Ftime 501.6275 74.36075 6.745864
##
## GRIFFING MODEL 3, no reciprocal effects ###############################
mod1h <- lmeb(Ftime ~ Block + is.cross +
## GCAC (only for hybrids)
(0+is.cross||fema) +
## male GCA (only for inbreds)
(0+is.self||Par1) +
## SCA (for hybrids only)
(0+is.cross||cross.id),
addmat=list(fema=Z),
data=DT2 )
vc <- VarCorr(mod1h); print(vc,comp=c("Variance"))
## VarComp VarCompSE Zratio
## overlay(Par1, Par2):is.cross.Ftime-Ftime 927.7895 537.91218 1.724797
## Par1:is.self.Ftime-Ftime 9960.9247 5456.58188 1.825488
## cross.id:is.cross.Ftime-Ftime 341.4567 148.53667 2.298804
## units.Ftime-Ftime 498.5974 73.92066 6.745035
##
## GRIFFING MODEL 2, with reciprocal effects #############################
## In Mohring: mixed model 3 reduced
mod1h <- lmeb(Ftime ~ Block + is.cross +
## GCAC (for hybrids only)
(0+is.cross||fema) +
## male GCA (for selfs only)
(0+is.self||Par1) +
## SCA (for hybrids only)
(0+is.cross||cross.id)+
## SCAR reciprocal effects (remaning SCA variance)
(0+cross.type||cross.id),
addmat=list(fema=Z),
data=DT2 )
vc <- VarCorr(mod1h); print(vc,comp=c("Variance"))
## VarComp VarCompSE Zratio
## overlay(Par1, Par2):is.cross.Ftime-Ftime 927.78742 537.89981 1.724833
## Par1:is.self.Ftime-Ftime 10001.78854 5456.47578 1.833013
## cross.id:is.cross.Ftime-Ftime 361.89712 148.54264 2.436318
## cross.id:cross.type.Ftime-Ftime 66.43695 49.24492 1.349113
## units.Ftime-Ftime 416.82960 74.27202 5.612203
##
## GRIFFING MODEL 3, with RGCA + RSCA ####################################
## In Mohring: mixed model 3
mod1h <- lmeb(Ftime ~ Block + is.cross +
## GCAC (for hybrids only)
(0+is.cross||fema) +
## RGCA: exclude selfs (to identify reciprocal GCA effects)
(0+cross.type||fema) +
## male GCA (for selfs only)
(0+is.self||Par1) +
## SCA (for hybrids only)
(0+is.cross||cross.id)+
## SCAR reciprocal effects (remaning SCA variance)
(0+cross.type||cross.id),
addmat=list(fema=Z),
data=DT2 )
vc <- VarCorr(mod1h); print(vc,comp=c("Variance"))
## VarComp VarCompSE Zratio
## overlay(Par1, Par2):is.cross.Ftime-Ftime 927.7843 537.88164 1.7248857
## Par1:is.self.Ftime-Ftime 10001.7570 5456.30125 1.8330654
## cross.id:is.cross.Ftime-Ftime 361.8958 148.53670 2.4364068
## overlay(Par1, Par2):cross.type.Ftime-Ftime 17.9799 19.92428 0.9024114
## cross.id:cross.type.Ftime-Ftime 30.9519 46.43908 0.6665054
## units.Ftime-Ftime 416.09922 447.2101 0.93043333
}
Genotypic and Phenotypic data for a potato polyploid population
Description
This dataset contains phenotpic data for 18 traits measured in 187 individuals from a potato diversity panel. In addition contains genotypic data for 221 individuals genotyped with 3522 SNP markers. Please if using this data for your own research make sure you cite Rosyara's (2015) publication (see References).
Usage
data("DT_polyploid")
Format
The format is: chr "DT_polyploid"
Source
This data was extracted from Rosyara (2016).
References
If using this data for your own research please cite:
Rosyara Umesh R., Walter S. De Jong, David S. Douches, Jeffrey B. Endelman. Software for genome-wide association studies in autopolyploids and its application to potato. The Plant Genome 2015.
Other references:
Covarrubias-Pazaran G (2016) Genome assisted prediction of quantitative traits using the R package sommer. PLoS ONE 11(6): doi:10.1371/journal.pone.0156744
Giovanny Covarrubias-Pazaran (2024). lme4breeding: enabling genetic evaluation in the age of genomic data. To be submitted to Bioinformatics.
Douglas Bates, Martin Maechler, Ben Bolker, Steve Walker (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1-48.
Examples
data(DT_polyploid)
DT <- DT_polyploid
GT <- GT_polyploid
MP <- MP_polyploid
## convert markers to numeric format
numo <- atcg1234(data=GT[,1:100], ploidy=4);
numo$M[1:5,1:5];
numo$ref.allele[,1:5]
## plants with both genotypes and phenotypes
common <- intersect(DT$Name,rownames(numo$M))
## get the markers and phenotypes for such inds
marks <- numo$M[common,]; marks[1:5,1:5]
DT2 <- DT[match(common,DT$Name),];
DT2 <- as.data.frame(DT2)
DT2[1:5,]
## Additive relationship matrix, specify ploidy
A <- A.matr(marks)
##############################################
############### sommer. #################
##############################################
if(requireNamespace("sommer")){
library(sommer)
A <- A.mat(marks)
D <- D.mat(marks)
###=========================================####
### run as mixed model
###=========================================####
ans <- mmes(tuber_shape~1,
random=~vsm(ism(Name), Gu=A),
data=DT2)
summary(ans)$varcomp
# if using mmes=TRUE provide Gu as inverse
Ai <- solve(A + diag(1e-4,ncol(A),ncol(A)))
Ai <- as(as(as( Ai, "dMatrix"), "generalMatrix"), "CsparseMatrix")
attr(Ai, 'inverse')=TRUE
}
##############################################
############### lme4breeding #################
##############################################
if(requireNamespace("lme4breeding")){
library(lme4breeding)
## run as mixed model
A <- A + diag(1e-4,ncol(A),ncol(A))
ans <- lmeb(tuber_shape~ (1|Name),
relmat = list(Name=A),
data=DT2)
vc <- VarCorr(ans); print(vc,comp=c("Variance"))
sigma(ans)^2 # error variance
BLUP <- ranef(ans, condVar=TRUE)
condVAR <- lapply(BLUP, function(x){attr(x, which="postVar")}) # take sqrt() for SEs
}
Rice lines dataset
Description
Information from a collection of 413 rice lines. The DT_rice data set is from Rice Diversity Org. Program. The lines are genotyped with 36,901 SNP markers and phenotyped for more than 30 traits. This data set was included in the package to play with it. If using it for your research make sure you cite the original publication from Zhao et al.(2011).
Usage
data(DT_rice)
Format
RicePheno contains the phenotypes RiceGeno contains genotypes letter code RiceGenoN contains the genotypes in numerical code using atcg1234 converter function
Source
Rice Diversity Organization http://www.ricediversity.org/data/index.cfm.
References
Keyan Zhao, Chih-Wei Tung, Georgia C. Eizenga, Mark H. Wright, M. Liakat Ali, Adam H. Price, Gareth J. Norton, M. Rafiqul Islam, Andy Reynolds, Jason Mezey, Anna M. McClung, Carlos D. Bustamante & Susan R. McCouch (2011). Genome-wide association mapping reveals a rich genetic architecture of complex traits in Oryza sativa. Nat Comm 2:467 DOI: 10.1038/ncomms1467, Published Online 13 Sep 2011.
Covarrubias-Pazaran G (2016) Genome assisted prediction of quantitative traits using the R package sommer. PLoS ONE 11(6): doi:10.1371/journal.pone.0156744
Giovanny Covarrubias-Pazaran (2024). lme4breeding: enabling genetic evaluation in the age of genomic data. To be submitted to Bioinformatics.
Douglas Bates, Martin Maechler, Ben Bolker, Steve Walker (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1-48.
Examples
data(DT_rice)
DT <- DT_rice
GT <- GT_rice
GTn <- GTn_rice
head(DT)
M <- atcg1234(GT)
##############################################
############### sommer. #################
##############################################
if(requireNamespace("sommer")){
library(sommer)
A <- A.mat(M$M)
mix <- mmes(Protein.content~1,
random = ~vsm(ism(geno), Gu=A) + geno,
rcov=~units,
data=DT)
summary(mix)$varcomp
# if using henderson=TRUE provide Gu as inverse
Ai <- solve(A + diag(1e-6,ncol(A),ncol(A)))
Ai <- as(as(as( Ai, "dMatrix"), "generalMatrix"), "CsparseMatrix")
attr(Ai, 'inverse')=TRUE
## MULTI-TRAIT MODEL
## reshape in long format the dataset
traits <- c("Protein.content","Flag.leaf.length")
DTL <- reshape(DT[,c("geno", traits)], idvar = "geno", varying = traits,
v.names = "value", direction = "long",
timevar = "trait", times = traits )
DTL <- DTL[with(DTL, order(trait)), ]
head(DTL)
M <- DTLM <- atcg1234(GT)
A <- A.mat(M$M)
mix <- mmes(value~trait,
random = ~vsm(usm(trait),ism(geno), Gu=A) ,
rcov=~vsm(dsm(trait), ism(units)),
data=DTL)
summary(mix)$varcomp
cov2cor(mix$theta$`vsm(usm(trait), ism(geno), Gu = A`)
}
##############################################
############### lme4breeding #################
##############################################
if(requireNamespace("lme4breeding")){
library(lme4breeding)
### univariate model
A <- A.matr(M$M)
A <- A + diag(1e-4, ncol(A), ncol(A))
mix <- lmeb(Protein.content ~ (1|geno),
relmat = list(geno=A),
data=DT)
vc <- VarCorr(mix); print(vc,comp=c("Variance"))
sigma(mix)^2 # error variance
### multi-trait model
traits <- c("Flowering.time.at.Arkansas" ,"Seed.volume", "Protein.content")
for(iTrait in traits){DT[,iTrait] <- scale(DT[,iTrait])}
DTL <- reshape(DT[,c("geno", traits)], idvar = "geno", varying = traits,
v.names = "value", direction = "long",
timevar = "trait", times = traits )
DTL <- DTL[with(DTL, order(trait, geno)), ]
head(DTL)
system.time(
mix <- lmeb(value ~ (0+trait|geno),
relmat = list(geno=A),
rotation = TRUE,
data=DTL)
)
vc <- VarCorr(mix); print(vc,comp=c("Variance"))
vc$geno
sigma(mix)^2 # error variance
BLUP <- ranef(mix, condVar=TRUE)
condVAR <- lapply(BLUP, function(x){attr(x, which="postVar")}) # take sqrt() for SEs
}
Reaction times in a sleep deprivation study
Description
The average reaction time per day for subjects in a sleep deprivation study. On day 0 the subjects had their normal amount of sleep. Starting that night they were restricted to 3 hours of sleep per night. The observations represent the average reaction time on a series of tests given each day to each subject. Data from sleepstudy to see how lme4 models can be translated in sommer.
Usage
data("DT_sleepstudy")
Format
The format is: chr "DT_sleepstudy"
Source
These data are from the study described in Belenky et al. (2003), for the sleep deprived group and for the first 10 days of the study, up to the recovery period.
References
Covarrubias-Pazaran G (2016) Genome assisted prediction of quantitative traits using the R package sommer. PLoS ONE 11(6): doi:10.1371/journal.pone.0156744
Giovanny Covarrubias-Pazaran (2024). lme4breeding: enabling genetic evaluation in the age of genomic data. To be submitted to Bioinformatics.
Douglas Bates, Martin Maechler, Ben Bolker, Steve Walker (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1-48.
Gregory Belenky et al. (2003) Patterns of performance degradation and restoration during sleep restrictions and subsequent recovery: a sleep dose-response study. Journal of Sleep Research 12, 1-12.
Examples
data(DT_sleepstudy)
DT <- DT_sleepstudy
head(DT)
##############################################
############### sommer. #################
##############################################
if(requireNamespace("sommer")){
library(sommer)
##################################
fm2 <- mmes(Reaction ~ Days,
random= ~ Subject,
data=DT, tolParInv = 1e-6, verbose = FALSE)
summary(fm2)$varcomp
##################################
fm2 <- mmes(Reaction ~ Days,
random= ~ Subject + vsm(ism(Days), ism(Subject)),
data=DT, tolParInv = 1e-6, verbose = FALSE)
summary(fm2)$varcomp
##################################
fm2 <- mmes(Reaction ~ Days,
random= ~ Subject + vsm(ism(Days), ism(Subject)),
data=DT, tolParInv = 1e-6, verbose = FALSE)
summary(fm2)$varcomp
##################################
fm2 <- mmes(Reaction ~ Days,
random= ~ vsm(ism(Days), ism(Subject)),
data=DT, tolParInv = 1e-6, verbose = FALSE)
summary(fm2)$varcomp
}
##############################################
############### lme4breeding #################
##############################################
if(requireNamespace("lme4breeding")){
library(lme4breeding)
##################################
fm1 <- lmeb(Reaction ~ Days + (1 | Subject),
data=DT)
vc <- VarCorr(fm1); print(vc,comp=c("Variance"))
sigma(fm1)^2 # error variance
##################################
fm1 <- lmeb(Reaction ~ Days + (Days || Subject), data=DT)
vc <- VarCorr(fm1); print(vc,comp=c("Variance"))
##################################
fm1 <- lmeb(Reaction ~ Days + (Days | Subject), data=DT)
vc <- VarCorr(fm1); print(vc,comp=c("Variance"))
##################################
fm1 <- lmeb(Reaction ~ Days + (0 + Days | Subject), data=DT)
vc <- VarCorr(fm1); print(vc,comp=c("Variance"))
BLUP <- ranef(fm1, condVar=TRUE)
condVAR <- lapply(BLUP, function(x){attr(x, which="postVar")}) # take sqrt() for SEs
}
Genotypic and Phenotypic data from single cross hybrids (Technow et al.,2014)
Description
This dataset contains phenotpic data for 2 traits measured in 1254 single cross hybrids coming from the cross of Flint x Dent heterotic groups. In addition contains the genotipic data (35,478 markers) for each of the 123 Dent lines and 86 Flint lines. The purpose of this data is to demosntrate the prediction of unrealized crosses (9324 unrealized crosses, 1254 evaluated, total 10578 single crosses). We have added the additive relationship matrix (A) but can be easily obtained using the A.matr function on the marker data. Please if using this data for your own research cite Technow et al. (2014) publication (see References).
Usage
data("DT_technow")
Format
The format is: chr "DT_technow"
Source
This data was extracted from Technow et al. (2014).
References
If using this data for your own research please cite:
Technow et al. 2014. Genome properties and prospects of genomic predictions of hybrid performance in a Breeding program of maize. Genetics 197:1343-1355.
Covarrubias-Pazaran G (2016) Genome assisted prediction of quantitative traits using the R package sommer. PLoS ONE 11(6): doi:10.1371/journal.pone.0156744
Giovanny Covarrubias-Pazaran (2024). lme4breeding: enabling genetic evaluation in the age of genomic data. To be submitted to Bioinformatics.
Douglas Bates, Martin Maechler, Ben Bolker, Steve Walker (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1-48.
Giovanny Covarrubias-Pazaran (2024). evola: a simple evolutionary algorithm for complex problems. To be submitted to Bioinformatics.
Gaynor, R. Chris, Gregor Gorjanc, and John M. Hickey. 2021. AlphaSimR: an R package for breeding program simulations. G3 Gene|Genomes|Genetics 11(2):jkaa017. https://doi.org/10.1093/g3journal/jkaa017.
Chen GK, Marjoram P, Wall JD (2009). Fast and Flexible Simulation of DNA Sequence Data. Genome Research, 19, 136-142. http://genome.cshlp.org/content/19/1/136.
Examples
data(DT_technow)
DT <- DT_technow
Md <- apply(Md_technow,2,as.numeric)
rownames(Md) <- rownames(Md_technow)
Mf <- apply(Mf_technow,2,as.numeric)
rownames(Mf) <- rownames(Mf_technow)
Md <- (Md*2) - 1
Mf <- (Mf*2) - 1
Ad <- A.matr(Md)
Af <- A.matr(Mf)
Ad <- Ad + diag(1e-4, ncol(Ad), ncol(Ad))
Af <- Af + diag(1e-4, ncol(Af), ncol(Af))
################# sommer ####################
if(requireNamespace("sommer")){
library(sommer)
ans2 <- mmes(GY~1,
random=~vsm(ism(dent),Gu=Ad) + vsm(ism(flint),Gu=Af),
rcov=~units,
data=DT)
summary(ans2)$varcomp
Adi <- solve(Ad + diag(1e-4,ncol(Ad),ncol(Ad)))
Adi <- as(as(as( Adi, "dMatrix"), "generalMatrix"), "CsparseMatrix")
attr(Adi, 'inverse')=TRUE
Afi <- solve(Af + diag(1e-4,ncol(Af),ncol(Af)))
Afi <- as(as(as( Afi, "dMatrix"), "generalMatrix"), "CsparseMatrix")
attr(Afi, 'inverse')=TRUE
####====================####
#### multivariate model ####
#### 2 traits ####
####====================####
head(DT)
traits <- c("GY","GM")
DT[,traits] <- apply(DT[,traits],2,scale)
DTL <- reshape(DT[,c("hybrid","dent","flint", traits)],
idvar = c("hybrid","dent","flint"),
varying = traits,
v.names = "value", direction = "long",
timevar = "trait", times = traits )
DTL <- DTL[with(DTL, order(trait,hybrid)), ]
head(DTL)
M <- rbind(Md,Mf)
A <- A.mat(M)
Ai <- solve(A + diag(1e-4,ncol(A),ncol(A)))
Ai <- as(as(as( Ai, "dMatrix"), "generalMatrix"), "CsparseMatrix")
attr(Ai, 'inverse')=TRUE
ans3 <- mmes(value~trait, henderson=TRUE,
random=~vsm(usm(trait),ism(overlay(dent,flint)),Gu=Ai),
rcov=~ vsm(dsm(trait), ism(units)),
data=DTL)
summary(ans3)
cov2cor(ans3$theta[[1]])
}
################# lme4breeding ####################
if(requireNamespace("lme4breeding")){
library(lme4breeding)
## simple model
ans2 <- lmeb(GY ~ (1|dent) + (1|flint),
data=DT)
vc <- VarCorr(ans2); print(vc,comp=c("Variance"))
BLUP <- ranef(ans2, condVar=TRUE)
condVAR <- lapply(BLUP, function(x){attr(x, which="postVar")}) # take sqrt() for SEs
### with relationship matrices
ans2 <- lmeb(GY ~ (1|dent) + (1|flint),
relmat = list(dent=Ad,
flint=Af),
data=DT)
vc <- VarCorr(ans2); print(vc,comp=c("Variance"))
### overlayed model
M <- rbind(Md,Mf)
A <- A.matr(M)
A <- A + diag(1e-4,ncol(A), ncol(A))
Z <- with(DT, overlay(dent,flint) )
Z = Z[which(!is.na(DT$GY)),]
#### model using overlay without relationship matrix
ans2 <- lmeb(GY ~ (1|fema),
addmat = list(fema=Z),
relmat = list(fema=A),
data=DT)
vc <- VarCorr(ans2); print(vc,comp=c("Variance"))
sigma(ans2)^2 # error variance
BLUP <- ranef(ans2, condVar=TRUE)
condVAR <- lapply(BLUP, function(x){attr(x, which="postVar")}) # take sqrt() for SEs
### rotated model for hybrids
H <- kronecker(Ad,Af, make.dimnames = TRUE)
H <- H[which(colnames(H)%in% DT$hy),which(colnames(H)%in% DT$hy)]
ans3 <- lmeb(GY ~ (1|hy),
relmat = list(hy=H),
rotation=TRUE,
data=DT)
vc <- VarCorr(ans3); print(vc,comp=c("Variance"))
BLUP <- ranef(ans3, condVar=TRUE)
condVAR <- lapply(BLUP, function(x){attr(x, which="postVar")}) # take sqrt() for SEs
}
################# evola ####################
if(requireNamespace("evola")){
library(evola)
DT <- DT_technow
DT$occ <- 1; DT$occ[1]=0
combos <- build.HMM(Md,Mf, return.combos.only = TRUE)
combos <- combos$data.used[which(combos$data.used$hybrid %in%DT$hy),]
M <- build.HMM(Md,Mf, custom.hyb = combos)
A <- A.matr(M$HMM.add)
A <- A[DT$hy,DT$hy]
# run the genetic algorithm
# we assig a weight to x'Dx of (20*pi)/180=0.34
res<-evolafit(formula = c(GY, occ)~hy,
dt= DT,
# constraints: if sum is greater than this ignore
constraintsUB = c(Inf,100),
# constraints: if sum is smaller than this ignore
constraintsLB= c(-Inf,-Inf),
# weight the traits for the selection
b = c(1,0),
# population parameters
nCrosses = 100, nProgeny = 10,
recombGens=1, nChr=1, mutRateAllele=0,
# coancestry parameters
D=A, lambda= (20*pi)/180 , nQtlStart = 90,
# selection parameters
propSelBetween = 0.5, propSelWithin =0.5,
nGenerations = 20)
Q <- pullQtlGeno(res$pop, simParam = res$simParam, trait=1); Q <- Q/2
best = bestSol(res$pop)[,"fitness"]
qa = (Q %*% DT$GY)[best,]; qa
qAq = Q[best,] %*% A %*% Q[best,]; qAq
sum(Q[best,]) # total # of inds selected
evolmonitor(res)
plot(DT$GY, col=as.factor(Q[best,]),
pch=(Q[best,]*19)+1)
pareto(res)
}
################## end ###################
wheat lines dataset
Description
Information from a collection of 599 historical CIMMYT wheat lines. The wheat data set is from CIMMYT's Global Wheat Program. Historically, this program has conducted numerous international trials across a wide variety of wheat-producing environments. The environments represented in these trials were grouped into four basic target sets of environments comprising four main agroclimatic regions previously defined and widely used by CIMMYT's Global Wheat Breeding Program. The phenotypic trait considered here was the average grain yield (GY) of the 599 wheat lines evaluated in each of these four mega-environments.
A pedigree tracing back many generations was available, and the Browse application of the International Crop Information System (ICIS), as described in (McLaren et al. 2000, 2005) was used for deriving the relationship matrix A among the 599 lines; it accounts for selection and inbreeding.
Wheat lines were recently genotyped using 1447 Diversity Array Technology (DArT) generated by
Triticarte Pty. Ltd. (Canberra, Australia; http://www.triticarte.com.au). The DArT markers
may take on two values, denoted by their presence or absence. Markers with a minor allele frequency
lower than 0.05 were removed, and missing genotypes were imputed with samples from the marginal
distribution of marker genotypes, that is, x_{ij}=Bernoulli(\hat p_j), where \hat p_j
is the estimated allele frequency computed from the non-missing genotypes. The number of DArT
MMs after edition was 1279.
Usage
data(DT_wheat)
Format
Matrix Y contains the average grain yield, column 1: Grain yield for environment 1 and so on.
Source
International Maize and Wheat Improvement Center (CIMMYT), Mexico.
References
Covarrubias-Pazaran G (2016) Genome assisted prediction of quantitative traits using the R package sommer. PLoS ONE 11(6): doi:10.1371/journal.pone.0156744
Giovanny Covarrubias-Pazaran (2024). lme4breeding: enabling genetic evaluation in the age of genomic data. To be submitted to Bioinformatics.
Douglas Bates, Martin Maechler, Ben Bolker, Steve Walker (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1-48.
McLaren, C. G., L. Ramos, C. Lopez, and W. Eusebio. 2000. “Applications of the geneaology manegment system.” In International Crop Information System. Technical Development Manual, version VI, edited by McLaren, C. G., J.W. White and P.N. Fox. pp. 5.8-5.13. CIMMyT, Mexico: CIMMyT and IRRI.
Giovanny Covarrubias-Pazaran (2024). evola: a simple evolutionary algorithm for complex problems. To be submitted to Bioinformatics.
Gaynor, R. Chris, Gregor Gorjanc, and John M. Hickey. 2021. AlphaSimR: an R package for breeding program simulations. G3 Gene|Genomes|Genetics 11(2):jkaa017. https://doi.org/10.1093/g3journal/jkaa017.
Chen GK, Marjoram P, Wall JD (2009). Fast and Flexible Simulation of DNA Sequence Data. Genome Research, 19, 136-142. http://genome.cshlp.org/content/19/1/136.
Examples
data(DT_wheat)
DT <- DT_wheat
GT <- apply(GT_wheat,2,as.numeric)
rownames(GT) <- rownames(GT_wheat)
DT <- data.frame(pheno=as.vector(DT),
env=as.factor(paste0("e", sort(rep(1:4,nrow(DT))))),
id=rep(rownames(DT),4))
K <- tcrossprod(GT-1)
m <- sum(diag(K))/nrow(K)
K <- K/m
K <- K + diag(1e-05, ncol(K), ncol(K))
K[1:4,1:4]
##
head(DT)
############## sommer ###############
if(requireNamespace("sommer")){
library(sommer)
mix0 <- mmes(pheno~1,
random = ~vsm(ism(id),Gu=K),
rcov=~units,
data=DT[which(DT$env == "e1"),])
summary(mix0)$varcomp
# if using mmes=TRUE provide Gu as inverse
Ki <- solve(K + diag(1e-4,ncol(K),ncol(K)))
Ki <- as(as(as( Ki, "dMatrix"), "generalMatrix"), "CsparseMatrix")
attr(Ki, 'inverse')=TRUE
}
############## lme4breeding ###############
if(requireNamespace("lme4breeding")){
library(lme4breeding)
#### main effect model
system.time(
mix0 <- lmeb(pheno ~ (1|id),
relmat = list(id=K),
data=DT)
)
vc <- VarCorr(mix0); print(vc,comp=c("Variance"))
sigma(mix0)^2 # error variance
BLUP <- ranef(mix0, condVar=TRUE)
condVAR <- lapply(BLUP, function(x){attr(x, which="postVar")}) # take sqrt() for SEs
#### unstructured model
DT <- DT[with(DT, order(env, id)), ] # sort for rotation
system.time(
mix1 <- lmeb(pheno ~ (0 + env | id),
relmat = list(id=K),
rotation = TRUE,
data=DT)
)
vc <- VarCorr(mix1); print(vc,comp=c("Variance"))
sigma(mix1)^2 # error variance
}
############## evola ###############
if(requireNamespace("evola")){
library(evola)
DT <- as.data.frame(DT_wheat)
DT$id <- rownames(DT) # IDs
DT$occ <- 1; DT$occ[1]=0 # to track occurrences
DT$dummy <- 1; DT$dummy[1]=0 # dummy trait
# if genomic
G <- K
# if pedigree
A <- A_wheat
A[1:4,1:4]
##Perform eigenvalue decomposition for clustering
##And select cluster 5 as target set to predict
pcNum=25
svdWheat <- svd(A, nu = pcNum, nv = pcNum)
PCWheat <- A %*% svdWheat$v
rownames(PCWheat) <- rownames(A)
DistWheat <- dist(PCWheat)
TreeWheat <- cutree(hclust(DistWheat), k = 5 )
plot(PCWheat[,1], PCWheat[,2], col = TreeWheat,
pch = as.character(TreeWheat), xlab = "pc1", ylab = "pc2")
vp <- rownames(PCWheat)[TreeWheat == 3]; length(vp)
tp <- setdiff(rownames(PCWheat),vp)
As <- A[tp,tp]
DT2 <- DT[rownames(As),]
DT2$cov <- apply(A[tp,vp],1,mean)
# we assign a weight to x'Dx of (60*pi)/180=1
res<-evolafit(formula=cbind(cov, occ)~id, dt= DT2,
# constraints: if sum is greater than this ignore
constraintsUB = c(Inf, 100),
# constraints: if sum is smaller than this ignore
constraintsLB= c(-Inf, -Inf),
# weight the traits for the selection
b = c(1,0),
# population parameters
nCrosses = 100, nProgeny = 10,
recombGens=1, nChr=1, mutRateAllele=0,
# coancestry parameters
D=As, lambda= (60*pi)/180 , nQtlStart = 90,
# selection parameters
propSelBetween = 0.5, propSelWithin =0.5,
nGenerations = 30, verbose = TRUE)
Q <- pullQtlGeno(res$pop, simParam = res$simParam, trait=1); Q <- Q/2
best <- bestSol(res$pop)[,"fitness"]
sum(Q[best,]) # total # of inds selected
pareto(res)
}
################ end ##################
Yield of oats in a split-block experiment
Description
The yield of oats from a split-plot field trial using three varieties and four levels of manurial treatment. The experiment was laid out in 6 blocks of 3 main plots, each split into 4 sub-plots. The varieties were applied to the main plots and the manurial (nitrogen) treatments to the sub-plots.
Format
blockblock factor with 6 levels
nitronitrogen treatment in hundredweight per acre
Varietygenotype factor, 3 levels
yieldyield in 1/4 lbs per sub-plot, each 1/80 acre.
rowrow location
columncolumn location
Source
Yates, Frank (1935) Complex experiments, Journal of the Royal Statistical Society Suppl. 2, 181–247.
References
Venables, W. N. and Ripley, B. D. (2002) Modern Applied Statistics with S. Fourth edition. Springer.
Examples
data(DT_yatesoats)
DT <- DT_yatesoats
head(DT)
##############################################
############### sommer. #################
##############################################
if(requireNamespace("sommer")){
library(sommer)
m3 <- mmes(fixed=Y ~ V + N + V:N,
random = ~ B + B:MP,
rcov=~units,
data = DT)
summary(m3)$varcomp
}
##############################################
############### lme4breeding #################
##############################################
if(requireNamespace("lme4breeding")){
library(lme4breeding)
mix <- lmeb(Y ~ V + N + V:N +
(1|B) + (1|B:MP),
data = DT)
vc <- VarCorr(mix); print(vc,comp=c("Variance"))
sigma(mix)^2 # error variance
BLUP <- ranef(mix, condVar=TRUE)
condVAR <- lapply(BLUP, function(x){attr(x, which="postVar")}) # take sqrt() for SEs
}
Identity relationship matrix
Description
Shortcut to get an identity relationship matrix.
Usage
I.matr(x)
Arguments
x |
Vector of values to identify the unique values and form an iende. |
Details
Nothing but a shortcut to avoid few lines of code.
Value
If return.imputed = FALSE, the n \times n identity matrix is returned.
- $I
the I matrix
References
Giovanny Covarrubias-Pazaran (2024). lme4breeding: enabling genetic evaluation in the age of genomic data. To be submitted to Bioinformatics.
Douglas Bates, Martin Maechler, Ben Bolker, Steve Walker (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1-48.
Examples
## random population of 200 lines with 1000 markers
Calculation of linkage disequilibrium decay
Description
This function calculates the LD decay based on a marker matrix and a map with distances between markers in cM or base pairs.
Usage
LD.decay(markers,map,silent=FALSE,unlinked=FALSE,gamma=0.95)
Arguments
markers |
a numeric matrix of markers (columns) by individuals (rows) in -1, 0, 1 format. |
map |
a data frame with 3 columns "Locus" (name of markers), "LG" (linkage group or chromosome), and "Position" (in cM or base pairs). |
silent |
a TRUE/FALSE value statement indicating if the program should or should not display the progress bar. silent=TRUE means that will not be displayed. |
unlinked |
a TRUE/FALSE value statement indicating if the program should or should not calculate the alpha(see next argument) percentile of interchromosomal LD. |
gamma |
a percentile value for LD breakage to be used in the calculation of interchromosomal LD extent. |
Value
- $resp
a list with 3 elements; "by.LG", "all.LG", "LDM". The first element (by.LG) is a list with as many elements as chromosomes where each contains a matrix with 3 columns, the distance, the r2 value, and the p-value associated to the chi-square test for disequilibrium. The second element (all.LG) has a big matrix with distance, r2 values and p-values, for each point from all chromosomes in a single data.frame. The third element (LDM) is the matrix of linkage disequilibrium between pairs of markers.
If unlinked is selected the program should return the gamma percentile interchromosomal LD (r2) for each chromosome and average.
References
Laido, Giovanni, et al. Linkage disequilibrium and genome-wide association mapping in tetraploid wheat (Triticum turgidum L.). PloS one 9.4 (2014): e95211.
Examples
nInds=300
nMarks=500
#random population of nInds lines with nMarks markers
M <- matrix(rep(0,nInds*nMarks),nInds,nMarks)
for (i in 1:nInds) {
M[i,] <- ifelse(runif(nMarks)<0.5,-1,1)
}
rownames(M) <- paste0("ind",1:nInds)
colnames(M) <- paste0("mark",1:nMarks)
mapM = data.frame(Locus=colnames(M),Position=1:nMarks,LG=1)
res <- LD.decay(M, mapM)
#### subset only markers with significant LD
res$all.LG <- res$all.LG[which(res$all.LG$p < .001),]
#### plot the LD decay
with(res$all.LG, plot(r2~d,col=transp("cadetblue"),
xlim=c(0,55), ylim=c(0,1),
pch=20,cex=0.5,yaxt="n",
xaxt="n",ylab=expression(r^2),
xlab="Distance in cM")
)
axis(1, at=seq(0,55,5), labels=seq(0,55,5))
axis(2,at=seq(0,1,.1), labels=seq(0,1,.1), las=1)
#### if you want to add the loess regression lines
#### this could take a long time!!!
# mod <- loess(r2 ~ d, data=res$all.LG)
# par(new=TRUE)
# lilo <- predict(mod,data.frame(d=1:55))
# plot(lilo, bty="n", xaxt="n", yaxt="n", col="green",
# type="l", ylim=c(0,1),ylab="",xlab="",lwd=2)
# res3 <- LD.decay(markers=CPgeno, map=mapCP,
# unlinked = TRUE,gamma = .95)
# abline(h=res3$all.LG, col="red")
add.diallel.vars
Description
'add.diallel.vars' adds 4 columns to the provided diallel dataset. Specifically, the user provides a dataset with indicator variables for who is the male and female parent and the function returns the same dataset with 4 new dummy variables to allow the model fit of diallel models.
Usage
add.diallel.vars(df, par1="Par1", par2="Par2",sep.cross="-")
Arguments
df |
a dataset with the two indicator variables for who is the male and female parent. |
par1 |
the name of the column indicating who is the first parent (e.g. male). |
par2 |
the name of the column indicating who is the second parent (e.g. female). |
sep.cross |
the character that should be used when creating the column for cross.id. A simple paste of the columns par1 and par2. |
Value
A new data set with the following 4 new dummy variables to allow the fit of complex diallel models:
- is.cross
returns a 0 if is a self and a 1 for a cross.
- is.self
returns a 0 if is a cross and a 1 is is a self.
- cross.type
returns a -1 for a direct cross, a 0 for a self and a 1 for a reciprocal cross.
- cross.id
returns a column psting the par1 and par2 columns.
Author(s)
Giovanny Covarrubias-Pazaran
References
Giovanny Covarrubias-Pazaran (2024). lme4breeding: enabling genetic evaluation in the age of genomic data. To be submitted to Bioinformatics.
Douglas Bates, Martin Maechler, Ben Bolker, Steve Walker (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1-48.
Examples
DT = data.frame(Block=c( rep(1,64), rep(2,64) ),
Par1=c( sort(rep(1:8,8)), sort(rep(1:8,8)) ),
Par2=rep(1:8, 16),
Ftime=rnorm(128)
)
DT2 <- add.diallel.vars(DT,par1="Par1", par2="Par2")
head(DT2)
Binds arrays corner-to-corner
Description
Array generalization of blockdiag()
Usage
adiag1(... , pad=as.integer(0), do.dimnames=TRUE)
Arguments
... |
Arrays to be binded together |
pad |
Value to pad array with; note default keeps integer status of arrays |
do.dimnames |
Boolean, with default |
Details
Binds any number of arrays together, corner-to-corner. Because the function is associative provided pad is of length 1, this page discusses the two array case.
If x=adiag1(a,b) and dim(a)=c(a_1,...,a_d),
dim(b)=c(b_1,...,b_d); then all(dim(x)==dim(a)+dim(b)) and
x[1:a_1,...,1:a_d]=a and
x[(a_1+1):(a_1+b_1),...,(a_d+1):(a_d+b_d)]=b.
Dimnames are preserved, if both arrays have non-null dimnames, and
do.dimnames is TRUE.
Argument pad is usually a length-one vector, but any vector is
acceptable; standard recycling is used. Be aware that the output array
(of dimension dim(a)+dim(b)) is filled with (copies of)
pad before a and b are copied. This can be
confusing.
Value
Returns an array of dimensions dim(a)+dim(b) as described above.
Note
In adiag1(a,b), if a is a length-one vector, it is coerced
to an array of dimensions rep(1,length(dim(b))); likewise
b. If both a and b are length-one vectors, return
diag(c(a,b)).
If a and b are arrays, function adiag1() requires
length(dim(a))==length(dim(b)) (the function does not guess which
dimensions have been dropped; see examples section). In particular,
note that vectors are not coerced except if of length one.
adiag1() is used when padding magic hypercubes in the context
of evaluating subarray sums.
Author(s)
Peter Wolf with some additions by Robin Hankin
Examples
a <- array( 1,c(2,2))
b <- array(-1,c(2,2))
adiag1(a,b)
## dropped dimensions can count:
b2 <- b1 <- b
dim(a) <- c(2,1,2)
dim(b1) <- c(2,2,1)
dim(b2) <- c(1,2,2)
dim(adiag1(a,b1))
dim(adiag1(a,b2))
## dimnames are preserved if not null:
a <- matrix(1,2,2,dimnames=list(col=c("red","blue"),size=c("big","small")))
b <- 8
dim(b) <- c(1,1)
dimnames(b) <- list(col=c("green"),size=c("tiny"))
adiag1(a,b) #dimnames preserved
adiag1(a,8) #dimnames lost because second argument has none.
## non scalar values for pad can be confusing:
q <- matrix(0,3,3)
adiag1(q,q,pad=1:4)
## following example should make the pattern clear:
adiag1(q,q,pad=1:36)
# Now, a use for arrays with dimensions of zero extent:
z <- array(dim=c(0,3))
colnames(z) <- c("foo","bar","baz")
adiag1(a,z) # Observe how this has
# added no (ie zero) rows to "a" but
# three extra columns filled with the pad value
adiag1(a,t(z))
adiag1(z,t(z)) # just the pad value
Letter to number converter
Description
This function was designed to help users to transform their data in letter format to numeric format. Details in the format are not complex, just a dataframe with markers in columns and individuals in rows. Only markers, NO extra columns of plant names etc (names of plants can be stored as rownames). The function expects a matrix of only polymorphic markers, please make sure you clean your data before using this function. The apply function can help you identify and separate monomorphic from polymorphic markers.
Usage
atcg1234(data, ploidy=2, format="ATCG", maf=0, multi=TRUE,
silent=FALSE, by.allele=FALSE, imp=TRUE, ref.alleles=NULL)
Arguments
data |
a dataframe with markers in columns and individuals in rows. Preferable the rownames are the ID of the plants so you don't lose track of what is what. |
ploidy |
a numeric value indicating the ploidy level of the specie. The default is 2 which means diploid. |
format |
one of the two possible values allowed by the program "ATCG", which means your calls are in base-pair-letter code, i.e. "AT" in a diploid call, "AATT" tetraploid etc (just example). Therefore possible codes can be "A", "T", "C", "G", "-" (deletion), "+" (insertion). Alternatively "AB" format can be used as well. Commonly this depends from the genotyping technologies used, such as GBS or microarrays. In addition, we have enabled also the use of single-letter code used by Cornell, i.e. A=AA, C=CC, T=TT, G=GG, R=AG, Y=CT, S=CG, W=AT, K=GT, M=AC. In the case of GBS code please make sure that you set the N codes to regular NAs handled by R. The "ATCG" format also works for the bi-allelic marker codes from join map such as "lm", "ll","nn", "np","hh","hk","kk" |
maf |
minor allele frequency used to filter the SNP markers, the default is zero which means all markers are returned in numeric format. |
multi |
a TRUE/FALSE statement indicating if the function should get rid of the markers with more than 2 alleles. If FALSE, which indicates that if markers with multiple alleles are found, the alternate and reference alleles will be the first 2 alleles found. This could be risky since some alleles will be masked, i.e. AA AG AT would take only A and G as reference and alternate alleles, converting to numeric format 2 1 1, giving the same effect to AG and AT which could be a wrong assumption. The default is TRUE, removes markers with more than two alleles. |
silent |
a TRUE/FALSE value indicating if a progress bar should be drawn for each step of the conversion. The default is silent=FALSE, which means that we want progress bar to be drawn. |
by.allele |
a TRUE/FALSE value indicating if the program should transform the data in a zero/one matrix of presence/absense per allele. For example, a marker with 3 alleles A,T,C in a diploid organism will yield 6 possible configurations; AA, AT, AC, TT, TC, CC. Therefore, the program would create 3 columns for this marker indicating the presence/absence of each allele for each genotype. |
imp |
a TRUE/FALSE value indicating if the function should impute the missing data using the median for each marker. If FALSE, then the program will not impute. |
ref.alleles |
a matrix with reference alleles to be used for the conversion. The matrix should have as many columns as markers with reference alleles and with 2 rows, being the first row the alternate allele (Alt) and the second row the reference allele (Ref). Rownames should be "Alt" and "Ref" respectively. If not provided the program will decide the reference allele. |
Value
- $data
a new dataframe of markers in numeric format with markers in columns and individuals in rows.
Author(s)
Giovanny Covarrubias-Pazaran
Examples
genotypes <- rbind(
c("AAAG" , "AAGG" , "TTCC" , "TTTT" , "CCCC"),
c("AAGG" , "AGGG" , "TTCC" , "TTTT" , "TCCC"),
c("AAGG" , "GGGG" , "TTCC" , "TTTT" , NA),
c("AAAA" , "GGGG" , "CCCC" , "TTTT" , NA),
c("AAAA" , "GGGG" , "CCCC" , "TTCC" , NA)
)
rownames(genotypes) <- letters[1:5]
colnames(genotypes) <- paste0("m",1:5)
## convert markers to numeric format polyploid potatoes
numo <- atcg1234(data=genotypes, ploidy=4)
numo
## convert markers back to letters
Xb <- atcg1234BackTransform(marks= numo$M, refs= numo$ref.alleles)
Xb
Letter to number converter
Description
This function was designed to help users back transform the numeric marker matrices from the function atcg1234 into letters.
Usage
atcg1234BackTransform(marks, refs)
Arguments
marks |
a centered marker matrix coming from atcg1234. |
refs |
a 2 x m matrix for m markers (columns) and 2 rows where the reference and alternate alleles for each marker are indicated. |
Value
- markers
a new marker matrix leter coded according to the reference allele matrix.
Author(s)
Giovanny Covarrubias-Pazaran
Examples
genotypes <- rbind(
c("AAAG" , "AAGG" , "TTCC" , "TTTT" , "CCCC"),
c("AAGG" , "AGGG" , "TTCC" , "TTTT" , "TCCC"),
c("AAGG" , "GGGG" , "TTCC" , "TTTT" , NA),
c("AAAA" , "GGGG" , "CCCC" , "TTTT" , NA),
c("AAAA" , "GGGG" , "CCCC" , "TTCC" , NA)
)
rownames(genotypes) <- letters[1:5]
colnames(genotypes) <- paste0("m",1:5)
## convert markers to numeric format polyploid potatoes
numo <- atcg1234(data=genotypes, ploidy=4)
numo
## convert markers back to letters
Xb <- atcg1234BackTransform(marks= numo$M, refs= numo$ref.alleles)
Xb
Generate a sequence of colors for plotting bathymetric data.
Description
bathy.colors(n) generates a sequence of n
colors along a linear scale from light grey to pure blue.
Usage
bathy.colors(n, alpha = 1)
Arguments
n |
The number of colors to return. |
alpha |
Alpha values to be passed to |
Value
A vector of blue scale colors.
Examples
{
# Plot a colorbar using bathy.colors
image(matrix(seq(100), 100), col=bathy.colors(100))
}
Function for creating B-spline basis functions (Eilers & Marx, 2010)
Description
Construct a B-spline basis of degree deg
with ndx-1 equally-spaced internal knots (ndx segments) on range [x1,xr].
Code copied from Eilers & Marx 2010, WIR: Comp Stat 2, 637-653.
Usage
bbasis(x, xl, xr, ndx, deg)
Arguments
x |
A vector. Data values for spline. |
xl |
A numeric value. Lower bound for data (lower external knot). |
xr |
A numeric value. Upper bound for data (upper external knot). |
ndx |
A numeric value. Number of divisions for x range (equal to number of segments = number of internal knots + 1) |
deg |
A numeric value. Degree of the polynomial spline. |
Details
Not yet amended to coerce values that should be zero to zero!
Value
A matrix with columns holding the P-spline for each value of x.
Matrix has ndx+deg columns and length(x) rows.
Build a hybrid marker matrix using parental genotypes from inbred individuals
Description
Uses the 2 marker matrices from both sets of inbred or partially inbred parents and creates all possible combinations unless the user specifies which hybrid genotypes to build (custom.hyb argument). It returns the additive and dominance marker matrices (-1,0,1; homo,het,homo in additive and 0,1,0; homo,het,homo for dominance).
Usage
build.HMM(M1,M2, custom.hyb=NULL, return.combos.only=FALSE,
separator=":",n.batch=1000, verbose=TRUE)
Arguments
M1 |
Matrix ( |
M2 |
Matrix ( |
custom.hyb |
A data frame with columns 'Var1' 'Var2', 'hybrid' which specifies which hybrids should be built using the M1 and M2 matrices provided. |
return.combos.only |
A TRUE/FALSE statement inicating if the function should skip building the geotype matrix for hybrids and only return the data frame with all possible combinations to be build. In case the user wants to subset the hybrids before building the marker matrix. |
separator |
Any desired character to be used when pasting the male and female columns to assign the name to the hybrids. |
n.batch |
An optional integer value to indicate how many hybrids should be constructed at once. When the number of hybrids and number of markers is big it is better to partition the problem into multiple matrix products. By default we assume that no more than 1000 hybrids should be computed at once to use the memory more efficiently. |
verbose |
A logical value indicating if progress and warning messages should be printed in the console. |
Details
It returns the marker matrix for hybrids coded as additive (-1,0,1; homo,het,homo) and dominance (0,1,0; homo,het,homo). This function is deviced for building marker matrices for hybrids coming from inbreds. If the parents are close to inbred >F5 you can try deleting the heterozygote calls (0's) and imputing those cells with the most common genotype (1 or -1). The expectation is that for mostly inbred individuals this may not change drastically the result but will make the results more interpretable. For non-inbred parents (F1 to F3) the cross of an F1 x F1 has many possibilities and is not the intention of this function to build genotypes for heterzygote x heterozygote crosses.
Value
It returns the marker matrix for hybrids coded as additive (-1,0,1; homo,het,homo) and dominance (0,1,0; homo,het,homo).
- $HMM.add
marker matrix for hybrids coded as additive (-1,0,1; homo,het,homo)
- $HMM.dom
marker matrix for hybrids coded as dominance (0,1,0; homo,het,homo)
- $data.used
the data frame used to build the hybrid genotypes
References
Giovanny Covarrubias-Pazaran (2024). lme4breeding: enabling genetic evaluation in the age of genomic data. To be submitted to Bioinformatics.
Douglas Bates, Martin Maechler, Ben Bolker, Steve Walker (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1-48.
Nishio M and Satoh M. 2014. Including Dominance Effects in the Genomic BLUP Method for Genomic Evaluation. Plos One 9(1), doi:10.1371/journal.pone.0085792
Su G, Christensen OF, Ostersen T, Henryon M, Lund MS. 2012. Estimating Additive and Non-Additive Genetic Variances and Predicting Genetic Merits Using Genome-Wide Dense Single Nucleotide Polymorphism Markers. PLoS ONE 7(9): e45293. doi:10.1371/journal.pone.0045293
Examples
nInds=3
nMarks=5
# set of lines 1
M1 <- matrix(rep(0,nInds*nMarks),nInds,nMarks)
for (i in 1:nInds) {
M1[i,] <- ifelse(runif(nMarks)<0.5,-1,1)
}
rownames(M1) <- letters[1:nInds]
# set of lines 2
M2 <- matrix(rep(0,nInds*nMarks),nInds,nMarks)
for (i in 1:nInds) {
M2[i,] <- ifelse(runif(nMarks)<0.5,-1,1)
}
rownames(M2) <- letters[(nInds+1):(nInds*2)]
# build additive and dominance matrices for single crosses
build.HMM(M1,M2)
Mixed Effect Model Enhancing Functions
Description
enhancer is a collection of functions useful in the fitting of linear mixed effect models and general data manipulation. As more of my packages use the same special functions I decided to have the special functions as its own package to facilitate the maintenance.
Functions for genetic analysis
The package provides kernels to the estimate additive (A.matr)
relationship matrix for diploid and polyploid organisms. A good converter from
letter code to numeric format is implemented in the function atcg1234
and atcg1234BackTransform,
which supports higher ploidy levels than diploid. Additional functions for genetic
analysis have been included such as build a genotypic hybrid marker matrix
(build.HMM). If you need to use pedigree you need to convert your
pedigree into a relationship matrix (use the 'getA' function from the pedigreemm package).
Also the LD.decay and the neMarker functions can be used
to calculate linkage disequilibrium and effective population size.
Functions for trial analysis
Recently, spatial modeling has been added added to enhancer using the two-dimensional
spline tps function. Based on bbasis
Functions for special mixed models
Available are add.diallel.vars, leg,
overlay, rrm, redmm,
simGECorMat function.
Functions for data manipulation and others
Available are adiag1, imputev, propMissing
replaceValues, stan, logspace
function.
Simple coloring functions available are jet.colors, bathy.colors,
transp
Keeping enhancer updated
The enhancer package is updated on CRAN every 4-months due to CRAN policies but you can find the latest source at https://github.com/covaruber/enhancer. This can be easily installed typing the following in the R console:
library(devtools)
install_github("covaruber/enhancer")
This is recommended if you reported a bug, was fixed and was immediately pushed to GitHub but not in CRAN until the next update.
Bug report and contact
If you have any questions or suggestions please post it in https://stackoverflow.com or https://stats.stackexchange.com
I'll be glad to help or answer any question. I have spent a valuable amount of time developing this package. Please cite this package in your publication. Type 'citation("enhancer")' to know how to cite it.
Author(s)
Giovanny Covarrubias-Pazaran
Imputing a numeric or character vector
Description
This function is a very simple function to impute a numeric or character vector with the mean or median value of the vector.
Usage
imputev(x, method="median", by=NULL)
Arguments
x |
a numeric or character vector. |
method |
the method to choose between mean or median. |
by |
an alternative vector with values declaring the method to be applied by. |
Value
- $x
a numeric or character vector imputed with the method selected.
Author(s)
Giovanny Covarrubias-Pazaran
References
Covarrubias-Pazaran G (2016) Genome assisted prediction of quantitative traits using the R package sommer. PLoS ONE 11(6): doi:10.1371/journal.pone.0156744
Examples
####=========================================####
#### generate your mickey mouse -log10(p-values)
####=========================================####
set.seed(1253)
x <- rnorm(100)
x[sample(1:100,10)] <- NA
imputev(x)
Generate a sequence of colors alog the jet colormap.
Description
jet.colors(n) generates a sequence of n
colors from dark blue to cyan to yellow to dark red. It
is similar to the default color schemes in Python's
matplotlib or MATLAB.
Usage
jet.colors(n, alpha = 1)
Arguments
n |
The number of colors to return. |
alpha |
The transparency value of the colors. See
|
Value
A vector of colors along the jet colorramp.
Examples
{
# Plot a colorbar with jet.colors
image(matrix(seq(100), 100), col=jet.colors(100))
}
Legendre polynomial matrix
Description
Legendre polynomials of order 'n' are created given a vector 'x' and normalized to lay between values u and v.
Usage
leg(x,n=1,u=-1,v=1, intercept=TRUE, intercept1=FALSE)
Arguments
x |
numeric vector to be used for the polynomial. |
n |
order of the Legendre polynomials. |
u |
lower bound for the polynomial. |
v |
upper bound for the polynomial. |
intercept |
a TRUE/FALSE value indicating if the intercept should be included. |
intercept1 |
a TRUE/FALSE value indicating if the intercept should have value 1 (is multiplied by sqrt(2)). |
Value
- $S3
an Legendre polynomial matrix of order n.
Author(s)
Giovanny Covarrubias-Pazaran
References
Giovanny Covarrubias-Pazaran (2024). lme4breeding: enabling genetic evaluation in the age of genomic data. To be submitted to Bioinformatics.
Douglas Bates, Martin Maechler, Ben Bolker, Steve Walker (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1-48.
Examples
x <- sort(rep(1:3,100))
library(orthopolynom)
leg(x, n=1)
leg(x, n=2)
Decreasing logarithmic trend
Description
logspace creates a vector with decreasing logaritmic trend.
Usage
logspace(x, p=2)
Arguments
x |
sequence of values to pass through the function. |
p |
power to be applied to the values. |
Value
- $res
a vector of length n with logarithmic decrease trend.
Author(s)
Giovanny Covarrubias-Pazaran
References
Covarrubias-Pazaran G (2016) Genome assisted prediction of quantitative traits using the R package sommer. PLoS ONE 11(6): doi:10.1371/journal.pone.0156744
Examples
plot(logspace(1:100,p=1))
plot(logspace(1:100,p=2))
plot(logspace(1:100,p=3))
Creating a manhattan plot
Description
This function was designed to create a manhattan plot using a data frame with columns "Chrom" (Chromosome), "Position" and "p.val" (significance for the test).
Usage
manhattan(map, col=NULL, fdr.level=0.05, show.fdr=TRUE, PVCN=NULL, ylim=NULL, ...)
Arguments
map |
the data frame with 3 columns with names; "Chrom" (Chromosome), "Position" and "p.val" (significance for the test). |
col |
colors prefered by the user to be used in the manhattan plot. The default is NULL which will use the red-blue palette. |
fdr.level |
false discovery rate to be drawn in the plot. |
show.fdr |
a TRUE/FALSE value indicating if the FDR value should be shown in the manhattan plot or not. By default is TRUE meaning that will be displayed. |
PVCN |
In case the user wants to provide the name of the column that should be treated as the "p.val" column expected by the program in the 'map' argument. |
ylim |
the y axis limits for the manhattan plot if the user wants to customize it. By default the plot will reflect the minimum and maximum values found. |
... |
additional arguments to be passed to the plot function such as pch, cex, etc. |
Value
If all parameters are correctly indicated the program will return:
- $plot.data
a manhattan plot
Author(s)
Giovanny Covarrubias-Pazaran
References
Covarrubias-Pazaran G (2016) Genome assisted prediction of quantitative traits using the R package sommer. PLoS ONE 11(6): doi:10.1371/journal.pone.0156744
Examples
#random population of 200 lines with 1000 markers
M <- matrix(rep(0,200*1000),1000,200)
for (i in 1:200) {
M[,i] <- ifelse(runif(1000)<0.5,-1,1)
}
colnames(M) <- 1:200
set.seed(1234)
pp <- abs(rnorm(500,0,3));pp[23:34] <- abs(rnorm(12,0,20))
geno <- data.frame(Locus=paste("m",1:500, sep="."),Chrom=sort(rep(c(1:5),100)),
Position=rep(seq(1,100,1),5),
p.val=pp, check.names=FALSE)
geno$Locus <- as.character(geno$Locus)
## look at the data, 5LGs, 100 markers in each
## -log(p.val) value for simulated trait
head(geno)
tail(geno)
manhattan(geno)
Creating a genetic map plot
Description
This function was designed to create a genetic map plot using a data frame indicating the Linkage Group (LG), Position and marker names (Locus).
Usage
map.plot(data, trait = NULL, trait.scale = "same",
col.chr = NULL, col.trait = NULL, type = "hist", cex = 0.4,
lwd = 1, cex.axis = 0.4, cex.trait=0.8, jump = 5)
Arguments
data |
the data frame with 3 columns with names; Locus, LG and Position |
trait |
if something wants to be plotted next the linkage groups the user must indicate the name of the column containing the values to be ploted, i.e. p-values, LOD scores, X2 segregation distortion values, etc. |
trait.scale |
is trait is not NULL, this is a character value indicating if the y axis limits for the trait plotted next to the chromosomes should be the same or different for each linkage group. The default value is "same", which means that the same y axis limit is conserved across linkage groups. For giving an individual y axis limit for each linkage group write "diff". |
col.chr |
a vector with color names for the chromosomes. If NULL they will be drawn in gray-black scale. |
col.trait |
a vector with color names for the dots, lines or histogram for the trait plotted next to the LG's |
type |
a character value indicating if the trait should be plotted as scatterplot 'dot', histogram 'hist', line 'line' next to the chromosomes. |
cex |
the cex value determining the size of the cM position labels in the LGs |
lwd |
the width of the lines in the plot |
cex.axis |
the cex value for sizing the labels of LGs and traits plotted (top labels) |
cex.trait |
the cex value for sizing the dots or lines of the trait plotted |
jump |
a scalar value indicating how often should be drawn a number next to the LG indicating the position. The default is 5 which means every 5 cM a label will be drawn, i.e. 0,5,10,15,... cM. |
Value
If all parameters are correctly indicated the program will return:
- $plot.data
a plot with the LGs and the information used to create a plot
Author(s)
Giovanny Covarrubias-Pazaran
References
Covarrubias-Pazaran G (2016) Genome assisted prediction of quantitative traits using the R package sommer. PLoS ONE 11(6): doi:10.1371/journal.pone.0156744
Examples
#random population of 200 lines with 1000 markers
M <- matrix(rep(0,200*1000),1000,200)
for (i in 1:200) {
M[,i] <- ifelse(runif(1000)<0.5,-1,1)
}
colnames(M) <- 1:200
set.seed(1234)
geno <- data.frame(Locus=paste("m",1:500, sep="."),LG=sort(rep(c(1:5),100)),
Position=rep(seq(1,100,1),5),
X2=rnorm(500,10,4), check.names=FALSE)
geno$Locus <- as.character(geno$Locus)
## look at the data, 5LGs, 100 markers in each
## X2 value for segregation distortion simulated
head(geno)
tail(geno)
table(geno$LG) # 5 LGs, 100 marks
map.plot(geno, trait="X2", type="line")
map.plot(geno, trait="X2", type="hist")
map.plot(geno, trait="X2", type="dot")
data("DT_cpdata")
MP <- MP_cpdata
colnames(MP)[3] <- c("LG")
head(MP)
map.plot(MP, type="line", cex=0.6)
Effective population size based on marker matrix
Description
'neMarker' uses a marker matrix to approximate the effective population size (Ne) by discovering how many individuals are needed to sample all possible alleles in a population.
Usage
neMarker(M, neExplore=NULL, maxMarker=1000, nSamples=5)
Arguments
M |
marker matrix coded in a numerical faashion (any allele dosage is fine). |
neExplore |
a vector of numbers with the effective population sizes to be explored. |
maxMarker |
maximum number of markers to use for the analysis. |
nSamples |
number of individuals to sample for the Ne calculation. |
Value
- $S3
A vector with allele coverage based on different number of individuals
Author(s)
Giovanny Covarrubias-Pazaran
References
Not based on any theory published yet but in a solid intuition on what is really important for a breeding program when we ask what is the effective population size
Examples
nInds=300
nMarks=1000
#random population of nInds lines with nMarks markers
M <- matrix(rep(0,nInds*nMarks),nInds,nMarks)
for (i in 1:nInds) {
M[i,] <- ifelse(runif(nMarks)<0.5,-1,1)
}
# run the function
ne <- neMarker(M, neExplore = seq(2,300,100), nSamples = 5)
# plot(ne$Ne)
Overlay Matrix
Description
'overlay' adds r times the design matrix for model term t to the existing design matrix. Specifically, if the model up to this point has p effects and t has a effects, the a columns of the design matrix for t are multiplied by the scalar r (default value 1.0). This can be used to force a correlation of 1 between two terms as in a diallel analysis.
Usage
overlay(..., rlist=NULL, prefix=NULL, sparse=FALSE)
Arguments
... |
as many vectors as desired to overlay. |
rlist |
a list of scalar values indicating the times that each incidence matrix overlayed should be multiplied by. By default r=1. |
prefix |
a character name to be added before the column names of the final overlay matrix. This may be useful if you have entries with names starting with numbers which programs such as asreml doesn't like, or for posterior extraction of parameters, that way 'grep'ing is easier. |
sparse |
a TRUE/FALSE statement specifying if the matrices should be built as sparse or regular matrices. |
Value
- $S3
an incidence matrix with as many columns levels in the vectors provided to build the incidence matrix.
Author(s)
Giovanny Covarrubias-Pazaran
References
Fikret Isik. 2009. Analysis of Diallel Mating Designs. North Carolina State University, Raleigh, USA.
Giovanny Covarrubias-Pazaran (2024). lme4breeding: enabling genetic evaluation in the age of genomic data. To be submitted to Bioinformatics.
Douglas Bates, Martin Maechler, Ben Bolker, Steve Walker (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1-48.
Examples
femalef <- as.factor(c("a","b","c"))
malef <- as.factor(c("d","e","a"))
overlay(femalef,malef, sparse = TRUE)
overlay(femalef,malef, sparse = FALSE)
Proportion of missing data
Description
propMissing quick calculation of the proportion of missing data in a vector.
Usage
propMissing(x)
Arguments
x |
vector of observations. |
Value
- $res
a numeric value with the proportion of missing data.
Author(s)
Giovanny Covarrubias-Pazaran
References
Covarrubias-Pazaran G (2016) Genome assisted prediction of quantitative traits using the R package sommec. PLoS ONE 11(6): doi:10.1371/journal.pone.0156744
Examples
x <- c(1:10, NA)
propMissing(x)
Reduced Model Matrix
Description
'redmm' reduces a model matrix by performing a singular value decomposition or Cholesky on an incidence matrix.
Usage
redmm(x, M = NULL, Lam=NULL, nPC=50, cholD=FALSE, returnLam=FALSE)
Arguments
x |
as vector with values to form a model matrix or the complete incidence matrix itself for an effect of interest. |
M |
an optional matrix of features explaining the levels of x. If not provided is assumed that the entire incidence matrix has been provided in x. But if provided, the decomposition occurs in the matrix M. |
Lam |
a matrix of loadings in case is already available to avoid recomputing it. |
nPC |
number of principal components to keep from the matrix of loadings to form the model matrix. |
cholD |
should a Cholesky or a Singular value decomposition should be used. The default is the SVD. |
returnLam |
should the function return the loading matrix in addition to the incidence matrix. Default is FALSE. |
Value
- $S3
A list with 3 elements:
1) The model matrix to be used in the mixed modeling.
2) The reduced matrix of loadings (nPC columns).
3) The full matrix of loadings.
Author(s)
Giovanny Covarrubias-Pazaran
References
Giovanny Covarrubias-Pazaran (2024). lme4breeding: enabling genetic evaluation in the age of genomic data. To be submitted to Bioinformatics.
Douglas Bates, Martin Maechler, Ben Bolker, Steve Walker (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1-48.
Examples
# data set including the covariate to be reduced
DT <- data.frame(Env=sort(rep(paste("e",1:5,sep="_"), 2)))
DT
# GxE matrix
H0 <- matrix(rnorm(30), nrow=5, ncol=6) # 5 envs, 6 features
rownames(H0) <- paste("e",1:5,sep="_")
colnames(H0) <- paste("x",1:6,sep="_")
H0
# get compression
Z <- with(DT, redmm(Env, M = H0, nPC = 3))
Z
Replace the values of a vector
Description
Simple function to map a vector of values to another.
Usage
replaceValues(Source, Search, Replace)
Arguments
Source |
A source of values to change. |
Search |
The values to search. |
Replace |
The equivalent values to replace. |
Details
Simple function to map a vector of values another.
Value
- $res
new values from a second vector.
References
Covarrubias-Pazaran G (2016) Genome assisted prediction of quantitative traits using the R package sommer. PLoS ONE 11(6): doi:10.1371/journal.pone.0156744
Examples
x <- letters[1:5]
y <- 1:5
z <- rep(letters[1:5],3)
replaceValues(Source=z, Search=x, Replace=y)
reduced rank matrix
Description
rrm creates a reduced rank factor analytic matrix by selecting the n vectors of the L matrix of the Cholesky decomposition or the U vectors of the SVD decomposition (loadings or latent covariates) to create a new incidence matrix of latent covariates that can be used with the lmeb solver to fit random regressions on the latent covariates.
Usage
rrm(x=NULL, H=NULL, nPC=2, returnGamma=FALSE, cholD=TRUE)
Arguments
x |
vector of the dataset containing the variable to be used to form the incidence matrix. |
H |
two-way table of identifiers (rows; e.g., genotypes) by features (columns; e.g., environments) effects. Row names and column names are required. No missing data is allowed. |
nPC |
number of principal components to keep from the loadings matrix. |
returnGamma |
a TRUE/FALSE argument specifying if the function should return the matrix of loadings used to build the incidence matrix for the model. The default is FALSE so it returns only the incidence matrix. |
cholD |
a TRUE/FALSE argument specifying if the Cholesky decomposition should be calculated or the singular value decomposition should be used instead. |
Details
This implementation of a version of the reduced rank factor analytic models uses the so-called principal component (PC) models (Meyer, 2009) which assumes specific effects (psi) are equal to 0. The model is as follows:
y = Xb + Zu + e
where the variance of u ~ MVN(0, Sigma)
Sigma = (Gamma_t Gamma) + Psi
Extended factor analytic model:
y = Xb + Z(I Gamma)c + Zs + e = Xb + Z*c + Zs + e
where y is the response variable, X and Z are incidence matrices for fixed and random effects respectively, I is a diagonal matrix, Gamma are the factor loadings for c common factor scores, and s are the specific effects, e is the vector of residuals.
Reduced rank model:
y = Xb + Z(I Gamma)c + e = Xb + Z*c + e
which is equal to the one above but assumes specific effects = 0.
The algorithm in rrm is the following:
1) uses a wide-format table of timevar (m columns) by idvar (q rows) named H to form the initial variance-covariance matrix (Sigma) which is calculated as Sigma = H'H of dimensions m x m (column dimensions, e.g., environments x environments).
2) The Sigma matrix is then center and scaled.
3) A Cholesky (L matrix) or SVD decomposition (U D V') is performed in the Sigma matrix.
4) n vectors from L (when Cholesky is used) or U sqrt(D) (when SVD is used) are kept to form Gamma. Gamma = L[,1:nPc] or Gamma = U[,1:nPC]. These are the so-called loadings (L for all loadings, Gamma for the subset of loadings).
4) Gamma is used to form a new incidence matrix as Z* = Z Gamma
5) This matrix is later used for the REML machinery to be used with the usc (unstructured) or smm (diagonal) structures to estimate variance components and factor scores. The resulting BLUPs from the mixed model are the optimized factor scores. Pretty much as a random regression over latent covariates.
This implementation does not update the loadings (latent covariates) during the REML process, only estimates the REML factor scores for fixed loadings. This is different to other software (e.g., asreml) where the loadings are updated during the REML process as well.
BLUPs for genotypes in all locations can be recovered as:
u = Gamma * u_scores
The resulting loadings (Gamma) and factor scores can be thought as an equivalent to the classical factor analysis.
As an additional information, notice that we calculate the factor loadings from BLUPs and the mixed model only calculates the factor scores. This is different to the asreml software where loadings are calculated as variance components through REML. Despite the difference we have run multiple examples and simulations and the BLUPs from both approaches are on average >0.98 correlated so you can be confident that our approach is robust.
Value
- $Z
a incidence matrix Z* = Z Gamma which is the original incidence matrix for the timevar multiplied by the loadings.
- $Gamma
a matrix of loadings or latent covariates.
- $Sigma
the covariance matrix used to calculate Gamma.
Author(s)
Giovanny Covarrubias-Pazaran
References
Giovanny Covarrubias-Pazaran (2024). lme4breeding: enabling genetic evaluation in the age of genomic data. To be submitted to Bioinformatics.
Douglas Bates, Martin Maechler, Ben Bolker, Steve Walker (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1-48.
Meyer K (2009) Factor analytic models for genotype by environment type problems and structured covariance matrices. Genetics Selection Evolution, 41:21
Examples
# data set including the covariate to be reduced
DT <- data.frame(Env=sort(rep(paste("e",1:5,sep="_"), 2)))
DT
# GxE matrix
H0 <- matrix(rnorm(30), nrow=6, ncol=5) # 5 envs, 6 inds
colnames(H0) <- paste("e",1:5,sep="_")
rownames(H0) <- paste("x",1:6,sep="_")
H0
# get latent factors
Z <- with(DT, rrm(Env, H = H0, nPC = 3))
Z
Create a GE correlation matrix for simulation purposes.
Description
Makes a simple correlation matrix based on the number of environments and megaenvironments desired.
Usage
simGECorMat(nEnv,nMegaEnv,mu=0.7, v=0.2, mu2=0, v2=0.3)
Arguments
nEnv |
Number of environments to simulate. Needs to be divisible by the nMegaEnv argument. |
nMegaEnv |
Number of megaenvironments to simulate. |
mu |
Mean value of the genetic correlation within megaenvironments. |
v |
variance in the genetic correlation within megaenvironments. |
mu2 |
Mean value of the genetic correlation between megaenvironments. |
v2 |
variance in the genetic correlation between megaenvironments. |
Details
Simple simulation of a correlation matrix for environments and megaenvironments.
Value
G the correlation matrix
- $G
the correlation matrix
References
Covarrubias-Pazaran G (2016) Genome assisted prediction of quantitative traits using the R package sommer. PLoS ONE 11(6): doi:10.1371/journal.pone.0156744
Examples
simGECorMat(9,3)
Image of sparsity between two variables
Description
image for sparsity.
Usage
simage(data, Var1=NULL, Var2=NULL, ...)
Arguments
data |
model data of class |
Var1 |
variable to set in the x axis |
Var2 |
variable to set in the y axis |
... |
Further arguments to be passed to the image function. |
Value
vector of image
Author(s)
Giovanny Covarrubias
See Also
Examples
# row x column combinations existing
DT <- data.frame(Row=sort(rep(1:10,5)), Col=rep(1:5,10))
DT$Rowf <- as.factor(DT$Row)
DT$Colf <- as.factor(DT$Col)
DT <- DT[-1,] # remove one intentionally
simage(data=DT, Var1 = "Rowf", Var2 = "Colf")
# you can also inpect which cross combinations exist
Shortcut to sparse matrix image
Description
image for sparsity.
Usage
simage2(X, ...)
Arguments
X |
object of class |
... |
Further arguments to be passed to the image function. |
Value
vector of image
Author(s)
Giovanny Covarrubias
See Also
Examples
simage2(matrix(0:8,3,3))
Standardize a vector of values in range 0 to 1
Description
Simple function to map a vector of values to the range of 0 and 1 values to have a better behavior of the algorithm.
Usage
stan(x, lb=0, ub=1)
Arguments
x |
A vector of numeric values. |
lb |
Lower bound value to map the x values. |
ub |
Upper bound value to map the x values. |
Details
Simple function to map a vector of values to the range of 0 and 1 values to have a better behavior of the algorithm.
Value
- $res
new values in range 0 to 1
References
Covarrubias-Pazaran G (2016) Genome assisted prediction of quantitative traits using the R package sommer. PLoS ONE 11(6): doi:10.1371/journal.pone.0156744
Examples
x <- rnorm(20, 10, 3);x
stan(x)
Get Tensor Product Spline Mixed Model Incidence Matrices
Description
tps is a wrapper of tpsmmb function from the TPSbits package to avoid version dependencies but if you're using this function for your research please cite the TPSbits package. This function is internally used by the spl2Dmatrices function to get Tensor-Product P-Spline Mixed Model Bits
(design matrices) for use with sommer and lme4breeding.
Usage
tps(
columncoordinates,
rowcoordinates,
nsegments=NULL,
minbound=NULL,
maxbound=NULL,
degree = c(3, 3),
penaltyord = c(2, 2),
nestorder = c(1, 1),
asreml = "grp",
eigenvalues = "include",
method = "Lee",
stub = NULL
)
Arguments
columncoordinates |
A string. Gives the name of |
rowcoordinates |
A string. Gives the name of |
nsegments |
A list of length 2. Number of segments to split column and row ranges into, respectively (= number of internal knots + 1). If only one number is specified, that value is used in both dimensions. If not specified, (number of unique values - 1) is used in each dimension; for a grid layout (equal spacing) this gives a knot at each data value. |
minbound |
A list of length 2. The lower bound to be used for column and row dimensions respectively; default calculated as the minimum value for each dimension. |
maxbound |
A list of length 2. The upper bound to be used for column and row dimensions respectively; default calculated as the maximum value for each dimension. |
degree |
A list of length 2. The degree of polynomial spline to be used for column and row dimensions respectively; default=3. |
penaltyord |
A list of length 2. The order of differencing for column and row dimensions, respectively; default=2. |
nestorder |
A list of length 2. The order of nesting for column and row dimensions, respectively; default=1 (no nesting). A value of 2 generates a spline with half the number of segments in that dimension, etc. The number of segments in each direction must be a multiple of the order of nesting. |
asreml |
A string. Indicates the types of structures to be generated
for use in asreml models; default
|
eigenvalues |
A string. Indicates whether eigenvalues should be
included within the Z design matrices |
method |
A string. Method for forming the penalty; default= |
stub |
A string. Stub to be attached to names in the |
Value
List of length 7, 8 or 9 (according to the asreml and
eigenvalues parameter settings).
-
data= the input data frame augmented with structures required to fit tensor product splines inasreml-R. This data frame can be used to fit the TPS model.Added columns:
-
TP.col,TP.row= column and row coordinates -
TP.CxR= combined index for use with smooth x smooth term -
TP.C.nfor n=1:(diff.c) = X parts of column spline for use in random model (where diff.c is the order of column differencing) -
TP.R.nfor n=1:(diff.r) = X parts of row spline for use in random model (where diff.r is the order of row differencing) -
TP.CR.nfor n=1:((diff.c*diff.r)) = interaction between the two X parts for use in fixed model. The first variate is a constant term which should be omitted from the model when the constant (1) is present. If all elements are included in the model then the constant term should be omitted, eg.y ~ -1 + TP.CR.1 + TP.CR.2 + TP.CR.3 + TP.CR.4 + other terms... when
asreml="grp"or"sepgrp", the spline basis functions are also added into the data frame. Column numbers for each term are given in thegrplist structure.
-
-
mbflist= list that can be used in call to asreml (so long as Z matrix data frames extracted with right names, eg BcZ<stub>.df) -
BcZ.df= mbf data frame mapping onto smooth part of column spline, last column (labelledTP.col) gives column index -
BrZ.df= mbf data frame mapping onto smooth part of row spline, last column (labelledTP.row) gives row index -
BcrZ.df= mbf data frame mapping onto smooth x smooth term, last column (labelledTP.CxR) maps onto col x row combined index -
dim= list structure, holding dimension values relating to the model:-
"diff.c"= order of differencing used in column dimension -
"nbc"= number of random basis functions in column dimension -
"nbcn"= number of nested random basis functions in column dimension used in smooth x smooth term -
"diff.r"= order of differencing used in column dimension -
"nbr"= number of random basis functions in column dimension -
"nbrn"= number of nested random basis functions in column dimension used in smooth x smooth term
-
-
trace= list of trace values for ZGZ' for the random TPspline terms, where Z is the design matrix and G is the known diagonal variance matrix derived from eigenvalues. This can be used to rescale the spline design matrix (or equivalently variance components). -
grp= list structure, only added for settingsasreml="grp",asreml="sepgrp"orasreml="own". Forasreml="grp", provides column indexes for each of the 5 random components of the 2D splines. Forasreml="sepgrp", provides column indexes for each of the X and Z component matrices for the 1D splines, plus the composite smooth x smooth interaction term. Forasreml="own", provides column indexes for the composite random model. Dimensions of the components can be derived from the values in thedimitem. The Z terms are scaled by the associated eigenvalues wheneigenvalues="include", but not wheneigenvalues="omit". -
eigen= list structure, only added for option settingeigenvalues="omit". Holds the diagonal elements of the inverse variance matrix for the terms Xc:Zr (calleddiagr), Zc:Xr (calleddiagc) and Zc:Zr (calleddiagcr).
Creating color with transparency
Description
This function takes a color and returns the same with a certain alpha grade transparency.
Usage
transp(col, alpha=0.5)
Arguments
col |
Color to be used for transparency |
alpha |
Grade of transparency desired |
Details
No major details.
Value
If arguments are correctly specified the function returns:
- $res
A new color with certain grade of transparency
References
Robert J. Henry. 2013. Molecular Markers in Plants. Wiley-Blackwell. ISBN 978-0-470-95951-0.
Ben Hui Liu. 1998. Statistical Genomics. CRC Press LLC. ISBN 0-8493-3166-8.
Examples
transp("red", alpha=0.5)
Wald Test for Model Coefficients
Description
Computes a Wald \chi^2 test for 1 or more coefficients, given their variance-covariance matrix.
Usage
wald.test(Sigma, b, Terms = NULL, L = NULL, H0 = NULL,
df = NULL, verbose = FALSE)
## S3 method for class 'wald.test'
print(x, digits = 2, ...)
Arguments
Sigma |
A var-cov matrix, usually extracted from one of the fitting functions (e.g., |
b |
A vector of coefficients with var-cov matrix |
Terms |
An optional integer vector specifying which coefficients should be jointly tested, using a Wald
|
L |
An optional matrix conformable to |
H0 |
A numeric vector giving the null hypothesis for the test. It must be as long as |
df |
A numeric vector giving the degrees of freedom to be used in an |
verbose |
A logical scalar controlling the amount of output information. The default is |
x |
Object of class “wald.test” |
digits |
Number of decimal places for displaying test results. Default to 2. |
... |
Additional arguments to |
Details
The key assumption is that the coefficients asymptotically follow a (multivariate) normal distribution with mean =
model coefficients and variance = their var-cov matrix.
One (and only one) of Terms or L must be given. When L is given, it must have the same number of
columns as the length of b, and the same number of rows as the number of linear combinations of coefficients.
When df is given, the \chi^2 Wald statistic is divided by m = the number of
linear combinations of coefficients to be tested (i.e., length(Terms) or nrow(L)). Under the null
hypothesis H0, this new statistic follows an F(m, df) distribution.
Value
An object of class wald.test, printed with print.wald.test.
References
Diggle, P.J., Liang, K.-Y., Zeger, S.L., 1994. Analysis of longitudinal data. Oxford, Clarendon Press, 253 p.
Draper, N.R., Smith, H., 1998. Applied Regression Analysis. New York, John Wiley & Sons, Inc., 706 p.
Examples
data("mtcars")
DT <- mtcars
head(DT)
m3 <- lm(mpg ~ as.factor(carb),
data = DT)
# extrac coefficients and variance of coefficients
wald.test(b = m3$coefficients, Sigma = vcov(m3), Terms = 2)
# make a test for a specific linear combination
LL <- matrix(0,nrow=1, ncol=6)
LL[1,2] <- 1
LL[1,3] <- -1
LL
wald.test(b = m3$coefficients, Sigma = vcov(m3), L=LL)
# sommer has 'Ci' slot for Sigma and 'bu' for coefficients
# lme4breeding has condVar() function to extract Sigma for random effects and
# vcov() for fixed effect and has fixef() and ranef() to extract fixed and
# random coefficients.