| Version: | 1.0-5 |
| Date: | 2021-11-23 |
| Title: | Datasets, Functions and Scripts for Semiparametric Regression Supporting Harezlak, Ruppert & Wand (2018) |
| Maintainer: | Matt P. Wand <matt.wand@uts.edu.au> |
| Imports: | KernSmooth, grDevices, graphics, splines, stats |
| Suggests: | fields, lattice, mgcv |
| Description: | The book "Semiparametric Regression with R" by J. Harezlak, D. Ruppert & M.P. Wand (2018, Springer; ISBN: 978-1-4939-8851-8) makes use of datasets and scripts to explain semiparametric regression concepts. Each of the book's scripts are contained in this package as well as datasets that are not within other R packages. Functions that aid semiparametric regression analysis are also included. |
| License: | GPL-2 | GPL-3 [expanded from: GPL (≥ 2)] |
| LazyData: | TRUE |
| LazyDataCompression: | xz |
| NeedsCompilation: | no |
| Depends: | R (≥ 3.5.0) |
| Packaged: | 2021-11-23 04:45:28 UTC; mwand |
| Author: | Jaroslaw Harezlak [aut], David Ruppert [aut], Matt P. Wand [aut, cre] |
| Repository: | CRAN |
| Date/Publication: | 2021-11-23 07:50:15 UTC |
Berry, Carroll and Ruppert (2002) data
Description
The BCR data frame has data from a 6-week clinical trial of a drug versus a placebo. The data are subject to measurement error and have been transformed and rescaled from their original form. These data are analyzed in the 2002 ‘Journal of the American Statistical Association’ article by Berry, Carroll and Ruppert (full reference below).
Usage
data(BCR)
Format
This data frame contains the following columns:
statuscode for status of patient:
control = patient is in placebo group,
treatment=patient is in treatment group.wphysician-assessed score of patient's mental health at baseline.
yphysician-assessed score of patient's mental health at end of the study.
References
Berry, S.M., Carroll, R.J. and Ruppert, D. (2002). Bayesian smoothing and regression splines for measurement error problems.Journal of the American Statistical Association, 97, 160-169.
Examples
library(HRW) ; data(BCR)
if (require("lattice"))
print(xyplot(y ~ w|status,data = BCR))
Contraception use in Bangladesh
Description
The BanglaContrac data frame has multilevel data from the 1988 Bangladesh Fertility Survey. There are data on contraceptive use, number of children, age and urban-dweller status of 1,934 women grouped in 60 districts.
Usage
data(BanglaContrac)
Format
This data frame contains the following columns:
districtIDdistrict identification number.
usingContraceptionindicator that woman is using contraception:
1 = woman using contraception at time of survey,
0 = woman not using contraception at time of survey.childCodenumerical code for number of living children:
1 = no living children at time of survey,
2 = one living child at time of survey,
3 = two living children at time of survey,
4 = three or more living children at time of survey.ageMinusMeanage in years of woman at time of survey, with mean age subtracted
isUrbanindicator that woman lives in an urban region:
1 = woman an urban region dweller at time of survey,
0 = woman a rural region dweller at time of survey.
Source
Huq, N.M. and Cleland, J. (1990). Bangladesh Fertility Survey 1989 (Main Report). Dhaka, Bangladesh: National Institute of Population Research and Training.
Examples
library(HRW) ; data(BanglaContrac)
if (require("lattice"))
print(xyplot(jitter(usingContraception) ~ ageMinusMean|factor(districtID),
groups = childCode, data = BanglaContrac))
Mortgage applications in Boston
Description
The full dataset is in the data frame 'Hdma' within the R package Ecdat. This data frame is the subset of mortgage applications during the years 1998-1999.
Usage
data(BostonMortgages)
Format
A data frame with 2,380 observations on the following 13 variables:
dirratio of the debt payments to the total income.
hirratio of the housing expenses to the total income.
lvrratio of the loan size to the assessed value of property.
ccsa credit score ranging from 1 to 6, where a low value indicates low credit risk.
mcsa mortgage credit score from 1 to 6, where a low value indicates low credit risk.
pbcrdid the applicant have a public bad credit record?: a factor with levels
noandyes.dmiwas the applicant denied mortgage insurance?: a factor with levels
noandyes.selfwas the applicant self-employed?: a factor with levels
noandyes.singleis the applicant single?: a factor with levels
noandyes.uria1989 Massachusetts unemployment rate in the applicant's industry.
condominiumis the unit a condominium?: a factor with levels
noandyes.blackis the applicant black?: a factor with levels
noandyes.denywas the mortgage denied?: a factor with levels
noandyes.
References
Munnell, A. H., Tootell, G. M. B., Browne, L. E., and McEneaney, J. (1996). Mortgage lending in Boston: Interpreting HDMA data, American Economic Review, 25-53.
Examples
library(HRW) ; data(BostonMortgages)
BostonMortgages$denyBinary <- as.numeric(BostonMortgages$deny == "yes")
fit <- glm(denyBinary ~ black + dir + lvr + pbcr + self + single + as.factor(ccs),
family = binomial,data = BostonMortgages)
summary(fit)
Coronory heart disease
Description
The CHD data frame has data on coronary heart disease status, cholesterol level measurements and age. Further details are given in the 1996 ‘Journal of the American Statistical Association’ article by Roeder, Carroll and Lindsay (full reference below).
Usage
data(CHD)
Format
This data frame contains the following columns:
CHDindicator of coronary heart disease status:
0 = patient does not have coronary heart disease,
1 = patient has coronary heart disease.LDLlow density lipoprotein cholesterol level.
TCtotal cholesterol level.
ageage of patient in years.
Source
Roeder, K., Carroll, R.J. and Lindsay, B.G. (1996). A semiparametric mixture approach to case-control studies with errors in covariables. Journal of the American Statistical Association, 91, 722-732.
References
Richardson, S., Leblond, L., Jaussent, I. and Green, P.J. (2002). Mixture models in measurement error problems, with reference to epidemiological studies. Journal of the Royal Statistical Society, Series A, 163, 549-566.
Examples
library(HRW) ; data(CHD)
pairs(CHD)
Sydney real estate
Description
The SydneyRealEstate data frame has data on several variables concerning houses sold in Sydney, Australia,
during 2001.
Usage
data(SydneyRealEstate)
Format
This data frame contains the following columns:
logSalePricenatural logarithm of sale price in Australian dollars.
lotSizelot size in square meters but with some imputation.
longitudedegrees longitude of location of house.
latitudedegrees latitude of location of house.
saleDatesale date in dd/mm/yyy format.
saleQtrfinancial quarter in which sale took place.
infRateinflation rate measure as a percentage.
postCodefour-digit post code of the suburb in which the house located.
crimeDensitycrime density measure for the suburb in which the house is located.
crimeRatecrime rate measure for the suburb in which the house is located.
incomeaverage weekly income of the suburb in which the house is located.
distToBusStopdistance from house to the nearest bus stop (kilometers).
distToCoastlinedistance from house to the nearest coastline location (kilometers).
distToNatParkdistance from house to the nearest national park (kilometers).
distToParkdistance from house to the nearest park (kilometers).
distToRailLinedistance from house to the nearest railway line (kilometers).
distToRailStationdistance from house to the nearest railway station (kilometers).
distToHighwaydistance from house to the nearest highway (kilometers).
distToFreewaydistance from house to the nearest freeway (kilometers).
distToTunneldistance from house to the Sydney Harbour Tunnel (kilometers).
distToMainRoaddistance from house to the nearest main road (kilometers).
distToSealedRoaddistance from house to the nearest sealed road (kilometers).
distToUnsealedRoaddistance from house to the nearest sealed road (kilometers).
airNoiseaircraft noise exposure measure.
foreignerRatioproportion of foreigners in the suburb in which the house is located.
distToGPOdistance from the house to the General Post Office in Sydney's central business district (kilometers).
NOnitrous oxide level recorded at the air pollution monitoring station nearest to the house.
NO2nitrogen dioxide level recorded at the air pollution monitoring station nearest to the house.
ozoneozone level recorded at the air pollution monitoring station nearest to the house.
nephnephelometer suspended matter measurement recorded at the air pollution monitoring station nearest to the house.
PM10particulate matter with a diameter of under 10 micrometers leve recorded at the air pollution monitoring station nearest to the house.
SO2sulphur dioxide level recorded at the air pollution monitoring station nearest to the house.
distToAmbulancedistance from house to the nearest ambulance station (kilometers).
distToFactorydistance from house to the nearest factory (kilometers).
distToFerrydistance from house to the nearest ferry wharf (kilometers).
distToHospitaldistance from house to the nearest hospital (kilometers).
distToMedicaldistance from house to the nearest medical services (kilometers).
distToSchooldistance from house to the nearest school (kilometers).
distToUniversitydistance from house to the nearest university (kilometers).
References
Chernih, A. and Sherris, M. (2004). Geoadditive hedonic pricing models. Unpublished manuscript. University of New South Wales, Australia.
Examples
library(HRW) ; data(SydneyRealEstate)
## Not run:
for (colNum in setdiff((2:39),c(5,8)))
{
plot(jitter(SydneyRealEstate[,colNum]),SydneyRealEstate$logSalePrice,pch = ".",
xlab = names(SydneyRealEstate)[colNum],ylab = "log(sale price)",col = "blue")
readline("Hit Enter to continue.\n")
}
## End(Not run)
Polygonal boundary of Sydney
Description
A two-column data frame containing the (longitude,latitude) pairs for the vertices of a 202-sided polygon. The polygon was created manually using the HRW package function createBdry(). The polygon tightly encompasses the majority of the (longitude,latitude) data of the HRW package 'SydneyRealEstate' data frame and approximately corresponds to the residential parts of Sydney, Australia.
Usage
data(SydneyRealEstateBdry)
Format
A data frame with 203 observations on the following 2 variables (note that the first vertex is repeated at the end of the data frame):
longitudelongitudinal position of a vertex.
latitudelatitudinal position of the same vertex.
See Also
SydneyRealEstate
Examples
library(HRW) ; data(SydneyRealEstateBdry) ; data(SydneyRealEstate)
plot(SydneyRealEstate$longitude,SydneyRealEstate$latitude,cex = 0.1)
lines(SydneyRealEstateBdry,lwd = 5,col = "red")
U.S. Treasury rate
Description
One-month maturity U.S. Treasury rate during the period 2001-2013.
Usage
data(TreasuryRate)
Format
A data frame with 3,117 observations on each of the following 2 variables:
datedays from July 31, 2001 until July 10, 2013 with the occasional missing values due to holidays.
ratedaily one-month maturity U.S. Treasury rate.
Source
Federal Reserve Bank of St. Louis, U.S.A.
Examples
library(HRW) ; data(TreasuryRate)
TRdate <- as.Date(TreasuryRate$date,"%m/%d/%Y")[!is.na(TreasuryRate$rate)]
TRrate <- TreasuryRate$rate[!is.na(TreasuryRate$rate)]
plot(TRdate,TRrate,type = "l",bty = "l",xlab = "date",ylab = "U.S. Treasury rate")
Peak expiratory flow in Utah, U.S.A.
Description
The UtahPEF data frame data contains longitudinal data on peak expiratory flow, air pollution and temperature for a cohort of 41 schoolchildren in the Utah Valley, U.S.A., during 1990-1991.
Usage
data(UtahPEF)
Format
This data frame contains the following columns:
idnumschoolchild identification number.
devPEFdaily peak expiratory flow measurements for each schoolchild minus the overall average for that schoolchild.
PM10withMA55-day moving average of the concentration of particulate matter 10 micrometers or less in diameter.
lowTemplowest temperature in degrees Fahrenheit on the day of recording.
timeTrendday of the study on which the measurements were made.
Source
Pope, C.A., Dockery, D.W., Spengler, J.D. and Raizenne, M.E. (1991). Respiratory health and PM_10 pollution: a daily time series analysis. American Review of Respiratory Disease, 144, 668-674.
Examples
library(HRW) ; data(UtahPEF)
if (require("lattice"))
print(xyplot(devPEF ~ PM10withMA5|as.factor(idnum),data = UtahPEF))
Apartment prices in Warsaw, Poland
Description
'WarsawApts' is a subset of the data set 'apartments' in the R package PBImisc. This dataset contains the prices of the apartments which were sold in Warsaw, Poland, during the calendar years 2007 to 2009.
Usage
data(WarsawApts)
Format
A data frame with 409 observations on the following 6 variables:
surfacearea of the apartment in square meters.
districta factor corresponding to the district of Warsaw with levels
Mokotow,Srodmiescie,WolaandZoliborz.n.roomsnumber of rooms in the apartment.
floorfloor on which the apartment is located.
construction.dateyear that the apartment was constructed.
areaPerMzlotyarea in square meters per million zloty.
Source
The Polish real estate web-site https://www.oferty.net.
References
Biecek, P. (2016). PBImisc: A set of datasets in My Classes or in the Book 'Modele Liniowe i Mieszane w R, Wraz z Przykladami w Analizie Danych' 1.0.
Examples
library(HRW) ; data(WarsawApts)
x <- WarsawApts$construction.date
y <- WarsawApts$areaPerMzloty
plot(x,y,bty = "l",col = "dodgerblue")
if (require("mgcv"))
{
fitGAMcr <- gam(y ~ s(x,bs = "cr",k = 30))
xg <- seq(min(x),max(x),length = 1001)
fHatgGAMcr <- predict(fitGAMcr,newdata = data.frame(x = xg))
lines(xg,fHatgGAMcr,col = "darkgreen")
}
O'Sullivan spline design matrices
Description
Constructs a design matrix consisting of O'Sullivan cubic spline functions of an array of abscissae. Typicially the array corresponds to either observed values of a predictor or an abscissa grid for plotting purposes.
Usage
ZOSull(x,range.x,intKnots,drv = 0)
Arguments
x |
array of abscissae. |
range.x |
array of length 2 such that range.x[1] <= min(x) and range.x[2] >= max(x). |
intKnots |
ordered array of length smaller than that of x and consisting of unique numbers between min(x) and max(x) that specifies the positions of internal knots, that define the spline basis (see the Wand and Ormerod (2008) reference below for full mathematical details). |
drv |
either 0,1 or 2 with a default value of 0. If drv = 1 then the first derivatives of the O'Sullivan spline basis functions are computed instead. Similarly, if drv = 2 then the second derivatives are computed. |
Value
A matrix of with length(x) rows and (length(intKnots) + 2) columns with each column containing a separate O'Sullivan spline basis function (determined by range.x and intKnots) of the abscissae in x. The values of range.x and intKnots are included as attributes of the fit object.
Author(s)
Matt Wand matt.wand@uts.edu.au
References
O'Sullivan, F. (1986). A statistical perspective on ill-posed inverse problems (with discussion). Statistical Science, 1, 505-527.
Wand, M.P. and Ormerod, J.T. (2008). On semiparametric regression with O'Sullivan penalized splines. Australian and New Zealand Journal of Statistics. 50, 179-198.
Examples
library(HRW)
x <- WarsawApts$construction.date
a <- 1.01*min(x) - 0.01*max(x)
b <- 1.01*max(x) - 0.01*min(x) ; numIntKnots <- 23
intKnots <- quantile(unique(x),seq(0,1,length = (numIntKnots + 2))[-c(1,(numIntKnots + 2))])
xg <- seq(a,b,length = 1001)
Zg <- ZOSull(xg,range.x = c(a,b),intKnots = intKnots)
plot(0,type = "n",xlim = range(xg),ylim = range(Zg),
bty = "l",xlab = "construction date (year)",
ylab = "spline basis function")
for (k in 1:ncol(Zg)) lines(xg,Zg[,k],col = k,lwd = 2)
lines(c(min(xg),max(xg)),rep(0,2),col = "darkmagenta")
for (k in 1:numIntKnots)
points(intKnots[k],0,pch = 18,cex = 2,col = "darkmagenta")
Brain image
Description
The brainImage data frame corresponds a functional magnetic image of a coronal slice of a human brain.
The data are brain activity on a pixel array.
Usage
data(brainImage)
Format
This data frame is a 80 by 37 array. The entries correspond to brain activity in each of the corresponding pixels. The columns names are c1-c37. These names have no meaning, and are present to ensure that this data frame conforms with R data frame conventions.
Source
Landman, B.A., Huang, A.J., Gifford, A., Vikram, D.S., Lim, I.A.L, Farrell, J.A.D., Bogovic, J.A., Hua, J., Chen, M., Jarso, S., Smith, S.A., Joel, S., Mori, S., Pekar, J.J., Barker, P.B., Prince, J.L. and van Zijl, P.C.M. (2010). Multi-parametric neuroimaging reproducibility: A 3T resource study. NeuroImage, 54, 2854-2866.
Examples
library(HRW) ; data(brainImage)
image(as.matrix(brainImage))
Stock indices
Description
Daily returns on the Standard & Poors' 500 stock market index, daily rate of the U.S. Treasury bills, and 3 companies' stocks including Microsoft, the General Electric and the Ford Motor Company during the period from November 1, 1993 to March 31, 2003.
Usage
data(capm)
Format
A data frame with 2363 observations on the following 6 variables:
Close.tbillDaily Treasury bill rate expressed as a percentage.
Close.msftDaily closing price of the Microsoft stock.
Close.sp500Daily closing Standard and Poor's 500 index.
Close.geDaily closing price of the General Electric stock.
Close.fordDaily closing price of the Ford Motor Company stock.
DateDates from November 1, 1993 to March 31, 2003 (d-Mon-yy and dd-Mon-yr formats).
Source
Federal Reserve Bank of St. Louis U.S.A. (Treasury bill rates) and Yahoo Finance (stock prices).
Examples
# The Capital Asset Pricing Model (CAPM) states that the excess returns on a stock
# have a linear relationship with the returns on the market. This example investigates
# the CAPM for General Electric stock:
library(HRW) ; data(capm)
n <- dim(capm)[1]
riskfree <- capm$Close.tbill[2:n]/365
elrGE <- diff(log(capm$Close.ge)) - riskfree
elrSP500 <- diff(log(capm$Close.sp500)) - riskfree
plot(elrSP500,elrGE,col = "blue",cex = 0.2)
fitOLS <- lm(elrGE ~ elrSP500)
summary(fitOLS)
par(mfrow = c(2,2)) ; plot(fitOLS)
Cars purchased at auction
Description
The carAuction data frame has data on several variables concerning cars purchased at automobile auctions by automobile dealerships in the United States of America. The origin of these data is a classification competition titled “Don't Get Kicked!” that ran on the ‘kaggle’ platform (https://www.kaggle.com) during 2011-2012. Many of the variables in this data frame have been derived from those in the original data set from https://www.kaggle.com.
Usage
data(carAuction)
Format
This data frame contains the following columns:
RefIdunique number assigned to each vehicles.
IsBadBuyindicator that the vehicle purchased at auction by an automobile dealership has serious problems that hinder or prevent it being sold - a "bad buy":
1 = the vehicle is a "bad buy"
0 = the vehicle is a "good buy".
All other indicator variables are defined in this way.purchIn2010indicator that vehicle was purchased in 2010.
aucEqAdesaindicator that the auction provider at which the vehicle was purchased was Adesa.
aucEqManheimindicator that the auction provider at which the vehicle was purchased was Manheim.
vehYearEq03indicator that the manufacturer's year of the vehicle is 2003.
vehYearEq04indicator that the manufacturer's year of the vehicle is 2004.
vehYearEq05indicator that the manufacturer's year of the vehicle is 2005.
vehYearEq06indicator that the manufacturer's year of the vehicle is 2006.
vehYearEq07indicator that the manufacturer's year of the vehicle is 2007.
ageAtSaleage of the vehicle in years when sold.
makeEqChevroletindicator that the vehicle's manufacturer is Chevrolet.
makeEqFordindicator that the vehicle's manufacturer is Ford.
makeEqDodgeindicator that the vehicle's manufacturer is Dodge.
makeEqChryslerindicator that the vehicle's manufacturer is Chrysler.
trimEqBasindicator that the trim level of the vehicle is 'Bas'.
trimEqLSindicator that the trim level of the vehicle is 'LS'.
trimEqSEindicator that the trim level of the vehicle is 'SE'.
subModelEq4DSEDANindicator that the submodel of the vehicle is '4DSedan'.
subModelEq4DSEDANLSindicator that the submodel of the vehicle is '4DSedanLS'.
subModelEq4DSEDANSEindicator that the submodel of the vehicle is '4DSedanSE'.
colourEqSilverindicator that the vehicle color is silver.
colourEqWhiteindicator that the vehicle color is white.
colourEqBlueindicator that the vehicle color is blue.
colourEqGreyindicator that the vehicle color is grey.
colourEqBlackindicator that the vehicle color is black.
colourEqRedindicator that the vehicle color is red.
colourEqGoldindicator that the vehicle color is gold.
colourEqOrangeindicator that the vehicle color is orange.
transEqManualindicator that the vehicle has manual transmission.
wheelEqAlloyindicator that the vehicle has alloy wheels.
wheelEqCoversindicator that the vehicle has covered wheels.
odomReadthe vehicle's odometer reading in miles.
AmericanMadeindicator that the vehicle was manufactured in the United States of America.
otherAsianMadeindicator that the vehicle was manuctured in an Asian nation other than Japan or South Korea.
sizeEqTruckindicator that the size category of the vehicle is 'truck'.
sizeEqMediumindicator that the size category of the vehicle is 'medium'.
sizeEqSUVindicator that the size category of the vehicle is 'SUV'.
sizeEqCompactindicator that the size category of the vehicle is 'compact'.
sizeEqVanindicator that the size category of the vehicle is 'van'.
priceacquisition price for this vehicle in average condition at time of purchase in U.S. dollars.
purchInTexasindicator that the vehicle was purchased in Texas.
purchInFloridaindicator that the vehicle was purchased in Florida.
purchInCaliforniaindicator that the vehicle was purchased in California.
purchInNorthCarolinaindicator that the vehicle was purchased in North Carolina.
purchInArizonaindicator that the vehicle was purchased in Arizona.
purchInColoradoindicator that the vehicle was purchased in Colorado.
purchInSouthCarolinaindicator that the vehicle was purchased in South Carolina.
costAtPurchacquisition cost paid for the vehicle at time of purchase.
onlineSaleindicator that the vehicle was purchased online.
warrantyCostwarranty cost in U.S. dollars.
Source
The “Don't Get Kicked” competition, https://www.kaggle.com.
Examples
library(HRW) ; data(carAuction)
## Not run:
for (colNum in 3:10)
{
plot(jitter(carAuction[,colNum]),jitter(carAuction$IsBadBuy),pch = ".",
xlab = names(carAuction)[colNum],ylab = "is car a bad buy?",col = "blue")
readline("Hit Enter to continue.\n")
}
for (colNum in 11:51)
{
plot(jitter(carAuction[,colNum]),jitter(carAuction$IsBadBuy),pch = ".",
xlab = names(carAuction)[colNum],ylab = "is car a bad buy?",col = "blue")
readline("Hit Enter to continue.\n")
}
## End(Not run)
Coral organisms in French Polynesia
Description
The coral data frame has data on initial size, taxonomic identity and alive/death status of coral organisms in French Polynesia.
Usage
data(coral)
Format
This data frame contains the following columns:
siteDepthPeriodfactor with levels corresponding to a code for the site, depth and time period concerning where and when coral organisms were measured.
taxonfactor corresponding to an abbreviation for taxonomic identity:
ACR= Acropora,
POC= Pocillopora,
POR= Porites.logInitialSizePlusOneinitial size measurement of coral organism transformed according to the log(initial size + 1).
diedindicator that coral organism has died:
1 = coral organism has died,
0 = coral organism still alive.
Source
Kayal, M., Vercelloni, J., Wand, M.P. and Adjeroud, M. (2015). Searching for the best bet in life-strategy: a quantitative population dynamics approach to life history trade-offs in reef-building corals. Ecological Complexity, 23, 73-84.
Examples
library(HRW) ; data(coral)
if (require("lattice"))
print(xyplot(died ~ logInitialSizePlusOne|siteDepthPeriod*taxon,
data = coral,layout = c(15,5)))
Boundary polygon creation
Description
Create a boundary polygon corresponding nominally to the effective probability density support of a bivariate dataset via an interactive graphical interface and mouse (or, possibly, touchpad) posititionings and button clicks.
Usage
createBoundary(x,y)
Arguments
x |
vector containing the x-coordinates of a bivariate dataset. |
y |
vector containing the y-coordinates of a bivariate dataset. |
Details
After the bivariate dataset is displayed on the screen a boundary polygon is selected by performing left mouse (or, possibly, touchpad) clicks on the screen to specify vertex positions, and then moving around in a clockwise direction until the polygon is completed. Completion is achieved by clicking inside the red octagon surrounding the starting vertex.
Value
A two-column matrix containing the vertices of the selected boundary polygon.
Author(s)
M.P. Wand matt.wand@uts.edu.au
See Also
Examples
library(HRW)
x <- c(4,1,9,8,3,9,7)
y <- c(5,7,5,4,2,1,1)
## Not run: myBoundary <- createBoundary(x,y)
Female spinal bone mineral densities
Description
The femSBMD data frame has longitudinal data on spinal bone mineral, density, age and ethnicity for female youths from a study on bone mineral acquisition.
Usage
data(femSBMD)
Format
This data frame contains the following columns:
idnumidentification number unique to each subject.
spnbmdspinal bone mineral density in grams per square centimeter.
ageage of subject in years.
ethnicityfactor corresponding to the subject's ethnicity with levels
Asian,Black,HispanicandWhite.blackindicator of the subject being black:
0 = subject is black
1 = subject is not black.hispanicindicator of the subject being Hispanic:
0 = subject is Hispanic
1 = subject is not Hispanic.whiteindicator of the subject being white:
0 = subject is white
1 = subject is not white.
Source
Bachrach, L.K., Hastie, T., Wang, M.-C., Narasimhan, B. and Marcus, R. (1999). Bone mineral acquisition in healthy Asian, Hispanic, Black, and Caucasian youth: a longitudinal study. Journal of Clinical Endocrinology and Metabolism, 84, 4702-4712.
Examples
library(HRW) ; data(femSBMD)
if (require("lattice"))
print(xyplot(spnbmd ~ age|factor(ethnicity),groups = idnum,
data = femSBMD,type = "b"))
Adolescent somatic growth in Indiana, U.S.A.
Description
Data on adolescent somatic growth obtained from a study of the mechanisms of human hypertension development conducted at the Indiana University School of Medicine, Indianapolis, Indiana, U.S.A. The data are restricted to a subset of 216 adolescents in the original study who had at least 9 height measurements. There are a total of 4,123 height measurements taken approximately every 6 months.
Usage
data(growthIndiana)
Format
A data frame with 4,123 observations on the following 5 variables:
idnumidentification numbers of the 216 adolescents.
heightheight in centimeters.
ageage in years.
maleindicator of the adolescent being male:
1 = adolescent is male,
0 = adolescent is female.blackindicator of the adolescent being black:
1 = adolescent is black,
0 = adolescent is not black.
References
Pratt, J.H., Jones, J.J., Miller, J.Z., Wagner, M.A. and Fineberg, N.S. (1989). Racial differences in aldosterone excretion and plasma aldosterone concentrations in children. New England Journal of Medicine, 321, 1152-1157.
Examples
library(HRW) ; data(growthIndiana)
growthINblackMales <- growthIndiana[(growthIndiana$male == 1) & (growthIndiana$black == 1),]
if (require("lattice"))
xyplot(height ~ age|factor(idnum),data = growthINblackMales)
Respiratory infection in Indonesian children
Description
Indonesian Children's Health Study of respiratory infections for a cohort of 275 Indonesian children. The data are longitudinal with each child having between 1 and 6 repeated measurements.
Usage
data(indonRespir)
Format
A data frame with 1200 observations on the following 12 variables:
idnumchild identification number.
respirInfecindicator of presence of resipiratory infection.
ageage of the child in years.
vitAdeficindicator of Vitamin A deficiency:
1 = the child had Vitamin A deficiency,
0 = the child did not have Vitamin A deficiency.femaleindicator of child being female:
1 = the child is female,
0 = the child is male.heightheight of the child in centimeters.
stuntedindicator of the child being "short for his/her age":
1 = the child is "short for his/her age",
0 = the child is not "short for his/her age"visit2indicator that the child had exactly 2 clinical visits:
1 = the exact number of clinical visits was 2,
0 = the exact number of clinical visits was not 2.visit3indicator that the child had exactly 3 clinical visits:
1 = the exact number of clinical visits was 3,
0 = the exact number of clinical visits was not 3.visit4indicator that the child had exactly 4 clinical visits:
1 = the exact number of clinical visits was 4,
0 = the exact number of clinical visits was not 4.visit5indicator that the child had exactly 5 clinical visits:
1 = the exact number of clinical visits was 5,
0 = the exact number of clinical visits was not 5.visit6indicator that the child had exactly 6 clinical visits:
1 = the exact number of clinical visits was 6,
0 = the exact number of clinical visits was not 6.
Source
Sommer, A. (1982). Nutritional Blindness. New York: Oxford University Press.
References
Diggle, P., Heagerty, P., Liang, K.-L. and Zeger, S. (2002). Analysis of Longitudinal Data (Second Edition). Oxford: Oxford University Press.
Examples
library(HRW) ; data(indonRespir)
if (require("mgcv"))
{
fit <- gamm(respirInfec ~ s(age) + vitAdefic + female + height
+ stunted + visit2 + visit3 + visit4 + visit5 + visit6,
random = list(idnum = ~1),family = binomial,data = indonRespir)
summary(fit$gam)
}
Light detection and ranging
Description
The lidar data frame has 221 pairs from a LIght Detection And Ranging (LIDAR) experiment.
Usage
data(lidar)
Format
This data frame contains the following columns:
rangedistance traveled before the light is reflected back to its source.
logratiologarithm of the ratio of received light from two laser sources.
Source
Sigrist, M. (Ed.) (1994). Air Monitoring by Spectroscopic Techniques (Chemical Analysis Series, vol. 197). New York: Wiley.
Examples
library(HRW) ; data(lidar)
plot(lidar$range,lidar$logratio)
Ozone levels in midwest U.S.A.
Description
This dataset is a subset of the ozone2 dataset in the fields package. It contains the 8-hour average ozone concentration at 147 sites in the midwest region of the U.S.
Usage
data(ozoneSub)
Format
A data frame with 147 observations on the following 3 variables:
longitudeobservation longitude.
latitudeobservation latitude.
ozoneozone level.
Source
Aerometric Information Retrieval System, the U.S. Environmental Protection Agency air quality data base.
References
Nychka, D., Furrer, R., Paige, J. and Sain, S. (2017). fields: Tools for spatial data. R package version 9.0. https://www.r-project.org.
Examples
library(HRW) ; data(ozoneSub)
if (require("mgcv"))
{
fit.ozone.mgcv.tp <- gam(ozone ~ s(longitude,latitude,bs = "tp"),
data = ozoneSub,method = "REML")
plot(fit.ozone.mgcv.tp,scheme = 2,
main = "ozone concentration",bty = "l")
points(ozoneSub$longitude,ozoneSub$latitude)
}
if (require("fields"))
US(add = TRUE,lwd = 2)
Flow cytometric measurements on plankton organisms
Description
The plankton data frame has data on six flow cytometric measurements for 400 plankton organisms categorized into 5 different species. The data are synthetic and were generated to test various machine learning algorithms for plankton species classification. More details are given in the 2001 ‘Cytometry’ article by Boddy, Wilkins and Morris (full reference below).
Usage
data(plankton)
Format
This data frame contains the following columns:
timeFlighttime of flight.
forwScattforward-scatter.
sideScattside-scatter.
redFluorBlueLightred fluorescence under blue light.
greenFluorBlueLightgreen fluorescence under blue light.
redFluorRedLightred fluorescence under red light.
speciesname of the plankton species, which is either Dunaliella, Hemiselmis, Isochrysis, Pavlova or Pyramimonas.
Source
Boddy, L., Wilkins, M.F. and Morris, C.W. (2001). Pattern recognition in flow cytometry. Cytometry, 44, 195-209.
Examples
library(HRW) ; data(plankton)
pointCols <- c("red","blue","green","orange","purple")
pairs(plankton[,1:6],col = pointCols[plankton$species],pch = ".")
Points inside/outside polygon determination
Description
Determination of whether each member of a set of bivariate points are inside a polygon.
Usage
pointsInPoly(pointsCoords,polygon)
Arguments
pointsCoords |
two-column matrix with each row specifying the (x,y) coordinates of a point. |
polygon |
two-column matrix with each row specifying the (x,y) coordinates of a vertex of a polygon. |
Details
Geometric results are used to determine whether each of a set of bivariate points is inside or outside a polygon. A Boolean vector of indicators of whether or not each point is inside the polygon is returned.
Value
A Boolean array with length equal to the number of rows in ‘pointsCoords’ corresponding to whether or not each of the corresponding points in 'pointCoords' are inside 'polygon'.
Author(s)
M.P. Wand matt.wand@uts.edu.au
See Also
Examples
library(HRW)
myPolygon <- rbind(c(1,9),c(8,8),c(9,3),c(3,2),c(1,9))/10
plot(0:1,0:1,type = "n") ; lines(myPolygon)
xyMat <- cbind(runif(10),runif(10))
inPoly <- pointsInPoly(xyMat,myPolygon) ; print(inPoly)
points(xyMat[,1],xyMat[,2],col = as.numeric(inPoly) + 2)
Protein intake dietery study
Description
The protein data frame has longitudinal data on protein intake, body mass index and age of subjects in a dietary study.
Usage
data(protein)
Format
This data frame contains the following columns:
idnumidentification number unique to each subject.
proteinBioMlogarithm of intake of protein as measured by the biomarker urinary.
ageage of subject in years.
BMIbody mass index.
proteinRecalllogarithm of intake of protein as measured by a 24-hour recall instrument.
femaleindicator that subject is female:
1=subject is female,
0=subject is male.
Source
Kipnis, V., Subar, A.F., Midthune, D., Freedman, L.S., Ballard-Barbash, R., Troiano, R., Bingham, S., Schoeller, D.A., Schatzkin, A. and Carroll, R.J. (2003). The structure of dietary measurement error: results of the OPEN biomarker study. American Journal of Epidemiology, 158, 14-21.
Examples
library(HRW) ; data(protein)
if (require("lattice"))
print(xyplot(proteinBioM ~ BMI|factor(female),groups = idnum,
data = protein, type = "b"))
Ragweed pollen in Kalamazoo, U.S.A.
Description
The ragweed data frame has data on ragweed levels and meteorological variables for 334 days in Kalamazoo, Michigan, U.S.A.
Usage
data(ragweed)
Format
This data frame contains the following columns:
pollenCountragweed pollen count (grains per cubic metre).
yearone of 1991, 1992, 1993 or 1994.
dayInSeasonday number in the current ragweed pollen season.
temperaturetemperature for the corresponding day (degrees Fahrenheit).
temperatureResidualresidual from fitting a 5 effective degrees of freedom smoothing splines to temperature versus day number for each annual ragweed pollen season (degrees Fahrenheit).
rainindicator of significant rain on the corresponding day:
1=at least 3 hours of steady or brief but intense rain,
0=otherwise.windSpeedwind speed for the corresponding day (knots).
Source
Stark, P. C., Ryan, L. M., McDonald, J. L. and Burge, H. A. (1997). Using meteorologic data to model and predict daily ragweed pollen levels. Aerobiologia, 13, 177-184.
References
Ruppert, D., Wand, M.P. and Carroll, R.J. (2003). Semiparametric Regression Cambridge University Press.
Examples
library(HRW) ; data(ragweed)
pairs(ragweed,pch = ".")
Scallop abundance off Long Island, U.S.A.
Description
The scallop data frame has 148 triplets concerning scallop abundance; based on a 1990 survey cruise in the Atlantic continental shelf off Long Island, New York, U.S.A.
Usage
data(scallop)
Format
This data frame contains the following columns:
latitudedegrees latitude (north of the Equator).
longitudedegrees longitude (west of Greenwich).
totalCatchtotal size of scallop catch at location specified by 'latitude' and 'longitude'.
Source
Ecker, M.D. and Heltshe, J.F. (1994). Geostatistical estimates of scallop abundance. In Case Studies in Biometry. Lange, N., Ryan, L., Billard, L., Brillinger, D., Conquest, L. and Greenhouse, J. (eds.) New York: John Wiley & Sons, 107-124.
References
Ruppert, D., Wand, M.P. and Carroll, R.J. (2003). Semiparametric Regression. Cambridge University Press.
Examples
library(HRW) ; data(scallop)
pairs(scallop)
School results in the United Kingdom
Description
The schoolResults data frame has multilevel data school results and gender for 1,905 school children from 73 schools in United Kingdom.
Usage
data(schoolResults)
Format
This data frame contains the following columns:
schoolIDschool identification number.
studentIDstudent identification number.
- female
indicator that child is female:
1=child is female,
0=child is male.writtenScorescore on traditional written examination paper out of a total of 160.
courseScorescore from projects undertaken during the course and marked by the student's own teacher, out of a total of 108.
Source
Creswell, M. (1991). A multilevel bivariate model. In Data Analysis with ML3 (eds. Prosser, R., Rasbash, J. and Goldstein, H.) London: Institute of Education, London.
Examples
library(HRW) ; data(schoolResults)
if (require("lattice"))
print(xyplot(writtenScore ~ courseScore|factor(schoolID),
groups = female,data = schoolResults))
Summarizes Markov chain Monte Carlo (MCMC) samples both graphically and numerically
Description
Given a set of MCMC for possibly several parameters the following summaries are produced: 1. a trace (time series) plot of each MCMC, 2. a plot of each MCMC sample against the 1-lagged sample, 3. estimated autocorrelatio function (acf), 4. Brooks-Gelman-Rubin (BGR) diagnostic plot in cases where multiple chains are inputted, 5. kernel ddensity estimate of the posterior density function based on the MCMC sample, 6. numerical summary consisting of the MCMC-based estimates of posterior means and credible sets for each parameter.
Usage
summMCMC(xList,EPSfileName,PDFfileName,plotInd = 1,parNames,columnHeadings,
colourVersion = TRUE,credLevel = 0.95,columnHeadCex = NULL,paletteNum = 1,
numerSummCex = NULL,BGRsttPos = 10,BGRyRange = c(0.95,1.25),BGRtickPos = 1.2,
BGRlogTransf,BGRlogitTransf,KDExlim,KDEvertLine = TRUE,KDEvertLineCol = "black",
addTruthToKDE = NULL)
Arguments
xList |
list of matrices, where each matrix corresponds to a different chain, and the columns of each matrix correspond to different parameters. The matrices each have dimension "numMCMC" by "numParms"; where "numMCMC" is the size of the MCMC sample and "numParms" is the number of parameters being summarized. |
EPSfileName |
filename if the summary is to be saved as a (encapsulated) Postscript file. If this argument and 'PDFfileName' are both not specified then the summary is printed to the screen. |
PDFfileName |
filename if the summary is to be saved as a PDF file. If this argument and 'EPSfileName' are both not specified then the summary is printed to the screen. |
plotInd |
if "numChains" exceeds 1 then this indicates which chain is summarized in the non-BGR panels. The BGR panels are Brooks-Gelman-Rubin diagnostic plots are use all chains. The default value is 1. |
parNames |
list containing a vector of character strings for the parameter names. The maximum length of the vector is 3. |
columnHeadings |
vector containing column headings. The default is: c("parameter","trace","lag 1","acf","BGR","density","summary"). |
colourVersion |
logical flag indicating if summary should be in colour. The default is TRUE. |
credLevel |
number between 0 and 1 specifying the credible set level. The default is 0.95. |
columnHeadCex |
positive number specifying character expansion factor for the column headings. |
paletteNum |
either 1 or 2 to specify the colour palette to be used. |
numerSummCex |
positive number specifying character expansion factor for the numerical summary (last column). |
BGRsttPos |
starting position for the Brooks-Gelman-Rubin plots. The default value is 10. |
BGRyRange |
vertical axis limits for the Brooks-Gelman-Rubin plots. The default value is c(0.95,1.25). |
BGRtickPos |
position of tick mark on vertical axis for the Brooks-Gelman-Rubin plots. The default value is 1.2. |
BGRlogTransf |
vector containing indices of those parameters for which the Brooks-Gelman-Rubin plots should be done on a logarithmic scale. |
BGRlogitTransf |
vector containing indices of those parameters for which the Brooks-Gelman-Rubin plots should be done on a logit scale. |
KDExlim |
list of vectors of length 2 specifying the horizontal axis limits for the kernel density estimates. |
KDEvertLine |
logical flag indicating if a vertical line at zero should be added to the kernel density estimates. The default value is TRUE. |
KDEvertLineCol |
Colour of the vertical line at zero for kernel density estimates. The default value is "black". |
addTruthToKDE |
Vector indicating ‘true’ values of parameters. The default value is NULL. If 'addTruthToKDE' is non-NULL then dashed vertical lines corresponding to true values (if known from simulation) are added. |
Author(s)
Matt Wand matt.wand@uts.edu.au
Examples
library(HRW)
xListSingleChain <- list(cbind(rnorm(100),rnorm(100),rnorm(100),rnorm(100)))
summMCMC(xListSingleChain,parNames = list("par1","par2","par3","par4"))
xListMultipleChains <- list(chain1 = cbind(rnorm(100),rnorm(100),rnorm(100),rnorm(100)),
chain2 = cbind(rnorm(100),rnorm(100),rnorm(100),rnorm(100)))
summMCMC(xListMultipleChains,parNames = list("par1","par2","par3","par4"))
U.S., European and Japanese yield curves
Description
U.S., European and Japanese yield curves. These are functions of maturity. This dataset has 91 columns. The first column is the date, columns 2 to 31 are European yields at maturities from 1 to 30 years, columns 32 to 61 are Japanese yields at these maturities, and columns 62 to 91 are U.S. yields at the same maturities.
Usage
data(yields)
Format
A data frame with 1565 observations on the following 91 variables:
datedate when the yield is measured.
EU01European yield at a maturity of 1 year.
EU02European yield at a maturity of 2 years.
EU03European yield at a maturity of 3 years.
EU04European yield at a maturity of 4 years.
EU05European yield at a maturity of 5 years.
EU06European yield at a maturity of 6 years.
EU07European yield at a maturity of 7 years.
EU08European yield at a maturity of 8 years.
EU09European yield at a maturity of 9 years.
EU10European yield at a maturity of 10 years.
EU11European yield at a maturity of 11 years.
EU12European yield at a maturity of 12 years.
EU13European yield at a maturity of 13 years.
EU14European yield at a maturity of 14 years.
EU15European yield at a maturity of 15 years.
EU16European yield at a maturity of 16 years.
EU17European yield at a maturity of 17 years.
EU18European yield at a maturity of 18 years.
EU19European yield at a maturity of 19 years.
EU20European yield at a maturity of 20 years.
EU21European yield at a maturity of 21 years.
EU22European yield at a maturity of 22 years.
EU23European yield at a maturity of 23 years.
EU24European yield at a maturity of 24 years.
EU25European yield at a maturity of 25 years.
EU26European yield at a maturity of 26 years.
EU27European yield at a maturity of 27 years.
EU28European yield at a maturity of 28 years.
EU29European yield at a maturity of 29 years.
EU30European yield at a maturity of 30 years.
JP01Japanese yield at a maturity of 1 year.
JP02Japanese yield at a maturity of 2 years.
JP03Japanese yield at a maturity of 3 years.
JP04Japanese yield at a maturity of 4 years.
JP05Japanese yield at a maturity of 5 years.
JP06Japanese yield at a maturity of 6 years.
JP07Japanese yield at a maturity of 7 years.
JP08Japanese yield at a maturity of 8 years.
JP09Japanese yield at a maturity of 9 years.
JP10Japanese yield at a maturity of 10 years.
JP11Japanese yield at a maturity of 11 years.
JP12Japanese yield at a maturity of 12 years.
JP13Japanese yield at a maturity of 13 years.
JP14Japanese yield at a maturity of 14 years.
JP15Japanese yield at a maturity of 15 years.
JP16Japanese yield at a maturity of 16 years.
JP17Japanese yield at a maturity of 17 years.
JP18Japanese yield at a maturity of 18 years.
JP19Japanese yield at a maturity of 19 years.
JP20Japanese yield at a maturity of 20 years.
JP21Japanese yield at a maturity of 21 years.
JP22Japanese yield at a maturity of 22 years.
JP23Japanese yield at a maturity of 23 years.
JP24Japanese yield at a maturity of 24 years.
JP25Japanese yield at a maturity of 25 years.
JP26Japanese yield at a maturity of 26 years.
JP27Japanese yield at a maturity of 27 years.
JP28Japanese yield at a maturity of 28 years.
JP29Japanese yield at a maturity of 29 years.
JP30Japanese yield at a maturity of 30 years.
US01U.S. yield at a maturity of 1 year.
US02U.S. yield at a maturity of 2 years.
US03U.S. yield at a maturity of 3 years.
US04U.S. yield at a maturity of 4 years.
US05U.S. yield at a maturity of 5 years.
US06U.S. yield at a maturity of 6 years.
US07U.S. yield at a maturity of 7 years.
US08U.S. yield at a maturity of 8 years.
US09U.S. yield at a maturity of 9 years.
US10U.S. yield at a maturity of 10 years.
US11U.S. yield at a maturity of 11 years.
US12U.S. yield at a maturity of 12 years.
US13U.S. yield at a maturity of 13 years.
US14U.S. yield at a maturity of 14 years.
US15U.S. yield at a maturity of 15 years.
US16U.S. yield at a maturity of 16 years.
US17U.S. yield at a maturity of 17 years.
US18U.S. yield at a maturity of 18 years.
US19U.S. yield at a maturity of 19 years.
US20U.S. yield at a maturity of 20 years.
US21U.S. yield at a maturity of 21 years.
US22U.S. yield at a maturity of 22 years.
US23U.S. yield at a maturity of 23 years.
US24U.S. yield at a maturity of 24 years.
US25U.S. yield at a maturity of 25 years.
US26U.S. yield at a maturity of 26 years.
US27U.S. yield at a maturity of 27 years.
US28U.S. yield at a maturity of 28 years.
US29U.S. yield at a maturity of 29 years.
US30U.S. yield at a maturity of 30 years.
Source
European Central Bank (European yields), Japanese Ministry of Finance (Japanese yields) and U.S. Federal Reserve Board (U.S. yields).
Examples
library(HRW) ; data(yields)
t <- 1:30 ; yieldsCleaned <- na.omit(yields)[,-1]
plot(t,yieldsCleaned[1,61:90], type="l",ylim = c(0,6),lwd = 2,
xlab = "maturity (years)",ylab = "yield",
bty = "l",cex.lab = 1.5,cex.axis = 1.5)
for (i in 2:14) lines(t,yieldsCleaned[100*i+1,61:90],col = i,lwd = 2)