| Type: | Package | 
| Title: | A few Useful Functions for Statisticians | 
| Version: | 2.4 | 
| Date: | 2025-03-24 | 
| Maintainer: | Hugo Varet <varethugo@gmail.com> | 
| Depends: | survival | 
| Imports: | stats, graphics, WriteXLS (≥ 2.3.0) | 
| Description: | Various useful functions for statisticians: describe data, plot Kaplan-Meier curves with numbers of subjects at risk, compare data sets, display spaghetti-plot, build multi-contingency tables... | 
| License: | GPL-2 | GPL-3 [expanded from: GPL (≥ 2)] | 
| RoxygenNote: | 7.3.2 | 
| NeedsCompilation: | no | 
| Packaged: | 2025-03-24 10:42:54 UTC; hvaret | 
| Author: | Hugo Varet [aut, cre] | 
| Repository: | CRAN | 
| Date/Publication: | 2025-03-24 11:10:02 UTC | 
A few useful functions for statisticians
Description
Various useful functions for statisticians: describe data, plot Kaplan-Meier curves with numbers of subjects at risk, compare data sets, display spaghetti-plot, build multi-contingency tables...
Author(s)
Hugo Varet
OR and their confidence intervals for logistic regressions
Description
Computes odd ratios and their confidence intervals for logistic regressions
Usage
IC_OR_glm(model, alpha = 0.05)
Arguments
model | 
 a   | 
alpha | 
 type I error, 0.05 by default  | 
Value
A matrix with the estimaed coefficients of the logistic model, their s.e., z-values, p-values, OR and CI of the OR
Author(s)
Hugo Varet
Examples
IC_OR_glm(glm(inherit~sex+age,data=cgd,family="binomial"))
RR and their confidence intervals for Cox models
Description
Computess risk ratios and their confidence intervals for Cox models
Usage
IC_RR_coxph(model, alpha = 0.05, sided = 2)
Arguments
model | 
 a   | 
alpha | 
 type I error, 0.05 by default  | 
sided | 
 1 or 2 for one or two-sided  | 
Value
A matrix with the estimaed coefficients of the Cox model, their s.e., z-values, p-values, RR and CI of the RR
Author(s)
Hugo Varet
Examples
cgd$time=cgd$tstop-cgd$tstart
IC_RR_coxph(coxph(Surv(time,status)~sex+age,data=cgd),alpha=0.05,sided=1)
Comparing two databases assumed to be identical
Description
Compares two data frames assumed to be identical, prints the differences in the console and also returns the results in a data frame
Usage
compare(d1, d2, id, file.export = NULL)
Arguments
d1 | 
 first data frame  | 
d2 | 
 second data frame  | 
id | 
 character string, primary key of the two data bases  | 
file.export | 
 character string, name of the XLS file exported  | 
Value
A data frame containing the differences between the two data bases
Author(s)
Hugo Varet
Examples
N=100
data1=data.frame(id=1:N,a=rnorm(N),
                        b=factor(sample(LETTERS[1:5],N,TRUE)),
                        c=as.character(sample(LETTERS[1:5],N,TRUE)),
                        d=as.Date(32768:(32768+N-1),origin="1900-01-01"))
data1$c=as.character(data1$c)
data2=data1
data2$id[3]=4654
data2$a[30]=NA
data2$a[31]=45
data2$b=as.character(data2$b)
data2$d=as.character(data2$d)
data2$e=rnorm(N)
compare(data1,data2,"id")
Convert variables of a data frame in factors
Description
Converts variables of a data frame in factors
Usage
convert_factor(data, vars)
Arguments
data | 
 the data frame in which we can find   | 
vars | 
 vector of character string of covariates  | 
Value
The modified data frame
Author(s)
Hugo Varet
Examples
cgd$steroids
cgd$status
cgd=convert_factor(cgd,c("steroids","status"))
Convert 0s in NA
Description
Converts 0s in NA
Usage
convert_zero_NA(data, vars)
Arguments
data | 
 the data frame in which we can find   | 
vars | 
 a character vector of covariates for which to transform 0s in   | 
Value
The modified data frame
Author(s)
Hugo Varet
Examples
my.data=data.frame(x=rbinom(20,1,0.5),y=rbinom(20,1,0.5),z=rbinom(20,1,0.5))
my.data=convert_zero_NA(my.data,c("y","z"))
Cut a quantitative variable in n equal parts
Description
Cuts a quantitative variable in n equal parts
Usage
cut_quanti(x, n, ...)
Arguments
x | 
 a numeric vector  | 
n | 
 numeric, the number of parts: 2 to cut according to the median, and so on...  | 
... | 
 other arguments to be passed in   | 
Value
A factor vector
Author(s)
Hugo Varet
Examples
cut_quanti(cgd$height, 3)
Making descriptive statistics
Description
Makes descriptive statistics of a data frame according to a group covariate or not, can export the results
Usage
desc(
  data,
  vars,
  group = NULL,
  whole = TRUE,
  vars.labels = vars,
  group.labels = NULL,
  type.quanti = "mean",
  test.quanti = "param",
  test = TRUE,
  noquote = TRUE,
  justify = TRUE,
  digits = 2,
  file.export = NULL,
  language = "english"
)
Arguments
data | 
 data frame to describe in which we can find   | 
vars | 
 vector of character strings of the covariates to describe  | 
group | 
 character string, statistics created for each levels of this covariate  | 
whole | 
 boolean,   | 
vars.labels | 
 vector of character string for sweeter names of covariates in the output  | 
group.labels | 
 vector of character string for sweeter column names  | 
type.quanti | 
 character string,   | 
test.quanti | 
 character string,   | 
test | 
 boolean,   | 
noquote | 
 boolean,   | 
justify | 
 boolean,   | 
digits | 
 number of digits of the statistics (mean, sd, median, min, max, Q1, Q3, %), p-values always have 3 digits  | 
file.export | 
 character string, name of the XLS file exported  | 
language | 
 character string,   | 
Value
A matrix of the descriptive statistics
Author(s)
Hugo Varet
Examples
cgd$steroids=factor(cgd$steroids)
cgd$status=factor(cgd$status)
desc(cgd,vars=c("center","sex","age","height","weight","steroids","status"),group="treat")
Plot a histogram with a boxplot below
Description
Plots a histogram with a boxplot below
Usage
hist_boxplot(
  x,
  freq = TRUE,
  density = FALSE,
  main = NULL,
  xlab = NULL,
  ymax = NULL,
  col.hist = "lightblue",
  col.boxplot = "lightblue",
  ...
)
Arguments
x | 
 a numeric vector  | 
freq | 
 boolean,   | 
density | 
 boolean,   | 
main | 
 character string, main title of the histogram  | 
xlab | 
 character string, label of the x axis  | 
ymax | 
 numeric value, maximum of the y axis  | 
col.hist | 
 color of the histogram  | 
col.boxplot | 
 color of the boxplot  | 
... | 
 other arguments to be passed in   | 
Value
None
Author(s)
Hugo Varet
Examples
par(mfrow=c(1,2))
hist_boxplot(rnorm(100),col.hist="lightblue",col.boxplot="red",freq=TRUE)
hist_boxplot(rnorm(100),col.hist="lightblue",col.boxplot="red",freq=FALSE,density=TRUE)
Multi cross table
Description
Builds a big cross table between several covariates
Usage
multi.table(data, vars)
Arguments
data | 
 the data frame in which we can find   | 
vars | 
 vector of character string of covariates  | 
Value
A matrix containing all the contingency tables between the covariates
Author(s)
Hugo Varet
See Also
Examples
multi.table(cgd,c("treat","sex","inherit"))
Kaplan-Meier plot with number of subjects at risk below
Description
Kaplan-Meier plot with number of subjects at risk below
Usage
plot_km(
  formula,
  data,
  test = TRUE,
  xy.pvalue = NULL,
  conf.int = FALSE,
  times.print = NULL,
  nrisk.labels = NULL,
  legend = NULL,
  xlab = NULL,
  ylab = NULL,
  ylim = c(0, 1.02),
  left = 4.5,
  bottom = 5,
  cex.mtext = par("cex"),
  lwd = 2,
  lty = 1,
  col = NULL,
  ...
)
Arguments
formula | 
 same formula than in   | 
data | 
 data frame with   | 
test | 
 boolean,   | 
xy.pvalue | 
 numeric vector of length 2, coordinates where to display the p-value of the log-rank test  | 
conf.int | 
 boolean,   | 
times.print | 
 numeric vector, times at which to display the numbers of subjects at risk  | 
nrisk.labels | 
 character vector to modify the levels of   | 
legend | 
 character string (  | 
xlab | 
 character string, label of the time axis  | 
ylab | 
 character string, label of the y axis  | 
ylim | 
 numeric vector of length 2, minimum and maximum of the y-axis  | 
left | 
 integer, size of left margin  | 
bottom | 
 integer, number of lines in addition of the table below the graph  | 
cex.mtext | 
 numeric, size of the numbers of subjects at risk  | 
lwd | 
 width of the Kaplan-Meier curve(s)  | 
lty | 
 type of the Kaplan-Meier curve(s)  | 
col | 
 color(s) of the Kaplan-Meier curve(s)  | 
... | 
 other arguments to be passed in   | 
Value
None
Author(s)
Hugo Varet
Examples
cgd$time=cgd$tstop-cgd$tstart
plot_km(Surv(time,status)~sex,data=cgd,col=c("blue","red"))
Spaghetti plot and plot of the mean at each time
Description
Spaghetti plot and plot of the mean at each time
Usage
plot_mm(
  formula,
  data,
  col.spag = 1,
  col.mean = 1,
  type = "spaghettis",
  tick.times = TRUE,
  xlab = NULL,
  ylab = NULL,
  main = "",
  lwd.spag = 1,
  lwd.mean = 4,
  ...
)
Arguments
formula | 
 
  | 
data | 
 data frame in which we can find   | 
col.spag | 
 vector of length   | 
col.mean | 
 vector of length   | 
type | 
 
  | 
tick.times | 
 boolean,   | 
xlab | 
 character sring, label of the time axis  | 
ylab | 
 character string, label of the y axis  | 
main | 
 character string, main title  | 
lwd.spag | 
 numeric, width of the spaghetti lines, 1 by default  | 
lwd.mean | 
 numeric, width of the mean lines, 4 by default  | 
... | 
 Other arguments to be passed in   | 
Value
None
Author(s)
Hugo Varet on Anais Charles-Nelson's idea
Examples
N=10
time=rep(1:4,N)
obs=1.1*time + rep(0:1,each=2*N) + rnorm(4*N)
my.data=data.frame(id=rep(1:N,each=4),time,obs,group=rep(1:2,each=N*2))
par(xaxs="i",yaxs="i")
plot_mm(obs~time+(group|id),my.data,col.spag=my.data$group,
        col.mean=c("blue","red"),type="both",main="Test plot_mm")
Plot a multi cross table
Description
Plots a multi cross table on a graph
Usage
plot_multi.table(data, vars, main = "")
Arguments
data | 
 the data frame in which we can find   | 
vars | 
 vector of character string of covariates  | 
main | 
 main title of the plot  | 
Value
None
Author(s)
Hugo Varet
See Also
Examples
plot_multi.table(cgd,c("treat","sex","inherit"))
Plot points with the corresponding linear regression line
Description
Plots points with the corresponding linear regression line
Usage
plot_reg(x, y, pch = 19, xlab = NULL, ylab = NULL, ...)
Arguments
x | 
 numeric vector  | 
y | 
 numeric vector  | 
pch | 
 type of points  | 
xlab | 
 character string, label of the x axis,   | 
ylab | 
 character string, label of the y axis,   | 
... | 
 other arguments to be passed in   | 
Value
None
Author(s)
Hugo Varet
Examples
plot_reg(cgd$age, cgd$height, xlab="Age (years)", ylab="Height")