\documentclass{article} \usepackage{hyperref} \usepackage[utf8]{inputenc} \begin{document} \SweaveOpts{concordance=TRUE} \title{survBootOutliers: An R package for outlier detection in survival analysis} \author{Joao Diogo Pinto, joao.pinto@tecnico.ulisboa.pt} \maketitle \tableofcontents %\VignetteIndexEntry{Introduction} \section{Introduction} This package provides three new outlier detection methods to perform outlier detection in a survival analysis context. The first method OSD, for One Step Deletion, is a sequential procedure that maximizes the c-index of a fitted Cox regression using a a greedy one-step-ahead search, in each step the observation that when removed maximizes the concordance increase is permanently deleted from the dataset, the algorithm ends until k observations are removed, these are considered the most outlying ones. The second and third methods are based on bootstrap methods. The second method BHT, for Bootstrap Hypothesis Testing, is based on creating B bootstrap samples for each observation that is removed from the dataset, then an hypothesis test is made for the B concordance variations to be larger than zero, the observations with the lowest p-values are considered the most outlying. The last method DBHT, for Dual Bootstrap Hypothesis Testing, draws 2B bootstrap samples for each observation, B samples with each observation absent, just like with BHT, the other B bootstrap samples are drawn with the observation under test being deliberately inserted in each of the bootstrap samples. The hypothesis test is different, the two histograms are tested for inequality, for non-outlying observations the histograms are expected to be similar but for outlying observtions the histograms drawn when the observation is absent is expected to have higher concordance on average. The package still provides three other methods considered more traditional based on Martingale-based residuals, Deviancre resiudals and Cox likelihood displacement. These methods are based on the Master Thesis at Instituto Superior Técnico, named "Outlier detection in survival analysis" evaluated in May 2015. The link for the full text is left here for more detail: \href{https://fenix.tecnico.ulisboa.pt/downloadFile/844820067124612/dissertacao.pdf}. \section{Example data} The well-known Worcester Heart Attack Study data is given as example and provided within the package; <>= library(survBootOutliers) whas100_data <- get.whas100.dataset() summary(whas100_data) @ %\VignetteIndexEntry{Examples} \section{Examples} \subsection{OSD} <>= whass <- get.whas100.dataset() outliers_osd <- survBootOutliers( surv.object=survival::Surv(time = whass$times,event = whass$status ), covariate.data = whass[,2:5] , sod.method = "osd", max.outliers = 10) print(outliers_osd) @ \subsection{BHT} <>= whas <- get.whas100.dataset() outliers_bht <- survBootOutliers( surv.object=survival::Surv(time = whas$times,event = whas$status ), covariate.data = whas[,2:5], sod.method = "bht", B = 10, B.N = 100) print(outliers_bht) @ \subsection{DBHT} <>= whas <- get.whas100.dataset() outliers_dbht <- survBootOutliers( surv.object=Surv(time = whas$times,event = whas$status ), covariate.data = whas[,2:5], sod.method = "dbht", B = 10, B.N = 100 ) print(outliers_dbht) @ \end{document}