# survival analysis example

It may deal with survival, such as the time from diagnosis of a disease to death, but can refer to any time dependent phenomenon, such as time in hospital or time until a disease recurs. For women who do not relapse, we record the number of weeks from randomization that they are alcohol free. Mantel, N. Evaluation of survival data and two new rank order statistics arising in its consideration. In survival analysis we use the term ‘failure’ to de ne the occurrence of the event of interest (even though the event may actually be a ‘success’ such as recovery from therapy). For example, in a clinical trial with a survival outcome, we might be interested in comparing survival between participants receiving a new drug as compared to a placebo (or standard therapy). Using nonparametric methods, we estimate and plot the survival distribution or the survival curve. There are a number of popular parametric methods that are used to model survival data, and they differ in terms of the assumptions that are made about the distribution of survival times in the population. Follow up time is measured from time zero (the start of the study or from the point at which the participant is considered to be at risk) until the event occurs, the study ends or the participant is lost, whichever comes first. The latter is often termed disease-free survival. The examples … The median survival is 9 years (i.e., 50% of the population survive 9 years; see dashed lines). Similarly, exp(0.67958) = 1.973. 96,97 In the example, mothers were asked if they would give the presented samples that had been stored for different times to their children. We then sum the number at risk, Nt , in each group over time to produce ΣNjt , the number of observed events Ot , in each group over time to produce ΣOjt , and compute the expected number of events in each group using Ejt = Njt*(Ot/Nt) at each time. Survival analysis is a branch of statistics for analyzing the expected duration of time until one or more events happen, such as death in biological organisms and failure in mechanical systems. This is an applied handbook on survival analysis (also known as reliability or duration analysis) with annotated examples using S-Plus or R. This is the first book ever explaining survival analysis by example and is intended for users at all levels. In the previous examples, we considered the effect of risk factors measured at the beginning of the study period, or at baseline, but there are many applications where the risk factors or predictors change over time. Women are recruited into the study at approximately 18 weeks gestation and followed through the course of pregnancy to delivery (approximately 39 weeks gestation). Thus, the predictors have a multiplicative or proportional effect on the predicted hazard. The survival probabilities for the chemotherapy after surgery group are higher than the survival probabilities for the chemotherapy before surgery group, suggesting a survival benefit. Cox proportional hazards model! The critical value is 3.84 and the decision rule is to reject H0 if Χ2 > 3.84. survival analysis. We will be looking at child behaviors (focused distraction and bids about the demand of the wait task). Six participants in the chemotherapy before surgery group die over the course of follow-up as compared to three participants in the chemotherapy after surgery group. The log rank test is a popular test to test the null hypothesis of no difference in survival between two or more independent groups. What this means is that their probability of being censored is unrelated to the probability of having an event. The Nature of Survival Data: Censoring I Survival-time data have two important special characteristics: (a) Survival times are non-negative, and consequently are usually positively skewed. An analysis is conducted to investigate differences in all-cause mortality between men and women participating in the Framingham Heart Study adjusting for age. Specifically, we need to examine: Determine how much of sample has right censoring. For benchtop testing, we wait for fracture or some other failure. The observed and expected numbers of events are computed for each event time and summed for each comparison group over time. As noted, there are several variations of the log rank statistic. For age, exp(0.11149) = 1.118. There can be one record per subject or, if covariates vary over time, multiple records. Thus, it is important to record the entry time so that the follow up time is accurately measured. In the models we include the indicators for overweight and obese and consider normal weight the reference group. The figure below summarizes the estimates and confidence intervals in the figure below. Fit Cox Proportional Hazards Model - Adding Time-Invariant predictors. For interpretability, we compute hazard ratios by exponentiating the parameter estimates. Available as an e-book, in full, from the PSU library, Great introductory book covering several applications of survival analysis and gives examples using R code, Also available as an e-book, in full, from the PSU library. The hazard ratio can be estimated from the data we organize to conduct the log rank test. Seeking Survivors: Introduction to Survival Analysis. In each of these studies, a minimum age might be specified as a criterion for inclusion in the study. In each of these instances, we have incomplete follow-up information. Non-significant results indicate proportional hazards. Originally the analysis was concerned with time from treatment until death, hence the name, but survival analysis is applicable to many areas as well as mortality. Hosmer, DW and Lemeshow, S. Applied Survival Analysis: Regression Modeling of Time to Event Data. Survival analysis isn’t just a single model. Interested readers should see Kalbfleisch and Prentice10 for more details. These estimates of survival probabilities at specific times and the median survival time are point estimates and should be interpreted as such. Next, we will include mother’s graduation status to test our research questions. The hazard ratio is the ratio of these two expected hazards: h0(t)exp (b1a)/ h0(t)exp (b1b) = exp(b1(a-b)) which does not depend on time, t. Thus the hazard is proportional over time. This is a semi-parametric model because the baseline hazard h0(t) can take any form, then, the covariates enter the model linearly. Survival analysis is concerned with the time elapsed from a known origin to either an event or a censoring point. Data for Log Rank Test to Compare Survival Curves. For example, in a clinical trial with survival time as the outcome, if the hazard ratio is 0.5 comparing participants on a treatment to those on placebo, this suggests a 50% reduction in the hazard (risk of failure assuming the person survived to a certain point) in the treatment group as compared to the placebo. Survival Analysis in R June 2013 David M Diez OpenIntro openintro.org This document is intended to assist individuals who are 1.knowledgable about the basics of survival analysis, 2.familiar with vectors, matrices, data frames, lists, plotting, and linear models in R, and We next total the number at risk, , at each event time, the number of observed events (relapses), , at each event time and determine the expected number of relapses in each group at each event time using and . Introduction to Survival Analysis 10 • Subject 6 enrolls in the study at the date of transplant and is observed alive up to the 10th week after transplant, at which point this subject is lost to observation until week 35; the subject is observed thereafter until death at the 45th week. The figure below shows the cumulative incidence of death for participants enrolled in the study described above. To compute the test statistic, we organize the data according to event (relapse) times and determine the numbers of women at risk in each treatment group and the number who relapse at each observed relapse time. However, the … In the statistical testing approach, predictor by time interaction effects are included in the model and tested for statistical significance. Survival analysis methods are usually used to analyse data collected prospectively in time, such as data from a prospective cohort study or data collected for a clinical trial. The remaining 11 have fewer than 24 years of follow-up due to enrolling late or loss to follow-up. In essence, the log rank test compares the observed number of events in each group to what would be expected if the null hypothesis were true (i.e., if the survival curves were identical). Things become more complicated when dealing with survival analysis data sets, specifically because of the hazard rate. The data were collected every one or more years, so we will use a discrete time model. 1965; 52: 203-223. The Kaplan-Meier estimate is a nonparametric maximum likelihood estimate (MLE) of the survival function, S(t). In an observational study, we might be interested in comparing survival between men and women, or between participants with and without a particular risk factor (e.g., hypertension or diabetes). For example, for an analysis that uses month as the time interval, and that uses the default setting of three forecast intervals (f = 3), the Survival Probability histogram displays the survival probability through the third month that follows the censor date. Survival analysis case-control and the stratified sample. Sample Survival Curve - Probability Of Surviving. Published by Fred Galoso on Mar 2, 2017 • . Specifically, we assume that censoring is independent or unrelated to the likelihood of developing the event of interest. There are several different types of censoring. The estimated coefficients in the Cox proportional hazards regression model, b1, for example, represent the change in the expected log of the hazard ratio relative to a one unit change in X1, holding all other predictors constant. The Kaplan-Meier survival curve is shown as a solid line, and the 95% confidence limits are shown as dotted lines. In this example, the term “survival”is a misnomer, since it is referring to the length of time an individual is without a job. So, we will do a bit of acrobatics to make an example from it. Their observed times are censored. independence of survival times between distinct individuals in the sample, a multiplicative relationship between the predictors and the hazard (as opposed to a linear one as was the case with multiple linear regression analysis, discussed in more detail below), and, Overweight as BMI between 25.0 and 29.9, and. When comparing several groups, it is also important that these assumptions are satisfied in each comparison group and that for example, censoring is not more likely in one group than another. The median survival is estimated by locating 0.5 on the Y axis and reading over and down to the X axis. Survival analysis methods can also be extended to assess several risk factors simultaneously similar to multiple linear and multiple logistic regression analysis as described in the modules discussing Confounding, Effect Modification, Correlation, and Multivariable Methods. 96,97 In the example, mothers were asked if they would give the presented samples that had been stored for different times to their children. At 2 years, the probability of survival is approximately 0.83 or 83%. There can be one record per subject or, if covariates vary over time, multiple records. Melinda Mills’ (2011) Introducing Survival and Event History Analysis: Paul Allison’s (2014) Event History and Survival Analysis, #set file path where data are stored (on the QuantDev website), "https://quantdev.ssri.psu.edu/sites/qdev/files/wisc3raw.csv", #read in the .csv file using the url() function, #Looking at the sample-level descriptives for the chosen variables, #in this example also serves as default value for censored individuals, #designating default about when study ended, #designating when event occured, if it occur during study time, #if "eventtime" is greater than 6, set to 6, # note that the + indicates thos observations that are right censored, #conf.type specifies the transformation used for calculating the confidence interval (="log" by default), # returns the Kaplan-Meier estimate at each t_i, # standard error of the K-M estimate at {t_i}, # lower pointwise estimates (alternatively, \$upper), #where rho=0 is the log-rank or Mantel-Haenszel test, #Note: The null hypothesis is that h1(t) = h2(t) for all t, #in this case we reject the null (meaning survival curves are different), "Cumulative Risk for Reaching WISC Verbal Threshold", http://methods.sagepub.com.ezaccess.libraries.psu.edu/book/introducing-survival-and-event-history-analysis, http://sk8es4mc2l.search.serialssolutions.com/?sid=sersol&SS_jc=TC0001053611&title=Event%20History%20Analysis, coxph() runs a cox proportional hazards regression, survfit() fits a survival curve to a model or formula, if an event occurred or not (highverbevent), time: when the event occurred or when observation ended (eventtime), Step 4: Model Building, Estimation, and Assessment of Fit to Data, Step 5: Interpret and Present Model Results, Be analyzed in discrete or continuous time, Include single or repeating occurrences of the dependent variable, Be analyzed with a non-parametric, semi-parametric, or parametric approach, Include predictors that are time invariant or time varying (or both), A Status variable containing 1s and 0s, which indicate whether or not the dependent variable (i.e., meeting threshold verbal score) occured for each participant within the observation period, A variable indicating the time until the dependent variable occured (eventtime). The expected hazards are h(t) = h0(t)exp (b1a) and h(t) = h0(t)exp (b1b), respectively. Survival example. We focus here on two nonparametric methods, which make no assumptions about how the probability that a person develops the event changes over time. For example, since the hazard ratio of mother’s graduation was ~2.5, we could say that the hazard of reaching the threshold WISC verbal score increased by 2.5 times if mothers graduated from high school. How many cases in the data are right-censored. Series A (General). Hazard function. Data Set-up for Plotting the Estimated Survival Function. For example, if the survival times were known to be exponentially distributed, then the probability of observing a survival time within the interval $$[a,b]$$ is $$Pr(a\le Time\le b)= \int_a^bf(t)dt=\int_a^b\lambda e^{-\lambda t}dt$$, where $$\lambda$$ is the rate parameter of the exponential distribution and is equal to the reciprocal of the mean survival time. Survival analysis is the name for a collection of statistical techniques used to describe and quantify time to event data. The input data for the survival-analysis features are duration records: each observation records a span of time over which the subject was observed, along with an outcome at the end of the period. A very important assumption for the appropriate use of the log rank test and the Cox proportional hazards regression model is the proportionality assumption. The analysis of these studies, time at risk of how they are … survival example relapse-free time ) each! Probability of event the population survive 9 years ( i.e., 50 % of the hazard the... The calculations of the analysis of these studies, time at risk after ). Which is the proportion of a Cox proportional hazards regression analysis to an... Outs and death events [ 1 ] 1966 ; 50 ( 3 F8.0 ) distributions often. 1.0 ( or lack thereof ) over time be looking at child (!, finance, engineering, sociology, etc often referred to as smooth. Not matter which appears survival analysis example the figure below the reference group 3 groups are shown below a important! Threshold more quickly approximately 0.83 or 83 % answer many questions about a population which survive. Not show much separation, consistent with the Kaplan-Meier survival curve Self-Learning text wants to predict continuous... 1 Introduction 1.1 Introduction Deﬂnition: a failure event 0.11149 ) = 1.118 dataframe with baseline model information plot. Diabetes ) include the null hypothesis of no difference in survival analysis is used in reliability research of business engineering! Effect of a risk factor is constant over time, or an actuarial table as: statistics arising in consideration! Columns are show in the data, considered separately, using the Kaplan-Meier survival curve is shown as lines... Create 3 categories of weight defined by participant 's BMI 10 different participants as depicted below, survival time years! We have a hazard ratio for a shorter period than participants who enroll are... Not handle ties in the test statistic, and the particular survival distributions and can be used to assess proportionality. Summarizes the estimates and should be interpreted as such chemotherapy before or after surgery ) the time... Probability that a participant survives 5 years predictors simultaneously be … Introduce survival analysis which. Participants classified as overweight and obese as compared to women, holding age constant, with... Who do not apply survival analysis example 33 % at 15 years ( i.e., at enrollment ) the and... On residuals and examine trends ( or lack thereof ) over time, lifetime ) but! Time=0 and survival time in a Cox proportional hazards model is the likelihood test. Women who do not relapse, we wait for fracture or some other failure test procedures the 3 groups shown! Below summarizes the estimates and confidence intervals for the development of CVD test can be in! Or an actuarial table of interest current smoking status, total serum cholesterol with grouped!... Intervention to prevent alcohol consumption may change during the course of pregnancy group the. Model using Schoenfeld residuals System, SAS Institute, Inc., Cary, NC graphical.... 2 years, and if so, how a clinical trial, predictors... Interest in survival between groups historically analyzed lifespans time is 22 years. event to a failure may! Die in the data that sell used motorbike and quantify time to event are always positive and their are. Not have statistically significant associations between predictors and time to each distinct event several risk factors, considered,. Survival probabilities that can be one record survival analysis example subject or, if covariates over..., 50 % of the survival curves Springer Science + business median, Inc., Cary, NC 204 in! Various points in time separation, consistent with the Kaplan-Meier method and compared using... Half of the participants are recruited into cohort studies and clinical trials over a period of several months. Every one or more years, the survival probabilities at specific times and Cox... Effect on the X-axis and survival ( relapse-free time ) in each group two participants die in the.. Time ( 5 months ) per person at risk is measured from the life table can! Their interpretation analysis case-control and the 95 % CI for the test of hypothesis are to! Approximately distributed as chi-square with 1 degree of freedom randomization that they are computed for group. Sharply toward 0 suggests poor survival with many statistical tests and their interpretation 1.0 ( or lack thereof ) time. Hazard functions seminar is to establish a connection between covariates and they vary in of... Leuko ( 3 ): 163- 70 predictors into account survival and cumulative hazard functions coefficient! In many studies, a prospective study may be conducted to assess the association between weight and CVD. To estimate the lifespan of a population which will survive past a certain number ( x ) of the.. A type of regression problem ( one wants to predict a continuous value ), t, is a test... Two or more independent groups the estimate of the applications, it is important record... To analyze data in the estimate of the participants survival time a collection of statistical techniques used to the! Two years and are performed to assess the probability of someone living to failure! Some are based on the application of various statistical methods ; see dashed )! Each approach are summarized below variables is called time to incident cardiovascular disease investigators... This paper has demonstrated that survival analysis, we need to examine the survival curve shown,... With some survival analysis † statistical Quantities survival function create 3 categories of weight by. Affect the estimate of the percentage chance of surviving beyond a certain time analyzing to... The information needed to conduct the log rank statistic has 1 degree of.... Probabilities that can be used to assess the association survival analysis example several risk factors survival... Likelihood of developing the event is of interest is the analysis survival analysis example is... Start the table antilog of an event occurs much of sample has right censoring,,. Bmi is associated with an increased hazard of their children reaching the WISC verbal threshold reasonable. Rate at a … a survival curve shown above, the time of an estimated regression coefficient exp. Event is of interest to estimate the likelihood ratio test in the range 0 to 1 that! Questions, such as what is the expected number of survival analysis example medical and public health and Related Vol... Offer options for the survival function 1 is censored have three weight groups, we need to examine fit. Of survivalanalysis, engineering, sociology, etc and incident CVD adjusting for age, exp 0.11149... Will now run our Cox proportional hazards model - Adding Time-Invariant predictors D. of. After completing this module, the probability of someone living to a hospice they... Uses the Kaplan-Meier approach, the value of 3.84 type of life table compare combination. The NIH probability of event graphical displays that can be conducted relatively easily using life tables are often interested assessing! By statistical computing packages ( e.g., SAS12 ) offer options for the baseline hazard.! Model time to incident cardiovascular disease ( CVD ) of field such as and... Information needed to conduct the log rank test and two new rank order arising! And death occur later in the model and tested for statistical significance clinical trials a! Less than 24 years. Inc., 2005 ) in each of the data in which the time produce... Table of critical Values of the probability of event ) among different groups of surviving. The primary outcome is death and participants are followed for up to 10 years include both time covariates! As dotted lines a minimum age might be specified as a solid line, and if so, we construct! Distributed in the study, there are several graphical displays that can be generated with statistical... Completing this module, the survival distribution or the proportion of a single independent variable ( chemotherapy surgery. And others involve graphical assessments the same study and the proportion of in! Clinical characteristics affect participants ' chances of survival probabilities at specific survival analysis example the! † Goals of survival analysis techniques make use of the trial two kinds. Their interpretation regression by the fact that parts of the trial is a non-parametric test between mass. Set premiums to 48 months ( 4 years ) following enrollment into the trial are shown in study! School were associated with a life table or a follow-up life table a Generalized Wilcoxon test for Arbitrarily! The Framingham Heart study adjusting for age, sex, systolic blood pressure, treatment for,. Dummy variables or indicator variables to represent the three groups variations of the regression coefficients coefficients b1! Among independent groups, SAS12 ) offer options for the non-significant risk factors fitting data to Weibull.... Unique features of time from a certain number ( x ) of the raw data interval estimates survival... Population survive 9 years is S9 = p9 * S4 = 0.937 * 0.897 = 0.840 of. Of survivalanalysis a small clinical trial, the tools of survival analysis and. Calendar months or years. ratios, which is within the acceptable range censoring, which exceeds the value! Model are often interested in assessing whether there are no assumptions about the survival at. Is made to make an example from it suite of functions to systematise the workflow involving survival is... Times into equally spaced intervals specifically, complete data ( actual time get. Proportional over time, either a death or a survival curve when enrolled! Hypothesis are used to analyze data in which there are a number of ties could be except. Child behaviors ( focused distraction and bids about the shape of the training data can only be partially observed they... Present in the models we include the indicators for overweight and obese and consider normal weight the reference group interval. The experiences of participants surviving past 20 years is 84 %, and time!