Description
STAT 4008 Assignment 1
1. Suppose a discrete random variable T taking values 1, 3, 5, 7, 9, 12 with probabilities 1
6
,
1
3
,
1
4
,
1
8
,
1
16 and 1
16 , respectively.
(a) Find the mean of T
(b) Find the survival function of T
(c) Find the area under the curve S(t) in the right upper quadrant, i.e., find
Z ∞
0
S(t)dt
(d) Compare results in (a) and (c).
2. Suppose the lifetime of a electronic component is exponentially distributed with
rate θ with density function
f(t) = θ exp(−θt), t > 0
Find the conditional probability that T > t + s given T ≥ t, where s > 0. Also
find the probability that T > s.
3. A random variable T is said to be Weibull distributed if its hazard function is
of the form
h(t) = αλtα−1
, t > 0
where α and λ are positive constants. Find the distribution of Y = log T.
4. Assume the lifetime random variable T is continuous and denote its mean of
remaining lifetime given T ≥ t by m(t). Then show that the following functions
of T can be expressed in term of m(t)
(a) The survival function
S(t) = m(0)
m(t)
exp ”
−
Z t
0
du
m(u)
#
(b) The density function
f(t) = (m0
(t) + 1)
m(0)
m(t)
2
!
exp ”
−
Z t
0
du
m(u)
#
(c) The hazard function
h(t) = −
d
dt log [S(t)] = m0
(t) + 1
m(t)
5. For the geometric random variable with probability mass function
Pr(X = j) = (1 − p)
j−1
p, j = 1, 2, . . . ,
find its hazard function.
6. For the Poisson distribution with probability mass function
Pr(X = j) = e
−λ λ
j
j!
, j = 0, 1, 2, . . . .
Show that the hazard function is monotone increasing.
7. Suppose that the mean residual life of a continuous survival time T is given by
m(t) = t + 10.
(a) Find the mean of T
(b) Find h(t)
(c) Find S(t)
8. Find the survival function of the Gompertz random variable where its hazard
function is given by
h(t) = θeαt, t ≥ 0; θ, α > 0.
STAT 4008 Assignment 2
1. Let Sˆ(t) be the Kaplan-Meier estimator of the survival function and
σ
2
S
(t) = X
j|tj<t
dj
nj (nj − dj )
(a) Show that the approximate variance of arcsin q
Sˆ(t0)
is
1
4
σ
2
S
(t0)
Sˆ(t0)
1 − Sˆ(t0)
(b) Hence, the 100(1−α)% confidence interval for S(t0), based on this transformation,
is given as
sin2
max
0, arcsin
Sˆ(t0)
1/2
− 0.5z1−α/2σS(t0)
Sˆ(t0)
1 − Sˆ(t0)
!1/2
≤ S(t0) ≤
sin2
min
π
2
, arcsin
Sˆ(t0)
1/2
+ 0.5z1−α/2σS(t0)
Sˆ(t0)
1 − Sˆ(t0)
!1/2
2. Consider the following data set:
Time 10 11 12 13 14 15 16 17 18 19
Number of items failed 3 2 0 4 5 3 2 1 1 2
Number of items (right) censored 0 1 2 1 1 0 3 2 4 2
Assume the data are from a population with survival function S(t)
(a) write down the likelihood function of S(t).
(b) Estimate the survival function using the Kaplan-Meier method.
(c) Estimate the survival function using the Nelson-Aalen method.
(d) Estimate the mean survival time and its standard error.
(e) Give all three 95% confidence intervals for S(14) discussed in the class.
Please give details in this question, i.e., use no computer package.
3. Consider the following right-censored sample:
2, 4, 4, 4+, 5, 6+, 7, 7+, 8+
Estimate the mean survival time and its standard error.
4. A study was conducted on the effects of ploidy on the prognosis of patients with
cancers of the mouth. Patients were selected who had a paraffin-embedded sample
of the cancerous tissue taken at the time of surgery. Follow-up survival data was
obtained on each patient. The tissue samples were examined using a flow cytometer to
determined if the tumor had an aneuploid (abnormal) or diploid (normal) DNA profile.
The data are in the following table. Times are in weeks.
Aneuploid Tumors:
Death Times: 1, 3, 3, 4, 10, 13, 13, 16, 16, 24, 26, 27, 28, 30, 30,
32, 41, 51, 65, 67, 70, 72, 73, 77, 91, 93, 96, 100, 104, 157, 167
Censored Observations: 61, 74, 79, 80, 81, 87, 87, 88, 89, 97, 101,
104, 108, 109, 120, 131, 150, 231, 240, 400
Diploid Tumors:
Death Times: 1, 3, 4, 5, 5, 8, 23, 26, 27, 30, 42, 56, 62, 69, 104,
104, 112, 129, 181
Censored Observations: 8, 67, 76, 104, 176, 231
(a) Estimate the survival functions and their standard error for both the diploid and
aneuploid groups.
(b) Estimate the cumulative hazard rates and their standard error for both the diploid
and aneuploid groups.
(c) Provide an estimate of the mean time to death, and find a 95% confidence interval
for the mean survival time for both the diploid and aneuploid groups.
(d) Provide an estimate of the median time to death, and find a 95% confidence
interval for the median survival time for both the diploid and aneuploid groups.
5. A data set is given in “ass2q5” from Blackboard. It contains “Times(in Years)” in
Column 1, “Censor” in Column 2 where 0 indicates death and 1 indicates alive.
(a) Estimate the survival function and its standard error.
(b) Estimate the survival function at t = 5 and its standard error.
(c) Provide an estimate of the median time to death, and find a 95% confidence
interval for the median survival time.
6. Suppose you are given the following data set:
1, 6+, 5−, 3,(7, 9], 2−,(3, 4], 4, 5+
+: right censored; -: left censored
Find the estimate of the survival function.
7. Consider a hypothetical study of the mortality experience of diabetics. Thirty diabetic
subjects are recruited at a clinic and followed until death or the end of study. The
subject’s age at entry into the study and their age at the end of study or death are
given in the table below. Of interest is estimating the survival curve for a 60- or for a
70-year-old diabetic.
(a) Since the diabetics needed to survive long enough from birth until the study began,
the data is left-truncated. Construct a table showing the number of subjects at
risk, Y , as a function of age.
(b) Estimate the conditional survival function for the age of death of a diabetic patient
who has survived to age 60.
(c) Estimate the conditional survival function for the age of death of a diabetic patient
who has survived to age 65.
Entry Exit Death Entry Exit Death
Age Age Indicator Age Age Indicator
58 60 1 67 70 1
58 63 1 67 77 1
59 69 0 67 69 1
60 62 1 68 72 1
60 65 1 69 79 0
61 72 0 69 72 1
61 69 0 69 70 1
62 73 0 70 76 0
62 66 1 70 71 1
62 65 1 70 78 0
63 68 1 71 79 0
63 74 0 72 76 1
64 71 1 72 73 1
66 68 1 73 80 0
66 69 1 73 74 1
8. A study was performed to estimate the distribution of incubation times of individuals
known to have a sexually transmitted disease (STD). Twenty five patients with a
confirmed diagnosis of STD at a clinic were identified on June 1, 1996. All subjects
had been sexually active with a partner who had a confirmed diagnosis of a STD at
some point after January 1, 1993 (hence τ = 42 months). For each subject the date
of the first encounter to the clinical confirmation of the STD diagnosis. Based on
this right truncated sample, compute an estimate of the probability that the infection
period is less than x months conditional on the infection period’s being less than 42
months.
Date of First Months From 1/93 to Time (in months) until STD
Encounter Encounter Diagnosed in Clinic
2/93 2 30
4/93 4 27
7/93 7 25
2/94 14 19
8/94 20 18
6/94 18 17
8/93 8 16
1/94 13 16
5/94 17 15
2/95 26 15
8/94 20 15
3/94 15 13
11/94 23 13
5/93 5 12
4/94 16 11
3/94 15 9
11/93 11 8
6/93 9 8
9/95 33 8
4/93 4 7
8/93 8 6
11/95 35 6
10/93 10 6
12/95 36 4
1/95 25 4
STAT 4008 Assignment 3
1. You are given the following right-censored sample
12, 15+, 17, 17, 18, 19+, 20, 20, 20+, 21+, 24, 27
Test if the hazard function h(t) = 0.2, 0 < t < 27.
2. You are given the following data:
Group 1 : 2,2,3+,3,4,4+,5,5,6+
Group 2 : 2, 3,3,3+,4,4,4+,5,5+5+,6,7
+:censored
Test if these two groups of data have the same distribution.
3. The following table gives survival data from 30 patients with AML. Two possible
prognostic factors are considered:
x1 =
(
1 if patient ≥ 50 years old
0 otherwise
x2 =
(
1 if cellularity of mallow clot section is 100%
0 otherwise
Table. Survival Times and Data of Two Possible
Prognostic Factors and 30 AML Patients
Survival Time x1 x2 Survival Time x1 x2
18 0 0 8 1 0
9 0 1 2 1 1
28+ 0 0 26+ 1 0
31 0 1 10 1 1
39+ 0 1 4 1 0
19+ 0 1 3 1 0
45+ 0 1 4 1 0
6 0 1 18 1 1
8 0 1 8 1 1
15 0 1 3 1 1
23 0 0 14 1 1
28+ 0 0 3 1 0
7 0 1 13 1 1
12 1 0 13 1 1
9 1 0 35+ 1 0
Test if the prognostic factors are significance.
4. A data set is given in “ass3data.xls” on the blackboard system. It contains
“Times” in Column 1, “Group” in Column 2 and “Treatment” in Column 3.
Note that there is no censoring in the data set.
(a) test if there is difference among groups
(b) test if there is difference among treatments
5. A data set is given in “ass3q5.csv” on the blackboard. It contains Time, Status
(0: lived; 1: death) and Smoking Status. Use the seed 123457, generate a subsample of size 100 and then test if there are any difference among the Smoking
Status based on this subsample.
STAT 4008 Assignment 4
1. The following table gives survival data from 30 patients with AML. Two possible
prognostic factors are considered:
x1 =
(
1 if patient ≥ 50 years old
0 otherwise
x2 =
(
1 if cellularity of mallow clot section is 100%
0 otherwise
Table. Survival Times and Data of Two Possible
Prognostic Factors and 30 AML Patients
Survival Time x1 x2 Survival Time x1 x2
18 0 0 8 1 0
9 0 1 2 1 1
28+ 0 0 26+ 1 0
31 0 1 10 1 1
39+ 0 1 4 1 0
19+ 0 1 3 1 0
45+ 0 1 4 1 0
6 0 1 18 1 1
8 0 1 8 1 1
15 0 1 3 1 1
23 0 0 14 1 1
28+ 0 0 3 1 0
7 0 1 13 1 1
12 1 0 13 1 1
9 1 0 35+ 1 0
Assuming cox proportional hazard model and use the exact likelihood to handle
the tie observations. Test if the prognostic factors are significant. Compare the
results with the function “survdiff”.
2. You are given a data set in the file “ass4.csv”. It contains the following variables:
lstay Length of stay of a resident
age Age of a resident
trt Nursing home assignment (1: receive treatment,0: control)
gender Gender (1:male,0:female)
marstat Marital status (1: married,0: not married)
hlstat Health status (2: second best, 5: worst)
cens Censoring indicator (1:censored, 0: discharged)
Use 123457 as the seed number to generate a sample of size 1000 and use this
sample to model the home nursing home duration times, measured by days, as
a function of patient characteristics.
(a) Test if the patient characteristics have no impact on the home duration
times. State all three test values (Wald, likelihood ratio and score).
(b) Based on the individual z-scores, can we remove all covariates which pvalues are greater than 0.05? State the null hypothesis we want to test.
Give all three tests (Wald, likelihood ratio and score) and their p-values.
(c) Based on the simplified model, give a 95% confidence interval on the hazard
ratio of the patient with status=4 compared with those with status=5.


