| Title: | Introductory Statistics with R |
|---|---|
| Description: | Data sets and scripts for text examples and exercises in P. Dalgaard (2008), `Introductory Statistics with R', 2nd ed., Springer Verlag, ISBN 978-0387790534. |
| Authors: | Peter Dalgaard [aut, cre] |
| Maintainer: | Peter Dalgaard <[email protected]> |
| License: | GPL (>= 2) |
| Version: | 2.0-12 |
| Built: | 2026-05-21 07:31:22 UTC |
| Source: | https://github.com/cran/ISwR |
Repeated measurements of alkaline phosphatase in a randomized trial of Tamoxifen treatment of breast cancer patients.
alkfosalkfos
A data frame with 43 observations on the following 8 variables.
grpa numeric vector, group code (1=placebo, 2=Tamoxifen).
c0a numeric vector, concentration at baseline.
c3a numeric vector, concentration after 3 months.
c6a numeric vector, concentration after 6 months.
c9a numeric vector, concentration after 9 months.
c12a numeric vector, concentration after 12 months.
c18a numeric vector, concentration after 18 months.
c24a numeric vector, concentration after 24 months.
Original data.
B. Kristensen et al. (1994), Tamoxifen and bone metabolism in postmenopausal low-risk breast cancer patients: a randomized study. Journal of Clinical Oncology, 12(2):992–997.
The ashina data frame has 16 rows and 3 columns. It contains
data from a crossover trial for the effect of an NO synthase inhibitor
on headaches. Visual analog scale recordings of pain levels were made
at baseline and at five time points after infusion of the drug or
placebo. A score was calculated as the sum of the differences from
baseline. Data were recorded during two sessions for each patient. Six
patients were given treatment on the first occasion and the placebo on
the second. Ten patients had placebo first and then treatment. The
order of treatment and the placebo was randomized.
ashinaashina
This data frame contains the following columns:
vas.activea numeric vector, summary score when given active substance.
vas.placa numeric vector, summary score when given placebo treatment.
grpa numeric vector code, 1: placebo first, 2: active first.
Original data.
M.Ashina et al. (1999), Effect of inhibition of nitric oxide synthase on chronic tension-type headache: a randomised crossover trial. Lancet 353, 287–289
plot(vas.active~vas.plac,pch=grp,data=ashina) abline(0,1)plot(vas.active~vas.plac,pch=grp,data=ashina) abline(0,1)
Danish study on the effect of screening for breast cancer.
bcmortbcmort
A data frame with 24 observations on the following 4 variables.
agea factor with levels 50-54, 55-59,
60-64, 65-69, 70-74, and 75-79
.
cohorta factor with levels Study gr.,
Nat.ctr., Hist.ctr., and Hist.nat.ctr..
bc.deathsa numeric vector, number of breast cancer deaths.
p.yra numeric vector, person-years under study.
Four cohorts were collected. The “study group” consists of the population of women in the appropriate age range in Copenhagen and Frederiksberg after the introduction of routine mammography screening. The “national control group” consisted of the population in the parts of Denmark in which routine mammography screening was not available. These two groups were both collected in the years 1991–2001. The “historical control group” and the “historical national control group” are similar cohorts from 10 years earlier (1981–1991), before the introduction of screening in Copenhagen and Frederiksberg. The study group comprises the entire population, not just those accepting the invitation to be screened.
A.H. Olsen et al. (2005), Breast cancer mortality in Copenhagen after introduction of mammography screening. British Medical Journal, 330: 220–222.
The bp.obese data frame has 102 rows and 3 columns.
It contains data from a random sample of Mexican-American adults in a
small California town.
bp.obesebp.obese
This data frame contains the following columns:
sexa numeric vector code, 0: male, 1: female.
obesea numeric vector, ratio of actual weight to ideal weight from New York Metropolitan Life Tables.
bpa numeric vector,systolic blood pressure (mm Hg).
B.W. Brown and M. Hollander (1977), Statistics: A Biomedical Introduction, Wiley.
plot(bp~obese,pch = ifelse(sex==1, "F", "M"), data = bp.obese)plot(bp~obese,pch = ifelse(sex==1, "F", "M"), data = bp.obese)
The table caesar.shoe contains the relation between caesarean
section and maternal shoe size (UK sizes!).
caesar.shoecaesar.shoe
A matrix with two rows and six columns.
D.G. Altman (1991), Practical Statistics for Medical Research, Table 10.1, Chapman & Hall.
prop.trend.test(caesar.shoe["Yes",],margin.table(caesar.shoe,2))prop.trend.test(caesar.shoe["Yes",],margin.table(caesar.shoe,2))
The coking data frame has 18 rows and 3 columns.
It contains the time to coking in an experiment where the oven width
and temperature were varied.
cokingcoking
This data frame contains the following columns:
widtha factor with levels 4, 8, and
12, giving the oven width in inches.
tempa factor with levels 1600 and 1900,
giving the temperature in Fahrenheit.
timea numeric vector, time to coking.
R.A. Johnson (1994), Miller and Freund's Probability and Statistics for Engineers, 5th ed., Prentice-Hall.
attach(coking) matplot(tapply(time,list(width,temp),mean)) detach(coking)attach(coking) matplot(tapply(time,list(width,temp),mean)) detach(coking)
The cystfibr data frame has 25 rows and 10 columns.
It contains lung function data for cystic fibrosis patients (7–23 years
old).
cystfibrcystfibr
This data frame contains the following columns:
agea numeric vector, age in years.
sexa numeric vector code, 0: male, 1:female.
heighta numeric vector, height (cm).
weighta numeric vector, weight (kg).
bmpa numeric vector, body mass (% of normal).
fev1a numeric vector, forced expiratory volume.
rva numeric vector, residual volume.
frca numeric vector, functional residual capacity.
tlca numeric vector, total lung capacity.
pemaxa numeric vector, maximum expiratory pressure.
D.G. Altman (1991), Practical Statistics for Medical Research, Table 12.11, Chapman & Hall.
O'Neill et al. (1983), The effects of chronic hyperinflation, nutritional status, and posture on respiratory muscle strength in cystic fibrosis, Am. Rev. Respir. Dis., 128:1051–1054.
This data set contains counts of incident lung cancer cases and population size in four neighbouring Danish cities by age group.
eba1977eba1977
A data frame with 24 observations on the following 4 variables:
citya factor with levels Fredericia,
Horsens, Kolding, and Vejle.
agea factor with levels 40-54, 55-59,
60-64, 65-69, 70-74, and 75+.
popa numeric vector, number of inhabitants.
casesa numeric vector, number of lung cancer cases.
These data were “at the center of public interest in Denmark in 1974”, according to Erling Andersen's paper. The city of Fredericia has a substantial petrochemical industry in the harbour area.
E.B. Andersen (1977), Multiplicative Poisson models with unequal cell rates, Scandinavian Journal of Statistics, 4:153–158.
J. Clemmensen et al. (1974), Ugeskrift for Læger, pp. 2260–2268.
The energy data frame has 22 rows and 2 columns.
It contains data on the energy expenditure in groups of lean and obese women.
energyenergy
This data frame contains the following columns:
expenda numeric vector, 24 hour energy expenditure (MJ).
staturea factor with levels
lean and
obese.
D.G. Altman (1991), Practical Statistics for Medical Research, Table 9.4, Chapman & Hall.
plot(expend~stature,data=energy)plot(expend~stature,data=energy)
England and Wales mortality rates from lung cancer, nasal cancer,
and all causes, 1936–1980. The 1936 rates are repeated as 1931 rates in
order to accommodate follow-up for the nickel study.
ewratesewrates
A data frame with 150 observations on the following 5 variables:
yearcalendar period, 1931: 1931–35, 1936: 1936–40, ....
ageage class, 10: 10–14, 15:15–19, ....
lunglung cancer mortality rate per 1 million person-years
nasalnasal cancer mortality rate per 1 million person-years
otherall cause mortality rate per 1 million person-years
Taken from the “Epi” package by Bendix Carstensen et al.
N.E. Breslow, and N. Day (1987). Statistical Methods in Cancer Research. Volume II: The Design and Analysis of Cohort Studies, Appendix IX. IARC Scientific Publications, Lyon.
The trypsin data frame has 271 rows and 3 columns.
Serum levels of immunoreactive trypsin in healthy volunteers (faked!).
fake.trypsinfake.trypsin
This data frame contains the following columns:
trypsina numeric vector, serum-trypsin in ng/ml.
grpa numeric vector, age coding. See below.
grpfa factor with levels
1: age 10–19,
2: age 20–29,
3: age 30–39,
4: age 40–49,
5: age 50–59, and
6: age 60–69.
Data have been simulated to match given group means and SD.
D.G. Altman (1991), Practical Statistics for Medical Research, Table 9.12, Chapman & Hall.
plot(trypsin~grp, data=fake.trypsin)plot(trypsin~grp, data=fake.trypsin)
The gvhd data frame has 37 rows and 7 columns.
It contains data from patients receiving a nondepleted allogenic bone
marrow transplant with the purpose of finding variables associated with
the development of acute graft-versus-host disease.
graft.vs.hostgraft.vs.host
This data frame contains the following columns:
pnra numeric vector patient number.
rcpagea numeric vector, age of recipient (years).
donagea numeric vector, age of donor (years).
typea numeric vector, type of leukaemia coded 1: AML, 2: ALL, 3: CML for acute myeloid, acute lymphatic, and chronic myeloid leukaemia.
prega numeric vector code indicating whether donor has been pregnant. 0: no, 1: yes.
indexa numeric vector giving an index of mixed epidermal cell-lymphocyte reactions.
gvhda numeric vector code, graft-versus-host disease, 0: no, 1: yes.
timea numeric vector, follow-up time
deada numeric vector code, 0: no (censored), 1: yes
D.G. Altman (1991), Practical Statistics for Medical Research, Exercise 12.3, Chapman & Hall.
plot(jitter(gvhd,0.2)~index,data=graft.vs.host)plot(jitter(gvhd,0.2)~index,data=graft.vs.host)
The heart.rate data frame has 36 rows and 3 columns.
It contains data for nine patients with congestive heart failure before
and shortly after administration of enalaprilat, in a balanced two-way
layout.
heart.rateheart.rate
This data frame contains the following columns:
hra numeric vector, heart rate in beats per minute.
subja factor with levels
1 to 9.
timea factor with levels
0 (before),
30,
60, and
120 (minutes after administration).
D.G. Altman (1991), Practical Statistics for Medical Research, Table 12.2, Chapman & Hall.
evalq(interaction.plot(time,subj,hr), heart.rate)evalq(interaction.plot(time,subj,hr), heart.rate)
The hellung data frame has 51 rows and 3 columns.
diameter and concentration of Tetrahymena cells with and without
glucose added to growth medium.
hellunghellung
This data frame contains the following columns:
glucosea numeric vector code, 1: yes, 2: no.
conca numeric vector, cell concentration (counts/ml).
diametera numeric vector, cell diameter ().
D. Kronborg and L.T. Skovgaard (1990), Regressionsanalyse, Table 1.1, FADLs Forlag (in Danish).
plot(diameter~conc,pch=glucose,log="xy",data=hellung)plot(diameter~conc,pch=glucose,log="xy",data=hellung)
Serum IgM in 298 children aged 6 months to 6 years.
IgMIgM
A single numeric vector (g/l).
D.G. Altman (1991), Practical Statistics for Medical Research, Table 3.2, Chapman & Hall.
stripchart(IgM,method="stack")stripchart(IgM,method="stack")
The intake data frame has 11 rows and 2 columns.
It contains paired values of energy intake for 11 women.
intakeintake
This data frame contains the following columns:
prea numeric vector, premenstrual intake (kJ).
posta numeric vector, postmenstrual intake (kJ).
D.G. Altman (1991), Practical Statistics for Medical Research, Table 9.3, Chapman & Hall.
plot(intake$pre, intake$post)plot(intake$pre, intake$post)
The juul data frame has 1339 rows and 6 columns.
It contains a reference sample of the distribution of insulin-like
growth factor (IGF-I), one observation per subject in various ages, with the
bulk of the data collected in connection with school physical
examinations.
juuljuul
This data frame contains the following columns:
agea numeric vector (years).
menarchea numeric vector. Has menarche occurred (code 1: no, 2: yes)?
sexa numeric vector (1: boy, 2: girl).
igf1a numeric vector, insulin-like growth factor
().
tannera numeric vector, codes 1–5: Stages of puberty ad modum Tanner.
testvola numeric vector, testicular volume (ml).
Original data.
plot(igf1~age, data=juul)plot(igf1~age, data=juul)
The juul2 data frame has 1339 rows and 8 columns;
extended version of |juul|.
juul2juul2
This data frame contains the following columns:
agea numeric vector (years).
heighta numeric vector (cm).
menarchea numeric vector. Has menarche occurred (code 1: no, 2: yes)?
sexa numeric vector (1: boy, 2: girl).
igf1a numeric vector, insulin-like growth factor
().
tannera numeric vector, codes 1–5: Stages of puberty ad modum Tanner.
testvola numeric vector, testicular volume (ml).
weighta numeric vector, weight (kg).
Original data.
plot(igf1~age, data=juul2)plot(igf1~age, data=juul2)
The kfm data frame has 50 rows and 7 columns.
It was collected by Kim Fleischer Michaelsen and contains data for 50
infants of age approximately 2 months. They were weighed immediately
before and
after each breast feeding. and the measured intake of breast milk was
registered along with various other data.
kfmkfm
This data frame contains the following columns:
noa numeric vector, identification number.
dl.milka numeric vector, breast-milk intake (dl/24h).
sexa factor with levels
boy and
girl.
weighta numeric vector, weight of child (kg).
ml.suppla numeric vector, supplementary milk substitute (ml/24h).
mat.weighta numeric vector, weight of mother (kg).
mat.heighta numeric vector, height of mother (cm).
The amount of supplementary milk substitute refers to a period before the data collection.
Original data.
plot(dl.milk~mat.height,pch=c(1,2)[sex],data=kfm)plot(dl.milk~mat.height,pch=c(1,2)[sex],data=kfm)
The lung data frame has 18 rows and 3 columns. It contains data
on three different methods of determining human
lung volume.
lunglung
This data frame contains the following columns:
volumea numeric vector, measured lung volume.
methoda factor with levels A, B, and C.
subjecta factor with levels 1–6.
Anon. (1977), Exercises in Applied Statistics, Exercise 4.15, Dept.\ of Theoretical Statistics, Aarhus University.
The malaria data frame has 100 rows and 4 columns.
malariamalaria
This data frame contains the following columns:
subjectsubject code.
ageage in years.
abantibody level.
mala numeric vector code, Malaria: 0: no, 1: yes.
A random sample of 100 children aged 3–15 years from a village in Ghana. The children were followed for a period of 8 months. At the beginning of the study, values of a particular antibody were assessed. Based on observations during the study period, the children were categorized into two groups: individuals with and without symptoms of malaria.
Unpublished data.
summary(malaria)summary(malaria)
The melanom data frame has 205 rows and 7 columns.
It contains data relating to the survival of patients after an operation for
malignant melanoma, collected at Odense University Hospital by K.T.
Drzewiecki.
melanommelanom
This data frame contains the following columns:
noa numeric vector, patient code.
statusa numeric vector code, survival status; 1: dead from melanoma, 2: alive, 3: dead from other cause.
daysa numeric vector, observation time.
ulca numeric vector code, ulceration; 1: present, 2: absent.
thicka numeric vector, tumor thickness (1/100 mm).
sexa numeric vector code; 1: female, 2: male.
P.K. Andersen, Ø. Borgan, R.D. Gill, and N. Keiding (1991), Statistical Models Based on Counting Processes, Appendix 1, Springer-Verlag.
require(survival) plot(survfit(Surv(days,status==1)~1,data=melanom))require(survival) plot(survfit(Surv(days,status==1)~1,data=melanom))
The data concern a cohort of nickel smelting workers in South Wales, with information on exposure, follow-up period, and cause of death.
nickelnickel
A data frame containing 679 observations of the following 7 variables:
idsubject identifier (numeric).
icdICD cause of death if dead, 0 otherwise (numeric).
exposureexposure index for workplace (numeric)
dobdate of birth (numeric).
age1stage at first exposure (numeric).
ageinage at start of follow-up (numeric).
ageoutage at end of follow-up (numeric).
Taken from the “Epi” package by Bendix Carstensen et al.
For comparison purposes,
England and Wales mortality rates (per 1,000,000 per annum)
from lung cancer (ICDs 162 and 163),
nasal cancer (ICD 160), and all causes, by age group and calendar period, are
supplied in the data set ewrates.
N.E. Breslow and N. Day (1987). Statistical Methods in Cancer Research. Volume II: The Design and Analysis of Cohort Studies, IARC Scientific Publications, Lyon.
The data concern a cohort of nickel smelting workers in South Wales,
with information on exposure, follow-up period, and cause of death, as
in the nickel data.
This version has follow-up times split according to age groups and is
merged with the mortality rates in ewrates.
nickel.expandnickel.expand
A data frame with 3724 observations on the following 12 variables:
agrage class: 10: 10–14, 15: 15–19, ....
ygrcalendar period, 1931: 1931–35, 1936: 1936–40, ... .
idsubject identifier (numeric).
icdICD cause of death if dead, 0 otherwise (numeric).
exposureexposure index for workplace (numeric).
dobdate of birth (numeric).
age1stage at first exposure (numeric).
ageinage at start of follow-up (numeric).
ageoutage at end of follow-up (numeric).
lunglung cancer mortality rate per 1 million person-years.
nasalnasal cancer mortality rate per 1 million person-years.
otherall cause mortality rate per 1 million person-years.
Computed from nickel and ewrates data sets.
Four small experiments with the purpose of estimating the EC50 of a biological dose-response relation.
philionphilion
A data frame with 30 observations on the following 3 variables:
experimenta numeric vector; codes 1 through 4 denote the experiment number.
dosea numeric vector, the dose.
responsea numeric vector, the response (counts).
These data were discussed on the R mailing lists, initially
suggesting a log-linear Poisson regression, but actually a relation
like
is
more suitable.
Original data from Vincent Philion, IRDA, Qu\'ebec.
https://stat.ethz.ch/pipermail/r-help/2003-July/036828.html (Thread on R-help mailing list: "inverse prediction and Poisson regression", started by Vincent Philion on July 25, 2003.)
The numeric vector react contains differences between two
nurses' determinations of 334 tuberculin reaction sizes.
reactreact
A single vector, differences between reaction sizes in mm.
Anon. (1977), Exercises in Applied Statistics, Exercise 2.9, Dept.\ of Theoretical Statistics, Aarhus University.
hist(react) # not good because of discretization effects... plot(density(react))hist(react) # not good because of discretization effects... plot(density(react))
The folate data frame has 22 rows and 2 columns.
It contains data on red cell folate levels in patients receiving three
different methods of ventilation during anesthesia.
red.cell.folatered.cell.folate
This data frame contains the following columns:
folatea numeric vector, folate concentration ().
ventilationa factor with levels
N2O+O2,24h: 50% nitrous oxide and 50% oxygen, continuously for
24 hours;
N2O+O2,op: 50% nitrous oxide and 50% oxygen, only during operation;
O2,24h: no nitrous oxide but 35%–50% oxygen for 24 hours.
D.G. Altman (1991), Practical Statistics for Medical Research, Table 9.10, Chapman & Hall.
plot(folate~ventilation,data=red.cell.folate)plot(folate~ventilation,data=red.cell.folate)
The rmr data frame has 44 rows and 2 columns.
It contains the resting metabolic rate and body weight data for 44 women.
rmrrmr
This data frame contains the following columns:
body.weighta numeric vector, body weight (kg).
metabolic.ratea numeric vector, metabolic rate (kcal/24hr).
D.G. Altman (1991), Practical Statistics for Medical Research, Exercise 11.2, Chapman & Hall.
plot(metabolic.rate~body.weight,data=rmr)plot(metabolic.rate~body.weight,data=rmr)
The secher data frame has 107 rows and 4 columns. It contains
ultrasonographic measurements of fetuses immediately before birth and
their subsequent
birth weight.
sechersecher
This data frame contains the following columns:
bwta numeric vector, birth weight (g).
bpda numeric vector, biparietal diameter (mm).
ada numeric vector, abdominal diameter (mm).
noa numeric vector, observation number.
D. Kronborg and L.T. Skovgaard (1990), Regressionsanalyse, Table 3.1, FADLs Forlag (in Danish).
Secher et al. (1987), European Journal of Obstetrics, Gynecology, and Reproductive Biology, 24: 1–11.
plot(bwt~ad, data=secher, log="xy")plot(bwt~ad, data=secher, log="xy")
The secretin data frame has 50 rows and 6 columns. It contains
data from a glucose response experiment.
secretinsecretin
This data frame contains the following columns:
gluca numeric vector, blood glucose level.
persona factor with levels A–E.
timea factor with levels 20, 30, 60, 90
(minutes since injection), and pre (before injection).
repla factor with levels
a: 1st sample;
b: 2nd sample.
time20plusa factor with levels
20+: 20 minutes or longer since injection;
pre: before injection.
time.comba factor with levels
20: 20 minutes since injection;
30+: 30 minutes or longer since injection;
pre: before injection.
Secretin is a hormone of the duodenal mucous membrane. An extract was administered to five patients with arterial hypertension. Primary registrations (double determination) of blood glucose were on graph paper and later quantified with the smallest of the two measurements recorded first.
Anon. (1977), Exercises in Applied Statistics, Exercise 5.8, Dept.\ of Theoretical Statistics, Aarhus University.
All cases of stroke in Tartu, Estonia, during the period 1991–1993, with follow-up until January 1, 1996.
strokestroke
A data frame with 829 observations on the following 10 variables.
sexa factor with levels Female and Male.
dieda Date, date of death.
dstra Date, date of stroke.
agea numeric vector, age at stroke.
dgna factor, diagnosis, with levels ICH
(intracranial haemorrhage), ID (unidentified). INF
(infarction, ischaemic), SAH (subarchnoid haemorrhage).
comaa factor with levels No and Yes,
indicating whether patient was in coma after the stroke.
diaba factor with levels No and Yes,
history of diabetes.
minfa factor with levels No and Yes,
history of myocardial infarction.
hana factor with levels No and Yes, history
of hypertension.
obsmonthsa numeric vector, observation times in months (set to 0.1 for patients dying on the same day as the stroke).
deada logical vector, whether patient died during the study.
Original data.
J. Korv, M. Roose, and A.E. Kaasik (1997). Stroke Registry of Tartu, Estonia, from 1991 through 1993. Cerebrovascular Disorders 7:154–162.
The tb.dilute data frame has 18 rows and 3 columns. It contains
data from a drug test involving dilutions of tuberculin.
tb.dilutetb.dilute
This data frame contains the following columns:
reactiona numeric vector, reaction sizes (average of diameters) for tuberculin skin pricks.
animala factor with levels 1–6.
logdosea factor with levels 0.5, 0, and -0.5.
The actual dilutions were 1:100, , 1:1000.
Setting the middle one to 1 and using base-10 logarithms gives
the logdose values.
Anon. (1977), Exercises in Applied Statistics, part of Exercise 4.15, Dept.\ of Theoretical Statistics, Aarhus University.
The thuesen data frame has 24 rows and 2 columns.
It contains ventricular shortening velocity and blood glucose for type 1
diabetic patients.
thuesenthuesen
This data frame contains the following columns:
blood.glucosea numeric vector, fasting blood glucose (mmol/l).
short.velocitya numeric vector, mean circumferential shortening velocity (%/s).
D.G. Altman (1991), Practical Statistics for Medical Research, Table 11.6, Chapman & Hall.
plot(short.velocity~blood.glucose, data=thuesen)plot(short.velocity~blood.glucose, data=thuesen)
The tlc data frame has 32 rows and 4 columns. It contains data on
pretransplant total lung capacity (TLC) for recipients of heart-lung
transplants by whole-body plethysmography.
tlctlc
This data frame contains the following columns:
agea numeric vector, age of recipient (years).
sexa numeric vector code, female: 1, male: 2.
heighta numeric vector, height of recipient (cm).
tlca numeric vector, total lung capacity (l).
D.G. Altman (1991), Practical Statistics for Medical Research, Exercise 12.5, 10.1, Chapman & Hall.
plot(tlc~height,data=tlc)plot(tlc~height,data=tlc)
The vitcap data frame has 24 rows and 3 columns.
It contains data on vital capacity for workers in the cadmium industry.
It is a subset of the vitcap2 data set.
vitcapvitcap
This data frame contains the following columns:
groupa numeric vector; group codes are 1: exposed > 10 years, 3: not exposed.
agea numeric vector, age in years.
vital.capacitya numeric vector, vital capacity (a measure of lung volume) in liters.
P. Armitage and G. Berry (1987), Statistical Methods in Medical Research, 2nd ed., Blackwell, p.286.
plot(vital.capacity~age, pch=group, data=vitcap)plot(vital.capacity~age, pch=group, data=vitcap)
The vitcap2 data frame has 84 rows and 3 columns.
Age and vital capacity for workers in the cadmium industry.
vitcap2vitcap2
This data frame contains the following columns:
groupa numeric vector; group codes are 1: exposed > 10 years, 2: exposed < 10 years, 3: not exposed.
agea numeric vector, age in years.
vital.capacitya numeric vector, vital capacity (a measure of lung volume) (l).
P. Armitage and G. Berry (1987), Statistical Methods in Medical Research, 2nd ed., Blackwell, p.286.
plot(vital.capacity~age, pch=group, data=vitcap2)plot(vital.capacity~age, pch=group, data=vitcap2)
The wright data frame has 17 rows and 2 columns.
It contains data on peak expiratory flow rate with two different flow
meters on each of 17 subjects.
wrightwright
This data frame contains the following columns:
std.wrighta numeric vector, data from large flow meter (l/min).
mini.wrighta numeric vector, data from mini flow meter (l/min).
J.M. Bland and D.G. Altman (1986), Statistical methods for assessing agreement between two methods of clinical measurement, Lancet, 1:307–310.
plot(wright) abline(0,1)plot(wright) abline(0,1)
The zelazo object is a list with four components.
zelazozelazo
This is a list containing data on age at walking (in months) for four groups of infants:
activetest group receiving active training; these children had their walking and placing reflexes trained during four three-minute sessions that took place every day from their second to their eighth week of life.
passivepassive training group; these children received the same types of social and gross motor stimulation, but did not have their specific walking and placing reflexes trained.
noneno training; these children had no special training, but were tested along with the children who underwent active or passive training.
ctr.8weighth-week controls; these children had no training and were only tested at the age of 8 weeks.
When asked to enter these data from a text source, many students will use one vector per group and will need to reformat data into a data frame for some uses. The rather unusual format of this data set mimics that situation.
P.R. Zelazo, N.A. Zelazo, and S. Kolb (1972), “Walking” in the newborn, Science, 176: 314–315.