In this example, we will clean up the code of gastempt1 and add graphical presentation of the fits.
# gastempt2.r # (C) 2005 Dieter Menne, http://www.menne-biomed.de # Adding graphics and diagnostics to the basic fit of # gastric emptying curves # PB: Pinheiro/Bates, Mixed-Effect..., Springer 2000 # See http://www.menne-biomed.de/gastempt1.r for more comments library(nlme) # Mixed effect number crunching library(lattice) # Trellis plots options(digits=3) # print 3 significant digits # Get our standard functions from external file in same folder source("fitfuncs.r")
We have outsourced the bulky self-starting LinExp emptying function into the file fitfuncs.r, and we in-source it here. Please download the file and put it into your working directory. If you peek into it, you will note that we have added definitions of other emptying function, for example the power exponential, which we will use later.
# Get Data from comma-separated (csv) file with header ge = read.table("gastempt.csv",sep=",",header=TRUE) # Make sure subject are not treated as numbers ge$subj = as.factor(ge$subj) # Create new factor <subj>.<treat> ge$subjtreat = as.factor(paste(ge$subj,ge$treat,sep=".")) # See PB 8.1.3. Function SSEmptLinExp is defined in fitfuncs.r gelist = nlsList(v~SSEmptLinExp(t,v0,kappa,tempt)|subjtreat,data=ge) print(gelist) # compute nlme fit contr = nlmeControl(pnlsTol=0.3) ge0.nlme=nlme(gelist,control=contr) ge0.coef = coef(ge0.nlme) # get coefficients
Nothing new till here compared to gastempt1.
ge0.coef$t50 = apply(ge0.coef,1,t50) print(ge0.coef)
We have replaced the for-loop by an R-typical apply row-wise. The function t50 we are going to use in all examples has been outsourced into the file fitfuncs.r.
ge0.pred = augPred(ge0.nlme,primary=~t)
The augmented prediction augPred generates a data frame of the original values, and in addition adds predicted values from the fits. A cut from ge0.pred looks like this
> ge0.pred t .groups v .type 1 3.00 2.alb 417 original 2 6.00 2.alb 419 original 3 9.00 2.alb 443 original 4 12.00 2.alb 459 original ... 1100 3.00 2.alb 430 predicted 2100 4.74 2.alb 431 predicted 3100 6.48 2.alb 432 predicted
The output of augPred knows how to plot itself; internally it calls plot.augPred.
Plot of Fitted and Observed
p = plot(ge0.pred) # Very quick, a bit ugly print(p)
It is best to use trellis graphs in a two-step approach: first generate the concept of the plot, assign it to a variable; then print it to a window by default, or to some graphics file as show below. In interactive mode, writing plot(ge0.pred) would have worked, but there are a few surprising cases where this fails, so better do it stepwise from the start. The default layout produced by this plot does not show the structure of the data nicely, so we give it another try with a forced layout, adding main title, x-label, y-label, and putting low-numbered subjects table-wise at the top.
# A few bells ... p = plot(ge0.pred, layout = c(2,4), # grid of 2 horizontal, 4 vert aspect = 0.7, # aspect of each panel; 1.0 for square main = "LinExp Gastric Emptying", xlab = "time (min)", ylab = "volume (ml)", as.table = T # ordered top to bottom ) print(p)
Default printing goes to a window, but by opening an device, several file formats can be created. Besides the bitmapped png-Format, the scalable postscript format is a good choice for inclusion in more recent version of Microsoft Word; for older Word versions, you must convert postscript to emf fist.
# Output the plot to a png file trellis.device("png",file="gastempt2.png", width=400,height=600,theme="col.whitebg") print(p) dev.off() # don't forget to close the device
Here is a view of the png file. Note that the curves with a value of kappa>1 show a rise after the initial filling of stomach; only 10.alb and 11.alb, the two failures in the nlsList fit, drop monotonously.
Diagnostic Plots of Residuals
Diagnostic plot of the residuals of a fit a statisticians best friend. Contrary to R2 values, they are worth a look both for linear and nonlinear fits. Residuals are the differences between predicted and observed values, normalized in such a way that we should expect about 95% of the data values in a band of plus/minus 2 around zero. The package nlme provides a very large toolkit for plotting of residuals, from which we will show two here.
#trellis.device("png",file="gastempt2_diag.png", # width=400,height=600,theme="col.whitebg")
If you plan to save the graph to a file, remove the two # starting the above lines.
p = plot(ge0.nlme,subjtreat ~resid(.), abline = 0, main= "Boxplot Residuals ge0.nlme",aspect=0.7) print(p,split=c(1,1,1,2),more=TRUE)
The details about the first line are documented under plot.lme. The second line is an interesting variation on the print.trellis subject. We plan to put two graphs on one page, so we split the page into 1 column and 2 rows (third and forth parameter of split), and we plot on grid field (1,1) of it (first and second parameter). By adding more=TRUE, we tell trellis that more plots are coming.
p = plot(ge0.nlme, id=0.05, # mark significant deviations idLabels = paste(ge$subjtreat,ge$t,sep=".t="), # add time main= "Residuals ge0.nlme", ylab = "residuals (norm.)", xlab = "fitted volumes (ml)", aspect=0.7, cex=0.7, # force smaller characters xlim=c(200,1200), # larger x limits ylim = c(-3,3) # symmetric )
The core line above simply reads plot(ge0.nlme), everything else is decoration. The most interesting part are the lines id and idLabels, where we ask that data points with that are probably outliers should be marked by subject, treatment and time.
print(p,split=c(1,2,1,2))
And we the graph into field (1,2) of a (1,2) grid. We could have added more=FALSE because nothing more will be plotted on the page, but this is the default.
#dev.off() # remove first # for png output
The boxplot shows non-normalized residuals, with deviations of measured from predicted are measured in ml. Half of the observed values are are within the boxes. Outliers are marked by circles.
For subject 11, albumin, there is a large spread of values, coming from the systematic hiatus between 15 and 25 minutes after start. This is not a misreading of data that could be removed, so we have to live with this effect.
Subject 10, lipid has two outliers in both directions, because the last data points are off. This could be a wrong reading of the last data point alone, but we cannot be sure about it because it is the last reading, so we have to keep the value.
The second plot shows normalized residuals against fitted values. Here we have marked the data points with values of normalized residuals beyond 2 in both directions as possible outliers, which could help us identify critical cases. In addition to the outliers, we note a slight increase of residuals spread at larger values; chapter 5.2 of PB shows how to handle this rather common case in biological data analysis.
In gastempt3 we will show how the same method can be used to fit the conventional power exponential function, and we will compare the solutions.