Download the program gastempt1.r into RStudio.
# gastempt1.r # Vanilla fit of gastric emptying curves
library(nlme)This loads package nlme, where core function like nlme and lme are defined. For reasons only R-historian can understand, the library() functions loads the package nlme, and not the library nlme. Details of this package are documented in Pinheiro PC, Bates DM (2000) Mixed-Effects Models in S and S-Plus. Springer, ISBN 0-387-98957-0, or PB for short.
Self-Starting Emptying Function
options(digits=3) # print 3 significant digits # Self starting functions used to fit gastric emptying curves EmptInit0= function(mCall,LHS,data) { model=as.character(mCall[[1]]) xy=sortedXyData(mCall[["t"]],LHS,data) if (nrow(xy) < 6) { .Exit("Too few distinct input values to fit tmodel") } # we use the first of the sorted data points for # the initial volume (v0) estimation v0=xy[1,2] xy$y=log(xy$y/v0) tempt=log(0.5)/coef(lm(y~x,data=xy))[2] kappa = 0.8 # a good starting value tempt=max(min(tempt/2,200),20) # empirical correction value = c(v0,kappa,tempt) names(value)=mCall[c("v0","kappa","tempt")] value }
For non-linear function fits, good initial values are required. We define function EmptInit0 that uses a linear fit to the log-transformed data points for a good guess. This function later will be extended to handle other function models, and we will hide it in a separate file because it rarely need changes.
# Standard LinExp model SSEmptLinExp=selfStart(~v0*(1+kappa*t/tempt)*exp(-t/tempt), initial=EmptInit0, parameters= c("v0","kappa","tempt"))
Here we define the model function LinExp for gastric emptying with accomodation we introduced in the publications. Conventionally, the function name is prepended by SS since this is a self-starting function. Three parameters vo, kappa and tempt are to be fitted. Time t is no fit parameter, it is defined by the data values (PB ch. 8.1.2).
# Get Data comma-separated (csv) file with header ge = read.table("gastempt.csv",sep=",",header=TRUE)
The file gastempt.csv can be downloaded from here and should be unzipped into the current directory of R. The first lines of the file read as follows:
subj,treat,t,v
2,alb,3,416.53
2,alb,6,418.97
2,alb,9,442.77
When analyzing your own data, the header line should be exactly the same as given here. The
first number in each row is the ID of the subject or patient; you can also use initials instead of numbers here. The second field is the treatment, it should be a short text, e.g. alb for albumin. If you have one treatment only, put in a dummy text instead; in the example on this page, we do not explicitely use the treatment, but we include it in the more advanced examples. The third number is the time in minutes after start, followed by the volume. The latter does not need to have units of ml, it may also represent counts or voxels. The sort order of the data is arbitrary. Click here if you want to know how to use field separators other than the comma, or retrieve data from a database.
# Make sure subject are not treated as numbers ge$subj = as.factor(ge$subj)This line is only required when your subject IDs are numbers; in this case read.table assumes that subjects are continuous variables, and R would try to fit some linear regression to the data in analysis. By explicitly converting to factors, we make sure that subj is treated as a categorical variable. If the IDs were something like "A01","A02", this line were not needed, because read.table reads in string-like values as factors by default.
ge$subjtreat = as.factor(paste(ge$subj,ge$treat,sep="."))
gelist = nlsList(v~SSEmptLinExp(t,v0,kappa,tempt)|subjtreat,data=ge) print(gelist)
See PB chapter 8.1.3. If we had only one curve, we would call nls, which determines values of v0, kappa and tempt of the nonlinear function SSEmptLinExp such that the sum-of-square deviation of the measured volumes v is minimized. Function nlsList collects the data with common parameter subjtreat from dataframe ge and fits individual curves to each subset, returning results in a list.
Let's look at the output, printed in green here.
nlsList Fit
>gelist = nlsList(v~SSEmptLinExp(t,v0,kappa,tempt)|subjtreat,data=ge) Error in nls(formula = formula, data = data, control = control) : step factor 0.000488281 reduced below 'minFactor' of 0.000976563 Error in nls(formula = formula, data = data, control = control) : step factor 0.000488281 reduced below 'minFactor' of 0.000976563
Our data contained a total of 8 gastric emptying curves, and for two of these es we got error messages. Compared to nonlinear fit software such as Microcal Origin, Sigma Plot, GraphPad Prism or Systat, function nls is very outspoken about results it does not like. It simply reports an error when it cannot find a unique combination of kappa and tempt that minimizes the least square fit. When we tried to fit a data set that gave no result in nls with GraphPad Prism, we got a result of kappa=0.0 and tempt=311, which looked resonable, but looking at the respective standard errors of 55 and 17159 (seventeen-thousand!) told us that the rude refusal of nls probably was a polite hint after all. This is not specific to LinExp model; it will also occur with conventional power-exponential fitting, but has rarely gained attention in the context of gastric emptying.
> print(gelist) Call: Model: v ~ SSEmptLinExp(t, v0, kappa, tempt) | subjtreat Data: ge Coefficients: v0 kappa tempt 10.alb NA NA NA 10.lip 547 2.06 43.7 11.alb NA NA NA 11.lip 818 1.50 67.4 12.alb 433 2.50 45.1 12.lip 607 2.01 39.5 2.alb 431 1.11 97.8 2.lip 694 1.83 39.6 Degrees of freedom: 66 total; 48 residual Residual standard error: 25.4
# compute simple nlme fit # This is similar to the example in PB, page 368 (CO2) contr = nlmeControl(pnlsTol=0.3) ge0.nlme=nlme(gelist,control=contr) # Show summary ge0.nlme # We will do a bit more work with the coeffs, so we store them ge0.coef = coef(ge0.nlme)
nlme Fit
Now we come to the core function. The first line of code is a bit of tweaking. When we call nlme with the result of nlsList passed as start values, we ask for a combined fit of all data sets, asssuming that all curves have some common statistics. This will tame some of the runaway NA results of nlsList as if they had been told: "If in doubt, try to behave like your mainstream peers". Or, somewhat more mathematical: "If several pairs of kappa and tempt give the same quality of fit, take that combination that is closer to the mean set of coefficents for all emptying curves". We will see what this means when we plot the results in the next example.
You cannot be sure that nlme converges for all data sets; sometimes it simply disappears into Nowhere-Land, telling you after 5 minutes that it ran out of patience. But if nlme reports result, you can be quite sure that all for all coefficients reasonable estimates have been found. Well, almost, but it worked for us with quite a few datasets.
> print(coef(ge0.nlme)) v0 kappa tempt 10.alb 545 0.168 151.2 10.lip 557 1.973 44.6 11.alb 1053 0.677 113.7 11.lip 813 1.529 67.0 12.alb 489 1.874 51.4 12.lip 601 2.052 39.3 2.alb 428 1.170 93.8 2.lip 661 2.061 37.8
There is no "NA" entry in this table of coefficients, but the two albumin curves printed in bold that did not converge in nlsList above are still outliers (Details). Both have a kappa < 1, indicating that the gastric volume did not rise above the initial volume as generated by secretion or accomodation,
The task of parametrization of the curves is finished, but medical researchers prefer to quote the half-emtpying time t50. To compute t50 from kappa and tempt, you can use our Add-On for Excel, or a Newton approximation by the R function uniroot as shown below.
# Determine t(50%) from tempt and kappa. # No closed solution known, we use newton approximation t50 = function(x){ value= uniroot(function(t) SSEmptLinExp(t,1,x$kappa,x$tempt) - 0.5, low=10,up=2000) # we expect between 10 and 2000 minutes value$root }
For a nice display, we will append the t50 to the primary estimated coefficients. We do it with a for-loop here to show that R can be used like a normal programming language; in the next example we will apply cryptic-short R code instead.
ge0.coef$t50=NA for (i in 1:nrow(ge0.coef)) { ge0.coef$t50[i]=t50(ge0.coef[i,]) } print(ge0.coef)
> print(ge0.coef) v0 kappa tempt t50 10.alb 545 0.168 151.2 124.4 10.lip 557 1.973 44.6 109.7 11.alb 1053 0.677 113.7 152.1 11.lip 813 1.529 67.0 143.9 12.alb 489 1.874 51.4 123.2 12.lip 601 2.052 39.3 98.5 2.alb 428 1.170 93.8 172.9 2.lip 661 2.061 37.8 95.2
We feel better know about the two fits printed in bold; even if the values for tempt are on the large side, the medically relevant values for t50 are within a reasonable range.
In the next example, we will clean up the code and add graphics for presentation and statistical diagnostic.