Gastric Emptying Analysis: gastempt1

Download the program gastempt1.r into RStudio.

# gastempt1.r
# Vanilla fit of gastric emptying curves
In R, commented lines start with a #
library(nlme)  
This loads package nlme, where core function like nlme and lme are defined. For reasons only R-historian can understand, the library() functions loads the package nlme, and not the library nlme. Details of this package are documented in Pinheiro PC, Bates DM (2000) Mixed-Effects Models in S and S-Plus. Springer, ISBN 0-387-98957-0, or PB for short.

Self-Starting Emptying Function

options(digits=3) # print 3 significant digits

# Self starting functions used to fit gastric emptying curves
EmptInit0= function(mCall,LHS,data)
{
  model=as.character(mCall[[1]])
  xy=sortedXyData(mCall[["t"]],LHS,data)
  if (nrow(xy) < 6) {
    .Exit("Too few distinct input values to fit tmodel")
  }
  # we use the first of the sorted data points for
  # the initial volume (v0) estimation
  v0=xy[1,2]
  xy$y=log(xy$y/v0)
  tempt=log(0.5)/coef(lm(y~x,data=xy))[2]
  kappa = 0.8 # a good starting value
  tempt=max(min(tempt/2,200),20) # empirical correction
  value = c(v0,kappa,tempt)
  names(value)=mCall[c("v0","kappa","tempt")]
  value
}

For non-linear function fits, good initial values are required. We define function EmptInit0 that uses a linear fit to the log-transformed data points for a good guess. This function later will be extended to handle other function models, and we will hide it in a separate file because it rarely need changes.

# Standard LinExp model
SSEmptLinExp=selfStart(~v0*(1+kappa*t/tempt)*exp(-t/tempt),
  initial=EmptInit0, parameters= c("v0","kappa","tempt"))

Here we define the model function LinExp for gastric emptying with accomodation we introduced in the publications. Conventionally, the function name is prepended by SS since this is a self-starting function. Three parameters vo, kappa and tempt are to be fitted. Time t is no fit parameter, it is defined by the data values (PB ch. 8.1.2).

# Get Data comma-separated (csv) file with header
ge = read.table("gastempt.csv",sep=",",header=TRUE)

The file gastempt.csv can be downloaded from here and should be unzipped into the current directory of R. The first lines of the file read as follows:

  subj,treat,t,v
  2,alb,3,416.53
  2,alb,6,418.97
  2,alb,9,442.77
  
When analyzing your own data, the header line should be exactly the same as given here. The first number in each row is the ID of the subject or patient; you can also use initials instead of numbers here. The second field is the treatment, it should be a short text, e.g. alb for albumin. If you have one treatment only, put in a dummy text instead; in the example on this page, we do not explicitely use the treatment, but we include it in the more advanced examples. The third number is the time in minutes after start, followed by the volume. The latter does not need to have units of ml, it may also represent counts or voxels. The sort order of the data is arbitrary. Click here if you want to know how to use field separators other than the comma, or retrieve data from a database.

# Make sure subject are not treated as numbers
ge$subj = as.factor(ge$subj)
This line is only required when your subject IDs are numbers; in this case read.table assumes that subjects are continuous variables, and R would try to fit some linear regression to the data in analysis. By explicitly converting to factors, we make sure that subj is treated as a categorical variable. If the IDs were something like "A01","A02", this line were not needed, because read.table reads in string-like values as factors by default.
ge$subjtreat = as.factor(paste(ge$subj,ge$treat,sep="."))
To compute separate fits for each unique subject/treatment combination, we create a new factor by appending subject name and treatment. While function nlsList itself could handle both factors separately in a formula, we are going to pass the result as an approximation to nlme, whose current implementation can handle only one combined factor.
gelist = nlsList(v~SSEmptLinExp(t,v0,kappa,tempt)|subjtreat,data=ge)
print(gelist)

See PB chapter 8.1.3. If we had only one curve, we would call nls, which determines values of v0, kappa and tempt of the nonlinear function SSEmptLinExp such that the sum-of-square deviation of the measured volumes v is minimized. Function nlsList collects the data with common parameter subjtreat from dataframe ge and fits individual curves to each subset, returning results in a list.

Let's look at the output, printed in green here.

nlsList Fit

>gelist = nlsList(v~SSEmptLinExp(t,v0,kappa,tempt)|subjtreat,data=ge)
Error in nls(formula = formula, data = data, control = control) : 
        step factor 0.000488281 reduced below 'minFactor' of 0.000976563
Error in nls(formula = formula, data = data, control = control) : 
        step factor 0.000488281 reduced below 'minFactor' of 0.000976563

Our data contained a total of 8 gastric emptying curves, and for two of these es we got error messages. Compared to nonlinear fit software such as Microcal Origin, Sigma Plot, GraphPad Prism or Systat, function nls is very outspoken about results it does not like. It simply reports an error when it cannot find a unique combination of kappa and tempt that minimizes the least square fit. When we tried to fit a data set that gave no result in nls with GraphPad Prism, we got a result of kappa=0.0 and tempt=311, which looked resonable, but looking at the respective standard errors of 55 and 17159 (seventeen-thousand!) told us that the rude refusal of nls probably was a polite hint after all. This is not specific to LinExp model; it will also occur with conventional power-exponential fitting, but has rarely gained attention in the context of gastric emptying.

> print(gelist)
Call:
  Model: v ~ SSEmptLinExp(t, v0, kappa, tempt) | subjtreat 
   Data: ge 

Coefficients:
        v0 kappa tempt
10.alb  NA    NA    NA
10.lip 547  2.06  43.7
11.alb  NA    NA    NA
11.lip 818  1.50  67.4
12.alb 433  2.50  45.1
12.lip 607  2.01  39.5
2.alb  431  1.11  97.8
2.lip  694  1.83  39.6

Degrees of freedom: 66 total; 48 residual
Residual standard error: 25.4
We can see that two fits with albumin did give NA (not available) as a result, while the other results look reasonable. We could also check the standard errors of the estimated coefficients, but this is of minor importance as the results only serve as start values for a more sophisticated approach. A few convergence failures are no problem, but if more than half of the coefficients are NA you should consider if the model is over-parametrized for your data set.
# compute simple nlme fit
# This is similar to the example in PB, page 368 (CO2)
contr = nlmeControl(pnlsTol=0.3)
ge0.nlme=nlme(gelist,control=contr)
# Show summary
ge0.nlme
# We will do a bit more work with the coeffs, so we store them 
ge0.coef = coef(ge0.nlme)

nlme Fit

Now we come to the core function. The first line of code is a bit of tweaking. When we call nlme with the result of nlsList passed as start values, we ask for a combined fit of all data sets, asssuming that all curves have some common statistics. This will tame some of the runaway NA results of nlsList as if they had been told: "If in doubt, try to behave like your mainstream peers". Or, somewhat more mathematical: "If several pairs of kappa and tempt give the same quality of fit, take that combination that is closer to the mean set of coefficents for all emptying curves". We will see what this means when we plot the results in the next example.

You cannot be sure that nlme converges for all data sets; sometimes it simply disappears into Nowhere-Land, telling you after 5 minutes that it ran out of patience. But if nlme reports result, you can be quite sure that all for all coefficients reasonable estimates have been found. Well, almost, but it worked for us with quite a few datasets.

> print(coef(ge0.nlme))
         v0 kappa tempt
10.alb  545 0.168 151.2
10.lip  557 1.973  44.6 
11.alb 1053 0.677 113.7 
11.lip  813 1.529  67.0
12.alb  489 1.874  51.4
12.lip  601 2.052  39.3
2.alb   428 1.170  93.8
2.lip   661 2.061  37.8

There is no "NA" entry in this table of coefficients, but the two albumin curves printed in bold that did not converge in nlsList above are still outliers (Details). Both have a kappa < 1, indicating that the gastric volume did not rise above the initial volume as generated by secretion or accomodation,

The task of parametrization of the curves is finished, but medical researchers prefer to quote the half-emtpying time t50. To compute t50 from kappa and tempt, you can use our Add-On for Excel, or a Newton approximation by the R function uniroot as shown below.

# Determine t(50%) from tempt and kappa.
# No closed solution known, we use newton  approximation
t50 = function(x){
  value= uniroot(function(t)
     SSEmptLinExp(t,1,x$kappa,x$tempt) - 0.5,
     low=10,up=2000) # we expect between 10 and 2000 minutes
  value$root
}

For a nice display, we will append the t50 to the primary estimated coefficients. We do it with a for-loop here to show that R can be used like a normal programming language; in the next example we will apply cryptic-short R code instead.

ge0.coef$t50=NA
for (i in 1:nrow(ge0.coef)) {
  ge0.coef$t50[i]=t50(ge0.coef[i,])
}
print(ge0.coef)
> print(ge0.coef)
         v0 kappa tempt   t50
10.alb  545 0.168 151.2 124.4
10.lip  557 1.973  44.6 109.7
11.alb 1053 0.677 113.7 152.1
11.lip  813 1.529  67.0 143.9
12.alb  489 1.874  51.4 123.2
12.lip  601 2.052  39.3  98.5
2.alb   428 1.170  93.8 172.9
2.lip   661 2.061  37.8  95.2

We feel better know about the two fits printed in bold; even if the values for tempt are on the large side, the medically relevant values for t50 are within a reasonable range.

In the next example, we will clean up the code and add graphics for presentation and statistical diagnostic.