Checklist data entry with Excel

Excel
basic
English
data
research methods
clinical studies
Author

Dieter Menne

Published

February 2, 2026

Although Excel is only suitable for statistical analysis of clinical studies to a limited extent, it is useful for data entry and management in medical research projects. Here are a few tips, from Must-haves to The best for the end.

Must-haves…

  • The Excel file has a unique name including the creation date or version number: SchafFreiburgZK_1_0.xls or SchafFreiburgZK_10_10_09.xlsx. However, never use a name with final.xls, otherwise… final.doc

  • All worksheets have an explicitly assigned name; i.e., not just Table1, but Stem Cells.Rename this table To rename: right-click on the tab at the bottom, then Rename.

  • Empty worksheets or test pages are deleted.

  • The data is stored in long form: each row is self-descriptive. A row contains the data for a patient, test subject, or laboratory animal; if measurements were taken at different times and in different locations, the time and location are added to the patient number for a complete description. To test whether all rows are self-descriptive, check whether the meaning remains the same when the rows or columns are rearranged.

  • The initial day of a study is not day 1, but day 0.

  • The first row contains only the short names of the fields (field names), without any further decoration.

  • Field names, i.e. column headers, consist of no more than 8-12 letters, numbers, or underscores. Exceptions are possible, but only for less important fields. Keep in mind that the names are used in table headers or image descriptions, where space should not be wasted, but identification should still be possible.

  • Field names begin with a letter and contain no special characters, spaces, periods, or hyphens; the only useful special character is the underscore (_, bottom right on the keyboard). So not: Field 1, but Field1 or field_1. Umlauts or non-latin characters could be used, but a crossword puzzle spelling such as groesse instead of größecan avoid trouble.

  • Field names follow a uniform scheme, including the use of capital letters and language. Possible examples are (PatId, Age, Size, BioOss0), but not a mixture (pat, age, Size, biooss_start). English field names are preferred for scientific publications. You can only use lowercase letters and separate parts of words with an underscore, for example bloodpressure_1; but please use this scheme consistently. Consistent naming is very important, so make sure you take the time to come up with a good naming scheme.

  • For the id of the patient, use PatId or PatNr or pat_id; if only voluntary subjects are involved, use SubjID or Subj or subject; for animals, use sheep_id or rat_id.

  • Descriptions of the fields, unless they are self-explanatory, are inserted as comments in the header (right mouse button/comment).

  • If several header columns are to be clearly summarized visually (e.g., laboratory values, histology), a background color can be used, but not an additional line of text summarizing several columns.

  • The top row of the data block is reserved for field names, followed by data and nothing else. A column contains only numbers or only text, not a mixture. If data needs to be commented on (“measuring device was unreliable here”), this should be entered as a comment; see the example not readable in CRF above

  • For laboratory values, the unit of measurement is inserted in the header as a comment. If different units are used (Glucose mg/dl, mmol/l), add a separate column with the unit, do not add the unit into the same string as the number.

  • Colors can also be used to highlight rows, for example, if you want to mark certain rows for review.

  • Most often, the first column contains the patient number, followed by columns with details on the location, time, or randomization group (descriptive values). The following columns could therefore be called visit and treatment.

  • Measured values (glucose, blood pressure) are entered in the columns to the right of the descriptive values (gender, year of birth).

  • Individual columns with intermediate sizes, which Excel sometimes requires, can be hidden. Select the column by double-clicking on the header (e.g., A), right-click, and select Hide.

  • All blank rows or blank columns in the data block must be removed. Fields with missing data are left blank, not marked with NA. Even if you learned something different in your statistics course: Do not use substitute values, such as 999 or -9, when a number is missing; instead, leave the field blank. If fields are only occasionally empty, you can add a comment to the field as a reminder, such as “Measuring device defective” or “Recording unreadable.”

  • Today’s programs can handle empty fields well, and even Excel sometimes calculates correctly when a field is empty, for example, with averages. Surprisingly, however, Excel sometimes sees empty fields as zeros (#excelbug). Check this by entering a zero (0) in an empty field; if the value calculated from the field does not change, then Excel has assumed (… or rather decided) that an empty field means zero. In that case, you will have to come up with something else; there is no easy solution.

Wild decimal places in Excel
  • Each column has a uniform number of decimal places, as few as possible. A quick overview is more important than apparent accuracy; more than three digits are rarely relevant in medicine. Hiding superfluous decimal places (Format/Cells) is particularly important for calculated columns. You can and should hide decimal places as you wish, because this does not change the number internally, and statistical programs always read 3.14159, even if only 3.1 is visible. There are no excuses that cleaning up decimal places is “a lot of work”: select the range, then click this button at the top until it fits: Decimal places
  • Columns that contain counts (= number of squats) are never displayed with decimal places; use 23, not 23.00.
  • You can code patients with numbers or codes (AWS02 or P03). Use enough leading zeros so that all patients have the same number of characters, i.e., P03, not P3, if you have more than 9 patients. Reason: if you forget the zero, the sorting will be incorrect: P19, P2, P20.
  • Numeric columns are always displayed right-aligned, text columns always left-aligned; Excel does this automatically in the default setting. Centered text is only allowed in the header. If a column with right-aligned numbers suddenly and inexplicably has left-aligned text, then you have accidentally entered text, such as the letter ‘o’ instead of a zero, or you have used the wrong decimal separator, i.e., a period or comma, depending on the settings on your computer.
  • Treatment groups or patient groups are not coded numerically (1, 2), but with a text field that is not too long, such as pla for placebo, antib for antibiotics. This is different from what some old statistics books recommend.
  • Code gender with m and f, or m and w, not with 0 and 1. Only use numbers for measurements or count values. In pharma industry, which adheres to ancient SOPs, coding with numbers is still common, and the professor may have worked there at some point. However, if you work for big pharma, you do not use Excel for data entry anyway.
  • So use short text labels when the variables are not ordered, such as for gender or treatment. If the variables are ordered (smoke not, 20-39 per day, 40 per day), you can use numerical coding (0, 1, 2). If one of the values stands for “none” or “never,” assign it a zero; if “none” does not exist, as in age groups, start with 1. To make the labels of tables and graphs easier to understand, you can also use a mixed designation, such as: 0_no, 1_exposure_5_mm, 2_exposure_10mm, 3_osteolysis; this labeling enforces the sort order.
  • Likert scales are an important special case of ordered variables. I recommend the following coding: strongly agree (2), agree (1), neutral (0), disagree (-1), strongly disagree (-2), because then, when calculating the mean, agreement is expressed with positive numbers and disagreement with negative numbers. Wikipedia (Likert scales) has a slightly different coding system that is also useful.

Example of study data in ExcelAlignment

Example of data entry. The data is formatted as a Table (note the uppercase), and the drop-down buttons appear automatically. The column headers are arranged vertically so that they take up less width and are aligned so that the text does not collide with the drop-down buttons — see image excerpt on the right. All column headers have a comment, indicated by the red check mark in the upper right corner. Unnecessary columns on the right and bottom are hidden. Gender, treatment group, and diabetes are coded with short names, not number codes. To visually separate the blood pressure values (sys, dia) for the two visits, they have been highlighted in color; you can also highlight only the header in color, which is sometimes clearer. Use unsaturated colors for color highlighting, not bright red, which looks like last millennium.


Makes life easier…

  • All columns are formatted to the smallest possible width by double-clicking on the separator in the header. If there are many columns, select the entire area (Ctrl-A) and then adjust automatically with Format/Column Width.
  • You can get a better overview of the data by arranging the field names vertically in the first column and using a smaller bold font:

Vertical font
  • The top row and the column with patient numbers are fixed so that they always remain visible. Go to field B2, and select View/Freeze Panes.
  • Hiding unnecessary rows and columns at the bottom and on the right-hand side improves the overview. This allows you to scroll faster and avoids ending up in no man’s land. To do this, select the first empty column on the right by clicking on the header; hold down the Shift key (do not release), End, right arrow. This selects the area. Right-click, Hide. Do the same at the bottom: hold down the Shift key, End, down arrow.
  • Try End, right arrow with and without the Shift key pressed if you were previously in the data area. And also End, down arrow. No more fiddling around with the mouse—it’s faster with the keyboard!

Filter Filter

  • With the Filter function, you can quickly identify the largest and smallest values in a column and thus check whether there are any outliers due to typing errors in a column. The most common input errors are those in which the decimal point and the decimal comma are mixed up, depending on the system settings. Excel will accept a mixture of 1.3 and 1.5 (did you notice?) in a column without complaint, but you will get error messages when calculating. At least if you’re lucky; if you’re unlucky, the result will simply be wrong and no one will notice.
  • If you want to avoid making such errors when entering data, you can add data validation to fields; details can be found under this keyword in Excel Help or on the Internet, for example here.

Furthermore, the Chef recommends…

As table
  • You can get an even better overview than with the filter by converting the area into a Table (I will use the uppercase term here). Select the entire data area and then choose Format as Table. It doesn’t matter which color scheme you use, as the color is only decorative. You can now change the order of the rows as you wish by sorting them and then unsort them again without having to worry about individual columns getting mixed up. You also don’t need to copy formulas within a column: simply insert the formula at the top once and it will be copied automatically. * If you are using a Table, you can also make your references into column names. Example: You want to calculate BMI from weight in kg and height in cm, and you have the columns weight (column D) and height (column E). This is how BMI is usually calculated in a formula if the range is not a Table:

Classic formula
  • And this is how it works if you have converted the range into a table. Especially if the fields are not close together, this variant makes it easier to understand what is being done and to find errors. When you write the formula and enter the opening bracket [, a list of suggestions appears from which you can choose.

Formula with column names

Statistics with Excel

The statistical functions in Excel are poorly designed, sometimes give incorrect results, and have not been improved for decades; the terminology used differs from the usual Microsoft terminology. Apple users are also at a disadvantage, as the statistical functions are only available in Excel 2016 and later; Apple’s quality control seems to have thwarted this development. Unfortunately, it is not easy to find anything better; commercial software, such as XLSTAT, is designed more for the business IT market.

In the Windows version, you will find the statistical functions on the Data page, on the far right in the Data Analysis block. If this block is not visible, go to File/Options and activate the analysis functions. You don’t normally need the analysis functions - VBA. Analysis functions are not active Analysis functions are activated

Excel’s worst bug

I recommend keeping all data for a study on the first worksheet in Excel so that you can keep track of everything; let’s call this sheet Data. Unfortunately, the data desert often spreads across quite a few columns, and you want to do the analysis on subpages where only a few important columns are visible that you need for this partial analysis.

Of course, you can use copy/paste to extract the crucial columns and transfer them to a new page. But what happens when you add more data or correct errors on the data page? Do you really remember to transfer all changes to the evaluation as well? I don’t…

There is a solution… but only almost, because Microsoft messed up in the first version of Excel and can’t change it now. Let’s say you have the following on the data page:

Data page

The value in field B3 is missing, so the field is empty. You only want to see these values on the Statistics page, but in such a way that changes to the data page are reflected here. This is easy to do: Write

=Data!A1

in field A1 on the statistics page; Excel will then retrieve the result live from the Data page. Copy the formula to the range A1..B4

Statistics page

Perfect? Almost. Field B3 contains a 0 (zero!) instead of a blank field. Unfortunately, in medicine, empty fields often occur that definitely do not mean zero. There is a not very satisfactory alternative: use the following formula for field A1:

=IF(Data!A1 = “”, “”,Data!A1)

In words: If field A1 on the Data page is empty, fill in the empty string “”; otherwise, take the value from A1. Depending on your computer settings, you may need to use a ; instead of a comma , in the above formula, but you will be notified of this when entering it. Copy this formula into all target fields. The solution is not ideal because, for Excel, the empty string “” is different from an empty field, even if you cannot see it. So far, however, it has not gone wrong; or maybe it has, and I haven’t noticed? * If you want to avoid making such errors when entering data, you can add data validation to fields; details can be found under this keyword in Excel Help or on the Internet, for example here.

Saving the best for the end: Pivot tables

After Excel’s biggest sins, let’s move on to Excel’s best feature, which is unfortunately used far too rarely: Pivot tables and pivot charts can be extremely helpful when creating tables and cross tables. You’ll need a few hours to try out the possibilities using simple examples, but the investment will definitely pay off. You can find documentation on the help pages and on the Internet. Unfortunately, you can’t use them to calculate statistical tests, but if you can easily fill a few pages in your work, that’s nice too.

Here’s an example: Different dosages of a medication and how often its effectiveness was rated as Very good, .. Insufficient. ____ PivotTableA pivot table; if you want to include this in your work, you should make the column and row labels a little more reviewer-friendly, i.e., omit “Number of ID row labels” and “Column label.”

Try creating this table the conventional way!