Checklist data entry with Excel
Although Excel is only suitable for statistical analysis of clinical studies to a limited extent, it is useful for data entry and management in medical research projects. Here are a few tips, from Must-haves to The best for the end.
Mustard-haves…
Unique Excel file name
The Excel file has a unique name including the creation date or version number: SheepFreiburgZK_1_0.xls or SheepFreiburgZK_10_10_09.xlsx. However, never use a name with final.xls, otherwise… 
Worksheet names
All worksheets have an explicitly assigned name; i.e., not just Table1, but Stem Cells.
To rename: right-click on the tab at the bottom, then Rename.
Remove empty worksheets
*Empty worksheets** or test pages are deleted.
Each row must be self-descriptive
The data is stored in wide form: each row is self-descriptive. A row contains the data for a patient, test subject, or laboratory animal; if measurements were taken at different times and in different locations, the time and location are added to the patient number for a complete description. To test whether all rows are self-descriptive, check whether the meaning remains the same when the rows or columns are rearranged.
Field names - column headers
Same data type in each column
The top row of the data block is reserved for field names, followed by data and nothing else. A column contains only numbers or only text, not a mixture. If data needs to be commented on (“measuring device was unreliable here”), this should be entered as a comment; see the example not readable in CRF in the image below.
Row 1 is only headings
The first row contains only the short names of the fields (field names, or column headers), without any further decoration.
Short names
Field names, i.e. column headers, consist of no more than 8-15 letters, numbers, or underscores. Exceptions are possible, but only for less important fields. Keep in mind that the names are used in table headers or image descriptions, where space should not be wasted, but identification should still be possible.
No special characters in field names
Field names begin with a lowercase letter and contain no special characters, spaces, periods, or hyphens; the only useful special character is the underscore (_, bottom right on the keyboard). So not: Field 1, but field_1. Umlauts or non-latin characters should not be used.
No spaces in field names!
Do not uses space in field names/column headings. It is not I take antiretroviral therapy daily, but rather art_daily.
Uniform naming scheme
Field names follow a uniform scheme of only lower case numbers and language. Possible examples are (pat_id, age, weight_0) English field names are preferred for scientific publications. Use lowercase letters and separate parts of words with an underscore, for example bloodpressure_1; but please use this scheme consistently. Consistent naming is very important, so make sure you take the time to come up with a good naming scheme.
Initial day of study is day 0
The initial day of a study is not day 1, but day 0; so use weight_0, not weight_1 for the weight at study start.
All uppercase? If you really must…
Supervisors who had worked in big-pharma before often use all-uppercase names, such as PATID, AGE, WEIGHT0, because long time ago statistical systems only understood these. If the boss wants it, you must comply, but it looks terribly 20th century.
When uppercase letter may be Ok
Sometimes, well-known initializms (spoken letter by letter) like BMI (B-M-I), AST or ALT are required. These are the only exceptions that might be used for uppercase in field names, in case you prefer that. Personally, I also use lowercase in header names (bmi, ast, alt) here.
Patient ID or animal ID in first column
Most often, the first column contains the patient number, followed by columns with details on the location, time, or randomization group (descriptive values). The following columns could therefore be called visit and treatment.
How to code the patient id
For the id of the patient, use PatId or PatNr or pat_id; the last is preferred. If only voluntary subjects are involved, use SubjID or Subj or subject_id; for animals, use sheep_id or rat_id.
You can code patients with codes (aws_02 or p_03). Using integers for patients is sometimes required by design, Use enough leading zeros so that all patients have the same number of characters, i.e., p03, not p3, if you have more than 9 patients. Reason: if you forget the zero, the sorting will be incorrect: p19, p2, p20.
Column sequence
Target values (glucose, blood pressure) are entered in the columns to the right of the independent values (gender, year of birth).
Visual guides and documentation
Descriptions of the fields, unless they are self-explanatory, are inserted as comments in the header (right mouse button/comment).
Use of color to structure the data block
If several header columns are to be clearly summarized visually (e.g., laboratory values, histology), a background color can be used. Do not use an additional line of text spanning several colors.
Colors can also be used to highlight rows, for example, if you want to mark certain rows for review.
Include units in comments
For laboratory values, the unit of measurement is inserted in the header as a comment. If different units are used (Glucose mg/dl, mmol/l), add a separate column with the unit, do not add the unit into the same string as the number.
Hide intermediate values
Individual columns with intermediate content, which Excel sometimes requires, can be hidden. Select the column by double-clicking on the header (e.g., A), right-click, and select Hide.
Remove blank rows
All blank rows or blank columns in the data block must be removed.
Missing data
Fields with missing data are left blank, not marked with NA. Even if you learned something different in your statistics course: Do not use substitute values, such as 999 or -9, when a number is missing; instead, leave the field blank. If fields are only occasionally empty, you can add a comment to the field as a reminder, such as “Measuring device defective” or “Recording unreadable.”
Today’s programs can handle empty fields well, and even Excel sometimes calculates correctly when a field is empty, for example, with averages. Surprisingly, however, Excel sometimes sees empty fields as zeros (Section 3)
Formatting of numbers

As few as possible decimals
Each column has a fixed number of decimals, as few as possible. A quick overview is more important than apparent accuracy; more than three digits are rarely relevant in medicine. Hiding superfluous decimal places (Format/Cells) is particularly important for calculated columns. You can and should hide decimal places as you wish, because this does not change the number internally, and statistical programs always read 3.14159, even if only 3.1 is visible. There are no excuses that cleaning up decimal places is “a lot of work”: select the range, then click this button at the top until it fits: 
Example: BMI is computed as weight/(height * height) *10000, when height is given in centimeters. A typical computed number is 27.6816609; more than one decimal is just nonsense, so truncate this to 27.6.
No decimals for counts
Columns that contain counts (= number of squats) are never displayed with decimal places; use 23, not 23.00.
Numeric columns right aligned
Numeric columns are always displayed right-aligned, text columns always left-aligned; Excel does this automatically in the default setting. Centered text is only allowed in the header. If a column with right-aligned numbers suddenly and inexplicably has left-aligned text, then you have accidentally entered text, such as the letter ‘o’ instead of a zero, or you have used the wrong decimal separator, i.e., a period or comma, depending on the settings on your computer.
Use string for treatment groups, not numbers
Treatment groups or patient groups are not coded numerically (1, 2), but with a text field that is not too long, such as pla for placebo, antib for antibiotics. This is different from what some old statistics books recommend.
Code gender with a string, not 0/1
Code gender with m f, d, not with 0 and 1. Numbers should only be used for measurements or count values. In pharma industry, which adheres to ancient SOPs, coding with numbers is still common, and the professor may have worked there at some point. However, if you work for big pharma, you do not use Excel for data entry anyway.
Ordered and unordered categorical
Use short text labels when the variables are not ordered, such as for gender or treatment. If the variables are ordered (smoke not, 20-39 per day, 40 per day), you can use numerical coding (0, 1, 2). If one of the values stands for “none” or “never,” assign it a zero; if “none” does not exist, as in age groups, start with 1. To make the labels of tables and graphs easier to understand, you can also use a mixed designation, such as: 0_no, 1_exposure_5_mm, 2_exposure_10mm, 3_osteolysis; this labeling enforces the sort order.
Likert scales
Likert scales are an important special case of ordered variables. I recommend the following coding: strongly agree (2), agree (1), neutral (0), disagree (-1), strongly disagree (-2), because then, when calculating the mean, agreement is expressed with positive numbers and disagreement with negative numbers. Wikipedia (Likert scales) has a slightly different coding system that is also useful.

{Example of data entry. The data is formatted as a Table (note the uppercase), and the drop-down buttons appear automatically. The column headers are arranged vertically so that they take up less width and are aligned so that the text does not collide with the drop-down buttons — see image excerpt on the right. All column headers have a comment, indicated by the red check mark in the upper right corner. Unnecessary columns on the right and bottom are hidden. Gender, treatment group, and diabetes are coded with short names, not number codes. To visually separate the blood pressure values (sys, dia) for the two visits, they have been highlighted in color; you can also highlight only the header in color, which is sometimes clearer. Use unsaturated colors for color highlighting, not bright red, which looks like last millennium.}
Makes life easier…
Shrink-fit columns
All columns width widths are shrunken double-clicking on the separator in the header.
If there are many columns, select the entire area (Ctrl-A) and then adjust automatically with Format/Column Width (not for Google Sheets).
Rotate text in header vertically
Shrink-fit works even better when you make the header very narrow by arranging the field names vertically in the first column and using a smaller bold font:

Fix the top row {#sec-fix-toprow
The top row and the column with patient numbers are fixed so that they always remain visible. Go to field B2, and select View/Freeze Panes.
Hide unused columns and rows
Hiding unnecessary rows and columns at the bottom and on the right-hand side improves the overview. This allows you to scroll faster and avoids ending up in no man’s land. To do this, select the first empty column on the right by clicking on the header; hold down the Shift key (do not release), End, right arrow. This selects the area. Right-click, Hide. Do the same at the bottom: hold down the Shift key, End, down arrow.
Use filter to check for invalid data
With the Filter function, you can quickly identify the largest and smallest values in a column and thus check whether there are any outliers due to typing errors in a column. The most common input errors are those in which the decimal point and the decimal comma are mixed up, depending on the system settings. Excel will accept a mixture of 1.3 and 1.5 (did you notice?) in a column without complaint, but you will get error messages when calculating. At least if you’re lucky; if you’re unlucky, the result will simply be wrong and no one will notice.
Add data validation
If you want to avoid making such errors when entering data, you can add data validation to fields; details can be found under this keyword in Excel Help or on the Internet, for example here.
Furthermore, the Chef recommends…

Use Tables (note uppercase)
You can get an even better overview than with the filter by converting the area into a Table (I will use the uppercase term here). Select the entire data area and then choose Format as Table. It doesn’t matter which color scheme you use, as the color is only decorative. You can now change the order of the rows as you wish by sorting them and then unsort them again without having to worry about individual columns getting mixed up. You also don’t need to copy formulas within a column: simply insert the formula at the top once and it will be copied automatically. * If you are using a Table, you can also make your references into column names.
Example: You want to calculate BMI from weight in kg and height in cm, and you have the columns weight (column D) and height (column E). This is how BMI is usually calculated in a formula if the range is not a Table:

And this is how it works if you have converted the range into a table. Especially if the fields are not close together, this variant makes it easier to understand what is being done and to find errors. When you write the formula and enter the opening bracket [, a list of suggestions appears from which you can choose.

Statistics with Excel
The statistical functions in Excel are poorly designed, sometimes give incorrect results, and have not been improved for decades; the terminology used differs from the usual Microsoft terminology. Apple users are also at a disadvantage, as the statistical functions are only available in Excel 2016 and later; Apple’s quality control seems to have thwarted this development. Unfortunately, it is not easy to find anything better; commercial software, such as XLSTAT, is designed more for the business IT market.
In the Windows version, you will find the statistical functions on the Data page, on the far right in the Data Analysis block. If this block is not visible, go to File/Options and activate the analysis functions. You don’t normally need the analysis functions - VBA.

Excel’s worst bug
I recommend keeping all data for a study on the first worksheet in Excel so that you can keep track of everything; let’s call this sheet Data. Unfortunately, the data desert often spreads across quite a few columns, and you want to do the analysis on subpages where only a few important columns are visible that you need for this partial analysis.
Of course, you can use copy/paste to extract the crucial columns and transfer them to a new page. But what happens when you add more data or correct errors on the data page? Do you really remember to transfer all changes to the evaluation as well? I don’t…
There is a solution… but only almost, because Microsoft messed up in the first version of Excel and can’t change it now. Let’s say you have the following on the data page:

The value in field B3 is missing, so the field is empty. You only want to see these values on the Statistics page, but in such a way that changes to the data page are reflected here. This is easy to do: Write
=Data!A1
in field A1 on the statistics page; Excel will then retrieve the result live from the Data page. Copy the formula to the range A1..B4

Perfect? Almost. Field B3 contains a 0 (zero!) instead of a blank field. Unfortunately, in medicine, empty fields often occur that definitely do not mean zero. There is a not very satisfactory alternative: use the following formula for field A1:
=IF(Data!A1 = "", "",Data!A1)
In words: If field A1 on the Data page is empty, fill in the empty string ""; otherwise, take the value from A1. Depending on your computer settings, you may need to use a ; instead of a comma , in the above formula, but you will be notified of this when entering it. Copy this formula into all target fields.
Saving the best for the end: Pivot tables
After Excel’s biggest sins, let’s move on to Excel’s best feature, which is unfortunately used far too rarely: Pivot tables and pivot charts can be extremely helpful when creating tables and cross tables. You’ll need a few hours to try out the possibilities using simple examples, but the investment will definitely pay off. You can find documentation on the help pages and on the Internet. Unfortunately, you can’t use them to calculate statistical tests, but if you can easily fill a few pages in your work, that’s nice too.
Here’s an example: Different dosages of a medication and how often its effectiveness was rated as Very good, .. Insufficient.

A pivot table; if you want to include this in your work, you should make the column and row labels a little more reviewer-friendly, i.e., omit Number of ID row labels and Column label`.
Pivot tables also work in Google Sheets, and even better: You can use the median function, which is missing in Excel.
Try creating this table the conventional way!

