BMR 617: Statistical Techniques for the Biomedical Sciences

Types of Variable

We make an important distinction between categorical and quantitative variables. Generally speaking, a categorial variable is one which takes on one of a fixed set of values, whereas a quantitative variable is one which is the result of a measurement. We further classify categorical variables as nominal or ordinal, and quantitative variables as interval or ratio.

Types of Data

Remember the NOIR mnemonic:
Nominal
Categorical, no ordering. Examples: Gender, Race, Genotype.
Ordinal
Categorical with an order. Examples: Socio-economic status, Pain scale
Interval
Numerical data on a scale (i.e. it has units) but no meaningful zero. Examples: temperature in Celcius or Fahrenheit. Time of day or date.
Ratio
Numerical data with scale and zero. Most measurement data is in this category.

Determining the type of the variable

To determine the type of variable we are using, we can ask the following questions:

Nominal Variables

Nominal variables are those whose values have no ordering; they are just qualitative categories. Examples:

Ordinal Variables

Ordinal variables are variables with qualitative categories which have an ordering, but no scale.

Example: Economic status.

Interval Variables

Interval variables are variables with ordering and scale, but with no meaningful zero.

Examples:

Operations on Interval Variables

Computing differences of values of interval variables makes sense.

For example, computing a change in temperature (difference between two temperatures) makes sense, because a change of one unit (one degree) always means the same "amount of temperature".

Computing ratios of values of interval variables does not make sense, because there is no meaningful zero

Ratio Variables

Ratio variables have both scale and a meaningful zero. Most measurements you work with will be ratio variables.

Examples: Length, mass, count data (e.g. number of cells), etc.

It makes sense to compute differences and ratios of ratio variables.

Note that the difference of values of an interval variable is always a ratio variable.

Examples

VariableType
Tumor grade
 Nominal
 Ordinal
 Interval
 Ratio
Heart Rate
 Nominal
 Ordinal
 Interval
 Ratio
Color
 Nominal
 Ordinal
 Interval
 Ratio
Weight (mass)
 Nominal
 Ordinal
 Interval
 Ratio
Disease status
 Nominal
 Ordinal
 Interval
 Ratio
Pain scale
 Nominal
 Ordinal
 Interval
 Ratio
Age
 Nominal
 Ordinal
 Interval
 Ratio
Genotype
 Nominal
 Ordinal
 Interval
 Ratio
CT values from RT-PCR
 Nominal
 Ordinal
 Interval
 Ratio

Categorical and Quantitative Variables

Nominal and Ordinal variables are categorical. They take on specific values only.

Interval and ratio variables are quantitative. They measure some value.

When we come to visualize and analyze data, the distinction between categorical and quantitative variables is the most important one. Knowing which variables are categorical and which are quantitative goes a long way to determining the correct presentation and analysis of the data.

Types of variable in R

R supports the notion of types of variable

Open Rstudio and type the following in the console (don't worry about what these functions mean yet):

 	
 	x <- rep(c("a", "b", "c"),  each=2)
 	y <- rnorm(6)
 	
 	

Look in the environment tab? What is the value of x? Can you guess what the function rep means?

What is the value of y?

In R, we can ask what type a variable is using the class function.

Try the following:

 	
 	class(x)
 	class(y)
 	
 	

Look in the "Environment" tab. Can you interpret everything that's displayed there?

Variable Types

Thinking in statistical terms, are x and y categorical or quantitative?

So in R, a character variable is , and a numeric variable is .

Example

Imagine a genetic study of obesity, in which we want to determine if the genotype of a particular locus confers obesity.

We could recruit a cohort of patients, measure their BMI, categorize them as obese (yes or no), and determine their genotype at a the locus of interest.

VariableCategorical or Quantitative
BMI
 Categorical
 Quantitative
Obese
 Categorical
 Quantitative
Genotype
 Categorical
 Quantitative

Factors in R

In R, we could represent our genotype variable with a character. Try:

 	
    gt <- c("CC", "CC", "CT", "TT", "CT", "CC")
    
    

Look in the environment. What is the type of gt? Is this what you expect?

What does the following give?

	
	gt[[3]]
	
	
What happens if you do
	
	gt[[2]] <- "CT"
	
	
What about
	
	gt[[2]] <- "Meaningless"
	
	

When we have a variable that can only take on a fixed set of values, it’s useful to force R to only let it have those values.

  • This is a common feature of many categorical variables.

In R, a factor gives this functionality.

Try the following:

	
	gt <- factor(c("CC","CC","CT","TT","CT","CC"))
	
	
What does the environment tab display for gt now?

What if you display it in the console? (Just type gt in the console.)

Missing values and incorrect values in factors

Many times when working with data, some values are missing.

  • Particularly true for clinical and patient/subject-based studies.

R reserves the special value NA to represent a missing value.

Now we have gt as a factor, what happens if you do

	
	gt[[2]] <- “nonsense”
	gt
    
    

Using a factor in R allows us to force all values to either be meaningful, or missing.

Summary

There are four types of variable:

  • Nominal
  • Ordinal
  • Interval
  • Ratio

Nominal and ordinal variables are categorical.

Interval and ratio variables are quantitative.

The distinction between categorical and quantitative drives the decision as to how to visualize and analyze data.

Summary of R work

In R, we have learned:

  • The <- combination of symbols assigns a value to a variable.
    • Note you can also use = here: x=c(2,3,5,7,8), but I prefer <-
  • You can access individual elements of a variable using [[ ]]
    • x[[3]] will give the third element of x.
  • x[[2]] <- 5 will change the second element of x to 5.
  • You can find the type of a variable x using class(x)
  • Use character and factor types for categorical variables, numeric for quantitative variables
    • We’ll see other types during the course
  • The special value NA represents a missing value.