Before you can choose an average, a chart or a statistical test, you have to know what kind of data you are holding. Getting this one step right quietly determines half the statistics questions on the AKT, because the data type dictates everything that follows.
The two families
Every variable you meet is either categorical (it puts each subject into a labelled group) or numerical (it attaches a meaningful number). Almost every classification question on the exam comes down to deciding which family a variable belongs to, then how far down the tree it sits.
Categorical data
Categorical variables sort people into named groups. They come in two flavours.
The trap with ordinal data is that it looks numerical. A pain score of 8 is worse than 4, but it is not "twice as painful" — the rungs of the ladder are not evenly spaced. That is why a pain score is ordinal, not numerical, and why you cannot meaningfully take its mean.
Numerical data
Numerical variables carry a genuine quantity, and they split by whether you count or measure.
Continuous data is then subdivided by whether zero means "none at all".
Worked examples: classify the variable
Run each one down the tree.
- Blood group → categorical, no order → nominal.
- NYHA class I–IV → ordered but unequal gaps → ordinal.
- Number of A&E visits last year → a count → numerical, discrete.
- Systolic blood pressure (mmHg) → measured, true zero → numerical, continuous, ratio.
- Year of diagnosis (e.g. 2019) → measured, equal gaps, no true zero → numerical, continuous, interval.
- Patient satisfaction on a 5-point Likert scale → ordered, unequal gaps → ordinal.
Why this is the foundation of everything
The whole reason the exam cares is that data type dictates which summary statistic and which test you may use. The mode is the only average you can quote for nominal data; the median suits ordinal and skewed data; the mean belongs to symmetrical numerical data. Likewise a chi-squared test compares categorical groups, while a t-test compares means of numerical data. Pick the wrong tool for the data type and the analysis is invalid — a point this module returns to on every later page.
High-yield summary
- Categorical = labelled groups: nominal (no order — blood group, ethnicity) and ordinal (ordered, unequal gaps — NYHA class, pain 0–10, Likert).
- Numerical = real quantities: discrete (counts) and continuous (measured — weight, BP).
- Continuous splits into interval (no true zero — °C, calendar year) and ratio (true zero, ratios meaningful — weight, height, Kelvin).
- "Does zero mean none?" → yes = ratio, no = interval.
- Data type dictates the summary statistic and the statistical test — get this right first and the rest follows.
Check your understanding
Check your understanding
3 questionsQ1.A GP audit records each patient's NYHA heart-failure class (I, II, III or IV). What type of data is this?
Q2.Which of the following variables is an example of interval (rather than ratio) data?
Q3.Why does identifying the data type matter before analysing a variable?