Types of Data — Research & Statistics | AKT Prep

Before you can choose an average, a chart or a statistical test, you have to know what kind of data you are holding. Getting this one step right quietly determines half the statistics questions on the AKT, because the data type dictates everything that follows.

The two families

Every variable you meet is either categorical (it puts each subject into a labelled group) or numerical (it attaches a meaningful number). Almost every classification question on the exam comes down to deciding which family a variable belongs to, then how far down the tree it sits.

The data-type tree. Where a variable sits decides which average and which test are legitimate.

Categorical data

Categorical variables sort people into named groups. They come in two flavours.

Nominal

Categories with no natural order. Blood group (A, B, AB, O), ethnicity, eye colour, alive/dead. You cannot say one category is "more" than another.

Ordinal

Categories that have a logical order, but the gaps between them are not equal or measurable. NYHA heart-failure class (I–IV), a pain score of 0–10, tumour stage, or a Likert scale (strongly disagree → strongly agree).

The trap with ordinal data is that it looks numerical. A pain score of 8 is worse than 4, but it is not "twice as painful" — the rungs of the ladder are not evenly spaced. That is why a pain score is ordinal, not numerical, and why you cannot meaningfully take its mean.

Numerical data

Numerical variables carry a genuine quantity, and they split by whether you count or measure.

Discrete

Whole-number counts that cannot take in-between values: number of GP attendances per year, number of children, number of seizures.

Continuous

Measured quantities that can take any value within a range, limited only by the precision of your instrument: weight, height, blood pressure, serum sodium.

Continuous data is then subdivided by whether zero means "none at all".

Interval

Equal gaps between values, but no true zero — zero is just a point on the scale, not an absence. Temperature in °C and the calendar year are the classic examples: 0 °C is not "no temperature", and 20 °C is not twice as hot as 10 °C.

Ratio

Equal gaps and a true zero, so ratios are meaningful. Weight, height, time and temperature in Kelvin are ratio data: 80 kg really is twice 40 kg, because 0 kg means no mass.

Worked examples: classify the variable

Run each one down the tree.

Blood group → categorical, no order → nominal.
NYHA class I–IV → ordered but unequal gaps → ordinal.
Number of A&E visits last year → a count → numerical, discrete.
Systolic blood pressure (mmHg) → measured, true zero → numerical, continuous, ratio.
Year of diagnosis (e.g. 2019) → measured, equal gaps, no true zero → numerical, continuous, interval.
Patient satisfaction on a 5-point Likert scale → ordered, unequal gaps → ordinal.

Why this is the foundation of everything

The whole reason the exam cares is that data type dictates which summary statistic and which test you may use. The mode is the only average you can quote for nominal data; the median suits ordinal and skewed data; the mean belongs to symmetrical numerical data. Likewise a chi-squared test compares categorical groups, while a t-test compares means of numerical data. Pick the wrong tool for the data type and the analysis is invalid — a point this module returns to on every later page.

High-yield summary

Categorical = labelled groups: nominal (no order — blood group, ethnicity) and ordinal (ordered, unequal gaps — NYHA class, pain 0–10, Likert).
Numerical = real quantities: discrete (counts) and continuous (measured — weight, BP).
Continuous splits into interval (no true zero — °C, calendar year) and ratio (true zero, ratios meaningful — weight, height, Kelvin).
"Does zero mean none?" → yes = ratio, no = interval.
Data type dictates the summary statistic and the statistical test — get this right first and the rest follows.

Check your understanding

3 questions

Q1.A GP audit records each patient's NYHA heart-failure class (I, II, III or IV). What type of data is this?
Q2.Which of the following variables is an example of interval (rather than ratio) data?
Q3.Why does identifying the data type matter before analysing a variable?