Instaskills, n,

one paged quick learning so you can hit the ground running while on the job

Understanding Statistics: Basic Data Analysis

This page provides you with some basic data analytics to help you quickly make sense of social science or population data.  All data types can be represented in a basic data matrix structure of rows (which represent 'cases' or 'observations') and columns (which represent 'variables') in a spreadsheet format.


There are two kinds of variables - categorical and quantitative (which can be continuous or discrete).  It is important to understand the different types as the type determines the sort of analysis we do.

Binary - categorical - two categories (e.g. male/female)

Nominal - categorical - more than two categories - sometimes called 'named' categories (e.g. social class, region of residence)

Ordinal - categorical - ranked or ordered categories - numeric codes 1,2,3 etc are used as labels but the numeric order corresponds to the ordering of categories  (e.g. class of university degree, level of job satisfaction) 

Interval/scaled - continuous - differences have the same meaning at different points of the scale  - we know the order and the exact differences between the values (e.g. income, calendar year)

Ratio - continuous - like interval but contains 0 as its point of origin (e.g. time, age)

Discrete count - discrete - (e.g. number of children in a family, number of patients per year)

Variable Type

Descriptive Statistics

Graphing data



Descriptive of 0/1 variable













Box and whiskers

Interval or ratio


Grouped frequencies



Stem & leaf plots

Box and whiskers

Discrete counts

Frequencies (if few values)



Stem & leaf plots

Box and whiskers


Categorical independent variable with a categorical dependent variable:


Example: table of counts of households with/without computer

BAR CHART in clustered or stacked form

Categorical independent variable with an interval/ratio dependent variable:


Example: table of mean income by sex

Interval/ratio independent variable with a categorical dependent variable:


Example: table of counts of households with/without computer by age-group of head of household

Interval/ratio independent variable with an interval/ratio dependent variable:

COMPARE MEANS (with grouped variables)

Example: table of mean income by age-group


Example: graph of income by age


  1. Survey
  2. Aggregate
  3. Time series
  4. Experiment
  5. Event based

What is Survey Data?

Data comes from a questionnaire administered to a number of respondents, usually a random sample of members of a population of interest.  Conclusions are to be made about the population.

Cases = Respondents

Variables = questionnaire responses

Requirements: estimation of population quantities; inference about relationships between variables

Issues to note: non-response (non-contacts) introducing bias; missing data (refusals, not-applicable); reliability and validity of measures; sampling and non-sampling errors; sample design; weighting to deal with complex design and any differential non-response

What is Aggregate Data?

Aggregate data involve statistics about a set of administrative, political, social or economic data.

Cases = administrative units (e.g. schools, local authority areas, general practices)

Variables = measures of characteristics of the unit - usually aggregates of individual level data such as counts and percentages

Requirements: description (e.g. ranking) and estimation; inference about relationships between variables

Issues to note: be careful about inference about individual behaviour (ecological inference); can use Geographical Information Systems (GIS) to map and manipulate spatial data

What is Time Series Data?

Time series data involve a set of measurements on an entity of interest over time.

Cases = time points

Variables = measures

Requirements: relationships between series; behaviour of series  (e.g. existence of trends, cycles, lagged relationships, effect of shocks)

Issues to note: time dependence between rows (need special techniques such as econometrics); changes in definition in measures

What is Experiment Data?

Experiment data involve the random allocation of treatments to subjects. Measures of the effects of the treatment are taken together with measures of potential explanatory variables (covariates).  If the subjects are randomly selected from a larger population conclusions can be generalised to the population. 

Cases= subjects

Variables=measures of treatment effects and covariates

Requirements: inference about the effect of treatments controlling for covariates and within subgroups

Issues to note: analysis must take account of experimental design

What is Event based Data?

Event based data involve observations which are constructed from a set of possible events which could occur (e.g. countries being at war or not; people dead or alive)

Cases: possible events

Variables: measures of the characteristics of the participants in the event and outcomes