Instaskills, n,

one paged quick learning so you can hit the ground running while on the job


Understanding Statistics: Basic Data Analysis


This page provides you with some basic data analytics to help you quickly make sense of social science or population data.  All data types can be represented in a basic data matrix structure of rows (which represent 'cases' or 'observations') and columns (which represent 'variables') in a spreadsheet format.


VARIABLE TYPES

There are two kinds of variables - categorical and quantitative (which can be continuous or discrete).  It is important to understand the different types as the type determines the sort of analysis we do.

Binary - categorical - two categories (e.g. male/female)

Nominal - categorical - more than two categories - sometimes called 'named' categories (e.g. social class, region of residence)

Ordinal - categorical - ranked or ordered categories - numeric codes 1,2,3 etc are used as labels but the numeric order corresponds to the ordering of categories  (e.g. class of university degree, level of job satisfaction) 

Interval/scaled - continuous - differences have the same meaning at different points of the scale  - we know the order and the exact differences between the values (e.g. income, calendar year)

Ratio - continuous - like interval but contains 0 as its point of origin (e.g. time, age)

Discrete count - discrete - (e.g. number of children in a family, number of patients per year)


Variable Type

Descriptive Statistics

Graphing data

Binary

Frequencies

Descriptive of 0/1 variable

Crosstabs

Barcharts

Nominal

Frequencies

Crosstabs

Barcharts

Piecharts

Ordinal

Frequencies

Crosstabs

Barcharts

Piecharts

Box and whiskers

Interval or ratio

Descriptive

Grouped frequencies

Scatterplots

Histograms

Stem & leaf plots

Box and whiskers

Discrete counts

Frequencies (if few values)

Descriptive

Histograms

Stem & leaf plots

Box and whiskers


SHOWING RELATIONSHIPS


Categorical independent variable with a categorical dependent variable:

CROSSTABS

Example: table of counts of households with/without computer

BAR CHART in clustered or stacked form

Categorical independent variable with an interval/ratio dependent variable:

COMPARE MEANS

Example: table of mean income by sex

Interval/ratio independent variable with a categorical dependent variable:

CROSSTABS

Example: table of counts of households with/without computer by age-group of head of household

Interval/ratio independent variable with an interval/ratio dependent variable:

COMPARE MEANS (with grouped variables)

Example: table of mean income by age-group

SCATTERPLOT

Example: graph of income by age


DATA TYPES

  1. Survey
  2. Aggregate
  3. Time series
  4. Experiment
  5. Event based


What is Survey Data?

Data comes from a questionnaire administered to a number of respondents, usually a random sample of members of a population of interest.  Conclusions are to be made about the population.

Cases = Respondents

Variables = questionnaire responses

Requirements: estimation of population quantities; inference about relationships between variables

Issues to note: non-response (non-contacts) introducing bias; missing data (refusals, not-applicable); reliability and validity of measures; sampling and non-sampling errors; sample design; weighting to deal with complex design and any differential non-response


What is Aggregate Data?

Aggregate data involve statistics about a set of administrative, political, social or economic data.

Cases = administrative units (e.g. schools, local authority areas, general practices)

Variables = measures of characteristics of the unit - usually aggregates of individual level data such as counts and percentages

Requirements: description (e.g. ranking) and estimation; inference about relationships between variables

Issues to note: be careful about inference about individual behaviour (ecological inference); can use Geographical Information Systems (GIS) to map and manipulate spatial data


What is Time Series Data?

Time series data involve a set of measurements on an entity of interest over time.

Cases = time points

Variables = measures

Requirements: relationships between series; behaviour of series  (e.g. existence of trends, cycles, lagged relationships, effect of shocks)

Issues to note: time dependence between rows (need special techniques such as econometrics); changes in definition in measures


What is Experiment Data?

Experiment data involve the random allocation of treatments to subjects. Measures of the effects of the treatment are taken together with measures of potential explanatory variables (covariates).  If the subjects are randomly selected from a larger population conclusions can be generalised to the population. 

Cases= subjects

Variables=measures of treatment effects and covariates

Requirements: inference about the effect of treatments controlling for covariates and within subgroups

Issues to note: analysis must take account of experimental design


What is Event based Data?

Event based data involve observations which are constructed from a set of possible events which could occur (e.g. countries being at war or not; people dead or alive)

Cases: possible events

Variables: measures of the characteristics of the participants in the event and outcomes


Useful Calculator for Mode, Median and Mean

If you want to save yourself some time on the calculator and don't have access to a statistics software package, this calculator is very useful. 

Mean, Median, Mode Calculator