one paged quick learning so you can hit the ground running while on the job

# Understanding Statistics: Basic Data Analysis

This page provides you with some basic data analytics to help you quickly make sense of social science or population data.  All data types can be represented in a basic data matrix structure of rows (which represent 'cases' or 'observations') and columns (which represent 'variables') in a spreadsheet format.

VARIABLE TYPES

There are two kinds of variables - categorical and quantitative (which can be continuous or discrete).  It is important to understand the different types as the type determines the sort of analysis we do.

Binary - categorical - two categories (e.g. male/female)

Nominal - categorical - more than two categories - sometimes called 'named' categories (e.g. social class, region of residence)

Ordinal - categorical - ranked or ordered categories - numeric codes 1,2,3 etc are used as labels but the numeric order corresponds to the ordering of categories  (e.g. class of university degree, level of job satisfaction)

Interval/scaled - continuous - differences have the same meaning at different points of the scale  - we know the order and the exact differences between the values (e.g. income, calendar year)

Ratio - continuous - like interval but contains 0 as its point of origin (e.g. time, age)

Discrete count - discrete - (e.g. number of children in a family, number of patients per year)

 Variable Type Descriptive Statistics Graphing data Binary Frequencies Descriptive of 0/1 variable Crosstabs Barcharts Nominal Frequencies Crosstabs Barcharts Piecharts Ordinal Frequencies Crosstabs Barcharts Piecharts Box and whiskers Interval or ratio Descriptive Grouped frequencies Scatterplots Histograms Stem & leaf plots Box and whiskers Discrete counts Frequencies (if few values) Descriptive Histograms Stem & leaf plots Box and whiskers

SHOWING RELATIONSHIPS

Categorical independent variable with a categorical dependent variable:

CROSSTABS

Example: table of counts of households with/without computer

BAR CHART in clustered or stacked form

Categorical independent variable with an interval/ratio dependent variable:

COMPARE MEANS

Example: table of mean income by sex

Interval/ratio independent variable with a categorical dependent variable:

CROSSTABS

Example: table of counts of households with/without computer by age-group of head of household

Interval/ratio independent variable with an interval/ratio dependent variable:

COMPARE MEANS (with grouped variables)

Example: table of mean income by age-group

SCATTERPLOT

Example: graph of income by age

DATA TYPES

1. Survey
2. Aggregate
3. Time series
4. Experiment
5. Event based

What is Survey Data?

Data comes from a questionnaire administered to a number of respondents, usually a random sample of members of a population of interest.  Conclusions are to be made about the population.

Cases = Respondents

Variables = questionnaire responses

Requirements: estimation of population quantities; inference about relationships between variables

Issues to note: non-response (non-contacts) introducing bias; missing data (refusals, not-applicable); reliability and validity of measures; sampling and non-sampling errors; sample design; weighting to deal with complex design and any differential non-response

What is Aggregate Data?

Aggregate data involve statistics about a set of administrative, political, social or economic data.

Cases = administrative units (e.g. schools, local authority areas, general practices)

Variables = measures of characteristics of the unit - usually aggregates of individual level data such as counts and percentages

Requirements: description (e.g. ranking) and estimation; inference about relationships between variables

Issues to note: be careful about inference about individual behaviour (ecological inference); can use Geographical Information Systems (GIS) to map and manipulate spatial data

What is Time Series Data?

Time series data involve a set of measurements on an entity of interest over time.

Cases = time points

Variables = measures

Requirements: relationships between series; behaviour of series  (e.g. existence of trends, cycles, lagged relationships, effect of shocks)

Issues to note: time dependence between rows (need special techniques such as econometrics); changes in definition in measures

What is Experiment Data?

Experiment data involve the random allocation of treatments to subjects. Measures of the effects of the treatment are taken together with measures of potential explanatory variables (covariates).  If the subjects are randomly selected from a larger population conclusions can be generalised to the population.

Cases= subjects

Variables=measures of treatment effects and covariates

Requirements: inference about the effect of treatments controlling for covariates and within subgroups

Issues to note: analysis must take account of experimental design

What is Event based Data?

Event based data involve observations which are constructed from a set of possible events which could occur (e.g. countries being at war or not; people dead or alive)

Cases: possible events

Variables: measures of the characteristics of the participants in the event and outcomes

Useful Calculator for Mode, Median and Mean

If you want to save yourself some time on the calculator and don't have access to a statistics software package, this calculator is very useful.