Instaskills, n,
one paged quick learning so you can hit the ground running while on the job
Understanding Statistics: Basic Data Analysis
This page provides you with some basic data analytics to help you quickly make sense of social science or population data. All data types can be represented in a basic data matrix structure of rows (which represent 'cases' or 'observations') and columns (which represent 'variables') in a spreadsheet format.
VARIABLE TYPES
There are two kinds of variables - categorical and quantitative (which can be continuous or discrete). It is important to understand the different types as the type determines the sort of analysis we do.
Binary - categorical - two categories (e.g. male/female)
Nominal - categorical - more than two categories - sometimes called 'named' categories (e.g. social class, region of residence)
Ordinal - categorical - ranked or ordered categories - numeric codes 1,2,3 etc are used as labels but the numeric order corresponds to the ordering of categories (e.g. class of university degree, level of job satisfaction)
Interval/scaled - continuous - differences have the same meaning at different points of the scale - we know the order and the exact differences between the values (e.g. income, calendar year)
Ratio - continuous - like interval but contains 0 as its point of origin (e.g. time, age)
Discrete count - discrete - (e.g. number of children in a family, number of patients per year)
Variable Type |
Descriptive Statistics |
Graphing data |
Binary |
Frequencies Descriptive of 0/1 variable Crosstabs |
Barcharts |
Nominal |
Frequencies Crosstabs |
Barcharts Piecharts |
Ordinal |
Frequencies Crosstabs |
Barcharts Piecharts Box and whiskers |
Interval or ratio |
Descriptive Grouped frequencies Scatterplots |
Histograms Stem & leaf plots Box and whiskers |
Discrete counts |
Frequencies (if few values) Descriptive |
Histograms Stem & leaf plots Box and whiskers |
SHOWING RELATIONSHIPS
Categorical independent variable with a categorical dependent variable:
CROSSTABS
Example: table of counts of households with/without computer
BAR CHART in clustered or stacked form
Categorical independent variable with an interval/ratio dependent variable:
COMPARE MEANS
Example: table of mean income by sex
Interval/ratio independent variable with a categorical dependent variable:
CROSSTABS
Example: table of counts of households with/without computer by age-group of head of household
Interval/ratio independent variable with an interval/ratio dependent variable:
COMPARE MEANS (with grouped variables)
Example: table of mean income by age-group
SCATTERPLOT
Example: graph of income by age
DATA TYPES
- Survey
- Aggregate
- Time series
- Experiment
- Event based
What is Survey Data?
Data comes from a questionnaire administered to a number of respondents, usually a random sample of members of a population of interest. Conclusions are to be made about the population.
Cases = Respondents
Variables = questionnaire responses
Requirements: estimation of population quantities; inference about relationships between variables
Issues to note: non-response (non-contacts) introducing bias; missing data (refusals, not-applicable); reliability and validity of measures; sampling and non-sampling errors; sample design; weighting to deal with complex design and any differential non-response
What is Aggregate Data?
Aggregate data involve statistics about a set of administrative, political, social or economic data.
Cases = administrative units (e.g. schools, local authority areas, general practices)
Variables = measures of characteristics of the unit - usually aggregates of individual level data such as counts and percentages
Requirements: description (e.g. ranking) and estimation; inference about relationships between variables
Issues to note: be careful about inference about individual behaviour (ecological inference); can use Geographical Information Systems (GIS) to map and manipulate spatial data
What is Time Series Data?
Time series data involve a set of measurements on an entity of interest over time.
Cases = time points
Variables = measures
Requirements: relationships between series; behaviour of series (e.g. existence of trends, cycles, lagged relationships, effect of shocks)
Issues to note: time dependence between rows (need special techniques such as econometrics); changes in definition in measures
What is Experiment Data?
Experiment data involve the random allocation of treatments to subjects. Measures of the effects of the treatment are taken together with measures of potential explanatory variables (covariates). If the subjects are randomly selected from a larger population conclusions can be generalised to the population.
Cases= subjects
Variables=measures of treatment effects and covariates
Requirements: inference about the effect of treatments controlling for covariates and within subgroups
Issues to note: analysis must take account of experimental design
What is Event based Data?
Event based data involve observations which are constructed from a set of possible events which could occur (e.g. countries being at war or not; people dead or alive)
Cases: possible events
Variables: measures of the characteristics of the participants in the event and outcomes