Document Analysis


What is document analysis?

Document or Documentary analysis is a social research method and is an important research tool in its own right and is an invaluable part of most schemes of triangulation. It refers to the various procedures involved in analyzing and interpreting data generated from the examination of documents and records relevant to a particular study. In other words, documentary work involves reading lots of written material (it helps to scan the documents onto a computer and use a qualitative analysis package). A document is something that we can read and which relates to some aspect of the social world. Official documents are intended to be read as objective statements of fact but they are themselves socially produced.

How does document analysis work in public health?

Use of documentary analysis has become quite popular within public health research, especially if you are trying to evaluate the impact of an initiative, for example a committee led venture to increase immunisation uptake in an area or a board led approach to reduce sexual ill-health or increase physical activity during a major event like the Olympics or Rugby World Cup. In this situation, you could take a 'qualitative' approach, utilising what is known as a 'realist viewpoint'. This involves establishing ‘a priori’ set of criteria to investigate whilst enabling the analysis to be guided by the data that emerges from familiarisation with the borough plans’ material. Data would be extracted relating to pre-agreed named terms covering the scope and scale of action plans, perhaps evidence of governance arrangements and minutes of the groups used to deliver the plans. To quantify the process, number and frequency of meetings and email exchanges may be included. This approach may then be supported with follow up interviews or surveys of the parties involved in delivering the plans.

Sources of Documents:

  1. Public records
  2. The media
  3. Private papers
  4. Biography
  5. Visual documents
  6. Minutes of meetings (plus emails etc which indicate the frequency of those meetings - that can help to quantify the process - and governance arrangements)
  7. Strategies, policies, action plans by public bodies or organisations

The term 'biography' has two meanings in social research. Firstly, it is a particular style of interviewing, where the informant is encouraged to describe how his or her life (or some aspect of it) has changed and developed over time. In doing so, they reflect his/her own conception of self, identity and personal history. Secondly, 'biography' refers to a work that draws on whatever materials are available to an author to represent an account of a person's life and achievements. Narrative analysis is used to elicit results. This is a form of analysis used for chronologically told stories. It focuses on how elements are sequenced, why some elements are evaluated differently from others and how the past shapes perceptions of the present and how the present shapes perceptions of the past and of course, how both shape perceptions of the future. It is especially used in feminist research.

Types of Analysis


  • Content Analysis


  • Semiotics
  • Discourse analysis
  • Interpretative analysis
  • Conversation analysis
  • Grounded Theory

Content Analysis

Content analysis is like a social survey but uses a sample of images rather than people.  It is a technique for gathering and analyzing content of text.  Generally speaking, it consists of the following steps:

  1. Choose a question which can be measured with variables.
  2. Devise your unit of analysis (amount of text that's assigned a code - e.g. each daily newspaper could be a unit) and design your code book. 
  3. Make a sampling frame, choosing the cases to analyse that are representative and unbiased. To get a sampling frame, search for relevant cases in contemporary or historical archives. The sample has to be representative, yet small enough for analyzing in depth. You define your population (which can be words, paragraphs, sentences or all articles in a certain period of time) and sampling element.  Very often you are counting words - e.g. how many times does the word 'hooligan' appear in articles sensationalizing the reporting of disturbances at football matches?  
  4. Code all the cases and analyze the resulting data.
  5. Produce semi-quantitative results using cross-tabulations, charts or graphs and where there are few cases, use tables.
  6. Report in a standard 'scientific' format.

This coding is sometimes known as 'manifest coding' and measures 4 characteristics:

  1. Frequency  - e.g. how many times is the subject, phrase or word mentioned?
  2. Direction - i.e. the direction of messages in the content along some continuum - e.g. positive, negative.
  3. Intensity - i.e. strength or poser of a message in a direction.
  4. Space - i.e. size of space on a newspaper page, time on television, placement in social media
You can also have 'latent coding' (which is predominantly used in semantic analysis below).  This  looks at the underlying, implicit meaning in the content of the text.  You will need a codebook in advance with rules on what to interpret.  It is less reliable than 'manifest coding', as it relies on coder's knowledge of language and social meaning. 

Content analysis is formal and systematic. It lends structure to your research. Variables are categorised in a precise manner so you can count them and intercoder reliability is commonly reported with the results of content analysis studies. However, content analysis ignores context and multiple meanings.  


Semiotics is a science that studies the life of signs in society. It is the opposite to the postivist method of content analysis. It is used a lot in media analysis.

In semiotics, the analyst seeks to connect the signifier (an expression which can be words, a picture or sound) with what is signified (another word, description or image). The use of language is noted as it is considered to be a description of actions. As part of language, certain signs match up with certain meanings. Semiotics seeks to understand the underlining messages in visual texts. It is related to discourse analysis and forms the basis for interpretive analysis.

Discourse Analysis

This is concerned with the production of meaning through talk and texts. Language is viewed as the topic of the research and how people use language to construct their accounts of the social world is important.

Intrepretative Analysis

This aims to capture hidden meaning and ambiguity. It looks how messages are encoded, latent or hidden. You are also acutely aware of who the audience is.

Conversation Analysis

This is concerned with the underlying structures of talk in interaction and with the achievement of interaction.

Grounded Theory

This is inductive, interpretative and can be social constructionalist. Central focus is on inductively generating novel theoretical ideas or hypotheses from the data. These new theories arise out of the data and are supported by the data. So they are said to be grounded.

Evaluation and Interpretation


Is it genuine, complete, reliable and of unquestioned authorship?


Is the document free from error or distortion?


Can the documents available be said to constitute a representative sample of the documents that originally existed?


What is the surface meaning? Is there a deeper/semiotic meaning?

Further reading

Robson, C. Real World Research. 3rd edition. Chichester,Wiley:2011.

Richie, J, Lewis J, (eds). Qualitative Research Practice, London: 2003.

Berger A. Media Analysis Techniques. The Sage Commtext Series, Newbury Park: 1991.

Bryman A. Social Research Methods. Oxford University Press:2001. See chapters 17-19.

Gribbs G. Qualitative Data Analysis: Explorations with Nvivo. Open University Press:2002.

Leedy, P. Practical Research: Planning and Design. 6th Edition. Merril, New Jersey, 1997.

Seale, C. Researching Society and Culture. Sage:2001. See chapters 18 - 21.

Wimmer, R.D. & Dominick, J. R. Mass Media Research: An Introduction. Belmont:1983.

And an example of where I've used it:

Heffernan C. 2001. "The Irish media and the lack of public debate on new reproductive technologies (NRTs) in Ireland", Health, 5 (3):355-371.