Quality of Life Measurement

What are outcomes?

Traditionally, the effectiveness of a health intervention was measured by professional knowledge, by the progression of the disease or by clinometric scales. These clinical measures are used for measuring outcomes from Radomised Controlled Trials and in calculating quality-adjusted life years i.e. QUALYs. QUALYs describe the relative benefits of diverse treatments and are calculated by multiplying the years of life by morbidity score on a quality of life scale with 0 for death and 1 for full health. Examples of traditional health outcome measures are mortality rates (including hospital deaths, avoidable deaths), morbidity rates, survival periods, clinical assessments, radiological markers, laboratory tests (toxicity, biochemical markers), symptom rates, relapses and easily measured social variables such as days off work, hospital remission rates or number of bed days.

While traditional outcome measures have a lot to offer, the way they are measured makes it difficult to make comparisons between health areas. It is also difficult to link the standard of health care directly to the outcome for groups of patients. Moreover, they rarely cover the emotional and psychological consequences of disease. The most important outcome measure is to be gained from the patients' experiences. If the aim of an intervention is to improve patient's functioning and well-being, than we need to know what is the impact on the patients' own perspectives of their health status and health-realted quality of life.

What are health status/quality of life instruments?

Health status and quality of life are different but are often used interchangeably. Health status measures physical and mental health. Quality of life incorporates people's emotional, social and physical well-being and their ability to function in the ordinary tasks of living. Both comprise of instruments in the form of questionnaires that measure subjective health status in a meaningful and reliable way. They are NOT patient satisfaction surveys! Also note that health status/quality of life instruments are not designed to be used as substitutes for traditional measures of clinical endpoints but are intended to complement existing measures adn provide a fuller picture of health.

Applications of health status instruments

Randomised Controlled Trials
Individual patient care - can be used as part of a consultation and is known to improve patient-professional interaction.
Screening patients to evaluate their needs for particular services - for example, cardiac patients have been screened for anxiety and depression and patients who scored above pre-determined cut off points were referred for specialist cardiac rehabilitation services.
Health status profiles - can be used to determine what services are needed, where they are needed and to whom they should be targeted.
Resource allocation - can be used to inform and justify rationing decisions.

Types of health status instruments

Generic

Generic instruments can be used across a wide range of illness conditions and in healthy populations. However, some, like the Nottingham Health Profile only tap the severe end of ill health and so have limited use for minor ailments. There are profile generic measures (measure social variations) and generic single-index measures of health status. The latter summarizes the information into a single number. This can be used in cost-benefit analysis and measure individualised quality of life, e.g. EuroQoL and Index of Health-related Quality of Life (IHQL).

Examples include SF-36, Functional Limitations Profile, Dartmouth COOP Charts and the Nottingham Health Profile.

Dimension Specific

Dimension specific measures focus on particular aspects of health, such as psychological well-being, anxiety or depression. These are most commonly found in psychology or psychiatry for measuring psychological well-being.

Examples include Caregiver Strain Index.

Disease Specific

Disease specific instruments are designed specifically to target the aspects of health that are relevant to a particular patient group. They are generated from interviews with patients. They have the advantage of measuring the aspects of ill health that are most salient to the disease group. It has been suggested that they are the most appropriate measures of patients' experiences, have greater clinical validity and the most likely to be sensitive to changes that are important to patients.

Examples include PDQ-39 for Parkinson's Disease, ALSAQ-40 for motor neurone disease, the Headache Scale, DUTCH-AIMS2 for arthritis.

Designing the Instrument

Like survey questionnaires, there are a number of validation and assessment issues that have to be addressed before a tool can be used in a study.

A. Reliability

Reliability is the consistency of a measure of a concept. It deals with an indicator's dependability. It means that the information provided by the indicators (e.g. a questionnaire) does not vary as a result of the characteristics of the indicator, instrument or measurement device. Reliability is necessary for validity and easier to achieve than validity. However, having the same result over and over does not mean that what it is measuring is valid.

1. Internal (Consistency) Reliability

Do the questions in a measure assess the same underlying phenomenon? This is the most commonly used method to test reliability and it is measured using the Cronbach's alpha statistic (for items with more than 2 response categories) and the Kuder-Richardson (KR-20) test (for items with 2 response categories, e.g. yes/no). Internal consistency involves testing for homogeneity which assumes that there are correlations between items on a scale that are not the result of random chance but reflect a real patterning as to how the questions are answered. If the Alpha statistic is < 0.5, then this is regarded as low internal reliability (i.e. the items are not measuring the same phenomenon).

2. Test-retest reliability

Does the measure produce the same or similar results from the same respondents if administered at different points of time? Usually the questionnaire is administered on 2 occasions separated by a few days. Ideally, responses shouldn't vary but in health, it is possible that the health status can change in between. This also known as stability reliability.

Other types of reliability include:

Representative reliability (reliability across subpopulations, usually tested in a sub-population analysis)
Equivalence reliability (used where multiple indicators of one construct are used, usually tested in a spilt-half method, where indicators or the same construct are divided into 2 groups usually by a random process and both halves compared to see if they give similar results)
Interrater or intercoder reliability (used where there are several observers, raters or coders of information, special statistical techniques are needed to measure it)

B. Validity

Validity is concerned with whether or not the measurement of a concept really measures the concept. It is worth noting that when an indicator is valid, it is valid for a particular purpose or definition. It may be valid for one purpose but less valid for another.

1. Face Validity

do the questions make sense and do they appear to be relevant for the population from which the subjects will be drawn?
this is measured by asking experts in the field and patients to read the questions and assess them in terms of ease of completion and relevance

2. Content Validity

is the choice of and relative importance given to each question appropriate for the phenomenon being measured?
this is measured by asking experts in the field and patients to read the questions and assess them in terms of ease of relevance and whether it reflects their experience

3. Criterion Validity

does the measure produce results that correspond with a superior one (gold standard) or predict some future criterion value?
this is measured by comparing the results of one questionnaire with those of another, but rarely does a 'gold standard' exist except in cases where the results from a short form of a questionnaire can be compared with the results of an original longer form

4. Construct Validity -

do the results obtained confirm expected relationships or hypotheses?
this is measured by analyzing the results from the questionnaire to determine whether it can differentiate between sub-groups among which one would expect to be able to differentiate (e.g. between those diagnosed with mild as opposed to severe symptoms)

Just think internal validity and external validity!

C. Responsiveness

Is the measure sensitive to change? It is essential that the instrument can detect change and that the level of this change is interpretable in some way. It is measured by effect size statistic. This is usually calculated by subtracting the mean before treatment with that gained after treatment and dividing the result with the baseline standard deviation.

Developing a (disease specific) health status instrument

There are usually two stages: (1) development of an instrument and (2) validation

Development stage

1. Exploratory open-ended interviews with a sample of the targeted population to identify salient dimensions. The point at which no new themes emerge from the interviews determines the sample size for this study.

2. Focus groups are then undertaken to further explore the issues raised in stage 1.

3. Interviews and focus groups are transcribed and scrutinized. Statements are re-cast as questionnaire items and questionnaire pre-tested with health care researchers and a sample of population in cognitive interviews.

4. Long form standardised questionnaire is developed from previous stages and piloted. Face validity assessed in a small sample before a wider sample used for the pilot.

5. Shorter form version of the questionnaire is developed (from the response rate from stage 4). Factor analysis usually employed to indicate which items are irrelevant and Cronbach's alpha statistic used to assess the internal consistency of the psychometric properties of the instrument.

Validation

1. Shorter form version is sent out in a survey of the targeted population. Often it is accompanied by a pre-existing generic or other disease specific health status instrument, also to be completed by respondents.

2. Confirmatory factor analysis is usually undertaken to ensure statistical and psychometric qualities of the questionnaire remain constant. Internal reliability is also assessed. Any final minor amendments to the instrument and its scoring algorithms are done at this stage.

3. Results can be reported.

What is the SF-36?

The Short Form-36 item (SF-36) is a generic health status instrument. It is a 36-item questionnaire that measures 8 areas of functioning:

Physical Functioning (10 items)
Social Functioning (2 items)
Role limitations due to physical problems (4 items)
Role limitations due to emotional problems (3 items)
Mental health (5 items)
Energy/vitality (4 items)
Pain (2 items)
General health perception (5 items)

It is completed by the individuals themselves rather than health professionals on their behalf. It takes 5 to 10 minute to complete. Scores are put on a scale from 0 (poor health) to 100 (good health). Scores can be reported as 8 dimension scores or they can be divided into 2 summary scores (physical component and mental health). They are relatively simple to compute and this reduces likelihood of chance statistical findings. There is a shortened version - the SF-12. Both versions have been widely validated in a range of areas that are affected by illness. They are accurate, reliable, reponsive (i.e. sensitive to change) and are acceptable to patients (high response rates). They are also easy to interpret and feasible to use.

Usefulness of SF-36 for health services

1. Good for measuring outcomes

The SF-36 is a validated tool that measures patients' perspectives. It can be used now and is applicable to a wide range of populations. It can be used to identify groups with worse health and it can be used by public health in health needs assessments. This is because it relates to salient issues for the population such as age, gender, socio-economic status, ethnicity and place of residence. Information from the SF-36 in large regional samples provide comparision data from which to judge difficulties associated with a particular disease or treatment (e.g. women referred for breast reduction surgery with population norms on SF-36 demonstrated that they experienced significant levels of pain which was resolved by surgery). SF-36 has also been used in Health & Lifestyle surveys in the UK.

2. Useful to Providers

Generic health status measures like SF-36 provide a unique patient perspective on disease and its impact. There is evidence that health professionals are poor judges of how aspects of life quality are affected by illness and treatment for individual patients. This can complement clincial evaluation and may improve professional-patient communication. Evaluation of SF-36 may guide the health professional and patient in selecting the most appropriate option or service for the patient (particularly for chronic conditions). It may even improve patient-provider communication and treatment compliance. Providers are also encouraged because it helps show that they are making a difference (e.g. achieving outcomes).

3. Useful for Commissioning

SF-36 can be used to measure the anticipated outcomes from services (i.e. improvement in quality of life). They can also be used to show if commissioners are getting the outcomes they would like to get and can predict those most likely to incur high future medical costs. SF-36 can help with future commissioning decisions (e.g. change provider or change intervention) and can help see if the commissioner is getting sustainable high-quality cost effective care.

Quality of Life and Health Status Measurement