This site is intended for health professionals only

At the heart of general practice since 1960

Seven steps for successful data mining

Dr Shane Gordon offers tips on how to avoid data overload and dig out the most important stats

Dr Shane Gordon offers tips on how to avoid data overload and dig out the most important stats

1. Set out your agenda before you get started

When starting on a data collecting project you need to clarify why you are doing it. Is it to demonstrate commissioning intentions, redesign services or is it for a business case?

The biggest pitfall is getting lost in all the data you accumulate. Without a clear goal you risk becoming confused and not completing the task.

2. Decide who your audience is

The next step is to consider who you are collecting the data for. Is it PBC colleagues, your PCT or your own practice?

A finance director will be less interested in the finer points of disease prevalence and much more interested in bigger financial issues such as risk, while colleagues will want to know more about the clinical aspects.

3. Consider what population you're interested in

Your population data must be relevant to your question.

Factors to consider are:

• geography
• age
• medical conditions
• routes of presentation
• time period.

Make yourself aware of trends and changes in coding. The most common one to be aware of is the change from Finished Consultant Episode to Spells in 2006 – this resulted in significant increases in the number of items appearing in the hospital data (SUS data).

4. Frame your question

You should now know what you are asking, for which population and who your audience is.

When you frame the question you need to consider if it is top-down or bottom-up. For example, ‘What's the biggest cause of admission to hospital for my cluster?' is different from, ‘How many patients could benefit from falls prevention services?' Again, think about your outcomes and audience when deciding your approach.

If you frame the question well it will set you on the path to getting the right data. You might go from the specific to the general, for instance: how many X happened last year? Or the other way: what was the most common preventable admission last year?

5. Break the question down into sub-questions

If the question you are asking is complex you may need to break it down.

You should also consider whether you need complete data or whether a snapshot or subset will illustrate the case in sufficient depth.

The example in the box on the right shows how I mined data to find the most costly preventable cause of admissions in my practice the previous year.

I broke this main question down into the following:

• What was the highest-cost route of admission?
• What was the highest-cost specialty for that route?
• What was the highest-cost diagnosis in that specialty?

6. Decide what data is relevant

Once you start to become clear about the outcome you want, consider the scope of the data you should collect – national or local?

Sometimes this will depend on the type of data you need or what is available.

National or regional data

• Benchmarking
• Morbidity and mortality
• Incidence and prevalence
• Demographics

PCT level

• Benchmarking
• Activity and cost
• Waiting times
• Public health
• Staff costs

Practice level

• Patient data

The panel below gives a summary of all data resources available and further information on these different categories.

7 SUS data is the PBC golden egg

SUS data is a rich source of information for practice-based commissioners because it shows how patients reach hospital (out-of-hours, A&E attendance) and what happens to them after they are admitted.

PCTs have SUS data – but you have to rely on them to give it to you.

There are various third party resources for SUS data – for instance, Dr Foster and Sollis – but you have to pay for it.

My consortium has purchased a tool to access SUS data and the licence we have covers an unlimited number of users. This costs 10p per patient in the consortium per year.

The box details some of the other data resources available.

With the exception of SUS data, all are in the public domain or on NHSNet, and you don't need to go through your PCT to get them.

Data is vital for your business planning and I would advise you develop someone in your organisation who specialises in it. Without competency in handling data, you will find it very difficult to make your case for commissioning or providing services. It might be perceived as nerdy, but data can effect real change.

Dr Shane Gordon is a GP in Tiptree, Essex, chief executive of the Colchester PBC Group and national co-lead for the NHS Alliance's PBC Federation

Click here for a list of sources for available NHS data

Putting the steps into practice

How Dr Gordon used the seven-step method to identify the best new service in which to invest his practice's freed-up resources

Steps 1, 2 and 3


This year, our practice has some freed-up resources, so we needed to decide how to spend it.

The first three steps were easily tackled by:
• setting out our goal – investing freed-up resources
• identifying our audience – our own practice
• identifying our population – patients from our list admitted to hospital.

Step 4


With the above agenda in mind, I framed my question to be: What is the mostly costly preventable cause of admissions?

Step 5


The above question was complex, so I broke it down into three parts:

• admission – the route by which patients were admitted, the specialty and who admitted them
• cause – single episode, pathology or aetiology
• whether it was preventable.

Step 6


So what data would be relevant to these questions?

• HRG data (Sollis)
• Benchmarking data
• Prevalence data
• Practice data

Step 7


From HRG admission-by-route data,results for the 2007/8 financial year looked like this:

• Total admissions – £2,860,251 of which £1,679,661 were non-elective (NEL) and £1,180,591 were elective.

I broke it down into HRG admission by type and found emergencies accounted for £1,486,839 of the total. Admissions for general medicine and geriatrics were high but nothing else stood out.

I next looked at the primary diagnosis for general medicine emergency admissions, which showed an even spread of cases in COPD, MI, gangrene, alcoholic liver disease and urinary system disorders. The cost for each diagnosis wasn't more than about £25,000 in total, so it didn't give an obvious target to attack.

I backtracked and looked at all NEL admissions by diagnosis and found, surprisingly that fracture of the femur was the highest – accounting for £66,074 of the total £1,679,661.

I then did a neck of femur search by age, which looked like this:

All age bands £66,074
85+ £26,055
75-84 £21,026
65-74 £7,566
45-64 £6,087
15-44 £5,340


This supported my initial suspicion that these were osteoporosis-related fractures. So I looked at trauma and orthopaedics (T&O) emergency admissions broken down by primary diagnosis. This revealed further fractures possibly related to osteoporosis – neck, wrist and other vertebral. Again the age bands supported this theory.

I then benchmarked my practice for emergency NEL admissions using NHS comparators. Looking at all NEL per 1,000 patients, my practice was below the national, SHA and PCT benchmark, so we were doing pretty well.

However, when I looked at the benchmark for T&O, we were not so good, hitting the national benchmark and overshooting the PCT standard.

The Office of National Statistics gave me the following information on the trends in hospital admissions for fractures in England, 1989/90 to 1997/8

• 70% of fractures in females over 40 are osteoporosis related.
• Osteoporosis gives a 40% lifetime risk of hip, wrist or spine fracture.
• One in two women and one in eight men over 50 will suffer an osteoporosis-related fracture.

An audit of practice data showed our coding of fractures and falls was good but that our prescribing of bisphosphonates and calcium with vitamin D could have been better. Our village is a long way from the centrally provided exercise classes run by the Falls Prevention Service and the ONS gave us useful figures highlighting the increase risk of osteoporosis-related fractures in women and older people.

This data mining allowed us to develop a robust business case for local falls prevention services, bone scans in the village for at-risk women and a drive to increase the uptake of bisphosphonates, calcium and vitamin D.

Data mining

Rate this article 

Click to rate

  • 1 star out of 5
  • 2 stars out of 5
  • 3 stars out of 5
  • 4 stars out of 5
  • 5 stars out of 5

0 out of 5 stars

Have your say