Canadian Partnership Against Cancer, Toronto, Canada.
There are few topics in medical science that arouse as much controversy – and as much passion – as the role of mammography in breast cancer screening. Part of the reason for continued debate lies within the complexity of the audiences for any recommendations made. The recent emphasis has been upon the individual weighing personal benefit and risk. Public health recommendations however, are based upon the overall population-based estimates of risk and benefit. In particular, population-based screening programs and public messaging must, by definition, come to conclusions about the best course of action based on a weighting of the risks and benefits for ‘average’ women in a specific population. This overview provides a window into the current analyses of the risk and benefits of mammography screening, and of the impact of these debates on considerations of programmatic screening. It examines current information on the perceived benefits and risks, the recent move towards individualised decisions of risks and benefits, and the role of public health messaging and population-based programs within this context.
There are few topics in medical science that arouse as much controversy – and as much passion – as the role of mammography in breast cancer screening. Population-based recommendations have been in place for over 25 years; population-based programs were launched in the UK in 1988,1 and subsequently in several countries.2 With eight major randomised controlled trials dedicated to studying the efficacy of screening, most of which consistently found benefit in at least a subset of their participants, it is perhaps surprising to find that this degree of controversy still exists.
Part of the reason for continued debate lies within the complexity of the audiences for any recommendations made. The recent emphasis has been upon the individual weighing personal benefit and risk, and several excellent reviews exist that help to inform this.1,3-5 Public health recommendations, however, are based upon the overall population-based estimates of risk and benefit. In particular, population-based screening programs, and public messaging, must by definition come to conclusions about the best course of action, based on a weighting of the risks and benefits for ‘average’ women in a specific population. Thus, there have also been recent reviews designed to assess whether current breast screening programs should be continued, and these have not always come to the same conclusion.1,6
This overview provides a window into the current analyses of the risk and benefits of mammography screening, and of the impact of these debates on considerations of programmatic screening. It will examine current information on the perceived benefits and risks of mammography screening, the recent move towards individualised decisions of risks and benefits, and the role of public health messaging and population-based programs within this context.
The results of mammography screening studies were first published in the late 1980s, with a notable landmark being the publication of the results of the Shapiro study in 1977.7 This US study enrolled women aged 40 to 64, and used annual mammography screening and physical examination as its screening interventions. When the study found a significant reduction in mortality, it was understandable that one result was population-based recommendations in US context based on annual screening for women over 40.
Subsequent studies, however, began to show a difference in results between those women who were under 50 years of age with those who were over 50 years, and several used intervals of two years or more between screens.8 Thus, outside of the US, most recommendations for population-based screening put forward over the late 1980s and early 1990s, targeted women aged 50 to 69, with screening intervals of two years or more.9 Many population-based programs used the Wilson-Jungner criteria to determine the value of instituting screening at a population level, which was of more interest in constituencies with publicly-funded health care systems than it was in the US.10 In these jurisdictions, the focus was on the sustainability of providing mammography to a large segment of the population, the assurance of at least the same technical and interpretive quality as in the positive randomised trials, and on minimising the potential harm of false-positive results. In an effort to maximise the population impact of mammography screening, many programs set goals for participation rates, and used a combination of invitation letters and public health messages to encourage participation.
Several recent meta-analyses and reviews have attempted to quantify the potential benefits of mammography. An independent review of the UK Breast Screening program concentrated on women aged 50 to 70 years, and arrived at a best estimate of a 20% reduction in breast cancer mortality with mammography in this age group.1 The Canadian Preventive Services Task Force arrived at relative risk estimates of breast cancer death of 0.85, 0.79, and 0.68 for women in their forties, fifties and sixties respectively.11 These are similar to those produced by a meta-analysis for the US Preventive Services Task Force (USPSTF) (2009), although the latter found slightly less benefit for women in their fifties (point estimate of relative risk 0.86).
All of the recent reviews are limited by the age of the studies (their start dates range from 1963 to 1982), and there has been some variability in which studies were included in various meta-analyses, largely due to different interpretations of the robustness of various study designs. Notably, a Cochrane review included only three of the eight available studies, and arrived at a relative risk estimate for the breast cancer mortality benefit of screening of 0.90, which was not statistically significant.12 However, when the most recent Cochrane review included the studies that most other review groups found to be eligible for inclusion, their estimates of breast cancer mortality benefit were significant and consistent with the results of other meta-analyses (RR 0.81), although in conjunction with their finding the authors proposed that breast cancer mortality was an unreliable outcome that was biased in favour of screening, mainly because of differential misclassification of cause of death.13
While the USPSTF found similar relative risks for women in their forties compared to those in their fifties, they arrived at different recommendations for these two age groups. They recommended biennial mammography for women aged 50 to 74, while advising against routine mammography for women in their forties, stating that in this age group, women’s own risks and preferences should be taken into account. This is consistent with the recent Canadian recommendations.
In arriving at these distinctions, the USPSTF noted that while they found the relative risks to be similar for women in the two age groups, the difference in the absolute risks for breast cancer resulted in quite different profiles of overall risk reduction.14 They estimated that 1904 women in their forties would need to be invited to screening in order to avert one case of breast cancer mortality, but that this was reduced to 1339 for women in their fifties, and 337 for women in their sixties. The Canadian analysis looked at numbers needed to screen to avert one breast cancer death in the 40-49 year age group versus the 50-69 year age group, and arrived at estimates of 2108 and 721, respectively.11 Again, these results are relatively consistent, and support the current differences in recommendations for the different age groups.
There are two principal areas of study in the category of potential harms associated with mammographic screening. Early estimates of harms focused on the potential detrimental effects of false positive results, which could generate additional imaging, consultation and potentially unnecessary benign biopsies. More recently, the potential for ‘overdiagnosis’, generally defined as the discovery of a cancer that would not have become symptomatic or problematic in the absence of screening, has been a focus of study. Overdiagnosis has potential to cause harm via the psychosocial issues associated with the diagnosis, the need to undergo further investigation, and the impact of associated overtreatment.
The frequency of false positives is very sensitive to the practice setting in which screening occurs. A study by Hubbard et al. that contributed to risk estimates in the reviews, found that given screening over a period of 10 years, 61.3% of women could expect to be recalled for additional tests if the screening was done annually, and 41.6% if the screening was biennial.15 However, this was based on an abnormality recall rate of 16.3% at first visit, and 9.6% in subsequent mammography. These recall rates are substantially higher than the target and reported abnormality rates within organised screening programs. In the Canadian breast screening programs, reporting on over two million screens done in 2007 and 2008, the abnormal recall rate for women aged 50 to 69 years was 12.6% at the first screening mammogram, and 6.0% for subsequent mammograms.16 In Australia, for women aged 50–69 years, 12.2% of women screened for the first time were recalled to assessment, while 4.0% attending subsequent screens were recalled.17 The UK screening program reports a recall rate of only 4%.18 Thus, in the programmatic context, the risk of an abnormal mammogram can be reduced, which would also dramatically reduce the cumulative risk of being recalled for investigation of an abnormality over 10 years.
Studies examining the sensitivity and specificity of mammography and its relationship to reading volumes (i.e. the number of mammographic studies assessed by a radiologist in a year), have pointed to the need to focus on the optimisation of mammography as one route to maximising benefit while minimising risk,19 and note that variability in recall rates among radiologists must be taken into consideration when calculating false-positive rates.20 This is largely ignored in the meta-analyses however, and it would be valuable to have more realistic estimates of risk based on current organised program results, so that they could be compared with those that are commonly used, but which are based on other practice cohorts.
The majority of abnormal screening results are resolved with further imaging or ultrasound, but the potential for unnecessary biopsies exists and must be minimised. Again, these rates may vary by practice. In the Hubbard et al analysis, a false-positive biopsy rate of over 3% for the first visit in women 50 to 59 years was reported.15 However, in Canadian programs in 2007 and 2008, the rate of biopsy with non-malignant result was 1.83% for women aged 50 to 69,16 while the UK screening program reports a benign biopsy rate of only 0.05% (0.5 per thousand).18 Programmatic attention to reporting and acting to minimise these rates can contribute to the limiting of negative impact from screening.
The most provocative issue in recent years has been the potential for overdiagnosis, and consequent overtreatment (i.e. treatment that ultimately does not provide a clinical benefit to the woman). The method to calculate the overdiagnosis rate is still under debate, and thus current estimates encompass a wide range. An early estimate used data from one randomised controlled trial, and arrived at an estimate of that 16% to 24% of cancers found could be considered as an overdiagnosis.21 An analysis comparing historical and current rates in the US estimated that 31% of breast cancers diagnosed represent overdiagnosis.22 However, based on actual follow-up data from the randomised controlled trials from Canada and Malmo, the UK Independent Review arrived at an estimate of overdiagnosis of 11% from a population perspective (the proportion of all cancers diagnosed in women invited to screening that are overdiagnosed), and 19% from an individual woman’s perspective (the chance that a cancer diagnosed during her screening experience is, in fact, an ‘overdiagnosis’).1 This probably represents the most realistic estimate to date, but further study is required and will need to include data from actual programmatic screening experiences.
With the number of randomised trials available, breast screening does not suffer from a lack of evidence on efficacy. Nevertheless, there are many different implementation decisions that need to be made in the provision of screening services, in order to deliver the maximal impact given constraints on resources and the local context.
One of the primary decisions to be made is whether screening should be offered opportunistically (through referrals from primary care physicians to existing specialists), or through organised programs, which involve centralised invitational and data collation systems. Canada, Australia and the UK have all moved forward with organised programs, although some mammography occurs outside of the programs to varying degrees in all three contexts. The Council of the European Union recommends that mammography occurs within the context of cancer screening programs, so that the entire population may be reached and appropriate quality controls are in place.23 In a survey of 27 countries belonging to the International Cancer Screening Network in 2007–2008, all but two (US and Uruguay) reported the existence of programmatic screening.2
The continued debate on the efficacy, monitoring and appropriate targeting of screening, points to the need to achieve a critical balance between the reductions in breast cancer mortality with the risks of overdiagnosis and follow-up of false positives. As noted above, there is evidence that false positive results have been reduced in the context of existing high volume programs,19 and routine outcome monitoring, as occurs in most organised programs, is key to introducing quality improvement interventions to ensure this balance minimises known risks. On the other hand, the very visibility and transparency of organised programs makes them an easier target when renewed discussions of the harms and benefits of mammography arise. For example, in the companion commentary to the recent revision of the Canadian guidelines (which recommended screening in the 50 to 74 year age group), the following opinion was offered: “The best method we have to reduce the risk of breast cancer is to stop the screening program.”12
In the UK, the ongoing debate led to a full independent review of the screening program, which included a careful consideration of the potential risks of overdiagnosis. While acknowledging that a woman who is screened beginning at age 50 years in the program would probably have an approximate risk of one per cent of having a breast cancer overdiagnosis, the review concluded that the program “…confers significant benefit and should continue”.1
Recently, however, a review of Swiss screening programs resulted in a recommendation that no new systematic programs be implemented, and that existing programs should have a ‘time limit’ imposed upon them.6 While it has been pointed out that the mortality reductions in Swiss cantons (regions) with breast screening programs decreased at about the same rate as in cantons without such programs,24 it is acknowledged that there is active opportunistic screening occurring through private practice in other cantons. The Swiss Medical Board’s additional recommendation, that the quality of all forms of mammography be evaluated, is in fact more difficult to carry out in the context of private practice, especially the evaluation of false-positive rates and overdiagnosis, and it is these harms, in fact, that are most under debate. While it is unclear whether the recommendations of the Swiss report will be adopted,25 it will be of interest to follow whether any changes that are implemented as a result allow the evaluation of all mammography (not just programmatic screening), so that risks are minimised in whatever context screening is provided.
Given the ongoing debate, there has been an emphasis in recent consensus processes on the need to refine our understanding of harms and benefits for individual women, and to involve each woman in decision-making around her own participation or non-participation in screening. At the proximate end of this process is the desire to arrive at more quantifiable estimates for women at lower or higher risk of developing breast cancer. For example, in the US, where public messages have targeted women in their forties for decades, modelling has been used to ascertain whether there is an identifiable sub-group of women in this younger age group whose elevated breast cancer risk profile may make the mammography benefit to harm ratio more favorable than for the average woman in her forties. One group determined that if breast cancer risk was doubled over that estimated to be the average or baseline risk, a woman in her forties may have the same benefit to harm ratio for screening as a woman in her fifties.26 Based on this, it has been suggested that women in their forties with a first degree relative with breast cancer, or those with extremely dense breasts, may have this degree of sufficient excess risk of developing breast cancer.14
Any information on the benefits and risk of harm has to be explained in a way that is both comprehensible and salient to the woman considering screening. This is not straightforward, however. A truly transparent process necessarily involves the use of reasonably complex numbers. A study of numeracy and decisions about mammography found that even though 96% of the study subjects were high school graduates, few could provide correct answers to three simple numeracy questions – and there was a strong correlation with accuracy on these questions and the ability to correctly interpret information on mammography and breast cancer risk.27 Thus, one cannot assume that simple presentation of the numbers will be sufficient to engage women in full decision-making.
Nevertheless, considerable effort has been put towards the development of relatively simple or complex decision aids to assist an individual woman in determining her preferences about whether to screen.11,28 One cannot argue with the motivation to provide women with tools to sort through this complex information – although, as others point out, we have not successfully arrived at complete agreement among experts on how to interpret the data we do have, and it is acknowledged that the impact of decision aids in screening is not well quantified.29 One study looking at the impact of a decision aid in assisting 70 year-old women to decide on mammography found that it did increase knowledge, but did not change women’s decisions.30
Further, it must be acknowledged that a decision about screening is not a single life event, but will be revisited as a woman’s perceived (and actual) risk changes, or as reports of emergent mammography studies change the available evidence base. Very little is known about the effectiveness of providing decision aid-based information over time. A Cochrane review of decision aids found that even when the individual’s choice was towards a particular treatment course, there was no impact on adherence to that therapy over time.31 Thus, while one cannot argue with the prudence of providing such tools, neither are they a panacea.
Finally, while the concept of shared decision-making implies a clinical context, most women receive much of their information about mammography from public messaging. It is true that we need to shift our efforts toward educating the public, as distinct from earlier efforts to simply encourage women to be screened.32 However, media reporting frequently emphasises the controversy rather than attempting to provide clarity; following the reporting on the USPSTF guideline changes in 2009, more women reported being more confused than helped by the information.33 While we cannot change editorial policy or reporting style, as professionals must make the effort to be informative, rather than provocative, if we are going to discharge our responsibilities to the public we serve.
Most recent analyses find a favorable benefit to risk ratio for screening mammography in women aged 50 to 74 years. Estimates of the mammography-associated harms in many studies are based on community practice mammography, but it appears that the risk of false positives is much lower within the context of organised programs. Thus, the suggestion to discontinue programs while allowing continued opportunistic screening appears to be ‘throwing the baby out with the bathwater’– and is unlikely to result in reduced risk to women. Consideration should be given to new analyses that reflect the lower false positive rates achieved in programmatic contexts, and of developing ways to explain this information to women and to policymakers to ensure that the highest quality screening is available for women who choose to be screened.