The design of controlled clinical trials

 

CHARLES WARLOW

 

 

Doctors need to know which treatments are effective. Patients not only expect to be advised what works and what does not but increasingly ask to be told of the risks and benefits of treatment options. Lawyers, economists, accountants, hospital managers, and third-party payers are becoming ever more curious and concerned about the cost-effectiveness of treatments and, if choices must be made, where priorities should lie. Theory alone is not enough to provide the answers, it never was; ‘the great tragedy of science’ wrote Thomas Huxley is ‘the slaying of a beautiful hypothesis by an ugly fact’. A treatment suggested by theory, or occasionally by serendipity, must be thoroughly tested in clinical practice since ‘one of the most important things about treatment is that it should be effective—not merely that it ought to be effective. A remedy which is known to work, though nobody knows why, is preferable to a remedy which has the support of theory without the confirmation of practice’ (Richard Asher, Lancet, 1961 ii, 1403 - 4). This chapter is concerned with Huxley's ‘ugly facts’ and Asher's ‘confirmation of practice’; the proper evaluation of treatment, specifically surgical operations.

 

The methodology used to assess the benefits, risks and costs of surgical operations is not much different from that used to assess drugs, physiotherapy, psychotherapy, acupuncture and any other intervention to relieve symptoms or influence the natural history of disease, and it is sometimes simpler. The fact that, in general, drugs have been better evaluated than operations is only partly because of the practical problems with trials of the latter; it is at least as much to do with the double standard which demands that drugs but not operations should be licensed before they are introduced into routine clinical practice. The proper evaluation of treatment may be hard work, but it is hardly more dangerous, or costly, than introducing or persevering with ineffective treatments. Many useless and even harmful treatments, both medical and surgical, have been introduced without good evidence of benefit, and the cost of this misplaced enthusiasm must be many times more than the cost of proper evaluation. Indeed, even during a trial there should be a considerable cost saving if half the patients do not receive the treatment under test when in normal clinical practice all the patients might have received it, whether it definitely worked or not.

 

EVIDENCE OF BENEFIT AND RISK: METHODS TO EVALUATE TREATMENT

Unstructured personal experience and anecdote

While these are immensely valuable in diagnosis, they were the traditional methods by which the ancients, barber surgeons, apothecaries and witch doctors evaluated treatment. Sometimes they got it right and sometimes they got it wrong. Although a personal observation, or mere anecdote, may be the start of the trail towards proving a treatment is effective, by itself it is no longer enough: better evidence is required to convince a sceptical and cost-conscious world. Without studies to evaluate benefit we risk the introduction of useless treatments (e.g. extracranial-to-intracranial bypass surgery to prevent stroke in unselected patients with internal carotid artery occlusion), and the near rejection of useful treatments (e.g. thrombolysis for acute myocardial infarction).

 

Literature controls

These are still often used for comparison with a series of operated patients. Although this method provides papers which embellish the curriculum vitae of keen young surgical (and medical) trainees, this method of evaluation is quite hopeless. At best the method is hypothesis-generating, since there is never a satisfactory guarantee that the untreated patients reported in the literature are sufficiently similar to the patients in the operated series to make any difference in outcome ascribable to the operation under consideration. The truth of this can easily be seen by examining the so-called natural history series in the literature which so often report very different outcomes even when the disease under consideration appears to be the same (Table 2) 718. The reasons for this include differences between the series in referral patterns, selection criteria, exclusion criteria, age and other prognostic variables, length of follow up, definition of outcome events and treatments given. Sophisticated statistical adjustment for these variables may still not explain any difference in outcome, particularly if there are unknown, and therefore unmatchable, differences between one series and another. Also, prospective series tend to detect more outcome events, particularly the less serious ones, than retrospective series.

 

Historical controls

The outcome of previous patients in an institution, perhaps treated by the same doctors, is often compared with the outcome in patients given the ‘new’ treatment. Such comparison is little better than using literature controls for much the same reasons. Of course, if the new treatment has a really major effect (e.g. penicillin for meningococcal meningitis) it will overwhelm any bias caused by modest baseline differences between the treated and previously untreated patient groups. Most treatments do not have such a major effect, however; if they did we would not still be arguing about them (Table 3) 719. It is crucial to be able to recognize, and not reject in error, treatments with relatively modest effects because if they can be applied to large numbers of patients at relatively low cost and risk they will have a major public health impact (e.g. streptokinase for acute myocardial infarction). It is equally important to resist the introduction of useless treatments which may not necessarily be all that dangerous but which are still costly (e.g. bed rest for twin pregnancy). It is impossible to recognize reliably a modest treatment effect in what is basically a ‘before and after’ (the introduction of a treatment) historical comparison. Medicine has the great advantage that randomization can be used to compare like with like. This method is not available to economists and politicians and is rarely used by social scientists, perhaps explaining the general lack of progress in these fields.

 

Concurrent non-randomized controls

These are patients who are selected for the ‘old’ treatment and compared with others who are selected for the ‘new’ treatment during the same period. This method of treatment allocation may be prospective (but so often the outcome is actually analysed retrospectively and unreliably from case note review) but it is haphazard rather than random and almost always reflects a bias in allocation to one treatment or the other. This results in the two groups being unequal in some important way (e.g. disease severity or age) which may have a crucial effect on outcome. In addition, if outcome is based on retrospective case note review there are always problems: the hospital diagnostic index is incomplete, case notes are missing (often of the most interesting patients), information is not recorded in the notes, follow-up is variable or incomplete, definitions (if any) of outcome are not uniformly applied. Moreover, the outcome may be played up or down in different ways depending on the observer's feelings about the treatment. Such non-randomized series may well exaggerate or completely miss modest but important treatment effects and usually do no more than generate hypotheses. What is worse is the fact that false-positive comparisons are more likely to be published than the false-negative (or even genuinely negative) comparisons: surgeons with poor results are not as likely to publish their series as those with good results. The literature itself is undoubtedly biased towards medical and surgical optimism.

 

Cross-over trials

These are not an option in the evaluation of surgical operations since their design demands that a group of relatively stable patients, or even just one patient, receives first one treatment and then the other in a random order.

 

Randomization

Allocation of patients to one of two (or more) treatments is the most effective and efficient method to ensure that the ‘treated’ and ‘untreated’ groups of patients are similar for not only prognostic variables that are known (e.g. the stage of a cancer) but also for those that remain to be discovered (e.g. until recently it was not known that plasma fibrinogen is a risk factor for stroke). Provided that the randomization is secure, with no previous knowledge of the treatment group and, therefore, no potential for cheating, there is no possibility of the ‘best’ patients receiving what is thought to be the ‘best’ treatment—like groups of patients will be compared with like. Randomization must be based on tables of random numbers or chance; allocation by day of week, date of admission to hospital or date of birth is not random.

 

The main objection to randomization, particularly of surgical operations, is said to be ethical. In fact, if one is truly uncertain whether one treatment is better or worse than another (usually because non-randomized comparisons have only generated hypotheses) it is more ethical to randomize patients in well conducted trials than to select one treatment rather than another on the basis of whim, particularly if the patients are not told of the uncertainties which, in practice, is what may happen. This is not just because randomization will eventually provide better information on which the treatment of future patients can be based; if the new treatment turns out to be dangerous then half of the patients in a randomized trial will have been protected from it, whereas if it turns out to be effective then half of the patients will have received it. Of course, it is unethical to randomize a patient if one is certain that a treatment is effective, or that it is not. It is only ethical to randomize a patient if one is uncertain which treatment to use and it is this grey area of uncertainty which provides the ethical window to scientific advancement. This uncertainty principle is crucial to the design of randomized (and even non-randomized) trials since it allows doctors with different grey areas of uncertainty to collaborate and randomize a broad range of patients without having to agree on exactly who is eligible or ineligible. All that they need to agree on is the disease to be treated (but not necessarily its severity or extent), the treatment, and the outcome to be measured (Fig. 1) 2891. It is perfectly legitimate for some collaborators to randomize only ‘mild’ cases: they may be certain that severe cases must not remain untreated when a new operation which they believe to be effective is available, but may remain uncertain what to do about the mild cases. Other collaborators may randomize only severe cases, because they happen to be certain that mild cases must not be exposed to the risk of the operation but are uncertain about whether it should be used in severe cases. In this way the treatment will be properly assessed across the whole disease severity spectrum; its risks and benefits can be compared within each severity category. Far more patients will be randomized by allowing collaborators to be uncertain about somewhat different categories of patients than be enforcing rigid eligibility and ineligibility criteria; indeed, the only certainty required is that each collaborator agrees to be uncertain in some cases and randomizes them.

 

It is worth stressing that in trials of surgical procedures the ‘no surgery’ patients are seldom untreated: some kind of medical treatment is usually used in both groups (e.g. aspirin as an antithrombotic drug in patients randomly allocated to carotid endarterectomy or no surgery); surgery is usually being added to, rather than substituting for, a medical treatment. It is often possible to design a trial so that patients are randomized to surgery now (e.g. coronary artery bypass for mild angina) versus no surgery, with the option of later surgery if symptoms persist, recur, or become troublesome (e.g. severe or unstable angina); indeed, such a design reflects clinical reality and the common and reasonable tendency to offer surgery if conservative treatment fails.

 

Any residual ethical concern can be minimized by asking an independent and experienced group of clinicians and statisticians to review the ongoing results of the trial in case a definitely beneficial, or harmful, effect of treatment appears before the scheduled end of the trial. Interim analyses must, however, be appropriately interpreted since spurious statistical significance may appear by chance quite early and a stringent ‘stopping rule’ must be applied, such as not until the difference in outcome between two groups reaches three standard deviations under the null hypothesis of no treatment difference. In addition, randomization can be unequal, so that the a priori preferred treatment is allocated to more patients than the alternative treatment (at random of course); no significant power in the analysis is lost until the allocation is more extreme than about three to two.

 

Exactly what information to give patients in randomized trials of surgical operations is easy to address. It should be no different to that given to patients who are being treated in normal clinical practice. If full information about the risks, benefits, and uncertainties of all of the treatment options is provided for patients in normal, and usually haphazard, clinical practice from which no scientifically useful information is likely to emerge, then it is an unacceptable double standard to demand that such full information is given to patients before randomization. Of course, patients in or out of randomized clinical trials have the right to as much information about the treatments as they wish to know. Sometimes, however, too much information may be too much for some patients to bear in normal clinical practice, and the same applies in randomized clinical trials. The notion that randomization is unethical because the patient cannot cope with the concept may well be false; it may be the doctor who cannot cope.

 

MEASUREMENT OF OUTCOME

It is important to decide exactly what is to be achieved by giving a particular treatment and then how to measure it in a way which is valid and which really measures the relevant outcome and not something else. The measure must also be sensitive to change, so that any worthwhile effect of treatment is measurable; reliable, with low inter- and intra-observer variation; simple, so that it is actually measured; and communicable to others, so that it is understood. Surgical operations have the advantage that outcomes of interest and relevance usually have to be fairly obvious to make the risk of surgery worth taking, such as death, recurrence of cancer, or risk of amputation. Although highly relevant, quality of life is extraordinarily difficult to measure and depends greatly on an individual patient's (and carer's) life-style and philosophy. This being so, an outcome must be unequivocally enhanced (e.g. 5 extra years of life without recurrence of cancer) and any surgical disadvantage in terms of quality of life lost has to be large enough to be easily observed and allowed for in the application of the trial results. For example, reduced quality of life due to the loss of a leg in the treatment of osteosarcoma may well be worthwhile if cancer-free life can be prolonged for several years, but it is presumably not worthwhile if life is only prolonged a few months. Obvious measures of outcome are also needed since trials of surgery cannot normally be double blind; the patients will at least know that they have had an operation and will no doubt be subject to a placebo effect if nothing else. One is, therefore, much less interested in small treatment differences than in an outcome which is clinically relevant and obvious. Nonetheless, the assessment of outcome, if not the act of surgery, can often be blinded so that at least any bias in the assessment, as a consequence of knowledge of the treatment given, is minimized (e.g. it may be possible to collect outcome data by telephone or from routinely collected statistics such as death certificates). Finally, it is important to ensure that a treatment effect is not submerged in other outcomes which cannot be influenced by the treatment (e.g. non-prostatic cancer deaths in a trial of treatment directed specifically at prostatic cancer). Obviously, overall death rates must be monitored in case a treatment which prevents death from one cause is responsible for death from another cause. However, the effect of treatment must be analysed in terms of what is most likely to be affected, particularly if that occurs in a rather small proportion of the patients (Table 4) 720. Non-fatal adverse effects of surgery, such as discomfort, prolonged hospital stay, complications of preoperative investigation, and deep vein thrombosis must also be accounted for if at all possible.

 

ANALYSIS OF RANDOMIZED TRIALS

In trials of surgical treatments one is not usually dealing with outcomes which are represented as continuous variables (e.g. blood pressure), which may be compared using t-tests, but rather with discontinuous variables (e.g. recurrent cancer versus no recurrence). Such outcomes are best analysed by deriving a risk or an odds ratio, usually from lifetables, using standard techniques found in statistical textbooks. The difference between relative reduction in risk and relative reduction in odds is illustrated in Table 5 721. The advantage of the odds reduction is that it is much easier to calculate a 95 per cent confidence interval while the disadvantage is that clinicians usually think in terms of relative reduction in risk. However, numerically both are very similar if the outcome of interest occurs in a fairly small proportion of patients. In the example given in Table 5 721, the result is not statistically significant at the p<0.05 level, since the 95 per cent confidence interval embraces 1.0. If the trial had been performed with larger sample sizes and produced the same result (i.e. a relative reduction in odds of 57 per cent) the 95 per cent confidence intervals would have been smaller (Fig. 2) 2892. Using a sample size of 200, the upper confidence interval is 0.92, which indicates that there is statistical significance at the p <0.05 level, but the result of the trial is still imprecise; the 95 per cent confidence interval of the odds ratio ranges from 0.20 to 0.92, suggesting that in the real world the true effect of treatment might be as good as a relative reduction in odds of 80 per cent ((1 − 0.20) × 100), or only 8 per cent ((1 − 0.92) × 100). Far larger sample sizes are required to obtain precise results than are needed for mere statistical significance. In this example, a sample size of 1000 gives reasonable precision, suggesting a treatment effect of a relative reduction in odds of 40 to 70 per cent. To estimate required sample sizes it is best to calculate various possible scenarios (as in Fig. 2 2892) on the basis of the expected result in the control group, various guesses at the expected treatment effect (without being too optimistic), and various sample sizes. It will then be obvious whether one needs tens, hundreds, or thousands of patients for a precise result and, therefore, whether one needs a single centre or multicentre trial. A multicentre trial has considerable advantages (Table 6) 722, except perhaps when exploring the fine detail of a treatment, usually fairly early in its development, or conducting a pilot study to define a protocol for a main study.

 

"Intention-to-treat" versus "on-treatment" analysis

Even in the most ordered world, some patients in a randomized clinical trial will not receive their allocated treatment—they may refuse surgery or insist on it, they may die before surgery can be performed, or the anaesthetist may veto surgery. Inevitably, therefore, there are ‘cross-overs’; patients allocated to surgery who don't undergo the operation and patients allocated to no surgery who are operated upon. Provided that the proportion of cross-overs is small, this will not affect the analysis of all the patients in their allocated treatment group (not the analysis by what treatment they actually received). This preferred analysis by intention-to-treat is crucial, not just because it reflects real clinical life, but because it may well be that the reason for a cross-over is directly related to the prognosis, so making an on-treatment analysis biased (Fig. 3) 2893. In the example shown in Fig. 3 2893, the on-treatment comparison of death rate excludes the patients allocated to coronary artery bypass surgery who were too sick to receive it but does not exclude similar patients in the group randomized to no-surgery. An on-treatment comparison is clearly biased in favour of surgery since, for the comparison, patients with a poor prognosis were removed from the surgery group but not from the no-surgery group. It follows, therefore, that all randomized patients must be followed up until the end of the trial, whether or not they receive the allocated treatment, and all outcome events between randomization and the end of the trial must be counted. Although this is the most conservative method of analysis it is unbiased and if the trial is positive it will reflect a real treatment effect, albeit diminished somewhat in size by the cross-overs.

 

Subgroup analysis

In any trial (non-randomized as well as randomized) there is an understandable temptation to look at treatment effects in subgroups of interest, particularly if the overall trial result is ‘negative’; after all, individual patients are different from each other. Unfortunately, chance can play extraordinary tricks so that a treatment can appear to be particularly favourable, or particularly unfavourable, in one subgroup or another even when there is no biologically plausible explanation (Table 7) 723. The more subgroups examined the more likely such a chance effect is to occur. Post-hoc subgroup analysis cannot, therefore, be regarded as anything more than hypothesis-generating and any apparent treatment effect must be confirmed in a further trial with the preformed a priori hypothesis that a particular subgroup of patients will benefit while other subgroups will not. In practice, the most reliable estimate of a treatment effect, even in a subgroup, comes from the whole sample of randomized patients. The assumption can often be made that subgroups of interest will differ in their response to the treatment to some extent (there will be a quantitative difference) but it cannot be assumed that there will be an unexpected qualitative difference such that treatment is harmful for one subgroup of patients but beneficial for another.

 

Cointerventions

It is obvious that treatments other than that under consideration, such as radiotherapy and drugs, should be as similar as possible in those patients allocated surgery as in those allocated to no-surgery. Major differences in effective (or dangerous) cointerventions should be monitored since they might bias the result either for or against the surgical treatment under trial.

 

Statisticians

A statistician who knows about clinical trials is as important at the design stage of a clinical trial as an anaesthetist who knows how to give a safe anaesthetic is before opening the chest. It is simply no good going to a statistician at the end of a non-randomized trial when many patients have been lost to follow-up, outcome events have been poorly recorded, and the professor wants a quick result. Statisticians are not, contrary to popular imagination, magicians and cannot resuscitate a seriously flawed trial. It is also important to consider the general methodological issues in Table 8 724, even if answering them all can be difficult.

 

OVERVIEWS (META-ANALYSES)

More than one randomized trial is often performed to test a treatment for a particular disease. Questions then arise as to which trial should provide material for inferences for future clinical practice; one with a positive result, one with a negative result, one's own trial, or a trial done by one's friends? The results of selecting one trial from many similar trials can produce as much bias as not randomizing patients in the first place. The solution to the dilemma is to consider all of the evidence—in other words, all similar unconfounded (truly randomized) trials together in an overview analysis, such that an overall estimate of the treatment effect is obtained (a so-called typical relative reduction in odds). Unpublished as well as published trials must be considered, since trials with negative results (particularly if they are small) tend not to be submitted for publication, and if they are, tend not to be accepted.

 

The overview technique is simple and depends on looking at the results of each trial individually and at the observed number of events and the expected number of events on the assumption (or null hypothesis) that there is no treatment effect. Therefore, from Table 5 721, the number of cancer deaths and recurrences is 18 so that the expected number in the treatment group is 18 ÷ 2 = 9. The expected number (E) is then subtracted from the observed number (O) in the treatment group (i.e. 6−9). A negative result ( - 3 in this example), points towards a benefit of treatment. If there is no treatment effect in a series of similar trials, the (O−E) for each trial will sometimes be somewhat negative and sometimes somewhat positive (just because of the play of chance) but the grand total of (O−E) for all of the trials will be close to zero (and certainly not significantly different from zero). On the other hand, if there is a positive treatment effect then the sum of the (O−E) values will be statistically significantly less than zero, even if the result from each individual trial is not statistically significant by itself. Using this technique an estimate of the typical odds ratio and its confidence interval can easily be calculated from the (O−E) values and their variances (Fig. 4) 2894.

 

Another major advantage of overviews is that they can make sense of a series of trials which are each too small (and therefore have wide confidence intervals) to give a precise estimate of the treatment effect; indeed they may individually not even be statistically significant (Fig. 4) 2894. A series of trials yielding non-significant results with 95 per cent confidence intervals overlapping 1.0 and yet which all (or nearly all) show a trend in favour of treatment may well yield a clearly positive result in an overview analysis. Each individual trial may produce a type II error, suggesting no treatment effect when one exists, simply because it is too small to show the treatment effect and the 95 per cent confidence interval overlaps 1.0. A type I error, suggesting that there is a treatment effect when none really exists, is easily dealt with by the p value, a level < 0.05 suggesting that, if no true difference exists, a positive result is not likely to occur more than one in 20 trials by chance (i.e. it is unlikely). At a level of p < 0.01, a false positive result is extremely unlikely—not more than 1 in 100 trials by chance.

 

If individual patient data are obtained from each trial in an overview analysis (preferably undertaken by all of the trialists in collaboration) then subgroup analysis can be attempted in a large enough sample of patients to minimize, but seldom abolish, the play of chance. Such a collaborative process also allows analysis of unpublished data, using up-to-date statistical techniques, and in particular can ensure that the same outcome of interest is assessed, and even reassessed in each individual trial. For example, the published results of a trial often do not include an outcome event which is reported in another trial, but in both trials that event has actually been collected and is available for analysis. For example, in one trial non-fatal myocardial infarction may be recorded but not published whereas in another trial this event may be both recorded and published.

 

GENERALIZABILITY (SO-CALLED EXTERNAL VALIDITY)

Strictly (and ridiculously) speaking, the results of a randomized trial, or a non-randomized series, apply only to the patients who were in the trial. However, in practice the results have to be generalized and applied to future patients and the question is, which future patients? The answer is, similar patients to those in the trial. This requires the trial patients to be properly described in terms of presenting clinical features, underlying disease, age, disease severity, and important prognostic variable if any are known. Future patients will never be exactly the same as the trial patients and one can only hope to use the trial results in approximately similar patients. How different future individual patients are allowed to be from the original trial patients in order to apply the results is a matter of clinical judgement. For example, if the trial patients have an age range of 50 to 70 years, it would seem reasonable to apply the results in future patients aged 45 or 75, unless there is a good reason to suppose that the treatment effect is extremely age-related. On the other hand, just because a trial shows that radiotherapy is an effective treatment for intracerebral lymphoma does not necessarily mean that it is also an effective therapy for an astrocytoma, because they are biologically different. The technical ability of a surgeon(s) and his team(s) in a trial cannot be generalized to all other surgeons and teams. There is an unfortunate tendency to suppose that just because the professor has a 1 per cent surgical mortality rate in St Agnes' Memorial Hospital and writes up his series, that this figure applies to everyone else who says that he or she can do the operation, which is clearly not necessarily true. Therefore, in any trial, the operation itself must be clearly described along with the perioperative mortality and morbidity. When the results are generalized to one's own institution and patients, it is important that the same (or more or less the same) operation is performed and that one's own mortality and morbidity rates are accounted for when considering whether the operation should be used or not.

 

None of these arguments leads to the conclusion that patients who are eligible for a trial (or a non-randomized series even) but not randomized (for whatever reason such as inconvenience, laziness, or a daunting protocol) need be either described, followed up, or accounted for in the analysis. The results which are generalizeable to future patients come from the randomized patients, not from those who were eligible but not randomized. Randomized trials are not trials of a random sample of all eligible patients, or any other population, but are of treatments which are allocated at random. In fact, patients in randomized trials, and non-randomized series, are usually highly selected depending on local referral patterns and on many other factors.

 

INTERPRETING THE RESULTS

Most trial results are given as a relative reduction in risk (or odds) of a particular outcome of interest: in the example in Table 4 720 this is 50 per cent (from 8 per cent cancer death in the control group to 4 per cent in the treated group). Although this is a useful and generalizeable result it does need to be seen in context. A large relative reduction in risk does not necessarily imply a large absolute reduction in risk; in this example, although the relative reduction in risk is large, the absolute reduction in risk is only 4 per cent. It is from the absolute reduction in risk that any kind of cost-effectiveness analysis is derived. From the example, it is clear that reducing the cancer death rate from 8 per cent to 4 per cent implies that if you treat 100 patients then four will avoid cancer death; in other words one has to treat 25 patients to avoid one cancer death. This concept of the number of patients which need to be treated to prevent one particular outcome of interest is crucial when allocating resources since, even if the number is small, if the treatment is cheap and safe then it should probably be used, particularly if the disease is common. On the other hand, if the number is large and the treatment expensive and risky, then it should probably not be used, particularly if the disease is rare. If the number needing to be treated is large, a relevant strategy is to try and identify a particular group of high-risk patients since, if the absolute risk of their adverse outcome is high, then a small relative reduction in risk would lead to a large number of adverse outcomes prevented. If, in Table 5 721, there was a subgroup of patients likely to have a 20 per cent cancer death rate and the treatment still reduced that by 50 per cent down to 10 per cent, then only 10 patients would need to be treated to prevent one cancer death. This emphasizes the importance of finding and treating high-risk rather than low-risk patients in situations where the treatment is risky or costly. Unfortunately this means that the low-risk patients are not treated, even if they are in the majority; under these circumstances, the treatment, however successful, will not have much impact on the outcome of the whole population of patients.

 

FURTHER READING

Campbell DT, Reforms as experiments. Am Psychologist 1969; 24: 409 - 29.

Chalmers I. Scientific inquiry and authoritarianism in perinatal care and education. Birth 1983; 10: 151 - 66.

Chalmers I. Minimizing harm and maximising benefit during innovation in health care: Controlled or uncontrolled experimentation? Birth 1986; 13: 155 - 64.

Chalmers I. Evaluating the effects of care during pregnancy and childbirth. In: Chalmers I. Enkin M, Keirse HJNC, eds. Effective care in pregnancy and childbirth Vol. 1: pregnancy. Oxford University Press, 1989: 3 - 38.

Cochrane AL. Effectiveness and efficiency: random reflections on health services. Nuffield Provincial Hospital Trust, 1971.

Collins R, Gray R, Godwin J, Peto R. Avoidance of large biases and large random errors in the assessment of moderate treatment effects: the need for systematic overviews Stat Med 1987; 6: 245 - 50.

Early Breast Cancer Trialists' Collaborative Group. Treatment of early breast cancer. Vol. i, worldwide evidence, 1985 - 1990. Oxford University Press 1990.

Gardner MJ, Altman DG. Statistics with confidence—confidence intervals and statistical guidelines. London: British Medical Journal 1989.

Hankey GJ, Slattery JM, Warlow CP. The prognosis of hospital-referred transient ischaemic attacks. J Neurol Neurosurg Psychiat, 1991; 54: 793 - 802.

Herxheimer A, Zentler-Munro P, Winn D. Therapeutic trials and society: making the best use of resources. London: Consumers' Association, 1986.

Malleson A. Need your doctor be so useless? London: George Allen & Unwin, 1973.

Peto R. Why do we need systematic overviews of randomised trials? Stat Med 1987; 6: 233 - 40.

Peto R, et al. Design and analysis of randomised clinical trials requiring prolonged observation of each patient. i. Br J Cancer 1976; 34: 585 - 612.

Petro R, et al. Design and analysis of randomised clinical trials requiring prolonged observation of each patient. ii. Br J Cancer 1977; 35: 1 - 39.

Sackett DL, Haynes RB, Tugwell P. Clinical epidemiology: A basic science for clinical medicine. Boston: Little, Brown and Company, 1985.

Sandercock P. The Odds Ratio: a useful tool in neurosciences, J Neurol Neurosurg Psychiat 1989; 52: 817 - 20.

Warlow C. How to do it: organise a multicentre trial. Br Med J 1990; 300: 180 - 83.

Yusuf S, Collins R, Peto R. Why do we need some large, simple randomized trials? Stat Med 1984; 3: 409 - 20.

Хостинг от uCoz