Clustering is a difficult concept to define precisely. It is important to distinguish it from the notion of an individual cluster, corresponding to an excess number of cases in one small area or around a putative point source. A fundamental problem when trying to assess the significance of a specific cluster is that analysis is almost always post-hoc--that is, the cluster is recognised as being unusual by some uncontrolled process, and then a subsequent statistical assessment is made. This is exactly opposite to the situation for which statistical testing was designed--where a hypothesis is first generated and then subsequently tested on new data. As a consequence the vagaries of the spatial and temporal boundaries of the putative cluster make it very difficult to determine the probability of the event being a chance occurrence.
A key aspect of statistical analysis is the concept of replication. If there is a suggestion of clusters at a variety of locations then statistical procedures are more capable of assessing whether this is due to chance. In this sense the analysis of clustering can be viewed as an extension of methods for studying spatial variation to a much smaller scale, where classic mapping procedures no longer are applicable. However, instead of being able to produce a visually appealing map of disease incidence that varies smoothly, here the variability is too localised to allow the averaging necessary to produce such maps, and the more abstract concept of excessive variance must be relied upon.[1]
Many new problems arise in attempting to do this. The most fundamental is how to account for the variation in population density at a very fine scale. Where available a complete population enumeration can be used, but when the scale is very small often it is more accurate to use a sampling scheme for selecting (matched) controls.
A second problem is determining the appropriate metric for establishing closeness. Should it be a fixed distance, as in methods developed by Diggle et al,[2] or should the population density be considered, as in methods developed by Cuzick and Edwards,[3] so that a cluster would encompass a larger area in a low density rural area than in a built up urban area. Other differences relate to whether clusters should be determined by the distance between cases, or the number of cases in predefined geographical areas (eg, wards or postcodes). That distance methods would be more efficient would be suspected, but sometimes this approach is easier to apply to available data, and simulations suggest that the power of these methods are similar.[4]
Clustering methods will always be exploratory, and they leave open the question of what is responsible for the clusters. Their value is to identify clearly when it is worth while to search for causative (infectious or environmental), agents. As more small scale geographical information becomes available for different diseases it is likely that clustering methods will be used more widely. Not only will they help to identify when clustering is present, but as in the present example, they also can rule out localised clustering in favour of a simpler explanation in terms of population density.
[1] Elliott P, Cuzick J, English D, Stern R, eds. Small-area studies: purpose and methods. In: Geographical and enironmental epidemiology : methods for small area studies. Oxford: Oxford University Press; 1992:14-21.
[2] Cuzick J, Edwards R. Spatial clustering for inhomogeneous populations. J Roy Stat Soc B 1990;52:73-104.
[3] Diggle PJ, Chetwynd AG. Second-order analysis of spatial clustering for inhomogeneous populations. Biometrics 1991 ;47:1155-63.
[4] Alexander FE, Boyle P, eds. Methods for investigating localized clustering of disease. Lyons: International Agency for Research on Cancer Scientific Publications, 1996. (No 135.)
Department of Mathematics, Statistics, and Epidemiology, Imperial Cancer Research Fund, PO Box 123, Lincoln's Inn Fields, London WC2A 3PX Jack Cuzick, head
j.cuzick@icrf.icnet.uk
COPYRIGHT 1998 British Medical Association
COPYRIGHT 2000 Gale Group