## Abstract

Data transformation in geoscience has typically been motivated by three objectives: (1) creating normally distributed data; (2) creating data that are additive; and (3) making errors constant across the range of the data.

Historically, transformation of geochemical concentrations has been undertaken to achieve normality. Unfortunately, most geochemical distributions are multi-modal and derived from several geological sources. Thus no continuous, monotonic transformations exist that can convert these into (even approximately) normal distributions, and thus transformation for this purpose is neither generally achievable nor justified. Transformations that create additivity are rare in geochemical applications, although they are important in error treatment and lithogeochemical data analysis. These transformations effectively convert data into a form that can be sensibly manipulated, and thus facilitate subsequent data analysis. Transformation to stabilizing errors in geochemical data is also not common, although it is a useful attribute in subsequent geochemical data analysis.

Another type of data transformation, designed to maximize geochemical contrast (or maximize data variance), may be achieved by raising geochemical concentrations to a power after transforming the data to the 0 ↔ 1 interval. The power that produces the maximum variance in the transformed result creates the maximum geochemical contrast, affording the geochemist an opportunity to extract the most information from the geochemical data.

The ‘maximum data variance’ transformation is based not on the subsequent data analysis result (e.g. recognizable geochemical patterns; circular reasoning where ‘the end justifies the means’), but on an optimal property created by the transform. As a result, this transformation provides significant advantage in subsequent data analysis because results achieved are not subjective.

- geochemistry
- data analysis
- transformation
- geochemical contrast
- constant variance
- normal distribution
- additivity
**.**

## Introduction

Geochemical concentration data from soil, drainage sediment, rock, soil gas and till samples collected in mineral exploration or environmental investigations have frequency distributions that differ in a variety of ways (Aubrey 1954, 1956; Jizba 1959; Vistelius 1960; 1964; Oertel 1969*a*,*b*; Sinclair 1976; Meisch & Kork 1978; De Wijs & Meisch 1979; Stanley & Sinclair 1986; Schneider 1988; McLachlan & Basford 1988; Cohen 1991; Stanley 2003*a*). Observed frequency distributions of these data may, among other forms, be positively or negatively skewed, multi-modal, or even truncated by an analytical detection limit. As a result, geochemists sometimes find it advantageous to transform the geochemical data to enhance its characteristics before numerically or graphically evaluating them (Curtis 1943; Bartlett 1947; Tukey 1957, 1958; Kruskal 1968; Draper & Cox 1969; Draper & Hunter 1969,⇓).

A variety of transformation functions have been employed in geochemical applications. Some simple transformations include logarithmic (1) (Finney 1941; Razumovsky 1941; Gaddum 1945; Pearce 1945; Ahrens 1954*a*,*b*, 1957; Bennett 1956; 1957; Aitchison & Brown 1957; Laurent 1963; Chapman 1977; Miesch 1977); square root (2) (Bartlett 1936; Anscombe 1948; Freeman & Tukey 1950; Blom 1954,⇓,⇓,⇓); power (3) (Moore 1957; Healy & Taylor 1962,⇓); Box–Cox (4) (Box & Cox 1964; Howarth & Earle 1979; Joseph & Bhaumik 1997,⇓,⇓); angular (5) (Zubin 1936; Eisenhart *et al*. 1947; Anscombe 1948; Ghurye 1949; Freeman & Tukey 1950; Claringfold *et al*. 1953; Stevens 1953; Biggers & Claringfold 1954; Blom 1954; Fisher 1954,⇓,⇓); and reciprocal (6) (Hoyle 1973) transformations. In these transformation functions, the λ‘s are commonly unconstrained, and therefore represent arbitrarily chosen parameters.

Historically, many of these transformations have been employed to create normally distributed variables (e.g. Hoyle 1973; Joseph & Bhaumik 1997; Box & Cox 1964,⇓,⇓); however, the commonly multi-modal nature of geochemical concentration data makes it virtually impossible to transform most geochemical concentrations into new variables with distributions approximating normality. As a result, applying the above transformations to convert a multi-modal distribution into a normal one may not be realizable, and thus may be misguided.

Other reasons exist for transforming geochemical data, and these may impart, to the geochemical data, desirable features that facilitate interpretation. Historically, and generally outside the field of geochemistry, data have been transformed for three different and distinct reasons (Tukey 1958; Dolby 1963; Kruskal 1968; Hoyle 1973,⇓,⇓,⇓): (1) to create a new variable that is ‘additive’ (Elston 1961; Hoyle 1973,⇓); (2) to create a new variable that has ‘constant variance’ (i.e. homoscedastic; Kendall & Stuart 1966; Hoyle 1973,⇓); and (3) to create a new variable that is normally distributed (Hoyle 1973; Joseph & Bhaumik 1997; Box & Cox 1964,⇓,⇓).

### Additivity

‘Additivity’ is the linear property of a variable and allows it to be numerically operated on in a linear manner that often has physical meaning (Elston 1961; Hoyle 1973,⇓). Additivity is not commonly understood by geologists, because it is a data feature rarely sought, mostly because many geological variables are already ‘additive’. The several examples below illustrate geochemical applications where transformations may be used to create additivity.

The first example involves a transformation required to obtain ‘additivity’ in till geochemical exploration, where knowledge of the glacial flow directions is critical to data interpretation. Glacial flow directions can be determined by measuring the azimuths of the long dimensions of clasts within a till. Because elongate clasts line up parallel to the glacial flow direction due to shear forces imparted by the moving ice (Mark 1973), these measurements can be used to determine the ice dispersal direction. As a result, after collecting an adequate number of till clast major axis azimuth measurements (generally >50), the geochemist typically determines the average direction of the major axes of the clasts to deduce the glacial flow direction. This cannot be done by averaging the azimuth measurements (in degrees) because this will produce a result that is inconsistent with the true, average physical direction. Rather, these azimuths must first be transformed into direction cosines, producing a set of easting (*x*) and northing (*y*) measurements corresponding to each azimuth (Fisher *et al*. 1987). These additivity-generating transformations are: (7) The resulting *x*'s and *y*'s can then be averaged, and the mean *x* and *y* values (x̄ and ȳ) can then be converted back into the average azimuth using: (8) Equations 7 and 8 are thus transformations that convert the original azimuths into and back from an additive set of variables that can be operated on (averaged) to determine the azimuth of the ice flow direction (Fisher *et al*. 1987).

Another example of additivity involves determination of sampling error in geochemistry. Standard deviations of replicate samples collected in the field typically are used to estimate the combined magnitude of sampling and analytical (i.e. total) error. Standard deviations of replicate analyses of the samples, in contrast, estimate only the magnitude of analytical error. Unfortunately, one may not simply subtract the analytical error standard deviation (*σ _{a}*) from the total error standard deviation (

*σ*) to determine the sampling error standard deviation (

_{t}*σ*), because errors are additive as variances (not as standard deviations). As a result, we must square the total error and analytical error standard deviations, to produce the corresponding variances (Davis 1986). Then, by subtraction, we determine the sampling error variance, and by taking the square root we determine the sampling error standard deviation. These power (square and square root) transformations are used to create new variables that are additive (the variances), and can be manipulated using the sum-of-variances equation: (9) to determine sampling error.

_{s}A third example of additivity involves the inherent nature of geochemical concentrations in rock samples, which vary because geological material transfer processes (fractional crystallization, diagenesis, metamorphism, weathering, hydrothermal alteration, etc.) have added and removed material to and from the rocks over time. Because the absolute amounts of each element in the rocks vary in proportion to how much of each element is added or removed, these element amounts are already linear (additive). For example, adding 1 g of Na to a 100 g rock containing 10% Na (i.e. containing 10 g of Na) produces a new rock with an amount of Na equal to the amount initially present plus the amount that was added (11 g).

Unfortunately, the absolute amounts of elements in samples are not generally known in geochemistry because when the samples are collected, information is lost about the size of the rock. As a result, we normally describe rock compositions using element concentrations (proportions), and these are not necessarily proportional to the material transfers that take place to cause compositional variability (Stanley & Madeisky 1995). This is because addition or loss of an element can cause the concentrations of other elements to change (via closure), even though these elements were neither added nor removed (i.e. they are conserved; Chayes 1962; Stanley & Madeisky 1995,⇓). As a result, a transformation may be used to convert the concentrations into molar Pearce element ratios (PERs; Pearce 1968; Russell & Nicholls 1988; Russell & Stanley 1990a; 1990b; Stanley & Madeisky 1995,⇓,⇓), which are molar ratios with a conserved element denominator that acts as a standardizing variable to circumvent the effects of closure. The PERs have values that change linearly with the amount of material transfer that the numerator element of the ratio undergoes during the material transfer process (Pearce 1968; Stanley & Madeisky 1995,⇓). As a result, PERs are proportionally ‘additive’, and the equation used to obtain a PER from concentration data is: (10) where the element concentrations (*x* and *z*) are in mass units (wt%, ppm, ppb, etc.), and the gram formula weights (*g*) are in grams/mole (Stanley & Madeisky 1995). This conversion formula is an ‘additive transformation’. Note that Equation 10 is also a multi-variate transformation, because it involves two variables (*x* and *z*). As a result, Equation 10 may not be considered by some to be strictly a transformation because it is a function of two variables. Nevertheless, it does produce a new variable that is additive.

One should note that although major element concentrations are typically not ‘additive’, due to closure, trace element concentrations commonly are. This can be demonstrated by considering the definition of a concentration: (11) where *X* is the mass of an element, *S* is the mass of a rock, and *x* is the mass concentration of that element. Taking the derivative of Equation 11 yields: (12) and dividing both sides by *x* and rearranging, gives: (13) This indicates that the relative change in an element concentration is a function of the relative change in the amount of the element (*dX/X*; the effect of material transfer) and the relative change in the size of the rock (*dS/S*; the effect of closure). When material transfer does not involve a change in the size of the rock (i.e. it is a perfect exchange process and *dS*=0), then the relative change in concentration is equal to the relative change in the amount of the element, and closure has no impact. As a result, the magnitude of the *dS*/*S* term in Equation 13 describes the magnitude of the closure effect. This allows Equation 13 to be used to illustrate why closure affects major element concentrations much more than trace element concentrations.

Consider three rocks each containing a major element with a concentration of 10 wt%, a minor element with a concentration of 1 wt%, and a trace element with a concentration of 0.1 wt%. Each of these rocks undergoes a different type of hydrothermal metasomatism; in the first rock the major element is added, in the second rock the minor element is added, and in the third rock the trace element is added. Let us assume that the amount of addition of each element is equal (0.1 g), and that the mass of each rock was initially 100 g. For each rock, Equation 13 can be used to determine the relative change in the element concentration (*dx*/*x*). For the major element: (14) for the minor element: (15) and for the trace element: (16) In all cases, the closure term (*dS*/*S*)=0.001, but this closure effect is proportionally largest, relative to the material transfer effect (*dX*/*X*), for the major element and smallest for trace element because the material transfer effects differ simply because of the different concentration levels. Clearly, closure introduces significant amounts of additional variation to major elements, and has much less impact, introducing much less variation, in elements with lower concentrations. In practice, the effect of closure in trace elements is generally negligible, and use of a transformation to create an additive geochemical variable from a trace element concentration is generally unnecessary.

Although the above three examples illustrate the relevance of additivity transformations in geochemical data analysis, transformations to create new variables that have ‘additivity’ are otherwise rarely undertaken in most geochemical applications.

### Error variance stabilization

Another possible reason for data transformation is called ‘error variance stabilization’. This has historically not been a traditional motivation for data transformation in geochemical applications, but is a desirable trait for geochemical data that are to be numerically or statistically evaluated. Error variance stabilization is used to treat data that are ‘heteroscedastic’, having magnitudes of variations (measurement errors) that change across the range of the data. Most geochemical concentration data are ‘heteroscedastic’, as analytical errors commonly increase with concentration (Thompson & Howarth 1973, 1976*a*,*b*). This makes data evaluation difficult, because the variations attributable to measurement error vary across the range of the data, and the criteria used to determine whether observed concentration differences are real thus change across the range of concentrations as well.

Transformation to stabilize the error in a geochemical variable creates a new variable that is ‘homoscedastic’ (has constant error variance across the range of the new transformed variable), and this attribute can significantly simplify subsequent statistical tests, numerical procedures and the graphical presentation of the data. The transformation function used is specific to the model describing how measurement error changes with concentration, and propagation of the original heteroscedastic error into the new variable results in new (transformed) error that is homoscedastic. Evaluation of homoscedastic data is far simpler and leads to more compelling conclusions because the data can be evaluated on an ‘even playing field’.

Examples of error stabilizing transformations commonly used in other scientific applications (Hoyle 1973) include: (1) the logarithmic transform (for proportional error; Equation 1); (2) the square root transform (for Poisson error; Equation 2); and (3) the angular transform (for binomial error; Equation 5).

To illustrate how these error-stabilizing transformations work, consider Case 1, above. If proportional measurement error is 10%, then *σ*=*y*=0.10 *c*, where *c* is an element concentration. Propagation of this error into the logarithm of *c* (*y*=log *c*) requires use of the generative error propagation equation (Stanley 1990): (17) In this case, there is only one variable (*c*), which we will assign to *x _{1}*, so Equation 17 simplifies significantly (the second term equals zero). The only required partial derivative is thus: (18) and Equation 17 becomes: (19) Clearly, regardless of the magnitude of

*c*, the error in

*y*(the transformed variable) equals 0.01 and is constant (homoscedastic) across the range of

*y*.

To date, no published geochemistry-related paper is known to the author that transforms geochemical data for this purpose, although ‘homoscedasticity’ would significantly facilitate a large number of efforts to evaluate geochemical data.

### Normality

In many geochemical applications, transformations are commonly undertaken to convert, or attempt to convert, the observed frequency distribution into one that is normal, approximately normal, or at least symmetric (e.g. Bartlett 1947; Draper & Cox 1969; Hoyle 1973,⇓,⇓). This is typically done for theoretical reasons, because subsequent statistical procedures either require data that are normally distributed, or are substantially simplified if the data are normally distributed (Hoyle 1973). A common transformation that has been used to convert data into a new variable that is at least ‘nearly normal’ is the Box–Cox transform (Equation 4). Typically, *λ* is adjusted until the resulting distribution has a skewness of 0 and a kurtosis or 3 (or nearly so; Joseph & Bhaumik 1997; Box & Cox 1964,⇓; 0 and 3 are the expected values of these parameters for a normal distribution). A maximum likelihood objective function can be maximized: (20) to determine the optimal value of *λ* that produces a transformed distribution with skewness and kurtosis closest to 0 and 3, respectively (Joseph & Bhaumik 1997). Computer software in the form of a MATLAB procedure (BOXCOX.M) is available that determines the maximum likelihood value of *λ* for a Box–Cox transform that converts data into a newly transformed variable with a distribution that comes closest to normality (this procedure can be downloaded from the author's website at http://www.acadiau.ca\∼cstanley\software.htm).

Unfortunately, most geochemical data derive from several geological sources. Because multiple rock types or geological materials may be represented in a geochemical survey dataset, the frequency distribution of the geochemical data is commonly multi-modal (Sinclair 1976; Stanley 1988,⇓). As a result, no matter what transformation is used, it is generally impossible to obtain a truly normally transformed distribution, unless the transformation is not monotonic (in which case the fundamental structure of the data is destroyed, and the information in the data is significantly altered). Application of the Box–Cox transform to the soil Cu concentration data from the Daisy Creek stratabound Cu–Ag prospect, Montana (Stanley 1984) illustrates this problem ( Fig. 1). The frequency distribution of this dataset has been modelled using probability plot analysis (Sinclair 1976; Stanley 1988,⇓), and is interpreted to consist of a mixture of three lognormal populations ( Fig. 2), with means and mean ± one standard deviation concentrations (in square brackets) of 26 [17, 38], 78 [55, 111] and 313 [198, 496] ppm that comprise 71, 19 and 10% of the data, respectively. These component distributions can be related spatially to background, chalcopyrite mineralized and chalcocite+bornite-mineralized zones at the prospect, and so have validated geological causes (Stanley 1984; Sinclair 1991,⇓).

Application of the maximum likelihood Box–Cox transform to these data significantly reduces the skewness and kurtosis of this soil concentration variable (from 3.830 and 17.595, to 0.161 and −0.564, respectively). However, even though these parameters can be made more consistent with the expected values for a normal distribution (0 and 3, respectively) by adjusting λ to −0.58 using the maximum likelihood approach of Joseph & Bhaumik (1997), the resulting distribution is still multi-modal, has a non-ideal skewness and kurtosis, and exhibits other significant departures from normality ( Fig. 3). As a result, the frequency distribution from this transformation in no way approximates a normal distribution, even though the transform employed is designed to create a distribution approximating normality. Nevertheless, the transform radically reduced the skewness of the data, because it stretched the data in concentration ranges where data frequencies are high, and compressed the data in concentration ranges where data frequencies are low ( Fig. 4).

## Unitized power transformation to maximize data variance

The above motivations for transforming geochemical data are not the only ones that endow optimal characteristics to the resulting variables. Another reason for transforming data is to increase the amount of information revealed by the data. In geochemical applications, this is analogous to increasing geochemical contrast. Evaluation of geochemical concentration data in mineral exploration or environmental applications typically benefits from results that exhibit high geochemical contrast (Rose *et al*. 1979; Levinson 1980,⇓). High geochemical contrast makes patterns more easily and accurately interpreted, and makes results more compelling. High geochemical contrast can primarily be achieved *a priori* through prudent selection of sampling and analytical strategies (determined via orientation surveys) that result in geochemical data that emphasize compositional differences and thus robustly respond to the anomalism of interest (Rose *et al*. 1979; Levinson 1980; Chao 1984; Hall 1992; Hall & Bonham Carter 1998,⇓,⇓). However, certain numerical procedures (data transformations) can also be used to enhance the contrast of geochemical concentration data that do not exhibit high contrast *a posteriori*, even after prudent selection of appropriate sampling and analytical procedures. These transformations are not typically used in geochemical data evaluation, but are quite common in remote sensing data processing (Barrett & Curtis 1992; Richards 1993; Verbyla 1995,⇓,⇓). A variety of transformations (e.g. contrast stretching, linear stretching, histogram equalization, etc.) are employed in GIS applications to ‘stretch’ data (increase its variance) so that features not initially evident can be discerned.

Unfortunately, some of these data transformations are not monotonic functions, and thus fundamentally alter the relative relationships between concentrations. Others, with complicated functional forms, are used for purely empirical purposes, and thus may impart undesirable features to the data. As a result, the best transformations for enhancing geochemical contrast are those with simple functional forms, such as the power transformation (Equation 3).

One characteristic of the power transformation is that the variance of the transformed result can be increased through application of a larger power (λ), provided that the dataset contains nominal concentrations greater than one. This is because, under this proviso, the maximum value in the dataset increases as λ increases. This feature makes a power transform a poor choice for enhancing geochemical contrast unless certain constraints can be applied, because no maximum variance exists if any raw data value exceeds unity.

However, if data fall within the interval 0 ↔ 1, then every new variable resulting from transformation by a positive power (within the interval 0 ↔ ∞) will also be within the interval 0 ↔ 1. As a result, if the data are first rescaled to within the interval 0 ↔ 1, the resulting standard deviations of different power-transformed variables derived from data within the interval 0 ↔ 1 can be compared.

Note that rescaling the data to within the 0 ↔ 1 interval can also be accomplished by expressing the concentration as a proportion; however, this may not result in a rescaled variable that exhibits the full dynamic range across that interval (e.g. Fe_{2}O_{3} concentrations from a banded iron formation could exhibit proportions ranging across a relatively small range within the 0 to 1 interval).

For data within the interval 0 ↔ 1 raised to the power λ, it can be shown that as λ approaches 0, the variance of the power-transformed variable also approaches 0, because all power-transformed values approach 1; similarly, as λ approaches ∞, the variance of the power-transformed variable also approaches 0 because all power-transformed values approach 0. Because the variance of the original data is positive (a positive variance exists for the power transform when λ=1, assuming that the original data exhibit more than one value), there must be some value of λ where the maximum variance in the power-transformed variable is obtained (C. Leon, 2004, pers. comm.). As a result, it is a simple process to determine the power (λ) that causes the power-transformed variable to exhibit maximum variance (e.g. using the ‘solver’ macro in Excel®). The power-transformed variable with maximum variance exhibits the maximum geochemical contrast, and consequently potentially reveals the most information embedded within the data.

It should be noted that rescaling, through subtraction of the mean and division by the range, requires some adjustment to ensure that the minimum and maximum rescaled values do not equal 0 and 1. This is because as λ goes to 0 or ∞, 0^{λ} ≠ 1^{λ}, and thus if any rescaled value equals 0 or 1, the associated standard deviation will not go to 0 because the power-transformed values will converge on 0 and 1. As a result, the rescaling transformation: (21) where *ε* is a small number (say, 0.001), ensures that the rescaled values remain within the 0 ↔ 1 interval. This transformation is ‘affine’ and so retains the original relationships between the variable values.

Furthermore, if the original dataset contains a small number of positive or negative outliers, the power that maximizes the variance of the unit interval rescaled variable will tend to be very large (close to ∞) or very small (close to zero), respectively. This may unduly stretch or compress the data during transformation, resulting in variations in the new variable that are statistically insignificant, or the loss of resolution of statistically significant variations. This problem can be avoided by excising the outliers from the dataset for the purpose of identifying the power that maximizes the geochemical contrast, and then applying the resulting transformation to these excised samples to re-include them in the dataset for subsequent interpretation.

Note that the maximum standard deviation obtainable using a power transform for an initially mono-modal distribution is less than the standard deviation of a uniform distribution on that interval (equal to (1−0)/=0.288675), which is the maximum theoretical standard deviation obtainable from a mono-modal frequency distribution on that interval. Unless the initial distribution is significantly multi-modal, this establishes a maximum value that the resulting standard deviation can reach.

An example of the application of the ‘maximum data variance’ transformation is presented in Figures 5, 6 and 7 using the Daisy Creek soil sample Cu concentrations (Stanley 1984). After unitizing the concentrations to within the interval 0 ↔ 1 according to Equation 21, the resulting distribution has a standard deviation of 0.128 (with a mean of 0.068), and a skewness and kurtosis of 3.830 and 17.595, respectively.

After comparing the variances produced by different positive powers (*λ*), the power that maximizes the variance of the resulting power-transformed distribution is 0.44 (Fig. 5). This transformed distribution has a standard deviation of 0.248 (with a mean of 0.049), which is larger than the original, unitized distribution standard deviation (of 0.128).

The resulting power-transformed distribution for the Daisy Creek soil survey is presented in Figure 6, and has a skewness and kurtosis of 2.069 and 4.519, respectively. As a result, it is clearly not normally distributed. However, it has been stretched monotonically within the interval 0 ↔ 1 (Fig. 7) such that its variance is maximized. As a result, this transformed variable exhibits an alternative optimal property, that of maximum geochemical contrast.

Use of the maximum variance transformed values in data analysis and presentation may thus afford substantial advantage. Statistical tests may be more powerful, and geographic trends may be more compelling when the geochemical contrast is enhanced. As a result, use of this transformation before data analysis or presentation may facilitate identification of additional information in the data.

Other examples of transformation for this purpose have been presented in Stanley *et al*. (2003); Moore *et al*. (2003),⇓. Computer software in the form of a MATLAB procedure (MAXVAL.M), that, like the Excel® ‘Solver’ macro, determines the power producing the maximum variance in the transformed variable, but which also produces a number of additional graphs to validate the results, can be downloaded from the author's website (http://www.acadiau.ca\∼cstanley\ software.htm).

### Data presentation

Presentation of the above transformed data in geographic space may lead to recognition of spatial patterns that can be associated with geological features of interest. However, like the selection of an appropriate transformation, selection of an appropriate format to present the transformed concentration data is very important. Geochemical data are typically punctual data, collected from finite points in space. Most geochemical samples are collected from small volumes, but the results of geochemical analysis of these samples are typically expected to represent significantly larger regions of a survey area (e.g. on a square soil grid, an actual sample may be derived from a single soil pit of 25 cm × 25 cm area, but the sample spacing may be 50 m × 50 m). As a result, many geochemical concentrations do not represent an integrated signal from across a large sample region, but rather are typically grab samples from specific finite locations.

Nevertheless, experience has shown that geochemical samples collected in this way may exhibit discernible spatial patterns. Unfortunately, there is no theoretical reason or guarantee for any spatial patterns to exist *a priori* (e.g. some surveys result in ‘salt-and-pepper’ type patterns reflecting measurement of a geochemical variable that lacks regional correlation). As a result, because geochemical data are not fundamentally regionalized, and this may be in part due to inadequate sample support (either the sizes of the samples are too small, or the samples are not truly representative of the areas from which they have been collected; Clark 1979; Sinclair & Blackwell 2002,⇓), contouring of geochemical data is generally not justified before spatial examination proves that patterns or trends actually exist. Consequently, an alternative method for displaying geochemical data to allow recognition of a spatial trend, other than contouring, is necessary so that contouring is justified. A way to examine geochemical data in space without employing contouring involves the use of bubbleplots.

Bubbleplots are diagrams composed of circles (or bubbles) located at sample sites; the sizes of the bubbles are proportional to the sample values. Typically, the sample values are proportional to the bubble diameters or to the bubble areas (e.g. both of these options are available in the bubbleplot facility in Microsoft Excel^{®}). Figure 8 presents two rows of bubbles; those in the top row have bubble diameters proportional to the values 1, 2, 4 and 8, whereas those in the bottom row have bubble areas proportional to these same values. Unfortunately, if the values are proportional to the bubble areas (bottom row, Fig. 8), the bubble sizes do not appear to reflect the fact that the associated values successively double from left to right. This is because the human eye (and brain) is more sensitive to a length (one dimension) than an area (two dimensions; Treisman & Gormican 1988; Hoffman 1998,⇓). The bubbles in the bottom row do not exhibit a size contrast commensurate with the numerical contrast of the associated values. In fact, the bubble diameters in the bottom row actually increase in proportion to the square root of the associated values. Because human perception is length-based rather than area-based, use of bubbleplots with bubble areas proportional to sample values is not recommended, as it will result in plots that lack accurate representations of geochemical contrast (there appears to be less contrast than actually exists).

Two bubbleplots with bubble diameters and areas that are proportional to sample concentrations are presented in Figures 9 and 10. These display the Cu concentrations from the Daisy Creek soil survey (after rescaling to the interval 0 ↔ 1), which have a frequency distribution that is highly positively skewed (Stanley 1984). As a result, *a priori* expectation of the spatial pattern defined by these data should involve a small number of large bubbles and a large number of small bubbles (reflecting the large skewness in the data). Figure 9, where the concentrations are proportional to bubble diameter, illustrates this feature precisely. In contrast, Figure 10 implies that the original data are much less positively skewed, and so does not appear to be consistent with the underlying distribution of the data (this is only because the human eye is less sensitive to differences in area). As a result, this example illustrates that human perception is more sensitive to dimension (or length) than to area, and thus bubbleplots employed in geochemical applications should have bubble diameters that are proportional to the geochemical variable to be investigated.

Additional bubbleplots depicting the transformed results from the Box–Cox transformation (Figs 3 and 4), and the ‘maximum data variance’ transformation (see Figs 6 and 7) for the Daisy Creek soil survey Cu concentrations (with values proportional to bubble diameters) are presented in Figures 11 and 12. These bubbleplots depict the underlying geochemical pattern to greater and lesser degrees, and the differences in the patterns are a function of the effects of the different transformations employed.

All of these bubbleplots (Figs 9, 11 and 12) present both an objective representation of the data, the different bubble diameters proportional to the raw or transformed data, and a subjective representation of the data, the different coloured bubbles (light-, medium- and dark-grey) that correspond to the concentration ranges defined by thresholds determined from the probability plot analysis depicted in Figure 2. This dual representation bestows these bubbleplots with both an interpretation of population membership (the bubble colours) and the means to evaluate the extent to which this interpretation is founded by the data. As a result, these bubble plots facilitate rigorous scientific inquiry and hypothesis testing in the interpretation of the geochemical patterns under examination.

When comparing these bubbleplots, the reader is reminded to resist founding any judgement regarding the relative success of these transformations based on the coherence of the pattern observed. Rather, judgement should be based on the characteristics that the various transformations imparted to the transformed data.

## Conclusions

Transformation of concentration data into a new variable which exhibits a normal, near-normal or even symmetric frequency distribution is not generally possible because of the multi-modal character of many/most geochemical datasets. As a result, transformation for this purpose is generally misguided and unlikely to be successful.

However, transformation of concentration data to obtain new variables that exhibit other optimal properties is warranted on a variety of theoretical grounds. Use of a power transform to maximize the variance of concentration data that has been rescaled to within the interval 0 ↔ 1 is: (1) theoretically desirable, (2) can be undertaken using a simple spreadsheet program, and (3) provides a resulting variable that exhibits the maximum geochemical contrast, and thus affords the geochemist with the best opportunity to identify previously unrecognized information in the data.

In the ‘maximum data variance’ transform, the nature of the transformed variable's resulting frequency distribution is not a criterion used to assess whether the transform is appropriate or not. This is because the desired characteristic sought, that of ‘maximum data variance’, is achieved *a priori* through use of the associated transform. Thus, geochemists can avoid the temptation of employing circular logic to justify their data analysis. The transformation employed is not justified based on the interpretability of the results of the data analysis procedure, and thus is scientific. As a result, use of a ‘maximum data variance’ transform will result in a more objective evaluation of geochemical data, and provide new opportunities to recognize new characteristics in the data that have not been previously identified. Evaluation of the frequency distributions of geochemical data should routinely include examination of the results of this data transformation procedure. The resulting transformation may be effectively represented using bubbleplots with diameters that are proportional to the transformed values.

## Acknowledgements

This paper benefited from helpful discussions with Dr Carlos Leon, Department of Mathematics and Statistics, and Dr Darlene Brodeur, Department of Psychology, Acadia University. It has been supported by an NSERC Discovery Grant to the author.

- © 2006 AAG/The Geological Society of London