## Abstract

Stream sediment geochemical data are usually subjected to methods of multivariate analysis (e.g. factor analysis) in order to extract an anomalous geochemical signature (factor) of the mineral deposit-type sought. A map of anomalous geochemical signature can be used as evidence, in combination with other layers of evidence, for mineral prospectivity mapping (MPM). Because factor analysis may yield more than one factor in a stream sediment dataset, it raises the challenge of how to recognize the factor that best indicates presence of the mineral deposit-type sought. In addition, MPM is faced with the challenge of how to assign weights to classes in a geochemical evidence map. Accordingly, a new approach is discussed in this paper for the extraction of significant anomalous geochemical signature of the mineral deposit-type sought and for assigning weights to anomaly classes in a geochemical evidence map. In this approach, we used a staged factor analysis and then applied a logistic function to transform factor scores representing an anomalous geochemical signature in order to derive a map of geochemical mineralisation prospectivity indices (GMPI) as a spatial evidence layer for MPM based on the theory of fuzzy sets and fuzzy logic. The GMPI is a fuzzy weight in the [0,1] range. We demonstrate the application of the GMPI for mapping prospectivity for Mississippi valley-type fluorite deposits in the Mazandaran province, north of Iran, which is a greenfield area.

- Staged factor analysis
- logistic function
- fuzzy weights
- geochemical evidence map
- mineral prospectivity mapping

Mineral prospectivity mapping (MPM) deals with the analysis and integration of spatial evidence layers derived from individual multi-source datasets to map and rank prospective areas for further exploration of the mineral deposit-type sought. The analysis of some spatial evidence layers can be independent of methods for MPM. For example, a map or maps of geochemical anomalies are among the common spatial evidence layers used in MPM, which can be derived independent of methods for MPM. To assign weights to classes of spatial evidence in order to create predictor maps, either knowledge- or data-driven methods for MPM are used (Bonham-Carter 1994; Carranza 2008). With knowledge-driven methods, which are appropriate in greenfield (or poorly explored) areas, subjective judgment of an expert analyst is employed in assigning weights to classes of a spatial evidence layer. The theory of fuzzy sets and fuzzy logic (Zadeh 1965) has been successfully applied in knowledge-driven MPM.

In fuzzy logic MPM, the fuzzy weights assigned to spatial evidence must reflect realistic spatial associations between spatial evidence and mineral deposits of the type sought. Fuzzification, or assignment of fuzzy weights, is the most important stage in fuzzy logic MPM (Carranza 2008). Recent examples of fuzzy logic MPM are found in D’Ercole *et al*. (2000); Knox-Robinson (2000); Porwal & Sides (2000); Venkataraman *et al*. (2000); Carranza & Hale (2001); Porwal *et al*. (2003, 2004, 2006), Tangestani & Moore (2003); Ranjbar & Honarmand (2004); Eddy *et al*. (2006); Rogge *et al*. (2006) and Nykänen *et al*. (2008). In these studies, fuzzy weights of different classes in an evidential map have been defined based on expert judgment, which is inherently subjective. The topic of the present paper is about (a) analysis of stream sediment geochemical data to derive a multi-element anomalous signature of the mineral deposit-type sought and (b) objective assignment of fuzzy weights to classes of derivative geochemical values for integration with other spatial evidence layers in MPM.

To derive multi-element anomalous signatures, multivariate analyses are especially useful because the relative importance of the combinations of geochemical variables can be evaluated (e.g. Garrett & Grunsky 2001; Carranza 2004, 2010*c*; Ali *et al*. 2006; Grunsky *et al*. 2009). However, recognizing multi-element anomalous signatures in stream sediment geochemical data for exploration of certain mineral deposit-types is challenging because mineralisation is, aside from being rare, just one of the factors that influence the variability of elements in stream sediments (Bonham-Carter *et al*. 1987; Carranza & Hale 1997). Nevertheless, factor analysis (FA), as one of the methods of multivariate analysis, has been widely used for interpretation of stream sediment geochemical data (e.g. Reimann *et al*. 2002; Kumru & Bakac 2003; Helvoort *et al*. 2005). That is because FA aims to explain variations in a multivariate data-set by a few factors as much as possible and to extract hidden multivariate data associations (Tripathi 1979). Recently, Yousefi *et al*. (2012) demonstrated the efficiency of a staged FA (they called it stepwise FA) over the ordinary FA in extracting significant multi-element geochemical signature of the mineral deposit-type sought.

The next challenge, after extraction of multi-element geochemical evidence, is how to assign objective, rather than subjective, fuzzy weights to values representing that evidence in order to create a geochemical predictor map for integration with other predictor maps for MPM based on the theory of fuzzy sets and fuzzy logic. In this regard, Yousefi *et al*. (2012) have used a logistic function to convert the factor scores (FSs) representing the multi-element anomalous signature into what they call geochemical mineralisation probability indices (GMPI). Logistic function has also been used in logistic regression (e.g. Carranza 2002; Daneshfar *et al*. 2006) to transform unbounded values into the [0,1] range. The values of GMPI fall in the [0,1] range, which is consistent with the concept of probability, and can be used as fuzzy weights for individual stream sediment samples with respect to the mineral deposit-type sought. However, in this paper we consider the GMPI to be geochemical mineral prospectivity indices because it is not strictly probability.

Considering the foregoing challenges, we performed staged FA for enhancement and recognition of multi-element anomalous signature in stream sediment geochemical data. Then, we used the GMPI for generation of a fuzzified geochemical evidence map, which can be combined with other evidential maps for MPM based on the theory of fuzzy sets and fuzzy logic. To demonstrate the procedure for deriving GMPI and to illustrate its usefulness in fuzzy logic MPM in greenfield areas, we chose an area in the Mazandaran province in northern Iran as a case study. We used multi-element (Zn, Pb, Ag, As, Sb, Ba, Sr, Cu, Mo, Sn, W, V, Cr, Co, Mn, Ni, Bi and Au) concentration data from 502 samples of the <80-mesh (<177 -μm) fraction of stream sediments, collected, analysed, and prepared by the Geological Survey of Iran (GSI) (Korehie 2002). This study focuses on Mississippi valley-type (MVT) fluorite deposits in the Mazandaran province.

## MVT Mineralisation in the Study Area

The study area, covered by the 1:100 000-scale geological map of Polsefid area of *c*. 2500 km^{2}, is part of the eastern district of central Alborz zone in northern Iran. In the eastern district of central Alborz, several known F-Ba-Pb-Zn deposits have been mined or are currently being mined for fluorite. In that district, there are eight important fluorite mining areas, namely, from the largest to the smallest, Pajimiana, Kamarposht, Sheshroodbar, Deraseleh, Era, Emaft, Sarcheleshk and Ashchal. The Pajimiana fluorite mine is the only one located in the Polsefid map (Fig. 1), although there are several minor occurrences of fluorite, Pb and Zn mineralisation in the area.

Rocks that outcrop in the study area consist of different kinds of igneous (plutonic and volcanic) and sedimentary rocks (Vahdatidaneshmand 2003), but the latter are predominant. The ages of lithological units in the area vary from Precambrian to Recent (Fig. 1). The Elika Formation (limestone and dolomite of Triassic age) and the Tizkooh Formation (*Orbitolina*-bearing limestone of Lower Cretaceous age) are the most important lithostratigraphic units hosting fluorite mineralisations in the eastern district of central Alborz zone (Vahabzadeh 2008). The fluorite mineralisation in the eastern district of central Alborz occurs as lenses or vein in one of two fault zones (Ghazban & Moritz 2001; Vahabzadeh 2008).

Several studies indicate that the fluorite deposits in the eastern district of central Alborz zone can be considered MVT deposits (Alirezaee 1988; Gorjizad 1996; Shariatmadar 1999; Ghazban & Moritz 2001; Vahabzadeh 2008). Comparisons of the fluorite deposits in the eastern district of central Alborz with various MVT fluorite deposits worldwide (e.g. Hill *et al*. 2000; Cardellach *et al*. 2002; Pannalal *et al*. 2003; Partida *et al*. 2003; Sasmaz *et al*. 2005; Bouch *et al*. 2006; Daneshfar *et al*. 2006; Sánchez *et al*. 2009; Souissi *et al*. 2010) also indicate that the former can be considered MVT deposits. Among the various characteristics of MVT fluorite deposits, their common F-Ba-Pb-Zn element/mineral associations provide reference for analysis and interpretation of geochemical exploration data to define targets for exploration of such deposit-type.

## Methods and Results

Because distribution maps of element concentrations are generated from point samples, geochemical mapping should make use of an appropriate scale to represent sampling density (Zuo 2012). Hence, a proper cell or pixel size should be used in GIS-based geochemical mapping. Hengl (2006) recommended that an appropriate cell size can be based on SN × 0.0005. SN is the scale number and equal to
, in which *A* is total area of a map, and *n* is the total number of observations. In this study, *A* is 25 × 10^{8} m^{2} and *n* is 502, and thus SN is 223; accordingly, the cell size is 111 m (approximately equal to 100 m). Therefore, we used a pixel size of 100 m × 100 m in all of the maps in this study.

### Staged factor analysis

Just like many other statistical techniques, FA assumes that data have normal (or symmetric) distribution. However, it is now well known that geochemical exploration data almost never show a normal distribution (Reimann & Filzmoser 2000). In addition, stream sediment geochemical data are compositional data, meaning that they represent a closed number system in which individual variables are not independent of each other but are parts of a whole (Filzmoser *et al*. 2009*a*; Carranza 2011). A proper normalization procedure must first be applied to data to form a normal distribution (e.g. Reimann & Filzmoser 2000; Reimann *et al*. 2002). Filzmoser *et al*. (2009*a*) examined both logratio- and log_{e}-transformed geochemical data and proved that logratio-transformation can yield approximately symmetric data distributions in comparison to log_{e}-transformation. Several researchers (e.g. Aitchison & Egozcue 2005; Reimann *et al*. 2008; Templ *et al*. 2008; Filzmoser *et al*. 2009*a*) have discussed the theoretical advantage of isometric logratio (ilr) transformation (Egozcue *et al*. 2003) over other logratio-transformations for the statistical analysis of geochemical data-sets. Carranza (2011) has demonstrated that, compared to log_{e}-transformation, ilr-transformation of stream sediment geochemical data enhances the recognition of anomalous multi-element associations reflecting the presence of mineralisation.

In this paper, we used the ilr-transformed values of the multi-element stream sediment geochemical data in a classical (non-robust) FA. The transformations and FA analyses, discussed in this paper, can be carried out with a spreadsheet and/or current statistical software. Furthermore, software packages have also been developed (Thió-Henestrosa & Martín-Fernández 2005; Van den Boogaart & Tolosana-Delgado 2008) and have been made available to the public (R Development Core Team 2008; Templ *et al*. 2009; Van den Boogaart *et al*. 2009) for proper statistical analysis of compositional data such as stream sediment geochemical data. For a data-set of ilr-transformed values, standard techniques, such as classical estimation of correlation matrix (classical FA), can also be used to study the relation between all variables in the multivariate space based on correlations (Filzmoser *et al*. 2010). In this regard, we examined both derived FSs and back-transformed FSs to original values for comparing the resulting maps of FSs, although in many publications the back-transform of FSs were not applied on the classical FA (e.g. Borovec 1996; Chandrajith *et al*. 2001; Reimann *et al*. 2002; Kumru & Bakac 2003; Helvoort *et al*. 2005).

The main aim of FA is to extract a few ‘factors’ to enhance interpretability of multivariate data (e.g. Garrett & Grunsky 2001; Reimann *et al*. 2002; Treiblmaier & Filzmoser 2010); Reimann *et al*. (2002) and Treiblmaier & Filzmoser (2010) mentioned that, in FA, the maximum likelihood method (ML) and principal factor analysis (PFA), which works essentially like principal component analysis (PCA) but with a reduced correlation or covariance matrix, are the main methods for extracting the common factors. Treiblmaier & Filzmoser (2010) pointed out that researchers need to make decisions about certain options in performing FA. These options, which analysts are free to change to obtain meaningful output, include robust versus non-robust procedures, data transformations, factor solutions to correlation matrix, factor extraction methods (e.g. PCA, PFA and ML), factor rotation methods (e.g. orthogonal, varimax, and quartimax), a criterion for number of factors to be extracted (e.g. eigenvalue, scree plot, percentage of variance), and other options like using correlation or covariance matrix. Details of robust and non-robust (classical or ordinary) FA can be found in Reimann & Filzmoser (2000); Reimann *et al*. (2002); Helvoort *et al*. (2005); Filzmoser & Hron (2009); Filzmoser *et al*. (2009*a*, 2009*b*, 2009*c*, 2010) and Treiblmaier & Filzmoser (2010). In either case, PCA can be used as a method of FA to extract principal components for detecting hidden multivariate data structures and reducing the number of variables (Filzmoser *et al*. 2009*b*).

Furthermore, in FA, a threshold value for the minimum loading criterion for variables should be selected (Helvoort *et al*. 2005). For this purpose, loadings in the range of 0.3 to 0.6 have been used as choices for the threshold value (Treiblmaier & Filzmoser 2010). Because the absolute value of 0.5 is a medium loading value, loadings for some variables generally have bimodality that can make difficult the interpretation of more than a single high loading per variable (e.g. Borovec 1996; Chandrajith *et al*. 2001; Helvoort *et al*. 2005). In this regard, some researchers have used loadings higher than 0.60 as threshold (e.g. Borovec 1996; Chandrajith *et al*. 2001; Helvoort *et al*. 2005; Yousefi *et al*. 2012). Furthermore, Helvoort *et al*. (2005) used the minimum loading 0.6 as the threshold to distinguish high-contributed elements of factors. Some authors suggest a factor loading higher than 0.60 (e.g. Treiblmaier & Filzmoser 2010) because a higher threshold value for loading is more reliable to extract contributed elements in each factor.

Therefore, to perform FA, we used the classical PCA (non-robust) for extracting the common factors, we used the varimax method (Kaiser 1958) for rotation, and we retained factors with eigenvalues of >1 for interpretation. In addition, we used 0.6 as threshold value for loadings in FA to extract significant multi-element geochemical signature of the deposit-type sought. We used these options in both classical (ordinary) FA and staged FA for proper comparison to demonstrate the superiority of the latter over the former. We used a staged FA, up to a fourth-stage FA (Table 1), to extract significant multi-element anomalous geochemical signatures of the MVT-fluorite deposits. The general procedure of the staged FA includes two main phases (Fig. 2): first phase is for extraction of ‘clean’ factors and second phase is for extraction of a significant multi-element anomalous signature of the mineral deposit-type sought to calculate reliable loadings and FSs. Each of the main phases of the staged FA may comprise sub-phases (hereafter referred to as stage) depending on geochemical data and the mineral deposit-type sought. Each of the main phases and the stages they include is explained below to demonstrate their application using the data-set of the study area.

### Extraction of clean factors

In this phase, FA is first carried out on the data-set of all elements. This is first-stage FA. In the results of the first-stage FA, elements that do not contribute significantly to any factor with respect to the selected threshold value for loadings are excluded from the data-set (cf. Reimann *et al*. 2002). These elements can be considered to comprise ‘geochemical noise’. After excluding 'noisy' elements, a second-stage FA is carried out on the remaining data to generate new factors (Helvoort *et al*. 2005). If there are some elements that do not contribute significantly to any factor (i.e. with loadings lower than threshold) in the second-stage FA, they are excluded from the data-set and a third-stage FA is performed using the remaining data, and so on. If there are no 'noisy' elements in any stage of FA, the extracted factors are considered clean factors. For the case study here, the first- and second-stage FA (Table 1) are the first main phase to recognize and filter noise elements for obtaining clean factors.

For the data-set of this paper, in the first-stage FA (Table 1), factor F1 represents an As-Ba-Cu-W-V-Cr-Co-Mn-Ni-Bi association, factor F2 a Zn-Pb-Ag-Sb-Mo association, and factor F3 a Ni-Sn association. Because of the presence of indicator elements (i.e. Zn, Pb, Ag, Sb), the factor F2 is considered a multi-element geochemical signature reflecting the presence of MVT-fluorite deposits. Hence, locations with high FSs of F2 in the first-stage FA can be used to define exploration targets for MVT-fluorite deposits. However, we consider that the factor F2 in the first-stage FA is not optimal because it excludes Ba; but, this element is an important indicator of MVT-type fluorite deposits (cf. Hill *et al*. 2000; Cardellach *et al*. 2002; Pannalal *et al*. 2003; Partida *et al*. 2003; Sasmaz *et al*. 2005; Bouch *et al*. 2006; Daneshfar *et al*. 2006; Sánchez *et al*. 2009; Souissi *et al*. 2010). In addition, in the first-stage FA, there are two elements, Sr and Au, that do not contribute significantly to any factor; hence, they are not used in the second-stage FA. After omitting Sr and Au, the contribution of Ba to F2 in the second-stage FA is enhanced (Table 1). Because there are no 'noisy' elements, the factors F1, F2, and F3 in the second-stage of FA (Table 1) are considered clean factors.

### Extraction of significant multi-element anomalous signature

In addition to the problem that some elements do not contribute significantly to any factor, some other problems remain because of the mathematical basis of FA as follows. In calculating FSs for each sample (here FS_{F1}, FS_{F2} and FS_{F3}), elements with negative loadings on a factor have a negative influence on FS. This means that, in a certain factor, elements with a positive association are distinguished, but there are some other elements having a positive association with other factors or have a negative association with any of the factors even if such elements were used in calculating the FS of each sample per factor based on the mathematical function of FA. Such a problem can be noted from the columns of the first- and second-stage FA in Table 1. For example, Cr is positively associated with F1 but is negatively associated with F2; thus, FS_{F2} is negatively influenced by Cr, suggesting that FS_{F2} for every sample carries geochemical noise from Cr (as well as W) and, thus, is not optimal for defining exploration targets for MVT-fluorite deposits. To address this problem, the best indicator factor with respect to the mineral deposit-type sought is selected as a key factor. Then, FA is carried out again using data of only elements with significant positive loadings on the key factor in the preceding-stage FA (Table 1) because a low number of extracted factors gives optimum results in terms of interpretability (Reimann *et al*. 2002).

Therefore, factor F2 in the second-stage FA (Table 1) is considered to reflect the presence of MVT-fluorite deposits better than factor F2 in the first-stage FA because the former is a clean factor whereas the latter is not. Hence, stream sediment samples with high FS_{F2} in the second-stage FA (Table 1) are used, as a key factor based on geochemical criteria, to define exploration targets for MVT-fluorite deposits (cf. Hill *et al*. 2000; Cardellach *et al*. 2002; Pannalal *et al*. 2003; Partida *et al*. 2003; Sasmaz *et al*. 2005; Bouch *et al*. 2006; Daneshfar *et al*. 2006; Sánchez *et al*. 2009; Souissi *et al*. 2010). However, F1 and F3 in the second-stage FA are not proper multi-element signatures of MVT-type of fluorite deposits, even if each of these factors is positively associated with elements that either have a negative influence or a small positive influence in the calculation of FS_{F2}. It implies that the second-stage FS_{F2} for every sample still carries ‘geochemical noise’ from non-indicator elements. To filter further that geochemical noise, only elements with positive loadings on F2 in the second-stage FA were used in the third-stage FA (Table 1).

The third-stage FA shows that Mo is not a good indicator of MVT-type fluorite deposit (since it does not show high loading) although it has significant positive loading on F2 in the second-stage FA. This shows that a staged FA also allows recognition of false indicators. By excluding Mo in the fourth-stage FA, the result is a Pb-Sb-Ba-Zn-Ag association with all elements having significant positive loadings (Table 1). All of the elements that have positive loadings on F1 in the fourth-stage FA are indicators of MVT-type deposits as shown in many studies of this deposit-type (e.g. Hill *et al*. 2000; Cardellach *et al*. 2002; Pannalal *et al*. 2003; Partida *et al*. 2003; Sasmaz *et al*. 2005; Bouch *et al*. 2006; Daneshfar *et al*. 2006; Sánchez *et al*. 2009; Souissi *et al*. 2010). In addition, according to Table 1, the total variance accounted for by the multi-element anomalous signature generally increased from *c*. 19% (F2 in the first-stage FA) to *c*. 69% (F1 in the fourth-stage FA). This comparison was made because a larger total variance should be retained as much as possible (e.g. Helvoort *et al*. 2005; Treiblmaier & Filzmoser 2010). Moreover, we compared a scatter plot of the input element values (before removing 'noisy' elements) versus FS_{F2} of the first-stage FA and a scatter plot of the input element values (after recognition and exclusion of 'noisy' elements) versus FS_{F1} of the fourth-stage FA. This comparison demonstrated that the input element values have poor correlations with the FS_{F2} of the first-stage FA whereas the input element values have nearly perfect correlations with the FS_{F1} of the fourth-stage FA as well as with each other. These illustrate that the staged FA was efficient in filtering geochemical noise, allowing for enhancement of the anomalous geochemical signature of interest. Thus, areas with high FS_{F1} of fourth-stage FA are considered reliable targets for exploration of MVT-fluorite deposits.

In the staged FA, the key elements in F2 of the first-stage FA became the key elements in F1 of the fourth-stage FA and interpretation of the latter factor was rather straightforward based on geochemical criteria (e.g. Hill *et al*. 2000; Cardellach *et al*. 2002; Pannalal *et al*. 2003; Partida *et al*. 2003; Sasmaz *et al*. 2005; Bouch *et al*. 2006; Daneshfar *et al*. 2006; Sánchez *et al*. 2009; Souissi *et al*. 2010). The results of the staged FA show that the number of input elements also influences the output factors. In a staged FA, FSs representing the likelihood of the presence of the mineral deposit-type sought can calculated per stream sediment sample in every stage and further filtering of ‘noisy’ elements can be carried out as necessary. Through a staged FA, loadings and FSs derived in the first-stage can be modified in each of the following stages (here, up to four stages) until a reliable multivariate signature is obtained. This is because, in each stage, elements with loadings lower than the selected threshold are excluded in the succeeding stages to derive loadings and FSs for a multivariate signature of interest based on data without ‘noisy’ elements.

The spatial distributions of FS_{F2} (Zn-Pb-Ag-Sb-Mo association) derived from the first-stage FA and the FS_{F1} (Zn-Pb-Ag-Sb-Ba association) derived from the fourth-stage FA, for both original derived FSs and back-transformed FSs values, are portrayed as symbol plots in Figure 3. To compare the results of the ordinary FA (i.e. the first-stage FA) and the staged FA, we consider an FS corresponding to the 90th percentile as a threshold separating background and anomalous samples. By comparing the maps, we found that some samples classified as anomalous according to FS_{F1} derived from the fourth-stage FA (Fig. 3b) are classified as background according to FS_{F2} derived from the first-stage FA (Fig. 3a), and vice versa. However, anomaly intensity downstream of the fluorite mine is higher in Figure 3b than in Figure 3a, meaning that the number of adjacent anomalous samples based on FS_{F1} derived from the fourth-stage FA is higher than that based on FS_{F2} derived from the first-stage FA. Hence, anomalous samples based on FS_{F1} derived from the fourth-stage FA are better indicators of target areas for MVT-type fluorite deposits than those based on FS_{F2} derived from the first-stage FA. Similar results are obtained from back-transformed FSs values (Figs. 3c–3d), meaning that the number of adjacent anomalous samples (anomaly intensity downstream of the host lithology portrayed in Fig. 1) based on FS_{F1} derived from the fourth-stage FA (Fig. 3d) is higher than that based on FS_{F2} derived from the first-stage FA (Fig. 3c). Hence, a staged FA is superior to an ordinary FA for the recognition of significant multi-element anomalous signature of the mineral deposit-type sought and for obtaining reliable FSs.

Attaining reliable element loadings and associations through a staged FA, as discussed in this paper, is likely worthwhile to use in the weighted sum approach described by Garrett & Grunsky (2001) to understand dispersion and association of geochemical elements. This is because in the weighted sum approach, *a priori* knowledge is needed to assign weights and to evaluate relative importance of variables. For this purpose, the factors controlling geochemical dispersion of certain elements, which are investigated in a study area, should be well known. In this regard, in initial phases of exploration especially in greenfield areas with no or a few known mineral occurrences, it is difficult to obtain adequate information about factors of element dispersion and multi-element associations reflecting the presence of mineralisation. In such a situation, multivariate analysis like FA or PCA, as Garrett & Grunsky (2001) pointed out, can be used prior to the application of the weighted sum method to gain initial information about multi-element associations reflecting various processes.

### Logistic function: fuzzy weighting of geochemical signature

After recognizing a significant multi-element geochemical signature from the geochemical data-set, values of FSs of this indicator factor for stream sediment samples can be mapped to generate a geochemical evidential map. In this situation, according to the purpose of this study, there is one important question: how can we use the values of FSs to assign fuzzy weights to individual stream sediment geochemical samples for generating a geochemical evidential map in fuzzy logic MPM?

One way to understand a pattern (e.g. distribution of geochemical variables in an area) is to classify or categorize it (Micheli-Tzanakou 1999). A classical classification approach is to transform variables to a new data space (Berthold & Hand 2002). That is, instead of defining a non-linear function (e.g. discriminant or regression) for a set of values in their original space, defining a suitable non-linear transformation into a new space could facilitate interpretation of a pattern. This approach can be used to overcome some classification problems (Alpaydm 2004). The primary reason for such transformation is to provide a set of values with more discriminatory information and less redundancy for classification (Micheli-Tzanakou 1999). The transformation increases the chance of finding the globally optimal configuration of variables (Berthold & Hand 2002), and, classification problem can thus be solved (Fink 2007). Bishop (2006) has illustrated that the result of transforming variables using a logistic sigmoid (or S-shaped) function gains an optimal decision boundary for classification. A sigmoid function maps the whole real axis into a finite interval, e.g. [0,1] range. Furthermore, a logistic sigmoid transformation plays an important role in many classification algorithms and pattern recognition (Bishop 2006), such as statistics, neural networks, machine learning, and expert systems (e.g. Micheli-Tzanakou 1999; Berthold & Hand 2002; Alpaydm 2004; Fink 2007). Carranza (2010*b*) used a logistic function to convert maps of distances to geological features into predictor maps for MPM. In this paper, the application of a logistic transformation is demonstrated to improve the discrimination between geochemical anomalies and background.

Derived FSs per sample based on a significant indicator multi-element signature (e.g., F1 in the fourth-stage FA) can lie outside the [0,1] range. In fuzzy logic MPM, weights of classes of geochemical anomalies should lie in the range [0,1]. A logistic sigmoid function is an efficient way for transforming unbounded values into the [0,1] range for use in fuzzy logic MPM. There is a family of logistic functions (Theodoridis & Koutroumbas 2006) that can be used to transform a data-set into the [0,1] range based on the minimum and maximum data values and slope variations between them (Carranza & Hale 2002; Carranza 2008; Porwal *et al*. 2003; Yousefi *et al*. 2012). To convert values of FSs values in this study into values of GMPI in the [0,1] range, we applied the following logistic function (Yousefi *et al*. 2012):
(1)

where *FS* is the sample factor score obtained from the staged FA. Thus, the GMPI of every stream sediment sample based on the F1 (Zn-Pb-Ag-Sb-Ba) factor obtained from the fourth-stage FA is:
(2)

where *FS _{Zn-Pb-Ag-Sb-Ba}* is factor score representing multi-element indicator of MVT-fluorite deposit.

Non-linear transformation gains an optimal decision boundary between different classes of a variable for classification purposes (Bishop 2006). Hence, non-linear transformation of FSs into the [0,1] range using logistic function (Eq. 2) gains stronger discrimination between anomaly and background values in comparison to linear-transformation of FSs values into the [0,1] range. To illustrate this proposition, we plotted FS_{F1} scores obtained from the fourth-stage FA versus linear and logistic transformations of those scores into the [0,1] range (Fig. 4). As reference values, we used the median, median+2MAD and median–2MAD of the FS_{F1} scores. The MAD represents the median of absolute deviations of all data values from the data median (Tukey 1977). In exploratory data analysis, the median+2MAD can be considered a threshold separating background and anomaly (Reimann *et al*. 2005). The MAD is analogous to the standard deviation (SDEV) in classical data analysis, so the median+2MAD threshold is also analogous to the classical mean+2SDEV threshold (Rose *et al*. 1979). The median–2MAD is a value in the background population of data. For the FS_{F1} scores, the median, median+2MAD and median–2MAD are -0.04, 0.77 and -0.84, respectively (Fig. 4). For the linearly-transformed FS_{F1} scores, the median, median+2MAD and median–2MAD are 0.21, 0.29 and 0.14, respectively. For the GMPIs (i.e. values obtained by application of Eq. 1 to the FS_{F1} scores), the median, median+2MAD and median–2MAD are 0.49, 0.68 and 0.30, respectively. Incidentally, the median GMPI is close to 0.5 and is greater than median of the linearly-transformed FS_{F1} scores. However, the median+2MAD of the GMPIs is greater than 0.5 but that of the linearly-transformed FS_{F1} scores is not, which illustrates that using GMPI enhances discrimination between background and anomaly. To show further that using GMPI enhances discrimination between anomaly and background, we take the absolute difference between median–2MAD and median+2MAD. For the linearly-transformed FS_{F1} scores, that is 0.15; for the GMPIs of the FS_{F1} scores, that is 0.38.

The inflection point along the curve of logistically-transformed values (Fig. 4) is similar to an inflection point on a cumulative probability plot (Sinclair 1974, 1991), meaning that, like a cumulative probability plot, a plot of logistically-transformed values can be used to define a cutoff to separate background and anomaly. In a data-set of geochemical variables (uni-element values or derived multivariate values), most high values can be generally classified as anomalous and most low values can be generally classified as background. In this regard, the main problem is to define the discrimination boundary between geochemical classes. Bishop (2006) has illustrated that transformed variables using a logistic sigmoid function gains more optimal decision boundary for classification compared to non-transformed variables. Following Bishop (2006), we have provided here Figure 5 for further illustration of the better performance of logistic transformation (non-linear) compared to linear transformation to discriminate between background and anomaly. In this regard, Yousefi *et al*. (2012) used the location of known mineral occurrences, as testing samples, to evaluate the results of the GMPI approach and they demonstrated that logistic transformation of FSs could increase the prediction rate of MPM. Here, the GMPI approach was used in a greenfield area with only one known mineral occurrence and, thus, the evaluation of results was based on indicative geological features (e.g. host lithology) as described below.

To demonstrate further the superiority of logistic function (for the GMPI approach) over linear transformation to discriminate between geochemical anomalies and background values, we used percentile-based classes of geochemical variables (e.g. Bonham-Carter & Goodfellow 1986; Bonham-Carter *et al*. 1987; Carranza & Hale 1997; Ohta *et al*. 2005; Carranza 2010*a*; Darwish & Poellmann 2010; Yousefi *et al*. 2012). In this regard, values of FS_{F1} in fourth-stage FA corresponding to percentiles of 97.5, 95, 90, 84, 50, 40, 30, 20, and 10, and their matching linearly- and logistically-transformed (i.e. GMPI) values were used as limits of geochemical classes (Table 2). The above-mentioned percentiles have been used in several publications as thresholds to define geochemical classes (e.g. Lepeltier 1969; Levinson 1974; Sinclair 1974; Bonham-Carter & Goodfellow 1986; Bonham-Carter *et al*. 1987; Stanley & Sinclair 1987; Carranza & Hale 1997; Ohta *et al*. 2005; Carranza 2008; Carranza 2010*a*; Darwish & Poellmann 2010; Yousefi *et al*. 2012). Here, the above-mentioned percentiles were used as references to compare and evaluate the results of linear and logistic transformation for discrimination between geochemical anomalies and background populations (Table 3). Then, we performed following analyses to demonstrate that the logistic transformation gains better results to discriminate these classes in comparison to linear transformation. In stream sediment data analyses, the 90th percentile has been traditionally considered as the threshold to separate anomalous values from background values (e.g. Lepeltier 1969; Bonham-Carter & Goodfellow 1986; Bonham-Carter *et al*. 1987; Wilde *et al*. 2004; Ohta *et al*. 2005; Darwish & Poellmann 2010; Yousefi *et al*. 2012). Table 2 shows that the 97.7, 95 and 90 percentiles of linearly-transformed values are 0.33, 0.3, and 0.28, respectively, and those of logistically-transformed values are 0.78, 0.73, and 0.68, respectively. These show that, assuming that among values in the [0,1] range a value 0.5 separates high (or anomalous) and low (or background) values, weighting of anomaly classes is reliably achieved by using logistic transformation, whereas linear transformation gives unreliable results because values above the 90th percentile are empirically anomalies and should be given weights of >0.5. In addition, the 50th percentile has also been considered the threshold to separate background and anomalous in geochemical data (e.g. Lepeltier 1969; Sinclair 1974). Table 2 shows that the 50th percentile of logistically-transformed values is 0.49 whereas that of linearly-transformed values is 0.21. This also shows that weighting of anomaly classes is reliably achieved by using logistic transformation but not by using linearly-transformed values.

Aside from using class limits to demonstrate the superiority of logistic function (for the GMPI approach) over linear transformation for discriminating between geochemical anomalies and background values, we also used the concept of anomaly contrast. Rose *et al*. (1979) and Sinclair (1991) have defined anomaly contrast as the ratio of the mean of anomalous population to the mean of background population to define the contrast (C) of anomaly quantitatively. Sinclair (1991) has also defined anomaly contrast as the mean of anomalous values minus the mean of background values, particularly for log-transformed values. Adapting the approach of Sinclair (1991), we used here the absolute difference between the median values of two geochemical classes as discrimination index (DI) quantifying the difference between two geochemical classes. Hence, the higher the value of DI for two classes the better the discrimination between them. Based on the information given in Table 3, values DI for adjacent and non-adjacent geochemical classes in linearly- and logistically-transformed values are plotted in Figure 6. It is clear that logistic-transformation provides for stronger discrimination among geochemical classes than linear transformation.

To illustrate further the advantage of a map of GMPIs over a map of FSs obtained from ordinary FA and over a map of linearly-transformed FSs values, we compare maps of interpolated values of FS_{F2} from the first-stage FA, interpolated values of FS_{F1} from the fourth-stage FA, interpolated values of linearly-transformed FS_{F1} from the fourth-stage FA, and interpolated *GMPI _{MVT–fluorite}* values (transformed by using logistic function) (Fig. 7). Because this study focuses on a greenfield area where there is just one known mineral occurrence, and because the presence of stream sediment anomalies does not always mean the presence of mineral deposits (Carranza 2010

*a*), it is necessary to apply certain criteria for further verification of anomalies. Therefore, to verify and compare anomalies in Figures 7a–d, we used as criterion or spatial reference the potential host lithologies of MVT-fluorite mineralisation (i.e. the Elika and Tiz Kuh Formations) located upstream of anomalous areas. The areas labeled A to N in each of the maps in Figure 7 are references for comparing anomalies in the maps with respect to potential host lithologies. Inspection of the maps in Figure 7 vis-à-vis the geological map (Fig. 1) shows that in areas downstream of the Elika and Tiz Kuh Formations (areas labeled A to K) there are high anomaly values in the GMPI map (Fig. 7d) but generally there are no high anomaly values in the interpolated map of FS

_{F2}values map (Fig. 7a). With respect to the geological map (Fig. 1), high anomaly values in the interpolated map of FS

_{F1}values (Fig. 7b) and in the interpolated map of linearly-transformed FS

_{F1}values (Fig. 7c) have better spatial associations with the Elika and Tiz Kuh Formations compared to high anomaly values in the interpolated map of FS

_{F2}values map (Fig. 7a). This is because of using staged FA to obtain the maps shown in Figures 7b and 7c. Furthermore, the Elika and Tiz Kuh Formations have apparent stronger spatial associations with high values in the GMPI map (Fig. 7d) than with high values in the interpolated map of linearly-transformed FS

_{F1}values (Fig. 7c). This is because of using a logistic transformation. However, the Elika and Tiz Kuh Formations are absent in some parts of the study area upstream of some high values in the GMPI map (areas labeled L to N) (Figs. 1 & 7d). This is because small outcrops of those and other rock units are not mappable at the 1:100 000 scale of the geological map portrayed in Figure 1 or because the potential host lithologies are overlain by younger sediments. The latter situation is exemplified in the northwestern part of the study area (with labels L to N in Fig. 7) where the Elika and Tizkooh Formations are overlain by younger sediments (Fig. 1). In this area, the GMPI map (Fig. 7d) shows high values but there are no high values in the FS

_{F2}map (Fig. 7a). The high GMPI values suggest the presence of hidden mineralisation hosted by Elika and Tizkooh Formations that are overlain by younger sediments (areas labeled by L-N). Nevertheless, the spatial associations of potential host lithologies of MVT-fluorite mineralisation with high values in the GMPI map (Fig. 7d), but not with high values in the FS map (Fig. 7a), illustrate the advantage of GMPI over ordinary FS. These observations imply that discrimination of geochemical populations is favored by using the GMPI map (Fig. 7d), in comparison to the FS

_{F2}map from the first-stage FA (Fig. 7a), the FS

_{F1}map from the fourth-stage FA (Fig. 7b), and the map of linearly-transformed FS

_{F1}values (Fig. 7c). This is due to (a) stronger correlation of FS

_{F1}from the fourth-stage FA with the corresponding elements, (b) stronger explanation of variance of data in the fourth-stage FA (Table 1), and (c) application of the non-linear transformation of the FSs into the [0,1] range using logistic function instead of linear transformation (Figs. 4–6).

### Combination of GMPI map with other fuzzy indicators

The foregoing illustrations demonstrate that the GMPI map is an enhanced weighted fuzzy geochemical evidence layer in comparison to a geochemical evidence layer generated based on the results of ordinary FA. Hence, it can be integrated with other weighted fuzzy evidential maps, generated based on the conceptual model of the mineral deposit-type sought, in MPM for generating target areas for further exploration. In this regard, for practically using the GMPI map, we used fault density (FD) as another layer of evidence in fuzzy logic MPM. Areas with high FD represent favorability for MVT fluorite deposits in the study area (Ghazban & Moritz 2001; Vahabzadeh 2008). To generate the FD map, the total length of faults per pixel of the study area was calculated. The values of FD are non-fuzzy and are not appropriate as fuzzy evidence scores. Thus, we also transformed the calculated values of FD by applying the following logistic function: (3)

where *F _{FD}* is a fuzzy score,

*i*and

*s*are inflection point and slope, respectively, of the logistic function. The parameters

*i*and

*s*determine the shape of the logistic function and, hence, the output values. These parameters are chosen arbitrarily. For the present study, the values 0.0004 and 5000 were used for

*s*and

*i*, respectively, in Eq. (3).

Because the map of logistically-transformed FD values is a weighted fuzzy evidence layer, it can be integrated with a weighted fuzzy geochemical evidential map, here the GMPI map (Fig. 7d), for fuzzy logic MPM (e.g. Carranza & Hale 2001; Porwal *et al*. 2003, 2004). For this, we used the fuzzy ‘gamma’ operator to integrate the map of fuzzy scores of FD with the GMPI map. Thus, we generated a fuzzy prospectivity map (Fig. 8). Because instead of Figures 7a–c, an enhanced weighted geochemical evidence map, the GMPI map, Figure 7d, was used to integrate with the *F _{FD}* map, the fuzzy prospectivity map (Fig. 8) is more reliable. Hence, application of the GMPI map, generated based on the results of staged FA, is worthwhile to integrate with other evidential layers in fuzzy logic MPM in comparison with the geochemical evidential map, generated based on the results of ordinary FA.

## Summary and Conclusions

IN this study, results of ordinary factor analysis generally include several factors that may indicate one or more anomalous geochemical associations, but the results of staged factor analysis allow for improved identification of significant anomalous geochemical signature of the deposit-type sought. This is because, in staged factor analysis, non-indicator elements are progressively identified and removed from the analysis until a satisfactory significant multi-element signature is obtained. After identification of a significant multi-element signature, the proposed logistic function can be applied to transform factor scores of the multi-element indicator of the mineral deposit-type sought into efficient fuzzy scores. The resulting fuzzy geochemical evidence map can then be integrated with other fuzzy evidence maps for mineral prospectivity mapping. Thus, the present study highlights the following findings in an attempt to improve existing methods for representation of geochemical evidence in mineral prospectivity mapping.

Staged factor analysis can be used for efficient extraction of significant multi-element anomalous signature of the mineral deposit-type sought from stream sediment geochemical data. In the present study, a four-stage of factor analysis resulted in recognition of a significant multi-element signature of the deposit-type sought. The number of staged factor analysis can differ depending on geochemical data and study area.

A logistic function can be used for proper transformation of unbounded factor scores into values in the [0,1] range, here called geochemical mineralisation prospectivity indices (GMPI), in order to obtain fuzzy weights for application in fuzzy logic mineral prospectivity mapping. The proposed GMPI can be used for effective weighting and fuzzification of stream sediment geochemical data with respect to the mineral deposit-type sought to generate a reliable weighted geochemical evidence map for integration with other evidence maps in order to map mineral prospectivity in greenfield areas.

## Acknowledgments

The authors thank the Industrials and Mines Organization of Mazandaran Province, Iran for providing information on fluorite mineralisation used in this research work. Special thanks are given to the Geological Survey of Iran (GSI) for supplying the necessary data for this research work. We thank Professor Graeme Bonham-Carter for his review and suggestions for improving this paper. We also appreciate the comments of the anonymous reviewers.

- © 2014 AAG/The Geological Society of London