## Abstract

Identifying multivariate anomalies from geochemical exploration data in a complex geological setting is very challenging because the complex geological setting may lead to an unknown high-dimensional distribution of the geochemical exploration data. One-class support vector machine (OCSVM) can give useful results in outlier detection in high-dimension or without any assumptions on the distribution of data. Thus, we applied the OCSVM model to identify multivariate geochemical anomalies from stream sediment survey data of the Lalingzaohuo district, an area with complex geological setting, in Qinghai Province, China. The performance of the OCSVM model was compared with that of continuous restricted Boltzmann machine (CRBM) in terms of receiver operating characteristic (ROC) curve, area under curve (*AUC*) and data-processing efficiency. The results show that the two models perform similarly well in terms of ROC and *AUC*; while their data-modeling processes spent 6.06 and 279.36 s, respectively. The anomalies identified by the OCSVM model occupy 19% of the study area and contain 82% of the known mineral deposits; and the anomalies identified by the CRBM model occupy 35% of the study area and contain 88% of the known mineral deposits.

Geochemical anomaly identification is a key procedure in geochemical exploration. In recent decades, various geochemical anomaly identification methods, which range from frequency-based to frequency-spatial-based methods, have been introduced in the literature. Frequency-based methods mainly include the mean ±2σ method (Hawkes & Webb 1962; Galuszka 2007), probability graphs (Sinclair 1974, 1991; Stanley & Sinclair 1989), boxplot (Tukey 1977; McGill *et al.* 1978; Michael *et al.* 1989; Hubert & Vandervieren 2008), and univariate and multivariate analysis (Govett *et al.* 1975; Miesch 1981; Stanley 1988; Garrett 1989; Stanley & Sinclair 1989; El-Makky 2011). Frequency-spatial-based methods mainly include fractal and multifractal methods (Cheng *et al.* 1994, 1996, 2000; Cheng 1995, 2000, 2006, 2007, 2008; Cheng & Agterberg 1995; Li & Cheng 2004; Zuo *et al.* 2009; Deng *et al.* 2010; He *et al.* 2013; Luz *et al.* 2014; Zuo & Wang 2016; Zuo *et al.* 2016), geo-statistics (Meng 1993, 1994; Wackernagel 2003), and spatial factor analysis (Grunsky & Agterberg 1988).

The frequency-based methods usually estimate a threshold for separating geochemical anomalies from background population. They are only suitable for the situation where geochemical background samples are from a single population. The mean ± 2σ method is arbitrary and inefficient despite its widespread use in geochemical exploration (Sinclair 1991). Probability plots require to assume a data distribution model and then the actual data distribution defines what samples may be anomalous; this method is somewhat arbitrary but provides a fundamental grouping of data values. Boxplots graphically depict groups of data through their quartiles; they display variation in samples of a statistical population without making any assumptions of the data distribution. Univariate and multivariate analysis estimate a threshold to separate geochemical anomalies by modelling geochemical data with the Gaussian distribution.

The frequency-spatial-based methods identify geochemical anomalies by modelling element concentrations as well as spatial information. By employing spatial information in geochemical anomaly identification, their modelling process becomes less subjective than that of the frequency-based approaches. But they are still not eligible for separating geochemical anomalies from complex geochemical background. Fractal and multifractal methods can minimize type I and type II errors of classification using a moving average technique with variable window radius (Cheng *et al.* 1996), but they can separate only univariate geochemical anomalies from background. Geo-statistics and spatial factor analysis can model only geochemical data with the Gaussian distribution.

Identifying anomalies from geochemical exploration data in a complex geological setting is very challenging because the complex geological setting may lead to an unknown population distribution of geochemical exploration data. In this case, both the frequency-based and frequency-spatial-based methods may perform poorly due to unknown or possible multimode distribution of the geochemical exploration data. In order to detect geochemical anomalies in a complex geological setting, ant colony algorithm (Chen & An 2016), kernel Mahalanobis distance (Chen *et al.* 2014*a*), continuous restricted Boltzmann machine or CRBM (Chen *et al.* 2014*b*; Chen & Wu 2017), and deep autoencoder network (Xiong & Zuo 2016) were applied to geochemical exploration. The ant colony algorithm can separate univariate anomalies from a complex geochemical background without making any assumptions of the background population distribution; and the kernel Mahalanobis distance, CRBM, and deep autoencoder network can separate multivariate geochemical anomalies from the background without making any assumptions of the background population distribution. However, the data-processing efficiency of these methods is low due to their computational complexity.

Support vector machines (SVMs) are efficient supervised pattern recognition methods and have been successfully applied in mineral prospectivity mapping as well as supervised geochemical anomaly detection (Zuo & Carranza 2011; Abedi *et al.* 2012; Gonbadi *et al.* 2015; Rodriguez-Galiano *et al.* 2015; Geranian *et al.* 2016). By modeling labelled data, the SVM algorithm searches for the best hyperplane in a high-dimensional feature space that leaves the maximum margin between the two classes.

The OCSVM is a natural extension of the SVM algorithm to the case of unlabeled data (Schölkopf *et al.* 2001). The algorithm estimates a subset of input space as the support of the high-dimensional probability distribution of input data; and multivariate outliers are those samples which are drawn from the high-dimensional probability distribution but lie outside the supporting subset. The model has been successfully applied to novelty and outlier detection (Hayton *et al.* 2000; Davy & Godsill 2002; Lengelle *et al.* 2011) and obtained useful results in modeling high-dimensional data without any assumptions on the distribution of the input data. Therefore, we applied the OCSVM model to identify multivariate geochemical anomalies from stream sediment survey data of the Lalingzaohuo district, an area with complex geological setting, in Qinghai Province, China; and the performance of the OCSVM model in multivariate geochemical anomaly identification was compared with that of a CRBM model using ROC curve, *AUC* metric, and data-processing efficiency; finally, multivariate geochemical anomalies were optimally delineated by using the Youden index (Chen 2015; Chen & Wu 2016, 2017) to maximize spatial association between the delineated anomalies and the known mineral deposits. Our work aims to show that the OCSVM model is a feasible multivariate anomaly identification method that can efficiently model multivariate geochemical exploration data in a complex geological setting.

## OCSVM-based multivariate geochemical anomaly identifier

The OCSVM can be used to separate multivariate geochemical anomalies from background in geochemical exploration. Suppose that the multivariate geochemical data in a study area satisfy an unknown high-dimensional probability distribution; and suppose further that , is a training sample set, where is the number of training samples drawn from the unknown high-dimensional probability distribution. Then the OCSVM algorithm can be used to estimate a subset of input space such that the probability that a sample drawn from the unknown high-dimensional probability distribution lies outside the subset equals a predefined real-value of (Schölkopf *et al.* 2001). The parameter expresses the maximum fraction of the training samples that lie outside the subset . It directly affects the performance of the OCSVM algorithm in multivariate outlier detection (Lengelle *et al.* 2011). The subset can be used to represent the background in multivariate geochemical anomaly identification; and test samples which lie outside the subset are recognized as anomaly samples.

Given a parameter, , to control the minimum fraction of the training samples that must be located in the subset in the input space, that is, at least *vm* training samples are classified as background samples; and the parameter *v* also controls the maximum fraction of the training samples that lie outside the subset in the input space, i.e. (1-*vm*) training samples are identified as anomaly samples at the most. The value of parameter *v* may be predefined empirically or determine using a trial-and-error method depending on the mineral exploration level of a study area.

According to Schölkopf *et al.* (2001), the support of an unknown high-dimensional probability distribution can be estimated by using the OCSVM algorithm to define the boundary of the subset , the minimum volume region enclosing at least *vm* training samples drawn from the high-dimensional probability distribution. In multivariate outlier detection, a decision function is used to judge whether a sample ** x** is a normal sample, that is, whether the sample belongs to the subset (Hayton

*et al.*2000; Davy & Godsill 2002; Lengelle

*et al.*2011). In multivariate geochemical anomaly identification, the same form of decision function can be defined for judging whether a sample

**is a background sample by denoting such that: (1)In SVMs, the space of possible functions is reduced to a reproducing kernel Hilbert space with kernel . This kernel induces the so-called feature space**

*x**H*via the feature mapping . We can use to express a dot product in the feature space

*H*. Many possible kernels have been proposed in the literature, and a good choice is Gaussian kernel (Hayton

*et al.*2000; Davy & Godsill 2002; Lengelle

*et al.*2011), defined as (2)where is the

*l*

_{2}norm defined on . This kernel depends on one parameter σ, related to the kernel spread. According to Equation (2), we can have . Therefore, all the geochemical samples are mapped onto the unit-radius hypersphere centered at the origin of the space

*H*.

Training an OCSVM model consists of defining the separation hyperplane such that the margin is maximized (Fig. 1). Parameters *w* and *b* are the solutions of the following optimization problem (Schölkopf *et al.* 2001; Tohmé & Lengellé 2011):
(3)subject to

where is the so-called slack variable representing the loss associated with (nonzero allows for some anomaly samples). The Lagrange multipliers α associated with this problem fully determine *w* and *b*. Once the dual variables are computed using Loqo algorithm (Vanderbei 1995), the decision function becomes
(4)with .

Using Equation (4), we can compute the value of decision function of each sample ** x** in geochemical sample population. If the decision function value of sample

**is positive, the sample**

*x***is a background sample; otherwise it is an anomaly sample. However, we are interested in anomaly samples rather than background samples in multivariate geochemical anomaly identification. So we slightly modify the decision function value by multiplying the value of with −1. As the result, the larger the value of the modified decision function value of a sample**

*x***is; the higher probability the sample**

*x***is an anomaly. Thus, geochemical samples can be identified as multivariate geochemical anomalies if their modified decision function values are greater than zero. In a situation where known mineral deposits are in a study area, the optimal threshold value – falling between the maximum and minimum of decision function values – can be determined by choosing a threshold value that maximizes the spatial association between the identified geochemical anomalies and the known mineral deposits. The Youden index (Chen 2015; Chen & Wu 2016, 2017) can be used to express the spatial association between geochemical anomalies and the known mineral deposits, and the value corresponding to the maximum Youden index is chosen as the optimal threshold.**

*x*## Case study

The Lalingzaohuo district in Qinghai Province in China was chosen as a case study area because the area has a complex geological setting and a regional stream sediment survey had been completed several years ago. Concentration data of 16 geochemical indicators in stream sediment samples had been stored in databases. The concentration data of 10 geochemical indicators were selected for multivariate geochemical anomaly identification because these indicators have significant spatial association with the known mineral deposits in the study area. The OCSVM-based anomaly identifier was used to extract multivariate geochemical anomalies from the stream sediment survey data. Its performance in multivariate geochemical anomaly identification was compared with that of a CRBM model.

### Geological setting and regional mineralization

The study area is located in the Qimantage-Dulan polymetallic mineralization zone in the eastern Kunlun orogenic belt. Six geological formations underlie the study area. They are the Baishahe Formation of the Palaeoproterozoic Jinshuikou Group, the Middle Ordovician to Silurian Tanjianshan Group, the Late Devonian Maoniushan Formation, the Early Carboniferous Dagangou and Shiguaizi Formations, and the Late Triassic Elashan Formation. Among these formations, the Baishahe Formation of the Palaeoproterozoic Jinshuikou Group is widely distributed in the study area and genetically associated with regional mineralization (Kong & Hu 2014). Intermediate-acidic magmatic rocks, formed in the five tectono-magmatic cycles from Mesoproterozoic to Early Cretaceous, are widely exposed in the study area and often coexist spatially with locally exposed basic-ultrabasic magmatic rocks, and both intermediate-acidic and basic-ultrabasic magmatic rocks are genetically associated with regional mineralization (Chen *et al.* 2006; Wang *et al.* 2014). There are 17 mineral deposits and occurrences discovered in the study area (Fig. 2). They include the Xiarihamu Cu-Ni sulfide deposit and other 16 hydrothermal-skarn mineral deposits (occurrences). The district-scale survey of geology and mineral resources completed in recent years revealed that Fe, Cu, Ni, Co, Ag, Pb and Zn are main mineralizing elements in the study area. The spatial distribution of regional geological formations and magmatic rocks as well as the known mineral deposits was controlled by a series of approximately paralleled WNW- and east–west-trending deep faults, which constitute the northern Kunlun fault zone that extends across the central study area (Du *et al.* 2012). The Baishahe Formation of the Palaeoproterozoic Jinshuikou Group, magmatic complex, and WNW- and east–west-trending deep faults are the three principal regional mineralization controlling factors in the study area.

### Geochemical exploration data and geochemical indicator selection

The Qinghai Geological Survey Institute has completed a stream sediment survey on scale of 1:50 000 in the study area in recent years. A size fraction experiment was conducted in the Kayakedengtage district to determine the optimal sampling granularity for the stream sediment survey. The experiment reveals that geochemical indicator concentration values of stream sediments at the sampling granularities from 250 to 1700 μm can portray geological variations in the experiment district. The stream sediment sampling density was designed to collect 4–8 stream sediment samples from each 1 × 1 km square cell within drainage catchments, which cover more than three-fourths of the total study area (Fig. 3). A total of 11 590 stream sediment samples, which contain 229 duplicate samples for evaluating sample analysis errors, were collected along drainage systems in six 1:50 000 scale topographic maps. The 11 159 samples as well as the 229 duplicate samples were analyzed in the Research Institute of Rock and Mineral Analysis of Qinghai Province, China. X-ray fluorescence spectrometry was used to analyze the concentration values of 16 geochemical indicators of each stream sediment sample. The concentration units are parts per billion (ppb) for Au and Ag, parts per million (ppm) for the other 13 elements, and weight/weight percent for TFe. The accuracy and precision of the geochemical data meet the requirements of Geochemical Survey Criteria (No. DZ/T0011-91) and Regional Geochemical Exploration Criteria (No. DZ/T0167-95). Sample tests show that sample analyzing pass rates of Ag, As, Au, Bi, Co, Cr, Cu, Mo, Ni, Pb, Sb, Sn, TFe, Ti, W and Zn are higher than 99%.

The stream sediment survey data were recorded in an Excel table, in which each row records the sample number, the coordinates of the sample locations in the plane of Gauss-Krüger projection, and the concentration values of 16 geochemical indicators. In order to match the geographical coordinate system of the map of mineral deposits and occurrences, the coordinates of the geochemical sample locations were, first, transformed into latitude and longitude coordinates using MAPGIS that is a GIS platform developed by Chinese researchers and then put back into the Excel table.

In order to select metallogenic indicators from the 16 geochemical indicators, the indicator concentration data collected from scattered stream sediment samples were transformed into 200 (columns) by 187 (rows) regularly spaced grid element map (Chen & An 2016) by interpolating using the Golden Software Surfer that provides a dozen of interpolating methods. The inverse distance to a power and kriging are often used interpolating methods in geochemical data processing. The kriging method can exploit spatial coherent information to determine local searching radiuses and weight coefficients in data interpolation but small local anomalies are easily smoothed out due to its too averaged interpolation. Thus, the inverse distance to a power was used in geochemical data interpolation in this case study. The searching radius was empirically chosen as 1.8 km and a moderate power value of 2 was used in the interpolation method because this value neither suppresses nor intensifies local anomalies in the weighted averaging process.

Based on interpolated data of each geochemical indicator, whether the geochemical indicator is significantly spatially associated with the known mineral deposits in the study area can be measured by its *AUC* value and further tested by *Z _{AUC}* statistic (Chen 2015). The

*AUC*values can be estimated using the Wilcoxon Mann–Whitney test of ranks (Bergmann

*et al.*2000); and

*S*s (i.e. standard deviation of

_{AUC}*AUC*s) and

*Z*s can be estimated based on

_{AUC}*AUC*values (Chen 2015). The

*AUC*s,

*S*

_{AUC}s, and

*Z*s for the 16 geochemical indicators were estimated (Table 1) based on the interpolated data at 31 617 grid points, which are not located in the blank areas where there is no interpolated geochemical data. From Table 1, it can be seen that Ag, Bi, Co, Cr, Cu, Ni, Pb, Sn, TFe and Zn concentration values are significantly spatially associated with the known mineral deposits because their

_{AUC}*Z*

_{AUC}s exceed the critical value 1.96 at the level

*α*= 0.05. These statistical results are strongly consistent with the mineral deposit models and associated geological settings. Cu, Ni, and Co are main metallogenic elements and Cr, Ag, Fe, Pb, Zn, Sn and Bi are mineralization-associated elements for the magmatic Cu-Ni sulfide deposits discovered in China (Liu & Li 2001); and the district-scale survey of geology and mineral resources showed that Fe, Pb, Zn, Cu, Ag are main metallogenic elements and Co, Cr, Ni, Sn and Bi are mineralization-associated elements for the hydrothermal-skarn mineral deposits discovered in the study area. The two different types of mineral deposits are controlled by the same district-scale metallogenic factors in the study area (Chen & Wu 2017). Therefore, Ag, Bi, Co, Cr, Cu, Ni, Pb, Sn, TFe and Zn can be used as metallogenic indicators in geochemical exploration.

### Multivariate geochemical anomaly identification

The OCSVM and CRBM can be used to separate multivariate geochemical anomalies from complex geochemical background because they are able to model multivariate geochemical data with complex or unknown population distribution. A common shortcoming of the OCSVM and CRBM is that they cannot explore the inter-relationship among elements because they are black boxes. How to overcome this shortcoming needs further investigations.

The geochemical exploration area has complex geological setting, so we applied the OCSVM and CRBM (Chen *et al.* 2014*b*; Chen & Wu 2017) to identify multivariate geochemical anomalies. According to the statistical results in Section 3.2, Ag, Bi, Co, Cr, Cu, Ni, Pb, Sn, TFe and Zn are significantly spatially associated with the known mineral deposits. Thus, their concentration values in Excel table were used as the input data of the two models.

Using the OCSVM model to identify multivariate geochemical anomalies needs to experimentally determine parameter *v*; while other parameters can be initialized using the default values (initialize ‘kernel’ with ‘rbf’ and initialize ‘*σ*’ with ‘1/*d*’, ‘*d*’ denotes the number of geochemical indicators). In this study, the value of parameter *v* was defined as 0.50, 0.55, 0.60, 0.65, 0.70, 0.75, 0.80, 0.85, 0.90 and 0.95, respectively; and then *AUC* values were used to measure the overall performance of the OCSVM model that was initialized using different values of parameter *v* in multivariate geochemical anomaly identification. With respect to each value of parameter *v*, decision function values of all the geochemical samples were computed and then transformed into 200 × 187 grid data by interpolation using the Golden Software Surfer. The *AUC* value with respect to each value of parameter *v* was estimated based on the interpolated decision function values and listed in Table 2. Diagram of *AUCs* varying with values of parameter *v* was plotted and shown in Figure 4. From Table 2 and Figure 4, we can find that *AUC*s reaches its maximum value at *v* = 0.75. Thus, the decision function values of the sample population with respect to *v*-value of 0.75 was finally used in multivariate geochemical anomaly identification in this study.

A CRBM model (Chen *et al.* 2014*b*; Chen & Wu 2017) with 11 visible processing units that correspond to the 10 metallogenic indicators and 31 hidden processing units was defined and initialized using the following parameters: (*a*) *σ* = 0.2; (*b*) *α* = 0.1; (*c*) learning rates *η _{w}* =

*η*= 0.5; (

_{α}*d*) learning cost = 0.00001; and (

*e*) learning moment = 0.9. The CRBM model was trained on the geochemical sample population comprised of 9742 stream sediment samples, each of which was defined as an input vector with 10 entries for representing the 10 indicator concentration values of the stream sediment samples. Average square error (

*ASE*) (Chen

*et al.*2014

*b*; Chen & Wu 2017) for each sample was obtained at training epoch 300 and used as geochemical anomaly indicator. The

*ASE*indices of the 9742 stream sediment samples were transformed into 200 × 187 grid data by interpolation using the Golden Software Surfer.

## Results and discussion

### Deposit-bearing and non-deposit-bearing grid points

In order to evaluate effectiveness of multivariate geochemical anomaly identification results, deposit-bearing grid points (i.e. true positive samples) and non-deposit-bearing grid points (i.e. true negative samples) must be determined for computing costs, benefits, and the Youden indices, and implementing ROC curve analysis (Chen & Wu 2016, 2017).

In this study, a deposit-bearing grid point denotes a grid point, distance between which and a mineral deposit is shorter than and no other grid point is closer to the mineral deposit than the grid point. Here, *dx* and *dy* are column and row spacing, respectively. A non-deposit-bearing grid point denotes a grid point which satisfies (*a*) being not a determined deposit-bearing grid and (*b*) being not located in a blank area where the geochemical survey is not covered.

It should be stated that a few non-deposit-bearing grid points are actually deposit-bearing ones because undiscovered mineral deposits possibly exist in an area where there is not any known mineral deposit. In other words, a few deposit-bearing grid points were incorrectly defined as non-deposit-bearing grid points in this study. However, the deposit-bearing grid points defined improperly account for only a very small proportion of the total non-deposit-bearing grid points. Thus, they cannot significantly affect results of a ROC curve analysis.

### Identified geochemical anomalies

The Youden index (Chen 2015; Chen & Wu 2016, 2017) was applied to quantify the spatial association between the identified multivariate geochemical anomalies and the known mineral deposits. Based on the interpolated data of decision function values or *ASE* indices, a number of threshold values (a value of 1000 was used in this study) that are evenly distributed between the maximum and minimum values of the anomaly index data, were used to classify the grid points into multivariate geochemical anomalies and the background; and then with respect to each threshold, the number of anomaly grid points which bear known mineral deposits, the number of anomaly grid points which do not bear known mineral deposits, the number of background grid points which bear known mineral deposits, and the number of background grid points which do not bear known mineral deposits were counted and used to compute cost and benefit (Chen & Wu 2016) as well as the Youden index. The optimal threshold, with respect to the maximum Youden index, was finally selected to delineate multivariate geochemical anomalies.

Table 3 lists the optimal thresholds, the area rates of geochemical anomalies, and the percentage of known mineral deposits located in the multivariate geochemical anomalies. From Table 3, we can see that the multivariate geochemical anomalies delineated using the OCSVM model account for 19 percent of the 31 617 grid points that are not located in blank areas and contain 82% of the 17 known mineral deposits; and the multivariate geochemical anomalies delineated using the CRBM model account for 35 percent of the 31 617 grid points and contain 88% of the 17 known mineral deposits.

Figure 5 are maps of the 17 known mineral deposits and the optimally delineated multivariate geochemical anomalies based on the interpolated data of decision function values and *ASE* indices. Figure 5 shows that the delineated multivariate geochemical anomalies are mainly spatially distributed along the northwest-western zone of the central study area. This zone spatially coincides with the upper wall of the northern Kunlun fault zone which controls the spatial distribution of either the Baishahe Formation of the Palaeoproterozoic Jinshuikou Group or regional magmatic complexes (Fig. 2). These two formations belong to the principal factors that control regional mineralization in the study area.

There are two large scale geochemical anomalies need to be paid much attention to in future mineral exploration in the study area (Fig. 5). One anomaly is located in the intersection region of latitude 36° 39′ and longitude 93° 6′; and the other is located in the intersection region of latitude 36° 37′ 30″ and longitude 93° 16′ 30″. Although these two anomalies do not contain any known mineral deposits, they have a similar good geological condition for regional mineralization because these two anomalies spatially coincide with approximately paralleled regional northwest-western faults and outcrops of the Baishahe Formation of the Palaeoproterozoic Jinshuikou Group.

### ROC curve analysis

Costs and benefits (Chen & Wu 2016, 2017) at various threshold settings were used to plot the ROC curves based on decision function values and *ASE* indices. *AUC* values were estimated by computing the Wilcoxon test of ranks, and *S*_{AUC}s and *Z*_{AUC}s were computed based on the estimated *AUC* values (Bergmann *et al.* 2000; Chen 2015).

The *AUC*s, *S*_{AUC}s, *Z*_{AUC}s, and time spent in modeling using the two models are listed in Table 3. From Table 3, we can conclude that: (*a*) decision function value and *ASE* index are significantly spatially associated with the known mineral deposits in the study area because the *Z*_{AUC} values of these two multivariate geochemical anomaly indices are higher than the critical value 1.96 at level ; (*b*) the OCSVM and CRBM models perform similarly well in multivariate geochemical anomaly identification in term of *AUC* metric because the *AUC* value (0.8316) of the former is nearly equal to the *AUC* value (0.8396) of the latter; and (*c*) the data-processing efficiency of the OCSVM model is much higher than that of the CRBM model because modeling the geochemical data using the former only needs 6.06 s while modeling the geochemical data using the latter needs 279.36 s.

Figure 6 shows the ROC curves of the OCSVM and CRBM models. It can be seen that the ROC curve of the OCSVM model intersects with that of the CRBM model and the latter slightly dominates the former in the right while the former dominates the latter in the left in the ROC performance space. According to these two ROC curves, we can conclude that the OCSVM and CRBM models are comparable in multivariate geochemical anomaly identification.

## Conclusion

The one-class support vector machine model is a novel outlier detection method in data mining. It can be used to detect abnormal points from a data set. In this study, we demonstrate the one-class support vector machine model as a multivariate geochemical anomaly identification method for extracting multivariate geochemical anomalies from stream sediment survey data of the Lalingzaohuo district in Qinghai Province in China. Receiver operating characteristic curve and area under curve were applied to evaluate the performance of the one-class support vector machine and continuous restricted Boltzmann machine models in multivariate geochemical anomaly identification. In this study, the extracted multivariate geochemical anomalies by the two models are significantly spatially associated with the known mineral deposits, and are strongly consistent with the regional geological features of the study area. It has been found that the one-class support vector machine and continuous restricted Boltzmann machine models perform similarly well in multivariate geochemical anomaly identification in terms of receiver operating characteristic curve and area under curve, while data-processing efficiency of the one-class support vector machine model is much higher than that of the continuous restricted Boltzmann machine model. Therefore, the one-class support vector machine model is a potentially useful multivariate geochemical anomaly identification method that can quickly extract multivariate geochemical anomalies from geochemical exploration data. Testing of the one-class support vector machine model in multivariate geochemical anomaly identification in other geochemical exploration areas with complex geological settings should be implemented to further evaluate its usefulness.

Parameter *v* must be properly defined when applying the one-class support vector machine model to identify multivariate geochemical anomalies. The parameter *v* is used to control the minimum fraction of the training samples that constitute the geochemical background, i.e. the subset in input space estimated by the one-class support vector machine model. It will directly affect the performance of the one-class support vector machine model in multivariate geochemical anomaly identification. In this study, the optimal parameter value of 0.75 was chosen from the 10 values that were evenly distributed in interval [0.5, 0.95] by using area under curve to measure the performance of the one-class support vector machine model initialized by each parameter value in multivariate geochemical anomaly identification. However, this parameter selection method is not suitable for an area where there are no known mineral deposits. Thus, how to determine the optimal value of parameter *v* for initializing a one-class support vector machine model still needs to be further investigated.

In the case study, we find that: (*a*) multivariate geochemical anomalies are mainly spatially distributed in the upper wall of the northern Kunlun fault zone that controls the spatial distribution of two of the principal regional mineralization controlling factors, i.e. the Baishahe Formation of the Palaeoproterozoic Jinshuikou Group and regional magmatic complexes, thus further geochemical exploration work should be deployed in the upper wall of the northern Kunlun fault zone; and (*b*) there are two regional multivariate geochemical anomalies which do not contain any known mineral deposits but have favourable geological conditions for regional mineralization, because they spatially coincide with WNW- and east–west-trending deep faults and the Baishahe Formation of the Palaeoproterozoic Jinshuikou Group, thus they need to be paid much attention to in future mineral exploration in the study area.

## Acknowledgements

We are grateful to Dr Nan Lin for his help in collecting geological and geochemical data of the study area. We are also grateful to the two anonymous reviewers for their constructive comments.

## Funding

This research was supported by the National Natural Science Foundation of China (Grant numbers 41272360, 41472299, and 41672322).

- © 2017 The Author(s)