Last modified on 18 January 2008, at 21:34

Why, and How, Should Geologists Use Compositional Data Analysis/Factor Analysis

Factor analysis is a statistical data reduction technique used to explain variability among observed random variables in terms of fewer unobserved random variables called factors. It is useful to reduce the number of variables, by combining two or more variables into a single factor, thus “simplifying” the original dataset.

Factor analysis (FA) is especially useful in geochemistry when one has a known target or some other way to understand the meaning of the obtained associations. When failing this, the geologist is usually forced to “plot and see”, and then to select the FA that he believes is the most useful for the studied area.

I processed both the initial dataset and the three transformed versions using SYSTAT SSPS 10.0 for Windows, but you can use any other statistical program capable of factor analysis.

Factor Analysis for the Initial DatasetEdit

Figure 36 shows the plot for the initial dataset, while table 24 shows the principal components defined by the software.

Scree plot for the initial dataset Figure 36.jpg

Figure 36. Scree plot for the initial dataset.


Table 24. Principal component analysis (PCA) for the initial dataset.

Principal component analysis (PCA) for the initial dataset Table 24.jpg


Equations 23 – 25 show the three FA components for the initial dataset.

Equation 23. FA 1 for the initial dataset.

FA 1 for the initial dataset Equation 23.jpg

Equation 24. FA 2 for the initial dataset.

FA 2 for the initial dataset Equation 24.jpg

Equation 25. FA 3 for the initial dataset.

FA 3 for the initial dataset Equation 25.jpg


Figures 37 – 39 show the effectiveness of these FA as a targeting tool for our ore body.


Effectiveness of these FA as a targeting tool for our ore body Figures 37 to 39.jpg


Conclusions and recommendations on the use of FA for the initial datasetEdit

For as long as we have a known target to test the obtained FA, this method offers better results than the RCC. It also allows for the combined studied of all the elements together.

FA1 and FA2 do contain the embedded correlations I introduced in the initial dataset, thus their effectiveness, especially FA 1, in mapping the location of the ore body.

The next question will be: Will the transformed data be any more effective in helping us locate our target?

CRL transformed dataEdit

Figure 40 shows the scree plot for the CLR transformed dataset, while table 25 shows the principal components defined by SYSTAT.

Scree plot for the CLR transformed dataset Figure 40.jpg

Figure 40. Scree plot for the CLR transformed dataset.


Table 25. Principal component analysis for the CLR transformed dataset.

Principal component analysis for the CLR transformed dataset Table 25.jpg


Equations 26 – 28 show the three FA components for the CLR transformed dataset.

Equation 26. FA 4 for the CLR transformed dataset.

FA 4 for the CLR transformed dataset Equation 26.jpg

Equation 27. FA5 for the CLR transformed dataset.

FA5 for the CLR transformed dataset. Equation 27.jpg

Equation 28. FA6 for the CLR transformed dataset.

FA6 for the CLR transformed dataset. Equation 28.jpg

Figures 41 – 43 show the effectiveness of these FA as a targeting tool for our ore body.


Effectiveness of these FA as a targeting tool Figures 41 to 43.jpg

Factor Analysis for the ALR Transformed DatasetEdit

Figure 44 shows the scree plot for the ALR transformed dataset, while table 26 shows the principal components defined by SYSTAT.

Scree plot for the ALR transformed dataset Figure 44.jpg

Figure 44. Scree plot for the ALR transformed dataset.


Table 26. Principal component analysis for the ALR transformed dataset.

Principal component analysis for the ALR transformed dataset Table 26.jpg


Although table 26 shows two components, I will analyze only the second, which is a coefficient as shown in equation 29.

Equation 29. FA7 for the ALR transformed dataset.

FA7 for the ALR transformed dataset Equation 29.jpg

This factor contains the embedded relationship from the initial dataset, but because of the presence of other elements, its usefulness as a targeting tool is more limited, as shown in Figure 45.

FA7 covers mostly the southeastern part of the ore body Figure 45.jpg

Figure 45. FA7 covers mostly the southeastern part of the ore body.

Factor Analysis of the IRL Transformed DatasetEdit

Figure 46 shows the scree plot for the IRL transformed dataset, while table 27 shows the principal components defined by SYSTAT.

Scree plot for the IRL transformed dataset Figure 46.jpg

Figure 46. Scree plot for the IRL transformed dataset.


Table 27. Principal component analysis for the IRL transformed dataset.

Principal component analysis for the IRL transformed dataset Table 27.jpg

The fact that we have so many components as the result of the P.C.A., is an indication that we will not get good results this time. Equations 30 through 34 show the obtained factors.

Equation 30. FA8 for the IRL transformed dataset.

FA8 for the IRL transformed dataset Equation 30.jpg

Equation 31. FA9 for the IRL transformed dataset.

FA9 for the IRL transformed dataset Equation 31.jpg

Equation 32. FA10 for the IRL transformed dataset.

FA10 for the IRL transformed dataset Equation 32.jpg

Equation 33. FA11 for the IRL transformed dataset.

FA11 for the IRL transformed dataset Equation 33.jpg

Figures 47 through 50 shows the spatial distribution of these factors with respect to the location of our ore body.


Spatial distribution of these factors Figures 47 to 50.jpg


Conclusions and Recommendations on the Use of FA for the Transformed DatasetsEdit

As I mentioned earlier, for FA to be most useful, one needs to have a known target to calibrate it. The factor analysis applied to the CLR transformed data gave us three factors, but only one (FA5) was useful for targeting the ore body.

The factor analysis of the ALR transformed data (Factor 7) was good in general, but the best factors were obtained from the ILR transformed data, specially Factor 9 that not only gave the exact location of the ore body, but also its internal structure. Another efficient factor was FA11, but it definitively required calibration based on a known target.

So answering the question from page 41, yes, the factor analysis of the IRL transformed data will be more effective than the factor analysis of the raw data as a tool for locating the ore deposit.