Why, and How, Should Geologists Use Compositional Data Analysis/Conclusions and Recommendations

< Why, and How, Should Geologists Use Compositional Data Analysis

Conclusions and Recommendations

The treatment of “closed” dataset by normal statistical methods does create spurious correlations that lower the effectiveness of the obtained results. While there are ways to minimize this problem, like processing major oxides independently from trace elements, or using only the strongest correlations into the composition of the RCCs, I believe that the transformation of the initial dataset presents a better solution for the processing and interpretation of geological data.

For the estimation of the most efficient RCC, I propose the use of the ARL transformation, although the CLR is also effective.

When a target for the testing of the effectiveness of our coefficient is available, then we should use the factor analysis preferentially. I recommend the use of the IRL transformation to define the most effective combination of factors.

Finally, I introduced here a method for dealing with zero values. This method’s main advantage is that we do not obtain a fixed value for the “Rounded Zeros”, but one that depends on the real value of the other variable. The proposed method depends on the geological characteristics of the data, and therefore is less biased or random than other methods. It also presents a viable alternative to amalgamation and an effective way to deal with “Essential Zeros” in a population.

The sequence of the method is as follows:

  1. We transform the data using CoDaPack or other similar software.
  2. We select the lower quartile of real data for the element with the b.d.l. values.
  3. Within this dataset, we test the relationship between the elements with the b.d.l. values with one (or more) element without b.d.l. values. In most cases, these elements will correspond with well-established geological relationship like between Pb and Zn on polymetallic deposits, or between Au and Pb in hydrothermal deposits, or between Cu and Mo in porphyritic deposits, as in the case I presented here.
  4. We establish the regression equation.
  5. We then substitute the b.d.l. values by those estimated with the obtained equation of regression.

We can apply this method to any type of data, provided we establish first their correlation dependency. I would also like to see this method included as an option for dealing with zeros in the next version of CoDaPack.