Analysis of "Big" Spectral Data

Chemometrics Without Preconceived Notions

 

Spectral data, in particular data generated from Laser Induced Breakdown Spectroscopy (LIBS), is very useful to researchers in a variety of fields from cultural heritage to pharmacology to material science to national defense. Data can be collected quickly, and samples require minimal-to-no sample preparation. However, the data sets generated from LIBS can rapidly become massive in size, measured in terabytes (10^12). Storing, processing, searching and interpreting data on this scale can be challenging. It does not easily fit into traditional models that rely on the fundamental concept of looking for what is thought to be important, searching for certain elements (usually fewer than ten), and measuring their ratios to determine the nature of a sample of material. Traditional techniques for analysis of LIBS data disregard over 99% of the collected data. The drawback to this approach is that it is not easy to know what is “important,” and valuable data may be ignored.

 

The challenge of “Big Data” is not new; terabyte databases are common in the sciences of biology, climatology, and particle physics. Creating and maintaining a LIBS spectral database measured in terabytes is a resource-intensive effort. However, there are multiple approaches to this, from programing new algorithms that allow for the visualization and interpretation of the vast amounts of LIBS spectral data; to using commercial off-the-shelf chemometric solutions, such as Unscrambler® (CAMO Software AS), and SpectraLearn (CoVar Applied Technologies, Inc.) Regardless of technique, it is worth the effort, because the results provide answers to previously unanswerable questions. Mining vast amounts of spectral data without preconceived notions of what data is important has allowed the classification of a wide variety of materials, determining, for example, what mine a gemstone came from, or what heat treatment an alloy has undergone, all with greater than 98% average accuracy. (New research resulting from the use of Big Data will be shared during the presentation.)

 

When applied to LIBS spectral data, the concepts behind the processing of Big Data are straightforward and effective, yielding a rapid answer to real world problems.

 

 

Date Presented: September 2014

Conference Presented: SCIX 2014

 

Authored By:

1. McManus, Catherine 

2. McMillian, Nancy

3. Dowe, James

 

Author Affiliations:

1. Materialytics, LLC, P.O. Box 10988, Killeen, TX 76547

2. Geological Sciences, New Mexico State University, Las Cruces, NM 88003

3. Analytical Data Services