Analysis of Raman Spectra With a Machine Learning Approach for Improved Quantification of Microcystin-LR

Loading...
Thumbnail Image

Date

2022-05

Journal Title

Journal ISSN

Volume Title

Publisher

The Ohio State University

Research Projects

Organizational Units

Journal Issue

Abstract

Cyanobacterial harmful algal blooms (cyanoHABs) have increased in prevalence in recent years, threatening 0.5% of potable water on earth. Microcystins, a class of toxins that can be produced in a cyanoHAB, are particularly harmful to humans and ecosystems alike. Microcystin-LR (MC-LR) is among the most toxic and the most common microcystins. Though biosensors have recently shown impressive capabilities in detecting microcystins, exhibiting high sensitivity, selectivity, and portability, traditional Raman spectroscopy combined with modern machine learning may still offer a path to sensitive, portable detection of MC-LR. The objective of this project focuses on evaluating the efficacy of three machine learning algorithms at detecting MC-LR in water at concentrations near the EPA's benchmark limit of 1 μg/L. Raman spectra were collected from MC-LR dilutions in water at concentrations ranging from 0.001 to 6.0 μg/L. A sample size of n=1000 Raman spectra was achieved, and spectral preprocessing methods including background subtraction of water, z-score feature normalization, and baseline removal were employed. Regression models to predict MC-LR concentration in water from Raman spectral data were built using three machine learning algorithms: kernel support vector machine for regression (SVR), regression deep neural network (DNN), and partial least squares regression (PLSR). These three models are compared using mean-square-error (MSE) and mean-absolute-error (MAE) to evaluate their efficacy for predicting MC-LR concentrations in the range of 0.001 to 6.0 μg/L. After validation of the models on test data (n=200), MSE values were found to increase in the order of PLSR without feature normalization (0.191) < PLSR with feature normalization (0.199) < SVR (0.432) < DNN (3.155).

Description

Keywords

Machine Learning, Spectroscopy, Microcystin, Raman Spectroscopy, Deep Learning

Citation