REVIEW ARTICLE Year : 2021  Volume : 12  Issue : 1  Page : 17 Infrared spectroscopy technique for quantification of compounds in plantbased medicine and supplement Effan Cahyati Junaedi^{1}, Keri Lestari^{2}, Muchtaridi Muchtaridi^{3}, ^{1} Department of Pharmaceutical Analysis and Medicinal Chemistry, Universitas Padjadjaran; Department of Pharmaceutical Chemical Analysis, Faculty of Mathematics and Natural Science, Universitas Garut, Langensari, Indonesia ^{2} Department of Pharmacology and Clincal Pharmacy Faculty of Pharmacy, Universitas Padjadjaran, Langensari, Indonesia ^{3} Department of Pharmaceutical Analysis and Medicinal Chemistry, Universitas Padjadjaran, Langensari, Indonesia Correspondence Address: Quality control of plantbased medicine and supplements must be carried out to ensure uniformity in quality and safety in their use, resulting in the need for effective and accurate analytical methods. Infrared spectroscopy is a method of qualitative and quantitative analysis that is fast, timesaving, costeffective, accurate, and nondestructive. This method has been applied for quantitative analysis of compounds in complex matrices such as plantbased medicine and supplements supported by chemometrics techniques. The success of infrared spectroscopy applications for quantitative analysis of phytochemicals and adulterants content in plantbased medicine and supplement can happen by several factors. This article highlights the effect of spectral preprocessing and variable selection on quantitative analysis of phytochemical and adulterant in plantbased medicine and supplements using infrared spectroscopy. Literature search was conducted with PubMed, Google Scholar, and Science Direct by selecting quantitative analysis research on plantbased medicines and supplements that utilize spectral preprocessing techniques and variable selection in processing data analysis. The preprocessing spectra and variables selection can affect the accuracy and precision of infrared spectroscopy methods. The variable selection can be done using the wavenumber point technique, the wavenumber interval, or a combination thereof. Variable selection is more commonly used for nearinfrared data than for IR data. The optimization of the preprocessing spectra and variables selection technique will be useful in increasing the ability of infrared spectroscopy in predicting compound levels.
Introduction The use of plantbased medicine and supplements had been increased in the past few decades. This is in line with the evolution of selfmedication so that there is a tendency to return to traditional and natural products. Consumers choose to use herbal and natural products because natural herbs are safe and more likely to minimize the side effects of using chemical drugs, improve health, and reduce treatment costs.[1] However, the use of herbal products is not yet acceptable in some countries due to counterfeit products, uneven quality, and safety of their use. It may cause negative effects to the consumer, product quality assurance can be assessed from both qualitative and quantitative aspects. Quantitative assessment of herbal products quality is focused on phytochemical components that are naturally contained in sample or adulterants that should not be contained therein.[2],[3] Various analytical methods can be used to examine phytochemical content and adulterants in herbal medicines such as high performance liquid chromatography (HPLC), ultrahighHPLC, liquid chromatography (LC)mass spectrometry (MS), gas chromatography (GC)MS, nuclear magnetic resonance (NMR), and thinlayer chromatography. However, the methods have several disadvantages. HPLC method is still costly and unpracticed in sample handling, needs longtime analysis, and has a high consumption of solvents. NMR method is insensitive and requires a relatively large amount of sample to make a measurement.[4] Furthermore, in herbal medicines analysis, one of the obstacles is the efficiency of the analysis, because the complex components of herbal medicines can complicate the process. In addition, the analysis process can also damage the material.[5] Infrared spectroscopy can be examined by qualitative and quantitative analysis. This method provides information about the compound content in complex samples with small levels. The complexity of the information provided can be resolved by chemometric techniques. Chemometric techniques can assist in extracting information from spectra through multivariate analysis. Spectroscopy is considered fast, timesaving, costeffective, accurate, and nondestructive analytical tool.[6] Infrared spectroscopy has been used successfully for quantitative analysis in various fields including pharmaceutical, food industry, agriculture, and biological evaluation. The success of this method is assessed from the value of root mean square error of calibration (RMSEC), root mean square error of prediction (RMSEP), root mean square error of crossvalidation (RMSECV), and determination coefficient (R2).[7] Good linearity and accepted standard errors of a model of good linearity and standard errors of a prediction model are influenced by several factors. The preprocessing spectra and variables selection can affect the accuracy and precision of infrared spectroscopy methods. We performed the PubMed, Google Scholar, and Science Direct databases to find articles providing information on the effect of preprocessing spectra and variables selection. Some of the research done on quantitative analysis using infrared spectroscopy has been presented in [Table 1]. This article will discuss several preprocessing techniques and variable selection that contribute to improving the quantitative analysis results of chemical components (in the form of adulterant) and phytochemicals (second metabolites) in plantbased medicine and supplements to standardize products with green analytical chemistry methods.{Table 1} Quantitative Analysis Using Infrared Spectroscopy Quantitative analysis with infrared spectroscopy is an indirect method because it still requires reference methods such as chromatographic (HPLC, GC and capillary electrophoresis), and spectroscopic (NMR, ultraviolet). To support the quantification of compound information in the sample, the relationship between Fourier transform infrared spectroscopy (FTIR) spectrum as a predictor variable, and the value of measurement level results with the reference method (response variable) will be translated by chemometrics techniques. In general, content analysis with infrared spectroscopy combined with chemometrics is carried out through several stages. First, the samples of spectra data are scanned using nearinfrared (NIR) and midinfrared (MIR) spectrometers. The sample is also analyzed by the reference method to get the actual value. This reference method is chosen based on the ability to quantify components in a sample accurately. Furthermore, infrared spectra can be treated further with preprocessing and variable selection. This aims to reduce the dimensions of the data to simplify calibration modeling without losing important information.[8] Calibration is made by plotting the prediction variable (variable of infrared spectra) as x with the actual variable (compound content obtained from the reference method) as y. Calibration models for quantitative analysis are divided into two categories: linear and nonlinear. Generally, linear calibration that can be used is partial least square (PLS), principal component regression, and stepwise multiple linear regression, while nonlinear calibration is artificial neural network.[8] After that, the calibration model formed is evaluated to testify the ability in analyzing the levels of compounds in the sample. Quantification of Phytochemical Compound and Adulterant An analysis with infrared spectroscopy is ideal if the evaluation results show R2 values approaching a value of 1, low standard error (RMSEC, RMSECV, RMSEP, or predicted residual error sum square [PRESS]), and the relative percent difference (RPD) values should be at least 2.4 or greater.[28],[29] R2 calibration values describe the linearity of the calibration curve formed from infrared spectrum data (x value) with the concentration data measured using the reference method (y value). R2 values close to 1 indicate that the infrared spectrum data are able to explain the concentration of compounds as the dependent variable. R2 validation value illustrates the accuracy of the measured concentration value by the infrared method. A validation value of R2 close to 1 indicates that the measurement of compound concentrations using the FTIR method yields a value that is proportional to the measurement results of the reference method. The RPD values were used to evaluate the fitting and prediction capacities of the models by calculating the standard deviation divided by standard error prediction. The RMSEC, RMSECV, RMSEP, or PRESS values are used to assess the precision. The analysis process using infrared spectroscopy can be seen in [Figure 1].{Figure 1} Spectra preprocessing Spectra preprocessing in quantitative analysis aims to minimize the noise and the physical phenomena so that the resulting signal correlates with concentration that increases the predictive ability.[30] There are several ways of preprocessing the spectra. In this review, SavitzyGolaybased derivatization, standard normal variate (SNV), multiplicative scatter correction (MSC), and scaling techniques are commonly used.[9],[21],[30] Most of the analysis with plant materials using infrared spectroscopy is done on solid samples, making it susceptible to scattering.[30] The scattering causes irrelevant variations in the spectra data. If there are no preprocessing spectra, a mix of information and noise will occur, and this can cause a decrease in the predictive ability of a model. The scattering usually results in fewer MIR measurements than NIR. It is due to inhomogeneity of particle size in NIR measurement.[30] In research conducted by Otsuka, it was shown that the particle size was inversely proportional to the scattering coefficient.[31] MSC and SNV are the examples of useful scattering correction techniques in reducing physical variability due to scattering. MSC and SNV often produce similar evaluation values.[13],[17],[20] MSC works to correct the spectrum by estimating the correction coefficient of the raw spectrum and calculating it with the average reference spectrum used in the calibration model. Since it uses the average spectrum, correction errors might occur if there is a dominant spectrum used as a reference spectrum. Therefore, repeated application of MSC generally reduces spectrum correction errors. Nevertheless, it does not mean that the more MSC processes are corrected for the better, the repetition of MSC can reduce the difference between the spectra data sets used.[32] In using SNV, spectra correction is done by calculating the mean and deviation of the sample spectrum. Unlike MSC, spectra preprocessing using SNV does not require a reference spectrum to make corrections, so the process is simpler.[33] SNV works by normalizing the spectrum due to physical differences but not chemically.[34] SNV success occurs when the uniform scatters between samples in the full spectrum. If this condition does not occur, then the spectral correction will not be optimal.[35] SavitzkyGolay is a derivative technique which has a mode for spectral smoothing.[36] The derivatization can provide a more detailed picture of the structure in the spectra to increase sensitivity. Spectra derivatization that is widely used in quantitative analysis of compounds in plants includes the 1st derivatives and the 2nd derivatives. The 1st derivative will remove baseline variation between the data, and the 2nd derivative will improve the resolution of spectra to remove any slope effect on the data.[18],[21] Although derivatization generally provides a detailed description, not all analyzes with IR spectroscopy require derivatization for preprocessing spectra. Such is the case with research conducted by Kokalj Ladan et al., who conducted the derivatization of spectra with different resolutions. From these studies, it is known that derivatization is useful for improving prediction models in spectra with a resolution of 16 cm1 but not at a resolution of 4 cm1.[10] In this regard, lower resolutions can cause loss of useful information for quantitative analysis; therefore, derivatization will help in describing the spectra in more detail.[37] However, the spectra contain more noise in higher resolution so that when derivatization is carried out, it will cause too much noise in the data and decrease the signal to noise ratio. Thus, the standard error in making the model also does not improve or even increase.[10] Variable selection The selection of spectra preprocessing techniques is very important to determine a successful calibration model formation, because it can emerge or eliminate important information connected with the content measurement. However, the successful of spectra effect preprocessing on the calibration model formation can only be known after validating the model.[38] It is important to compare several spectral preprocessing techniques and their combinations to determine a more suitable technique for the analyzed data. Preprocessing techniques are appropriate if it decreases standard errors by reducing or minimally maintaining the complexity of the model. Besides using spectral preprocessing, reducing the complexity of the model can also be done by selecting variables. From the summary of the research, it can be seen that the selection of variables using principal component analysis (PCA),[9] interval PLS (iPLS),[13] Competitive adaptive reweighted sampling (CARS),[18] or genetic algorithm (GA)[39] reduce the number of variables by eliminating irrelevant variables. PCA will change the original variables correlated into new variables that contain a combination of the original variables. This combination can reduce the number of variables, but still can explain most of the original variable information.[40],[41] CARS variable selection has three stages. The first step is a random sampling of data sets, and then a number of these variables will be reduced again using the exponential decrease function technique where the number of variables rapidly decreased. The remaining variables that produce low RMSECV will be selected as informative variables included in the calibration model.[42] This method can make the calibration model simpler and effective.[43] While GA is done by randomly selecting a subset of variables and then calibrating the PLS model to select the variable with the greatest influence. Furthermore, crossover and mutations are carried out to form new variables, and recalibration is carried out to decide which variable is the most suitable.[44],[45] In addition to the selection based on wave number points, variable selection can also be done by separating the wave numbers at certain intervals. In this article, iPLS and synergy interval (siPLS) have been used. iPLS works by dividing the spectrum into several parts with the same interval. For each subinterval, submodels will be made and the standard error will be calculated. The submodel with the lowest standard error contains important information correlated with the objectives of the analysis. At this point, it will be chosen as a subinterval of wavenumbers, which will be used to form the calibration model.[46] While siPLS is a modification of iPLS to optimize variable selection with interval combinations containing informative variables.[47] siPLS can contain more informative variables because it does not only select one spectrum region but combines several.[48] Variable selection can also be done by combining the selection of wave number points and wave number intervals such as siPLSGA. Variable selection using combination techniques can produce a simpler model with fewer variables than without a combination. However, the smaller number of variables cannot be ascertained to have the best predictive ability, because of the loss of informative data.[49],[50] Variable selection is useful for reducing noise and interfering variables in data so as to make prediction model formation more efficient with fewer variables but reliable prediction ability.[51] A successful method in completing multivariate analysis is assessed by calibration and validation parameters. R2 values that are close to 1 indicate that the predictive variables used in making the model correlate linearly with the levels of compounds. In addition, a calibration model is good if there is no large difference between the RMSEC and root mean square error P values.[17] If this happens, it will be overfitting or underfitting. Overfitting is a situation where the calibration model produces a low standard error but unable to predict new samples and vice versa. Conclusion Infrared spectroscopy is useful in the quantification of phytochemical components and adulterants in plantbased medicines and supplements. This analysis can help further study to determine the quality of nutritious plants to ensure the safety and effectiveness of plantbased products. It is found that chemometrics can overcome the complexity of the chemical content in herbs. Optimization of preprocessing techniques and selection of variables and a combination of both are generally useful in improving predictive ability of models. This is indicated by an increase in linearity value (in calibration and validation) and a decrease in the standard error in the model by the spectrum that is experiencing preprocessing, variable selection, and a combination of both compared to raw spectra. The high R2 validation value (close to 1) indicates that the measurement of compound concentrations using the FTIR method yields a proportional accuracy to the measurement results of the reference method. Financial support and sponsorship Nil. Conflicts of interest There are no conflicts of interest. References


