Soil health testing has transformed the way growers and consultants evaluate their fields. Indicators like permanganate oxidizable carbon (POXC), water extractable organic carbon (WEOC), autoclaved citrate extractable (ACE) protein, soil respiration, aggregate stability, and enzyme activities now sit alongside traditional measurements of organic matter and pH. The problem is not that these tests lack scientific rigor; each one is backed by a growing body of validation research. The problem is cost. A comprehensive soil health panel that includes physical, chemical, and biological indicators can easily exceed $150 per sample and demand several days of laboratory turnaround. Multiply that cost by the number of management zones a producer wants to characterize, and routine monitoring becomes economically impractical for all but the most well funded operations.
Mid infrared (MIR) spectroscopy offers a fundamentally different approach. Rather than running separate wet chemistry, incubation, and physical fractionation procedures for each indicator, MIR collects a single diffuse reflectance spectrum in under a minute and uses chemometric models to predict dozens of properties simultaneously. The technique has been used in soil survey work for more than a decade, but recent expansions of national spectral libraries and advances in machine learning have pushed its predictive power into the domain of soil health assessment. This blog examines the current evidence for MIR prediction of key soil health indicators, distinguishes what the technology does well from where it still needs improvement, and considers what a spectroscopy driven workflow could mean for the future of soil health monitoring.
What MIR Actually Measures
MIR spectroscopy operates in the 4000 to 400 cm⁻¹ wavenumber range, corresponding to wavelengths between 2.5 and 25 micrometers. When infrared light strikes a soil sample, molecules absorb energy at frequencies that match their fundamental vibrational modes. These are direct, first order molecular vibrations, not the overtones and combination bands that characterize the near infrared (NIR) region. As a result, MIR absorption features are stronger, better resolved, and more chemically specific than their NIR counterparts (Soriano-Disla et al., 2014).
The practical consequence is that an MIR spectrum of a soil sample encodes information about organic functional groups (aliphatic C-H stretching near 2930 and 2850 cm⁻¹; carboxylic C=O near 1720 cm⁻¹; aromatic C=C near 1640 cm⁻¹; amide bands near 1660 and 1565 cm⁻¹), clay mineralogy (Si-O stretching between 1100 and 1000 cm⁻¹; O-H stretching near 3700 to 3200 cm⁻¹), and carbonates (diagnostic absorptions near 2515 and 1400 cm⁻¹). Because soil health properties like organic carbon content, labile carbon pools, aggregate stability, microbial biomass, and nitrogen availability are all connected to these underlying molecular and mineralogical characteristics, MIR spectra carry indirect but statistically recoverable information about a wide range of indicators.
Carbon Indicators: SOM, TOC, TIC, and Labile Pools
The strongest MIR predictions consistently fall on carbon related properties, which makes sense given that organic and inorganic carbon functional groups dominate the mid infrared spectrum. Total organic carbon (TOC) and soil organic matter (SOM, commonly estimated via loss on ignition or as a multiple of TOC) are predicted with validation R² values routinely above 0.90 across large, geographically diverse calibration sets. Seybold et al. (2019) reported Lin's concordance correlation coefficients of 0.967 to 0.996 for total carbon, organic carbon, and calcium carbonate equivalent in central U.S. Mollisols using the KSSL spectral library. Sanderman, Savage, and Dangal (2020) achieved test set R² values above 0.80 for SOC across 54,211 spectra spanning all 50 states. Ng et al. (2022) evaluated 133 soil properties using roughly 45,000 KSSL spectra and placed TOC and total carbon among their highest accuracy tier.
Total inorganic carbon (TIC), present primarily as calcium carbonate in calcareous soils, is likewise well predicted. Carbonate absorbs strongly near 2515 and 1400 cm⁻¹, giving MIR models a direct spectral target. The distinction between organic and inorganic carbon is critical for soil health interpretation; a high total carbon value in a calcareous Great Plains soil does not carry the same biological significance as the same value in an acidic Ultisol. MIR can resolve both fractions from a single scan.
Labile carbon pools present more nuanced results. POXC, the permanganate oxidizable carbon fraction widely used as an indicator of biologically active carbon, has been predicted with MIR calibration R² values of 0.77 to 0.81 and validation R² of 0.75 to 0.77 (Calderón et al., 2017). In Wisconsin soils, Brice (2023) achieved an all soils model R² of 0.76 for POXC using a handheld MIR instrument with partial least squares regression, identifying positive correlations with amide II (1565 cm⁻¹), amide I (1660 cm⁻¹), carboxylic acid (1722 cm⁻¹), and aliphatic C-H peaks (2851 and 2924 cm⁻¹). WEOC, the water extractable organic carbon pool that serves as a proxy for the readily available microbial food source, has received less direct MIR prediction work at scale. However, its strong correlation with TOC and POXC (the two are often interrelated; r² values between WEOC and SOC typically exceed 0.60) suggests that MIR models calibrated to large libraries could achieve moderate to good prediction accuracy when sufficient reference data become available.
Indicator | Validation R² Range | Typical Model Type | Notes TOC / SOC | 0.90 to 0.99 | PLSR, MBL | Consistently strong across large libraries SOM (LOI) | 0.85 to 0.95 | PLSR, MBL | High performance in national-scale datasets TIC / CaCO₃ | 0.93 to 0.99 | PLSR | Direct carbonate spectral features POXC | 0.75 to 0.81 | PLSR | Moderate-good, calibration dependent WEOC | Emerging | Mixed | Limited large-scale direct MIR studies
Organic Matter Fractions: POM and MAOM
The conceptual framework separating soil organic matter into particulate organic matter (POM) and mineral associated organic matter (MAOM) has gained substantial traction for understanding carbon persistence and cycling (Lavallee, Soong, and Cotrufo, 2020). POM represents partially decomposed plant residues with faster turnover (years to decades), while MAOM comprises organic compounds sorbed to mineral surfaces with longer residence times (decades to centuries). From a soil health perspective, both fractions matter: POM reflects recent organic matter inputs and is responsive to management changes, while MAOM represents the more stable carbon reservoir linked to long term soil structure and nutrient retention.
MIR spectroscopy has shown divergent prediction performance for these two pools. Ramirez et al. (2021) used MIR with partial least squares regression on 349 European topsoil samples and achieved good predictions for MAOM carbon (R² = 0.79, RPD = 2.17 using the 2000 to 400 cm⁻¹ region) and total SOM carbon (R² = 0.80, RPD = 2.23). POM carbon, however, was predicted with notably lower accuracy (R² = 0.61, RPD = 1.60), with models tending to underestimate samples above 20 g C per kg soil. POM nitrogen showed still weaker performance (R² = 0.47). The explanation lies in spectral chemistry: MAOM is intimately associated with the clay mineral matrix that dominates MIR absorptions, while POM exists as discrete particles whose spectral signatures are diluted by the bulk mineral background. Land use specific calibrations appear to improve POM prediction, suggesting that the relationship between POM and the broader MIR spectral features varies with vegetation inputs and decomposition pathways.
Biological Indicators: Respiration, ACE Protein, and Enzymes
The biological indicators on a standard soil health panel are the most expensive and time consuming to measure by conventional methods, and they are also where MIR faces its greatest predictive challenges. Soil respiration (typically a 24 hour or 4 day CO₂ burst incubation) is an integrative measure of microbial metabolic activity that reflects labile carbon availability and microbial community vitality. ACE protein quantifies the organically bound nitrogen pool accessible to microorganisms and serves as a proxy for nitrogen mineralization potential. Enzyme activities, including beta glucosidase (cellulose degradation), N acetyl beta glucosaminidase (chitin degradation), and phosphomonoesterase (phosphorus cycling), provide functional snapshots of microbial nutrient processing capacity.
MIR does not detect these biological properties directly. There is no spectral absorption band for CO₂ burst or beta glucosidase activity per se. Instead, MIR models exploit correlations between biological activity and the organic and mineral soil matrix that does absorb in the mid infrared. Sanderman, Savage, and Dangal (2020) found that aggregate stability, a property closely linked to both biological activity and organic matter content, could be predicted with R² above 0.70 when using memory based learning (MBL) algorithms on the KSSL library. Plant available phosphorus and potassium were also in this moderate accuracy range.
A recent continental scale study by Shu, Price, Lynch, Burton, and Heung (2025) using 829 samples from Nova Scotia agricultural soils demonstrated strong MIR predictive performance (RPIQ 2.2 to 3.7) for water stable aggregates alongside TOC, total nitrogen, and pH. Beckstrom, Crow, and Deenik (2025) working with Hawaiian soils showed that MIR spectroscopy coupled with machine learning could predict multiple soil health indicators including mineral class and land use history, noting that biological indicators dependent on correlations with MIR responsive organic matter components showed the highest accuracy when calibration sets were regionally tuned.
Enzyme activities represent perhaps the most indirect relationship. Ludwig et al. (2017) reported moderate vis-NIRS prediction of beta glucosidase and other enzyme activities in forest soils, with R² values varying across sites and soil types. Rasche et al. (2013) achieved MIR based R² values of 0.78 for microbial biomass carbon across temperate grassland soils using midDRIFTS-PLSR. A 2025 study using MIR and PLSR across diverse continental U.S. ecosystems reported the strongest estimations for microbial respiration, followed by microbial biomass nitrogen, beta glucosidase activity, and microbial biomass carbon, with microbial properties positively correlated to spectral regions associated with aliphatic C-H groups and C=O stretches of polysaccharides (Ghimire et al., 2025). For ACE protein specifically, no large scale MIR calibration study has yet been published, but the strong correlations between ACE protein and TOC (R² adjusted = 0.48), total nitrogen (0.47), and POXC (0.45) reported by Amsili, van Es, and Schindelbeck (2025) suggest that MIR predictions of ACE protein through these covarying properties are feasible if sufficient reference data are collected.
Indicator | Validation R² Range | Interpretation | Notes Aggregate stability | 0.70 to 0.85 | Moderate to strong | Often improves with MBL and regional libraries Soil respiration (CO₂ burst) | 0.55 to 0.75 | Moderate | Indirect and context dependent ACE protein | Emerging | Developing | Likely via covariance with C and N pools Beta glucosidase | 0.65 to 0.82 | Moderate | Best results in regional calibrations Microbial biomass C | 0.55 to 0.78 | Moderate | Higher in homogeneous calibration sets WEON | Emerging | Developing | Limited direct calibration data
Why MIR Works Better for Some Indicators Than Others
The pattern of prediction accuracy across these indicators follows a logical hierarchy rooted in spectral chemistry. Properties that are directly encoded in the MIR spectrum through fundamental molecular absorptions, such as TOC, SOM, TIC, clay mineralogy, and CEC (which is controlled by clay type and organic matter content), are predicted with excellent accuracy. Properties that are strongly correlated with these direct spectral targets but are not themselves absorbing in the MIR, such as POXC, aggregate stability, and total nitrogen, achieve good to moderate accuracy through statistical association. Properties that reflect transient biological processes, such as respiration rate, enzyme activities, and short term nitrogen mineralization, occupy the lowest tier because their relationship to the stable mineral and organic matrix is looser and more context dependent.
This hierarchy has important implications. MIR spectroscopy is not replacing every wet chemistry and incubation procedure in a soil health lab. What it can do is deliver the core chemical and physical indicators at a fraction of the cost and time, while providing increasingly useful estimates of biological properties that improve with regional calibration and library expansion. The USDA NRCS has recognized this potential. The KSSL spectral library now contains more than 85,000 spectra, and as of 2024, 24 NRCS offices (21 MLRA soil survey offices and 3 state or area offices) have deployed MIR instruments with plans for further expansion (NRCS, 2024). The University of Wisconsin-Madison has built a USDA NRCS funded web portal (soilmir.wisc.edu) for automated MIR predictions using national spectral libraries (Zhang, Hartemink, and Huang, 2021). The Open Soil Spectral Library, a USDA NIFA funded initiative, is working to harmonize spectral data across instruments and institutions for global calibration transfer.
What This Means for Soil Health Monitoring
The vision emerging from this body of work is not a binary replacement of conventional testing but a tiered analytical strategy. MIR serves as a rapid, inexpensive first pass that delivers strong predictions for TOC, SOM, TIC, CEC, pH, clay content, and texture from a single scan. It provides useful estimates of POXC, aggregate stability, and bulk density. And it increasingly offers moderate estimates of biologically derived indicators like respiration and enzyme activities, with accuracy improving as regional calibration libraries grow and machine learning algorithms mature. Where MIR predictions flag unusual patterns or where biological indicators require higher precision, conventional methods can be targeted to a subset of samples rather than run across an entire sampling campaign.
For the practical soil health community, this means more data points per dollar, faster turnaround, and the possibility of routine temporal monitoring that has been economically out of reach. Imagine a carbon market verification protocol where MIR scans at multiple depths across hundreds of sampling points produce SOC estimates in days rather than weeks. Imagine a soil health benchmark database where every submitted sample automatically generates spectral predictions for a dozen indicators, with conventional confirmation on a subset. These are not speculative scenarios; the infrastructure is being built now.
The soil health movement has spent the past decade defining what to measure. The next decade may be defined by how we measure it, and a beam of mid infrared light is positioned to be a central part of that answer.