What it actually measures
When a soil dries out and is then rewetted, its microbes wake up fast and consume a flush of readily available carbon. The CO2 burst test captures that pulse in a jar. You take an air-dried, sieved soil, rewet it, seal it, and measure the carbon dioxide that evolves over a short incubation. Higher CO2 means a larger, more active microbial community feeding on a bigger pool of easily decomposable carbon [3]. It is worth noting that drying and rewetting is itself a disturbance: it lyses some cells and mobilizes labile substrates, so the flush reflects both standing microbial biomass and that rewetting artifact rather than a clean biomass measurement [1].
The key insight from the foundational work is *which pool* that CO2 comes from. Across the soils tested under standardized laboratory conditions, the flush of CO2 was tightly tied to the active organic pools — microbial biomass and mineralizable C and N — more tightly than to the bulk, more stable organic carbon that makes up most of soil organic matter [1]. In other words, you are not measuring total organic matter — you are measuring the small, living, active fraction that drives much of nutrient cycling, especially nitrogen [3].
That is why the test is such a useful early-warning gauge. The active pool turns over fast, so it shifts in response to management years before total soil organic carbon moves measurably. In Franzluebbers et al. 2000, under controlled laboratory incubation, the flush of CO2 explained roughly 97% of the variability in cumulative carbon mineralization, about 86% of soil microbial biomass, and about 67% of net nitrogen mineralization across the soils tested [1]. These are correlations with laboratory-incubated mineralization, not with field plant-available nitrogen — a distinction that matters, as the limitations section below makes clear (the grass-sod case). Within that lab scope, a single quick assay stands in for a much more laborious set of biological measurements.
How the lab does it (and why three labs give three answers)
The recipe is conceptually identical everywhere — dry, sieve, rewet, incubate, capture CO2 — but the details differ enough that the absolute numbers are not interchangeable between methods [3]. There are three dominant protocols in North American agriculture:
- Haney / Solvita 24-hour: In the foundational Haney et al. 2008 comparison, soil was dried at 40°C, ground to pass a 2 mm sieve, and a 40 g sample wetted to 50% water-filled pore space in a beaker; the Solvita arm read a gel paddle after 1 day at 25°C, and a parallel arm captured CO2 in a 1 M KOH alkali trap read by titration [2]. The operational Haney Soil Health Tool that producers order today wets a weighed sample and reads a Solvita gel paddle in a digital reader, reporting mg CO2-C per kg (ppm) [9][10].
- Franzluebbers 3-day flush: dried soil rewetted to roughly 50% water-filled pore space, incubated at 25°C, with the CO2 captured in an alkali (KOH) trap [1].
- Cornell CASH 4-day: 20.00 g of air-dried soil sieved to 8 mm, rewetted by capillary draw from below, sealed with a 0.5 M KOH trap, and incubated 4 days at room temperature; CO2 is quantified from the drop in trap electrical conductivity and blank-corrected [3]. Bottom-up capillary rewetting was shown to compare satisfactorily with wetting to a predetermined moisture content [3].
CO2 capture itself comes in three flavors — Solvita gel colorimetry, an alkali trap read by titration or conductivity, and infrared gas analysis (IRGA). These read the same soil consistently: in Haney et al. 2008, the day-1 alkali-titration trap (1 M KOH) was highly related to the day-1 Solvita gel reading on the same 50%-water-filled-pore-space soil, for 24-hour CO2 respiration (r² = 0.84, Fig. 7) [2], and Solvita's own validation reports a virtually 1:1 relationship to IRGA over roughly 0–125 mg/kg CO2-C, with the gel chemistry itself linear to standardized CO2 over a separate, wider window of about 0–140 ppm (translated from a 0–3% gas-phase range) [7]. Two further wrinkles make cross-method comparison invalid. First, the clocks differ: a 4-day cumulative value is structurally larger than a 24-hour value for the same soil, so they are not the same measurement read on a different timer. Second, the units differ in species: Cornell reports CO2 mass (mg CO2 per g soil) while Solvita and Haney report carbon mass (mg CO2-C per kg) — to convert, multiply CO2 mass by 12/44 (≈ 0.273) to get CO2-C, or multiply CO2-C by 44/12 (≈ 3.67) to get CO2 mass. Never compare a number from one method against a threshold built for another.
How to read the number
Because incubation length, soil mass, and reporting units differ by method, interpretation has to be method-specific — and, increasingly, texture-specific. The first table gives published research reference points from the Cornell New York dataset and the Solvita calibration; the second translates the producer-facing Haney/Solvita CO2-C numbers most farmers actually receive into qualitative bands. Methods are not directly comparable across rows or tables; use the one that matches your lab report.
Context | Mean (SD) / range | n | Unit | Source Cornell CASH 4-day, statewide (all textures) | 0.60 (0.29) | 1,750 | mg CO2 g⁻¹ (cumulative 4-day) | [6] Cornell CASH 4-day, coarse-textured group | 0.48 | — | mg CO2 g⁻¹ (cumulative 4-day) | [6] Cornell CASH 4-day, loam group | 0.59 | — | mg CO2 g⁻¹ (cumulative 4-day) | [6] Cornell CASH 4-day, silt loam group | 0.69 | — | mg CO2 g⁻¹ (cumulative 4-day) | [6] Cornell CASH 4-day, fine-textured group | 0.67 | 46 | mg CO2 g⁻¹ (cumulative 4-day) | [6] Solvita vs IRGA near-1:1 calibrated range | 0–125 (range) | — | mg/kg CO2-C | [7] Solvita gel linearity to standardized CO2 | 0–140 (range) | — | ppm CO2-C | [7]
Anchor your reading on texture. In the Cornell dataset, fine and silt-loam soils sat meaningfully above the coarse-textured group mean, which is why a single universal threshold misleads and texture-specific scoring is recommended [6]. A 0.48 reading would be about average on a coarse soil but below par on a silt loam. Weigh the fine-textured benchmark cautiously — it rests on only 46 samples versus 1,750 statewide [6].
For the Haney/Solvita 24-hour test specifically, most producers receive a single CO2-C number in ppm. Ward Laboratories' Haney interpretation guide places that number on the qualitative scale below. Treat the bands as orientation, not gospel: the guide itself stresses that the rankings are on a sliding scale and are dependent on soil type and climate region (a value of 50 means something different in arid New Mexico than in central Iowa), that readings can fall anywhere from near zero to 1,000 ppm, and that most agricultural soils — described as currently degraded — do not read above 200 ppm [12].
CO2-C (ppm) | Qualitative band 0-10 | Very Low 11-20 | Low 21-30 | Below Average 31-50 | Slightly Below Average 51-70 | Slightly Above Average 71-100 | Above Average 101-200 | High >201 | Very High
What management actually moves it
The reason agronomists care about this test is that it responds to the practices producers control, and it does so fairly quickly. The big levers are reducing disturbance, returning carbon, and diversifying the rotation. No-tillage consistently elevated the flush of CO2 in surface soil (0-10 cm), and animal manure application consistently raised it as well [11]. Conversely, intensive tillage and simple annual-grain rotations depress it: annual-grain row-crop systems showed significantly lower respiration than diversified or perennial systems [6].
Franzluebbers' multi-state survey work points to the same two practices as the most reliable elevators of the burst: it identifies no tillage and routine application of animal manures as the two consistent management approaches that lifted soil-test biological activity in the surface 10 cm, and the highest flush values came from fields that combined no-till with diverse rotations, cover crops, and routine manure inputs [11]. The picture is additive: tillage reduction and carbon return reinforce each other rather than competing. Cornell's on-farm CASH work adds a useful nuance about *which* lever matters most for this particular indicator — for the respiration measurement specifically, the primary gains came from moving off the plow to strip-till, with comparatively smaller benefit from cover cropping (whereas cover cropping and reduced tillage contribute roughly equally to indicators like aggregate stability and active carbon); this relative-magnitude framing is the source's narrative description rather than transcribed tabular values [17].
Cover crops are nonetheless a real driver, and the cleanest effect size comes from how they feed the labile pool the burst reads. In a 122-study meta-analysis, adding crops to a monoculture rotation raised total soil carbon by 3.6% and total nitrogen by 5.3%, but when the rotation *included a cover crop* total carbon rose 8.5% and total nitrogen 12.8% — the cover-crop subset roughly doubled the carbon gain [15]. Separately, rotational diversity substantially increased the fast-cycling pools that drive the rewetting flush: microbial biomass carbon rose 20.7% and microbial biomass nitrogen 26.1%, an effect the authors found was not moderated by crop type or management practice [15]. There is an apparent tension with the Cornell finding above: cover crops build the active carbon *pool* (what the burst should read over the long run), but in Cornell's on-farm data tillage reduction moved the respiration *number* faster — so for short-term burst gains, prioritize getting off the plow, and treat cover cropping as the slower-acting pool-builder [15][17]. Manure and compost amendments tend to increase the burst with loading; the assay is sensitive to such inputs, consistent with the Solvita gel system's calibration history on compost and manure-amended materials [7].
The chart below shows mean respiration by cropping system on coarse-textured soils from the Cornell New York dataset (Amsili et al. 2021, Table 7), where cropping system explained only about 11.7% of respiration variance — meaning the great majority is driven by texture, site, and sampling [6]. The differences between systems are real and directional but small, and they are not a clean monotonic gradient: pasture, a low-disturbance perennial system, sits between the vegetable and dairy-crop systems on these coarse soils. Note that this ordering is specific to coarse-textured soils; in the same dataset, pasture tends to have the highest respiration of any system on finer-textured (loam, silt-loam, and fine) soils, so the coarse-panel mid-pack position of pasture is not the general pattern [6].
Because direct burst-value comparisons for cover-crop versus no-cover or no-till versus conventional plots are scarce in the published literature, the cleanest available magnitudes come from the carbon pools the burst tracks rather than from head-to-head burst tables. The chart below makes that link concrete: across the 122-study meta-analysis, the gains in total carbon (a cover-crop contrast) and microbial-biomass carbon (a rotational-diversity contrast) index the active fraction the rewetting flush quantifies [15].
Read directionally, the pattern holds: less disturbance plus more and more-diverse carbon inputs tend to feed a bigger active microbial pool [6][11][15]. But the spread within each system is wide and only a small share of the variation is explained by cropping system, so treat these as soft tendencies, not rankings — and if you are tracking your own ground over time, this is nonetheless the indicator most likely to respond to a new cover-crop, reduced-till, or manure program before total organic carbon budges.
What it cannot tell you
This is a genuinely useful test, but it has hard limits — and most of them come from the same procedural looseness that makes it cheap and fast.
- No standard protocol, large between-lab variability. Mineralizable C / CO2 burst showed 2-fold to 20-fold greater inter-laboratory variability than other common soil tests [5]. The same sample sent to two labs can return very different numbers — and the authors of that study argue this variability compromises the indicator's usefulness, not merely its precision [5].
- Procedure changes the answer. Sieve size and the rewetting method (capillary vs top-watering vs wetting to a target moisture) significantly change measured mineralizable C [5], and the soil mass and volume used in the assay also shift the estimate [11].
- Methods are not comparable. With incubations of 24 h, 3 d, and 4 d in common use — and CO2 versus CO2-C reporting — absolute values cannot be compared across methods [3].
- Sampling timing and field conditions matter. Because the test rewets a dried soil, the result is sensitive to antecedent field moisture, sampling season, and rewetting procedure [5]. A field sampled after a dry spell, or shortly after residue incorporation, can give a very different burst independent of its underlying soil health.
- Lab burst is not field CO2 flux. The assay isolates microbial mineralization of a dried-rewetted sample; it does not measure what your field is venting (see the cover-crop warning above) [16].
- It can mis-predict available nitrogen. A soil can show high microbial respiration but low plant-available N — grass sod is the classic example — so respiration alone is a contested predictor of available N and should be interpreted alongside a nitrate test [8]. Even within the Solvita/Haney framework, CO2-C below 20 mg/kg is treated as outside the useful range for N prediction [13].
- Haney-specific caution. The Haney Soil Health Tool uses non-standard H3A extracts, so its outputs require separate calibration and caution before driving nutrient recommendations [9][10].
- Interpretive bands are region-dependent. Even published producer-facing CO2-C ranges warn that the same number means different things across climates and soil types [12], reinforcing that texture-specific, locally-calibrated scoring beats one universal threshold [6].
The bottom line
The CO2 burst is one of the most studied biological soil health indicators available to producers. Bagnall et al. 2023 selected 24-hour carbon mineralization potential — alongside soil organic carbon concentration and aggregate stability — as one of just three measurements in a recommended minimum suite of indicators for North American agriculture, on the grounds that it is responsive to management and well correlated with soil organic carbon (the paper reports the suite's indicators correlated to soil organic C at r = 0.56 to 0.91) [4]. That this minimum-suite framing coexists with the validity concerns raised by Wade et al. 2018 [5] reflects a genuine, unsettled debate in the literature rather than a closed question. Note too that strong SOC correlation cuts both ways: part of the respiration signal is redundant with SOC, and the test's distinct value lies less in revealing something SOC cannot and more in responding to management faster than total carbon does [1][2].
Use it the way it actually works: read it against a texture-matched, method-matched range [6][12], track it over time at the same lab and under comparable sampling conditions to watch your management pay off [11], and never ask it to do a job — predicting available N on its own — that it was never built to do [8][13].