Author: Richard Johnston

  • A modest proposal for a clinical spirometry grading system

    A while back I reviewed the spirometry grading system that was included in the 2017 ATS reporting standards. My feeling was, and continues to be, that its usefulness is very limited because it’s mostly a reproducibility grading system that relies on a few easy-to-measure parameters. This doesn’t mean that a grading system can’t be helpful, just that it needs to be focused differently.

    In a clinical PFT lab many patients have difficulty performing adequate and reproducible spirometry, but that doesn’t mean the results aren’t clinically useful. Moreover, suboptimal quality results may be the very best the patient is ever able to produce. So what’s more important in a grading system than reproducibility is the ability to assess the clinical utility of a reported spirometry effort.

    The two most important results that come from spirometry are the FEV1 and the FVC, and I strongly believe that they need to be assessed separately. For each of these values there are two aspects that need to be determined. First, is there a reliable probability that the reported value is correct? Second, are any errors causing the reported value to be underestimated or overestimated? The two are inter-related since a value with excellent reliability is not going to have any significant errors, but if there are errors then a reviewer needs to know which direction the result is being biased.

    The current ATS/ERS standards contain specific thresholds for certain spirometry values such as expiratory time and back-extrapolation. Although these are certainly indications of test quality they are almost always used in a binary [pass | fail] manner. In order to assess clinical usefulness however, you instead need to grade these on a scale. For example an expiratory time of 5.9 seconds for spirometry from a 60 year-old individual would mean that there is a small probability that the FVC is underestimated, but with an expiratory time of 1.9 seconds the FVC would have a very high probability of being underestimated and this needs to be recognized in order to assess clinical utility.

    Note: Although the A-B-C-D-F grading system is rather prosaic it is still universally understandable, so I will use it for grading reliability. An A grade or an F grade are probably easy to assign but differentiating between B-C-D may be more subjective, particularly since reliability depends on multiple parameters and judging their relative contribution is always going to be subjective at some point. For bias, I will be using directional characters (↑↓) to show the direction of the bias (i.e. positive or negative), so ↑ will indicate probable overestimation, ↓ will indicate probable underestimation, and ~ indicates a neutral bias.

    FEV1 / Back extrapolation:

    Back-extrapolation is a way to assess the quality of the start of a spirometry effort and the accuracy of the timing of the FEV1. The ATS/ERS statement says that the back-extrapolated volume must be less that 5% of the FVC or less than 0.150 L, whichever is greater.

    My experience is that an elevated back-extrapolation tends to cause FEV1 to be overestimated far more often than underestimated. So a suggested grading system for back-extrapolation would be (and I’ll be the first to admit these are off the top of my head and open for discussion):

    FEV1:    
    Back-Extrapolation: Reliability: Bias:
    Within standards: A ~
    > 1 x standard, < 1.5 x standard: B
    > 1.5 x standard, < 2 x standard C ↑↑
    > 2 x standard, < 2.5 x standard: D ↑↑↑
    > 2.5 x standard F ↑↑↑↑

    FEV1 / Pause:

    Any pauses that occur due to cough or glottal closure during the first second of exhalation will cause the FEV1 to be underestimated. The time at which these occur and their duration will determine how much the FEV1 will be affected.

    What matters is the duration of the pause within the FEV1. Any part of the pause that occurs after the FEV1 may possibly affect the FVC, but not the FEV1. A possible grading system would be:

    FEV1:    
    Pause Duration: Reliability: Bias:
    No pause A ~
    >0, <0.1 second B
    >0.1 second, <0.15 second C ↓↓
    >0.15 second, <0.2 second D ↓↓↓
    >0.2 second F ↓↓↓↓

    FEV1 / Peak flow contour

    This part of the grading gets into subjective territory. Although the ATS/ERS spirometry standard does not consider the Peak Expiratory Flow (PEF) to be a criteria when selecting spirometry efforts it does say that a good spirometry effort should show maximal patient effort. I think that PEF should be a selection criteria for FEV1 because submaximal spirometry efforts, as shown be a lower PEF, often have an elevated FEV1:

    So the FEV1 from the effort with the highest PEF should be reported (even when it isn’t the highest reported FEV1), but that’s only looking at reported PEF value, not the actual quality of the PEF effort. For that, we generally have look at how “pointy” (there’s got to a better was to describe this) the PEF is on the flow-volume loop.

    In general, the “sharper” the PEF contour, the more likely the PEF effort was good. The more “blunted” the PEF contour (which should not be mistaken for the expiratory plateaus of intrathoracic large airway obstructions or the typical flattened contour of tracheomalacia) the more likely the PEF effort was submaximal and the more likely that the FEV1 is overestimated.

    And the suggested grading would be:

    FEV1:    
    PEF Contour: Reliability: Bias:
    Sharp A ~
    Mildly blunted B
    Moderately Blunted C ↑↑
    Severely blunted F ↑↑↑

    FVC / Expiratory time:

    The ATS/ERS spirometry standard recommends a minimum expiratory time of 6 seconds for adults but this fails to acknowledge that expiratory time necessary to obtain a reliable FVC is often lower in young adults and higher in the elderly. Nor does it take into consideration the fact that expiratory time increases as airway obstruction increases.

    For these reasons, expiratory time needs to be assessed by two different criteria, age and degree of airway obstruction.

    FVC / Expiratory time / Age:

    I’d like to suggest that an adequate expiratory time should be be 4 seconds for a 20 year old and 8 seconds for an 80 year old (totally arbitrary of course but hopefully reasonably correct). Because exhaled volume closely follows an exponential curve, an expiratory time that’s low by 2 seconds has a proportionally greater effect on FVC than does an expiratory time that’s low by 1 second. For this reason, grading the reliability of expiratory time should look something like this:

    There is a pretty direct relationship between the reliability of the expiratory time and the bias:

    FVC:  
    Expiratory Time – Age: Bias:
    A ~
    B
    C ↓↓
    D ↓↓↓
    F ↓↓↓↓

    Note: Expiratory time is usually determined by the point at which the patient starts to inhale after their maximal exhalation or when the technician manually terminates the test. The reported expiratory time will be therefore be overestimated when there are expiratory pauses or when the patient stops exhaling but the test system does not immediately register that this has occurred. Whenever possible the expiratory time used in grading reliability and bias should be adjusted for pauses and early termination of exhalation.

    FVC / Expiratory time / Airway obstruction

    The presence of airway obstruction is assessed using the LLN of the FEV1/VC ratio and its severity is assessed using the percent predicted FEV1. There is likely a curvilinear relationship between the severity of airway obstruction and the amount of extra expiratory time that’s required for a reliable FVC.

    In one sense this curvilinearity quickly produces excessive FVC expiratory times that aren’t clinically or physiologically realistic (i.e. more than 12-15 seconds) and under no circumstances should we expect our patients to exhale that long. At the same time however, does anybody expect that an FVC that’s 50% of predicted in a patient with an FEV1 of 25% of predicted and a 12 second expiratory time is the patient’s “real” FVC?

    The degree of this curvilinearity is only speculative however, but expiratory time should be adjusted for obstruction in some way. Off the top of my head I’d suggest that anybody with mild airway obstruction needs an additional 25% in expiratory time for a reliable FVC and that anybody with very severe airway obstruction would need 3 times their expected expiratory time for a reliable FVC and in-between:

    FVC: Expiratory time factor:
    Mild OVD 1.25
    Moderate OVD 1.50
    Severe OVD 2.00
    Very Severe OVD 3.00

    I further suggest that the age-adjusted expected expiratory time should be multiplied by the appropriate factor and then scored by the percent of the actual expiratory time:

    FVC:  
    Expiratory Time – OVD Reliability
    > 90% A
    >75%, <90% B
    >60%, <75% C
    >50%, <60% D
    <50% F

    FVC / Terminal expiratory flow rate:

    The current ATS/ERS standard for an adequate terminal expiratory flow rate is 0.025 L/sec (although it’s actually expressed as a volume change of 0.025 L over 1 second and not as an actual flow rate). The problem is that an FVC with a terminal expiratory flow that is only slightly over this value still has a reasonable probability of being correct. It’s when the terminal flowrate is high that it’s clear the probability the FVC is being underestimated is also high.

    However, there are no test systems that I know of that report the terminal expiratory flowrate (why not?), so until they do this has to be judged by eye.

    And terminal expiratory flow rate should be graded as:

    FVC:    
    Terminal Flowrate: Reliability: Bias:
    Within standard A ~
    Mild B
    Moderate C ↓↓
    Severe F ↓↓↓

    FVC / Gas Trapping:

    A spirometry effort may meet all the ATS/ERS criteria but inspection of the flow-volume loop sometimes shows that the exhaled volume is lower than the inhaled volume. This is a sign of gas trapping and can happen in individuals with severe airway obstruction. Unfortunately, there are no test systems that measure the volume of the initial inhalation (and again, why not?) so this must be detected by eye.

    If the difference in inspiratory and expiratory volumes can be measured the expiratory volume should be compared to the inspiratory as a percent and could be graded accordingly:

    FVC:    
    Gas Trapping: Reliability: Bias:
    Exhaled volume ≥ Inhaled Volume A ~
    Exhaled volume > 95% Inhaled Volume B
    Exhaled volume > 85% & < 95% Inhaled Volume C ↓↓
    Exhaled volume > 75% & < 85% Inhaled Volume D ↓↓↓
    Exhaled volume < 75% Inhaled Volume F ↓↓↓↓

    FVC / Inadequate Inhalation:

    All of the ATS/ERS criteria that apply to the FVC are concerned with an inadequate exhalation and there are no criteria that address an inadequate inhalation. This is mostly because detecting an inadequate inhalation is quite difficult. Although there are several signs that are suspicious for this problem the only circumstance in which this clearly shows is when a maximal inhalation is performed after the maximal expiratory maneuver and the final inhalation has a larger volume than the initial one. Many test systems will measure this final maximal inhalation as the FIVC, although this value is not often reported.

    When the difference in the initial inspiratory volume (as shown by the FVC) can be compared to the final inspiratory volume (FIVC) as a percent and this could be graded accordingly:

    FVC:    
    Inadequate Inhalation: Reliability: Bias:
    FVC ≥ FIVC A ~
    FVC > 95% FIVC B
    FVC > 85% & < 95% FIVC C ↓↓
    FVC > 75% & < 85% FIVC D ↓↓↓
    FVC < 75% FIVC F ↓↓↓↓

    FVC / Zero offset error:

    This is primarily an equipment error that is uncommon but still occasionally happens (twice last week in my lab on two different test systems). It can also be caused by transtracheal O2.

    It can be difficult to detect, particularly when it’s a negative offset, but when it is detected there is no way to be sure how much or how little the FVC has been overestimated. This error gets an automatic F score with a ↑↑↑ for bias when it’s a positive zero offset and ↓↓↓ when it’s a negative zero offset.

    FEV1 and FVC scoring:

    FEV1 and FVC will each be affected most by the lowest reliability score. So when the individual scores are combined:

    A+A = A

    A+B = B

    A+C = C

    A+D = D

    A+F = F

    There should also be an additive effect, so:

    B+B = C

    C+C = D

    D+D = F

    Bias is likely additive. Opposite biases will cancel each other out to some extent, but probably not ever exactly. For this reason, when opposing biases are added, they should be replaced with ↕ to indicate that the resultant bias is uncertain but may be neutral. For example:

    Overall FEV1 bias: ↓↓↑

    Would be reported as:

    Overall FEV1 bias: ↕↓

    FEV1/FVC:

    When the FVC or FEV1 are under- or over-estimated this will affect the reliability and the bias of the reported FEV1/FVC ratio. The reliability of the FEV1/FVC ratio should equal the lowest overall reliability score for the FVC and the FEV1. For example:

    Overall FEV1 reliability: C

    Overall FVC reliability: B

    FEV1/FVC reliability: C

    The direction of the bias in FEV1 and FVC have opposite effects on the FEV1/FVC ratio. A negative (↓) bias in FEV1 will have a negative bias (↓) in the FEV1/FVC ratio. A negative bias in FVC (↓) on the other hand will have a positive bias (↑) in the FEV1/FVC ratio. For this reason when estimating the total bias acting on the FEV1/FVC ratio it is probably easiest to flip the direction of the FVC bias and add it to the FEV1 bias.

    FVC and FEV1 biases can oppose or reinforce each other. Opposite-acting biases will probably never cancel each other out exactly but will leave an uncertainty regarding the actual bias of the FEV1/FVC ratio. For this reason I’d again suggest that when two biases oppose each other they are replaced with an indication of uncertainty: ↕. So, for example, after the FVC biases have been flipped:

    FEV1 bias: ↓↓

    FVC bias: ↑↑

    and would be reported as:

    FEV1/FVC ratio bias: ↕↕

    or:

    FEV1 bias: ↓↓↓

    FVC bias: ↑

    Would be reported as:

    FEV1/FVC ratio bias: ↕↓↓

    And the overall reporting of reliability and bias for all these parameters could look something like this:

    FEV1:   Reliability: Bias:
      Overall: C ↓↓
      Back Extrapolation: A ~
      Pause: C ↓↓
      PEF Contour: A ~
    FVC:   Reliability: Bias:
      Overall: C ↓↓
      Expiratory Time – age: B
      Expiratory Time – OVD: A ~
      Terminal Flow: B
      Gas Trapping: A ~
      Inadequate Inhalation: A ~
      Zero Offset: A ~
        Reliability: Bias:
    FEV1/FVC: Overall: C ↕↕

    My point in suggesting this grading system is that spirometry results are often less than perfect. Some patients (10%? 15%?, 20%?) are completely unable to give any kind of a reproducible effort but that doesn’t mean that the reported effort isn’t clinically relevant. The clinical utility of FVC and FEV1 are difficult, if not impossible, to judge using the current [pass | fail] approach to grading results. Even more importantly, the reliability and bias of the reported FEV1 and FVC need to be addressed separately rather than combined in a single score.

    Reliability and bias scores would help reviewers to assess the clinical utility of the reported results and this system attempts to address this. Most of the values I’ve suggested for assessing test quality are fairly arbitrary but I wouldn’t have suggested them if I didn’t think they were reasonably accurate.

    There’s no particular reason that most, if not all, of this suggested grading system could be implemented in software and so there’s some potential for producing reliability and bias scores automatically. Most manufacturers are reluctant to add features like this however, unless they are recommended or mandated by the ATS and ERS. As much I may think this is the direction that a clinically-oriented grading system should go, I’m well aware that until it gains approval by the ATS or ERS this type of system would have implemented manually and that means it’s unlikely to be adopted. Nevertheless I still hope to at least generate some ideas and conversation on this subject.

    Finally though, I’ve begun to wonder if the basic premise of getting both the FEV1 and the FVC from the same test maneuver is really correct. The standard spirometry maneuver is good for getting the best FEV1 but often so-so in getting the best VC. An SVC maneuver on the other hand, is good for getting the best VC, but very poor in getting the best FEV1. Is it time that we re-thought routine spirometry and obtained the FEV1 and VC from different maneuvers rather than just the one?  But I’ll save discussion of this topic for another time.

    References:

    Brusasco V, Crapo R, Viegi G. ATS/ERS task force: Standardisation of lung function testing. Standardisation of spirometry. Eur Respir J 2005; 26: 319-338.

    Culver BH, Graham BL, Coates AL et al. Recommendations for a standardized pulmonary function report. Am J Respir Crit Care Med 2017; 196(11): 1463-1472.

    Creative Commons License
    PFT Blog by Richard Johnston is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License

  • Infection Control

    The issue of infection control has been a topic of a couple of discussions I’ve had lately. In particular, it was reported to me that a PFT lab had come under fire from a Joint Commission inspector who did not believe that filter mouthpieces were adequate and that “patient valves and circuits need to be sterilized between each patient”.

    Unfortunately with all the other things we have to worry about it’s all too easy to become blasé about infection control. This despite the fact that every hospital I’ve visited in the last dozen or so years has posted numerous signs about hand washing and the safe disposal of contaminated supplies. But maybe it’s because we’re inundated with reminders that we’ve developed a blind spot about it.

    The 2005 ATS/ERS statement on general considerations has two pages devoted to infection control (pages 155-157). The ATS procedure manual also has four pages devoted to infection control (pages 34-38), although much of this is devoted to a discussion of tuberculosis, cystic fibrosis and sterilization procedures. Of necessity, the ATS/ERS statement and ATS procedure manual discuss infection control in generalities and any given lab will need to have a policy tailored for their specific circumstances. Even so, either or both of these (as well as Kendrick et al’s 2003 review) should be the basis for your lab’s policy on infection control (and you do have one, don’t you?).

    So what are the issues?

    Diseases can be transmitted by direct contact (saliva) or indirect contact (airborne particles). PFT Labs need to prevent cross-transmission of diseases by the use of barrier devices (gloves, filter mouthpieces) and proper cleaning procedures.

    So yeah, it’s as simple as that, but as usual the devil is in the details and in particular there are trade-offs between expense, time and efficacy.

    Starting at the beginning, so to speak, of any pulmonary function test is the mouthpiece. Given that it is in direct contact with the patient’s saliva this item has the highest likelihood of transmitting diseases. A number of articles have suggested that when spirometry is performed as an expiratory-only maneuver, a cardboard mouthpiece is adequate.

    Note: I am not a fan of expiratory-only maneuvers partly because coordinating both the maximal inspiration and inserting the mouthpiece at the beginning of the maneuver can be difficult for many patients (although there are some mouthpieces containing one-way valves that get around this); partly because despite instructions to do otherwise patients will still inhale from the spirometer; but most importantly I think that the pre-maneuver tidal breathing and a maximal inspiration after the maximal exhalation provide clinically important information.

    A more effective approach would seem to be to use disposable filter mouthpieces but interestingly neither the ATS/ERS statement or ATS procedure manual mandates their use. There are appears to be at least two reasons for this. First, at least a couple studies have shown that filter mouthpieces can cause reductions in FEV1 and PEF. These reductions however, tend to be small and the general consensus has been that when low resistance mouthpiece filters are used these reductions are not clinically significant.

    Second, the actual efficacy of mouthpiece filters remains unclear. Filter efficiency is usually measured at relatively low flow rates (usually 0.5 to 1.0 L/sec) and it’s unknown how well this translates to the flow rates routinely encountered during spirometry. On the plus side, at least one study using a mechanical model has shown that the possibility of bacterial cross-infection was zero when a filter mouthpiece was used. Another study however, that cultured swabs from actual mouthpieces showed that 2 out 155 mouthpieces had bacterial contamination on the distal (machine) side. The same study though, showed that 33 out of 155 mouthpieces had bacterial contamination on the proximal (patient) side which at least indicates that filter mouthpieces certainly reduces the potential contamination of the equipment.

    Another study however, showed that a filter mouthpiece reduced downstream contamination but did not eliminate it. The authors did note however, that the mouthpiece filter chosen for the study had a lower efficiency rating than others in the marketplace. Interestingly though, the same study showed that a pneumotachograph also provided an additional filtering effect and when the two were in-line with each other that the downstream equipment was contaminated with measurable bacteria in only 1 out of 40 trials.

    So a mouthpiece filter probably reduces the amount of equipment contamination but as both the ATS/ERS statement and ATS procedure manual state: “use of in-line filters does not eliminate the need for regular cleaning and decontamination of lung function equipment”. Unfortunately though, they do not indicate when and how often this should be done. Presumably this should be according to the manufacturer’s specifications but as an example, the manual for my lab’s equipment states:

    “Under normal use, it is acceptable to clean the pneumotach at periodic intervals according to guidelines set by the hospital or laboratory’s infection control committee.”

    Which pretty much leaves it up to us. I will note that more than one author has stated when mouthpiece filters are used tubing should be cleaned whenever there is a noticeable amount of moisture inside (and this presumably includes the manifold and valves) and this seems to be at least minimally adequate. A more stringent approach however, would be to do this daily but regardless some kind of regular cleaning schedule should be determined and adhered to.

    But is it possible to sterilize mouthpieces and tubing, and to clean the test equipment between each patient visit and thereby bypass the need for disposable mouthpieces? Theoretically yes, and this has been a routine recommendation in the past. As an example, from Clausen’s 1982 textbook:

    “All mouthpieces, tubing, valves and connectors from the patient to the measuring device should be disassembled, cleaned with detergent and water, thoroughly rinsed, and dried following each use.”

    But this recommendation comes from a time when mechanical volume-displacement spirometers (both water seal and bellows) were the most commonly used type of equipment and it’s unclear how well this approach would work with the current crop of (more delicate?) flow-based test systems.

    A cost analysis comparing the use of disposable filter mouthpieces versus cleaning after every patient showed that the cost for the time of the staff member cleaning the equipment and routine testing supplies was 5 times higher than the cost of disposable supplies (mouthpiece and nose-clip). This analysis did not take into consideration the cost of the extra inventory of non-disposable supplies that would need to be kept on hand in order to keep up with patient testing, nor did it explore the costs of automated cleaning equipment.

    Note: There are a variety of cleaning and sterilizing systems intended for departmental-level use, and these range from systems using water and detergent, gas sterilization (ethylene oxide), steam sterilization and cold liquid sterilization (glutaraldehyde). All of these require a certain level of knowledge and experience to use and there are safety issues with some of them. In addition, there are significant up-front costs in acquiring them and for the supplies needed to operate them. Any lab that routinely requires sterilization of parts and testing supples should have this done in their hospital’s central supply department if for no other reason than because of their economies of scale.

    In addition, whether or not this approach actually protects the patient any better than the use of a disposable mouthpiece filter has never been demonstrated. On the one hand a study showed that any aerosols from exhaled air were deposited on internal tubing within 5 minutes and that once deposited did not tend to become re-suspended. In addition, it was shown that five full flushes of a volume spirometer were sufficient to clear test systems of any suspended aerosols. On the other hand, studies from several decades ago showed that parts of the testing systems that were not amenable to routine cleaning (i.e. water seal spirometers) became contaminated relatively rapidly. It was never shown that this internal contamination had the potential to actually reach the patient, but in one reported case an individual cleaning a test system appeared to has acquired tuberculosis as a result of contact from a contaminated spirometer.

    One completely novel and stringent approach to preventing cross-infection was proposed a while ago. Specifically, a bag-in-a-box system was developed using a disposable plastic bag and the authors were able to show that there was no significant effect on FVC and FEV1. Although an interesting idea, it never took hold.

    from Merchant J, Bush A. Prevention of cross-infection during outpatient spirometry. Arch Dis Child 1995; 72: page 156.

    My personal recommendation is that disposable filter mouthpieces should be used with all patients and then disposed afterwards. In addition, when a flanged mouthpiece is needed, it too should be disposed of after use. This is within the ATS/ERS statement and ATS procedure manual guidelines and should significantly decrease (but not eliminate) the frequency that tubing, valves and manifolds will need to be cleaned. In addition, a filter mouthpiece provides the patient with some protection from any test system contamination.

    So far, this has all been pretty much about mouthpieces. There are numerous other topics that apply to infection control in the PFT lab but many of these must be determined by individual labs based on their equipment, staffing and budget. The most important of these includes:

    Handwashing: Staff must always wash their hands both before and after each patient session. The use of gloves during testing should also be considered.

    Testing environment: Anything a patient is liable to touch should be cleaned with a germicidal wipe before the beginning of a testing session. This includes the patient chair and any part of the test system they will hold on to.

    Test systems: Tubing, valves and manifolds should be cleaned regularly. Cleaning frequency will be determined by manufacturer instructions and visible indications, but should always be done on a regular basis.

    Test supplies:  All supplies needed for testing should be disposable whenever possible.

    Patients: Each lab needs a policy addressing the testing of patients with known or suspected tuberculosis (and other communicable diseases), and for immunocompromised patients.

    The actual level of risk of cross-infection from pulmonary function testing remains unclear and consists primarily of circumstantial, indirect and anecdotal evidence. The ATS/ERS statement indicates that the the risk of any cross-infection is low for any individual with a “competent immune system”. The ATS/ERS statement also states that there is “no direct evidence that routine pulmonary function testing poses an increased risk to immunocompromised patients.”

    The problem is that it isn’t really possible to knowledgeably assign any level of risk to pulmonary function testing and these statements are guesses. Hospital-acquired infections are relatively common however. The CDC estimates that in 2011 (a year for which some of the best statistics are available) 721,800 individuals acquired an infection during a hospital stay (157,500 with pneumonia), and that 75,000 individuals with hospital-acquired infections died. Cross-infection from pulmonary function testing has to be responsible for at least some fraction of these.

    Whether or not it was a single Joint Commission inspector being overbearing on a subject they weren’t knowledgeable about, PFT lab infection control is an issue that could easily be adopted by Joint Commission inspectors and become part of routine inspections. Although this is a reason to develop an infection control policy more pertinently it’s our responsibility to keep pulmonary function testing safe for all of our patients. For this reason alone we all need an infection control policy, and just as importantly, we need to follow it.

    References:

    Bracci M, Strafella E, Croce N, Staffolani S, Carducci A, Verani M, Valentino M, Santarelli L. Risk of bacterial cross infection with inspiration through flow-based spirometers. Am J Infect Control 2011; 39: 50-55.

    Brusasco V, Crapo R, Viegi G. ATS/ERS Task force: standardisation of lung function testing. General consideration for lung function testing. Eur Respir J 2005; 26: 153-161.

    Burgos F, Torres A, Gonzalez J, Puig de la Bellacasa J, Rodriguez-Roisin R, Roca J. Bacterial colonization as a potential source of nosocomial infections in two types of spirometers. Eur Respir J 1996; 9: 2612-2617.

    Clausen JL. Pulmonary Function Testing Guidelines and controversies. Equipment, methods and normal values. Chapter 2, Gold PM, Schwesinger DW. Pulmonary laboratory infection control and safety. Published by Grune & Stratton, 1982.

    Hancock KL, Schermer TR, Holton C, Crockett AJ. Microbiological contamination of spirometers. Australian Family Physician 2001; 41: 63-66.

    Hiebert, T, Miles J, Okeson GC. Contaminated aerosol recovery from pulmonary function testing equipment. Am J Resp Crit Care Med 1999; 159(3): 610-612.

    Johns DP, Ingram C, Booth H, Williams TJ, Walters EH. Effect of a microaerosol barrier filter on the measurement of lung function. Chest 1995; 107: 1045-1048.

    Jones AM, Govan JRW, Doherty CJ, Dodd ME, Isalska BJ, Stanbridge TN, Webb AK. Identification of airborne dissemination of epidemic multiresistant stratins of Pseudomonas auruginosa at a CF centre during a cross infection outbreak. Thorax 2003; 58: 525-527.

    Kamps AWA, Vermeer K, Roorda RJ, Brand PLP. Effect of bacterial filters on spirometry measurements. Arch Dis Child 2001; 85: 346-347.

    Kendrick AH, Johns DP, Leeming JP. Infection control of lung function equipment: a practical approach. Resp Med 2003; 97: 1163-1179.

    Merchant J, Bush A. Prevention of cross-infection during outpatient spirometry. Arch Dis Child 1995; 72: 156-158.

    Normand H, Normand F, Le Coutour X, Metges M-A, Mouadil A. Clinical evaluation of a screen pneumotachograph as an in-line filter. Eur Respir J 2007; 30: 358-363.

    Rasam SA, Apte KK, Salvi SS. Infection control in the pulmonary function test laboratory. Lung India 2015; 32: 359-366.

    Side EA, Harrington G, Walters EH, Johns DP. A cost-analysis of two approaches to infection control in a lung function laboratory. Aust NZ J Med 1999; 29: 9-14.

    Unstead M, Stearn MD, Cramer D, Chadwick MV, Wilson R. An audit of the efficacy of single use bacterial/viral filters for the prevention of equipment contamination during lung function assessment. Resp Med 2006; 100: 946-950.

    Wanger, J. ATS Pulmonary Function Laboratory management and procedure manual, Third edition. Published 2016 by the American Thoracic Society.

    Creative Commons License
    PFT Blog by Richard Johnston is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License

  • Is gas trapping more common than we think it is?

    Over the last couple of years I’ve run across a number of test systems that do not include tidal loops along with the maximal flow-volume loop. I’ve wondered why this was done and because of this I’ve thought a lot about tidal flow-volume loops and what additional information, if any, they add to spirometry interpretation.

    One of my thoughts has been about the relationship between obesity and the IC and ERV. FVC and TLC are often reasonably preserved even with relatively severe obesity. FRC, on the other hand, is often noticeably affected with even minor changes in BMI (and interestingly this applies to reduced as well as elevated BMI’s). When FRC decreases because of obesity the IC usually increases and the ERV decreases and for this reason the IC/ERV ratio has been suggested as a way to monitor changes in FRC without having to actually measure lung volumes.

    IC and ERV are not measured as part of spirometry but the position of the tidal loops gives at least a general indication of their magnitude and I’ve noticed that there’s a moderately good correlation between BMI and the position of the tidal loop.

    With this in mind, I see up to a dozen reports a week with restrictive-looking spirometry (i.e. symmetrically reduced FVC and FEV1 with a normal FEV1/FVC ratio) on patients with a diagnosis of asthma. This is nothing new and there have probably been at least 10 articles in the last decade about the Restrictive Spirometry Pattern (RSP). Interpreting these kinds of spirometry results is always problematic, particularly when there are no prior lung volume measurements to rule-in or rule-out restriction. I’ve noticed however, that patients with a restrictive spirometry pattern almost always have the tidal loop on the far right-hand side of the flow-volume loop (zero or near zero ERV). For example:

    Observed: %Predicted:
    FVC: 1.65 74
    FEV1: 1.21 73
    FEV1/FVC: 73 100

    But there doesn’t seem to be any relationship between this observation and the patient’s BMI and in fact, this is seen even when BMI is normal or somewhat reduced.

    The patient in the above example however, also had a DLCO test and the Inspired Volume (IVC) was 1.95 L (87% of predicted FVC). Using the IVC to re-calculate the FEV1/VC ratio showed it to be 62.1 or 85% of predicted (LLN = 89). So, instead of possible restriction, this shows that the patient had mild airway obstruction and that the reduced FVC was due to gas trapping.

    In addition, I’ve reviewed the flow-volume loops from a number of patients with known lung restriction and found that the tidal loop was almost always more centrally placed. But when I look at restrictive spirometry pattern flow-volume loops a certain fraction of them has a tidal loop that is to the right and outside the maximal flow-volume loop.

    I’d be the first to admit that this flow-volume loop could be the result of the patient leaking around the mouthpiece. I’d even argue that expiratory leaks are more likely to occur than inspiratory leaks and that this could explain why the end of expiration is to the left of the beginning of the inspiration (i.e. FIVC > FVC). However, after reviewing a number of RSP loops I saw that this was occurring in somewhere around a quarter of them and that seems to be a bit too many to attribute to leaking, particularly since the same thing doesn’t seem to be happening in patients with restriction.

    The alternative explanation therefore is that this is actually a more extreme form of gas trapping. Confirmation of gas trapping usually requires lung volume measurements (i.e. elevated FRC and RV) but many patients with a restrictive spirometry pattern do not get lung volume measurements, or if they did it was well in the past and there is no overwhelming need to repeat them.

    But you don’t necessarily need lung volume measurements to determine the presence of gas trapping. Back in the 1970’s and 1980’s when we had water-seal volume displacement spirometers with a kymograph there was a relatively simple test to detect gas trapping. Specifically, we’d have a patient start with tidal breathing and then have them take a deep breath in and then ask them to return to normal breathing. Patients without gas trapping would have the end-exhalation level of their tidal breathing return to FRC almost immediately, while those with gas trapping would take a number of breaths before this happened. This test wasn’t particularly qualitative but it clearly differentiated between patients with gas trapping and those without.

    This kind of testing however, usually can’t be performed on current test systems, primarily due to software limitations, and the more common approach is to compare the FVC and SVC.

    So I have strong suspicions that patients with a restrictive spirometry pattern and whose tidal flow-volume loops show an essentially zero ERV are likely gas trapping. These (and probably other) patients should probably perform an SVC maneuver in order to make this more evident. Although I’d like to get our techs to perform SVC maneuvers as part of routine spirometry more frequently there are at least a couple of problems that make this more difficult than it ought to be.

    First, almost all of our routine spirometry is performed as part of the pulmonary physician clinics which are located several buildings away from the main PFT Lab. We have a couple screening spirometry systems there and I would like to say we can give every patient the time that they need but realistically given the number of patients seen in clinics and our resources the amount of time we can give to each patient is limited. Adding an SVC maneuver to routine spirometry could as much as double the amount of time spent with a patient and that’s time we often don’t have.

    Next, even when we have the time to perform both FVC and SVC maneuvers our lab software will not calculate the FEV1/SVC ratio and will not include it in a report. The best we can do is to report the SVC and then manually calculate the FEV1/SVC ratio and put it in the notes.

    Note: In addition I know that there are at least some spirometry systems do not permit an SVC to be performed. I don’t know how widespread this problem is but I have an older (~15 years) office spirometer that I use in a weekly free spirometry session in my community and it does not have an SVC test module.

    And although I feel crass for having to mention this, you can perform an FVC and be reimbursed or you can perform an SVC and be reimbursed but if you perform both and FVC and an SVC you will only be reimbursed for the FVC. Since an FVC and SVC probably only need to be performed a relatively small number of patients I don’t have a problem with this (we already perform upright and supine spirometry regularly and aren’t reimbursed for the supine portion) but there are many labs where this might be an issue.

    Finally, there remains a number of open questions about interpreting the FEV1/SVC ratio. In one sense it’s a no-brainer when a subject with a reduced FVC and a normal FEV1/FVC ratio can be shown to have a normal SVC with a reduced FEV1/SVC ratio. At the same time however, the FEV1/SVC ratio has not been studied in a normal population so its Mean and LLN values are at best speculative.

    Gas trapping has significant clinical implications and I strongly suspect that it is far more common than is generally realized. A major reason for this is our approach to routine spirometry (i.e. FVC only) which in turn has lead to corresponding limitations in many of our test systems. We need to be more open towards performing additional testing despite problems in time management, software limitations and reimbursement in order to detect gas trapping in our patients.

    It’s been over a decade since the ATS/ERS interpretation guidelines advocated the use of the FEV1/VC ratio (using the largest VC whatever the source). Although the recent ATS reporting guidelines mentions:

    “Measurement of slow VC and calculation of FEV1/VC are a useful adjunct in patients with suspected airflow obstruction.”

    there are no still guidelines about when an SVC should be performed as part of routine spirometry. It could be argued that an SVC should be performed in any patient with a reduced FVC but many if not most of those who meet this criteria clearly have airway obstruction and the value of an SVC is limited. More realistically it would probably be best to perform an SVC in only those patients with a reduced FVC and a normal FEV1/FVC ratio.

    References:

    Brusasco V, Crapo R, Viegi G et al. ATS/ERS task force: Standardisation of lung function testing. Interpretive strategies for lung function tests. Eur Respir J 2005; 26: 948-968.

    Graham BL et al. Recommendations for a standardized pulmonary function reports. An American Thoracic Society technical statement. Am J Respir Crit Care Med 2017; 196(11): 1463-1472.

    Creative Commons License
    PFT Blog by Richard Johnston is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License

  • Telling the right story

    The 2005 ATS/ERS spirometry standard make it permissible and even recommends that the FVC and FEV1 be selected from different efforts. I disagree somewhat with their criteria for selecting the FEV1 but overall reporting composite results makes a lot of sense. In an ideal world we’d always get the best FVC and FEV1 in a single effort but what we more often get is a good FEV1 with a poor FVC or a poor FEV1 with a good FVC. So, it best serves the clinical needs of the patient to report the best elements from multiple spirometry efforts.

    However, I was disappointed that the 2017 ATS reporting standards did not in any way address how to indicate that composite results are being reported, nor does it resolve the selection of the flow-volume loops and volume-time curves that accompany the numerical results. That leaves it to us to decide how to do this but this in turn is often limited by the capabilities of our equipment’s software.

    One test system that I routinely take to a free spirometry screening clinic will only report the three “best” efforts based solely on the largest combined FVC + FEV1. Admittedly, to some extent this follows the 2005 ATS/ERS spirometry standards selection criteria but other than deleting a specific test effort I cannot override these selections nor can I mix and match the FVC and FEV1 values. This means that what it reports as the “best” effort doesn’t always agree with what in reality are the best results.

    My lab’s software however, allows us to select which test efforts the FVC and FEV1 come from. In addition we can select which test effort the ancillary measurements (Peak Flow, Expiratory Time, FIVC, FEF50, etc.) and which effort the flow-volume loop and volume-time graphs comes from.

    It is therefore possible to select the FVC, FEV1, ancillary measurements and the graphs from entirely different test efforts. Thankfully, this almost never done but when I review reports what I see most frequently is that the FVC is selected from one test effort, but the FEV1, ancillary measurements and graphs are selected from another. To some extent this makes sense because I’d usually agree that the Peak Flow should always be associated with the FEV1, and if that’s the case, then so should the flow-volume loop. The problem with this is that the FVC often comes from a test effort with a substantially longer expiratory time and when results are selected this the volume-time curve and expiratory time are instead reported for the effort the FEV1 came from.

    This leads to a report that look like this:

    Observed: Predicted: %Predicted:
    FVC: 2.62 3.65 72%
    FEV1: 2.01 2.58 78%
    FEV1/FVC: 77 72 107%
    Peak Flow: 8.83 6.73 131%
    Exp. Time: 1.20

    with graphs like:

    At first glance these results are typical of restriction but anybody who reads this report and who knows anything about PFTs would see the short expiratory time and immediately suspect that the FVC is underestimated and the FEV1/FVC ratio overestimated, probably by a fair amount. For this reason, the test effort would probably be considered suboptimal and more likely to be masking underlying airway obstruction instead. However, because the expiratory time and the volume-time graph are coming from the effort with the highest FEV1 they are misleadingly low. If the proper expiratory time and volume-time curve were reported, it would look like this which goes a long way toward clearing up the confusion:

    Observed: Predicted: %Predicted:
    FVC: 2.62 3.65 72%
    FEV1: 2.01 2.58 78%
    FEV1/FVC: 77 72 107%
    Peak Flow: 8.83 6.73 131%
    Exp. Time: 8.76

    However, as I mentioned Peak Flow and Expiratory Time are part of the ancillary measurements and can’t be selected separately. The same applies to the flow-volume loop and the volume-time curve.

    So when this happens I take a look at all of the patient’s test efforts. In this specific case, it just happened that the test effort with the largest FVC had a peak flow comparable to the test effort the largest FEV1 came from. That allowed me to select the ancillary measurements (Peak Flow and Expiratory Time) and a combined flow-volume loop and volume time curve that matched the reported results.

    I’m not always so lucky. For example, this report has similar problems:

    Observed: Predicted: %Predicted:
    FVC: 3.30 4.24 78%
    FEV1: 2.77 3.12 89%
    FEV1/FVC: 84 74 114%
    Peak Flow: 8.31 8.11 102%
    Exp. Time: 1.35

    But all of the patient’s other test efforts had substantially lower Peak Flows, and even though the effort the FVC came from had the next best Peak Flow, the flow-volume loop looked substantially different:

    In this case I pretty much had to leave the original report alone. If I could have selected the flow-volume loop and volume-time graphs separately and the Peak Flow and Expiratory Time separately, the report would have made a lot more sense.

    When I review reports with problems I find that I can “fix” them about 1/3 of the time. So why bother even looking to fix reports in the first place when most of the time I can’t?

    Our PFT reports tell a story. There’s no character development, no scenes and no particular plot, but it’s still a story. It’s important that the elements of the story support each other and have a coherent narrative. When the elements are at odds with each other then the story doesn’t make a lot of sense, even if the “real events” the story is taken from do.

    I’d be able to tell the patient’s stories better if I could be more specific in what elements I could select from each test effort but there I’m limited by what our lab software allows me to do. It would also been helpful if the 2017 ATS reporting standards addressed the issue of reporting composite results since this is something we have to deal with every day.

    I’d like the stories my lab tells to be coherent, believable and clinically relevant rather than something that appears to be a flight of fancy. That’s why it’s worth my time to try to re-work a report even when most of the time our lab’s software won’t let me.

    So, what kind of stories are your lab’s reports telling?

    References:

    Brusasco V, Crapo R, Viegi G. ATS/ERS Task Force: Standardisation of lung function testing. Standardisation of spirometry. Eur Respir J 2005; 26: 319-338.

    Culver B H et al. Recommendations for a standardized pulmonary function reports. An official American Thoracic Society technical statement. Am J Respir Crit Care Med 2017; 196(11): 1463-1472.

    Creative Commons License
    PFT Blog by Richard Johnston is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License

  • I’ve got the old back-extrapolation blues

    A couple days ago I pulled my copy of the Intermountain Thoracic Society manual on pulmonary function testing off the bookshelf and thumbed through it a bit. It was first published in 1975 and was the first major attempt towards standardizing the performance and interpretation of PFTs.

    My first thought was that we’ve come a long way since then. Most importantly our understanding of what spirometry can (and cannot) tell us has improved dramatically.

    Equipment too, has advanced since 1975, most particularly due to the first equipment standards that were published in that decade. As a reminder, spirometer accuracy was not a given and there are number of studies dating from that time period that detailed just how woefully inaccurate many of them were.

    In 1975 computerized spirometers were exceptionally rare and I was reminded of this because 141 pages (two-thirds!) of the ITS manual is filled with look-up tables for predicted values and ATPS – BTPS – STPD conversion factors.

    Most spirometry systems were entirely manual and the majority of us measured FVC and FEV1 manually from pen tracings on kymograph paper. The results were then hand-calculated and then hand-written onto report forms. Since our equipment is so much more accurate and our computers acquire and calculate test results automatically, everything is so much better now, isn’t it?

    Overall, I’d have to say yes. Testing is much quicker and more accurate than it used to be in 1975, and no, I’m not particularly nostalgic about those days.

    {Arrrhh, gather round lads and lasses and let me tell you of the days when coal-fired steam-powered spirometers rumbled and hissed in basement labs everywhere; when you had to solve regression equations with your slide rule on the fly or risk the horror of ripped kymograph paper, exploding alveolar sample bags and spirometer bells gone ballistic without warning. The toll this daily physical and mental trauma took amongst the lowly pulmonary techs was terrifying and only the bravest continued the daily battle against gnarly patients, sneering doctors, black-hearted administrators and monopolistic manufacturers…

    …Oops! Wrong time-line; those are memories from the universe one north and two left of ours. Too much steampunk sci-fi late at night and too little sleep left me momentarily confused}

    I ran across an error today that reminded me that although computerized test systems are essential to our ability to run efficient and accurate labs, at the same time the limitations of software that comes along with them hinders our ability to detect and correct errors.

    A patient’s spirometry results came across my desk and every effort had excessive back-extrapolation (ranging from 0.33 L to 0.56 L). The tech performing the tests had indicated this in the notes but when I took a close look at the volume-time curves I was somewhat baffled since the beginning of each spirometry effort was sharp and clearly evident.

    It was only when I looked at the flow-volume loop that I realized what was happening.

    The back-extrapolation algorithm has it’s basis in the volume-time curve. Back in the day, you used a ruler to find the highest slope of the volume-time curve and used it to extrapolate backwards to the “real” beginning of the expiratory effort.

    From ATS/ERS Standardisation of spirometry, Figure 2, Page 324

    The highest slope however is, by definition, the peak flow. Computer software uses this fact to look for the peak flow and uses this to perform back-extrapolation. In this case, the patient was coughing during each expiratory effort, and in each effort, this cough caused expiratory flow to spike above the “real” peak flow for a very short interval.

    This “spike” was used by the computer algorithm to determine back extrapolation and because it was using the wrong “peak” flow, it was mis-calculating both the FEV1 and the amount of back-extrapolation.

    If the “real” start of exhalation was used instead, the FEV1 changes from 0.96 L (37% of predicted) to 0.72 L (28% of predicted), and although that’s a relatively small difference it changes the interpreted severity of the patient’s airway obstruction from “severe” to “very severe”.

    There are, of course, at least a couple things wrong with this. First, you’d think after all these years that software could account for coughs and ignore them. The cough in this instance lasted only 35-40 milliseconds.

    The “real” peak flow, on the other hand, lasts 100’s of milliseconds.

    Second, I have no way to correct this error, other than mentioning it in the notes. Our lab software lets me make at least some corrections to our DLCO tests and lung volumes, but for spirometry the best I can do is to select the computer-measured FVC and FEV1 from different efforts.

    Third, this error is relatively obscure and hard to detect. It would have been much more evident if the back-extrapolation process had been visually graphed but this type of graph is not available from my lab’s software and it only reports back-extrapolation numerically.

    So, yes, computerized test systems have made spirometry faster and more accurate than it was back in 1975 but only when the software algorithms are correct, and this is not always the case. My concern is that errors like this (and there are probably more that occur with some kind of regularity) are difficult to detect and in many cases, impossible to fix.

    I don’t see any easy solution to this problem. Equipment manufacturers have no particular incentive to find and fix these problems (that’s not to say that there aren’t some that are at least trying but in the current PFT and spirometry marketplace there’s no real payback for these kinds of efforts). I can say with some assurance that the spirometry algorithms in my lab’s software have remained unchanged for at least the last 12 years, and that they are probably directly based on software algorithms from the 1980’s and 1990’s.

    Manufacturers do respond to regulation and standards. The ATS/ERS spirometry standards have done much to improve spirometry quality but there are notable gaps in their recommendations that are left to individual manufacturers to interpret and solve. In addition the spirometry guidelines are now 13 years old and there is no particular sign that new ones are being developed (and from discussions I’ve had with other PFT veterans there’s no indication that even if they were that they’re soliciting input from any of us in the field).

    My real point however, is that by using computerized testing systems (not that that there’s been any alternative for at least a couple decades) we’ve traded some of our autonomy for convenience and productivity. Most of the time this is probably an acceptable trade but every so often we’re going to pay for this by (knowingly or not) reporting inaccurate results.

    Despite all this it’s your responsibility to move brightly into the future and interact in a positively rewarding manner with your grateful patients, knowledgeable physicians, generous administrators and enlightened manufacturers…

    […Oops! Wrong universe again! That’s the one that’s three south and one rightwards of ours…]

    References:

    Brusasco V, Crapo R, Viegi G. ATS/ERS task force: Standardisation of lung function testing. Standardisation of spirometry. Eur Respir J 2005; 26: 319-338.

    Creative Commons License
    PFT Blog by Richard Johnston is licensed under a Creative Commons Attribution-NonCommercial 4.0 International Lic

  • A spirometry quality grading system. Or is it?

    A set of guidelines for grading spirometry quality was included with the recently published ATS recommendations for a standardized pulmonary function report. These guideline are similar to others published previously so they weren’t a great surprise but as much as I may respect the authors of the standard my first thought was “when was the last time any of these people performed routine spirometry?” The authors acknowledge that the source for these guidelines is epidemiological and if I was conducting a research study that required spirometry these guidelines would be useful towards knowing which results to keep and which to toss but for routine clinical spirometry, they’re pretty useless.

    I put these thoughts aside because I had other projects I was working on but I was reminded of them when I recently performed spirometry on an individual who wasn’t able to perform a single effort without a major errors. The person in question was an otherwise intelligent and mature individual but found themselves getting more frustrated and angry with each effort because they couldn’t manage to perform the test right. I did my best to explain and demonstrate what they were supposed to do each time but after the third try they refused to do any more. About the only thing that was reportable was the FEV1 from a single effort.

    This may be a somewhat extreme case but it’s something that those of us who perform PFTs are faced with every day. There are many individuals that have no problems performing spirometry but sometimes we’re fortunate to get even a single test effort that meets all of the ATS/ERS criteria. The presence or absence of test quality usually isn’t apparent in the final report however, and for this reason I do understand the value in some kind of quality grading system. But that also implies that the grading system serves the purpose for which it is intended.

    In order to quantify this I reviewed the spirometry performed by 200 patients in my lab in order to determine how many acceptable and reproducible results there were. To be honest, as bad as I thought the quality problem was, when I looked at the numbers it was worse than I imagined.

    The spirometry quality grading system is:

    Grade: Criteria:
    A ≥3 acceptable tests with repeatability within 0.150 L (for age 2–6, 0.100 L ), or 10% of highest value, whichever is greater
    B ≥2 acceptable tests with repeatability within 0.150 L (for age 2–6, 0.100 L ), or 10% of highest value, whichever is greater
    C ≥2 acceptable tests with repeatability within 0.200 L (for age 2–6, 0.150 L ), or 10% of highest value, whichever is greater
    D ≥2 acceptable tests with repeatability within 0.250 L (for age 2–6, 0.200 L ), or 10% of highest value, whichever is greater
    E 1 acceptable test
    F No acceptable tests

    It’s important to note that this grading system is based primarily on the reproducibility of acceptable tests. Acceptable tests are:

    1. A good start of exhalation with extrapolated volume , <5% of FVC or 0.150 L, whichever is greater.
    2. Free from artifacts
    3. No cough during first second of exhalation (for FEV 1 )
    4. No glottis closure or abrupt termination (for FVC)
    5. No early termination or cutoff (for FVC)
    6. Maximal effort provided throughout the maneuver
    7. No obstructed mouthpiece

    There were 703 spirometry tests from the 200 patients for an average of 3.5 tests per patient. The lowest number of tests performed was 3, the maximum was 7. Out of 200 patients, 50 patients (25%) were unable to perform a single, acceptable test and would have received an ‘F’ quality grade. Another 51 patients (26%) were able to perform one acceptable test and would have received an ‘E’ quality grade. Only 38 patients (19%) were able to perform three (or more) acceptable tests and receive a ‘A’ quality grade. The remaining 61 patients would have gotten a ‘B’, ‘C’ or ‘D’ quality grade.

    The distribution of errors were (some efforts had more than one error):

    Expiratory time < 6 seconds: 314
    End-of-test: 268
    FVC > 0.15 L or 10%: 201
    FEV1 > 0.15 L or 10%: 126
    PEF < 20% max 117
    Back-extrapolation: 45
    Pauses that affected FVC or FEV1: 43
    FIVC > FVC: 6

    It’s apparent from this that the biggest problem most patients have is with the length of their exhalation (EOT criteria, expiratory time and FIVC > FVC) and that this primarily impacts the FVC and not the FEV1. The number of factors that affect the FEV1 (back-extrapolation, peak flow, pauses) are a lot smaller. To some extent this doesn’t surprise me since I’ve always felt that in spirometry testing the FEV1 was more reliable than the FVC.

    There is an additional point the quality grading system does not address, and that is composite results. Specifically, reporting the highest FVC (regardless of which effort it came from) along with the highest FEV1 which is allowed and even encouraged by the ATS/ERS spirometry standards. Composite results were reported for 69 out of the 200 patients (35%). I did not try to analyze these closely but I can say that 22 out of these 69 (32%) had no acceptable test efforts. Some fraction of these however, combined an effort with an acceptable FEV1 and an effort with an acceptable FVC but the grading system would still have given them an ‘F’.

    Note: I didn’t try to correlate the number or type of spirometry errors with the technicians that performed their tests. Partly because I wasn’t interested, partly because which patient you get is usually the luck of the draw and partly because in the past when I was the lab manager I always took the toughest patients and probably would have had one of the highest error rates so there isn’t necessarily any correlation here.

    I can’t prove it but I think that these statistics are reasonably representative of the experience in most PFT labs. Some labs are going to be better, some are going to be worse. I like to think that my lab is better than most but that’s purely subjective and regardless of how good (or bad) a lab’s staff are, in the final analysis it comes down to the patient’s ability to perform spirometry and that really isn’t as good as you might think it ought to be. To (badly) paraphrase Clauswitz, “even though spirometry is simple, when testing humans even the simple is very difficult.”

    In the ICU there’s something called alarm fatigue where alarms are going off more or less continuously because a patient moved or because of bad connections or because the alarm limits are set too stringently (or whatever). Medical staff often become deaf to these alarms and stop paying attention to them, sometimes with adverse consequences for their patients.

    So, the problem is that over 50% of my lab’s patients would have gotten an ‘E’ or and ‘F’ grade. If you were interpreting reports, how quickly would you get ‘alarm fatigue’ if those were the most common quality grades you saw? For that matter, how long would it take you to get the idea that your PFT lab was mostly staffed with incompetents?

    I’m sure the authors of the quality grading system would argue that the results should be used as part of a quality improvement plan, and although I would agree with the sentiment, the reasons for suboptimal test quality (probably partly psychological, partly physiological and partly medical) are not easily quantifiable. In addition, what’s labeled a spirometry quality grading system is really a reproducibility grading system for ‘acceptable’ quality tests. I’m not going to say that this doesn’t serve a useful purpose but it should be labeled for what it is.

    A problem that everyone who interprets pulmonary function results faces (with varying degrees of success since it is usually only acquired from experience) is assessing suboptimal quality tests in order to determine what parts are meaningful and informative, and what parts aren’t. Given that over half our patients would only have gotten and ‘E’ or an ‘F’ grade what would have been far more useful than a grading system would be official guidelines for determining the information content of suboptimal quality tests. A spirometry effort that doesn’t meet acceptability criteria may still have something useful to say about expiratory volume or flow rates. This in turn could be used to say something useful about the probable presence or absence of airway obstruction and restriction and allow us to at least salvage something out of suboptimal spirometry test quality.

    References:

    Brusasco V, Crapo R, Viegi G. ATS/ERS task force: Standardisation of lung function testing. Standardisation of spirometry. Eur Respir J 2005; 26(2): 318-339.

    Brusasco V, Crapo R, Viegi G. ATS/ERS task force: Standardisation of lung function testing. Interpretive strategies for lung function tests. Eur Respir J 2005; 26(6): 948-968.

    Graham BL, Coates AL, Wanger J et al. Recommendations for a standardized pulmonary function report. An official American Thoracic Society technical statement. Am J Respir Crit Care Med 2017; 196(11): 1463-1472

    Creative Commons License
    PFT Blog by Richard Johnston is licensed under a Creative Commons Attribution-NonCommercial 4.0 International Lic

  • 3-Equation DLCO

    One of the limitations of the single-breath DLCO is that the equation used to calculate results implicitly assumes that the entire breath-holding period occurs at TLC. Mathematically, what happens to the diffusion of carbon monoxide (CO) during inspiration and expiration is not a consideration:

    The different approaches towards measuring breath-holding time (BHT) make allowances for inspiration and expiration to one extent or another but realistically they should be considered fudge factors.

    The 3-equation DLCO was first proposed by Graham et al in 1980 and it received its name because there is a separate equation for each phase of the single-breath DLCO maneuver. The individual equations are based on the mass-balance equation and attempt to account for the mass of CO inhaled, absorbed and exhaled during the single-breath maneuver. One of the most significant differences is that an iterative approach is used to determine DLCO. Specifically, an initial estimate of DLCO is made and then compared against the values measured during the three phases. Any differences in observed versus expected values is used to re-estimate the DLCO, and then re-compare it. The authors indicated that 10 iterations are usually sufficient to converge on a DLCO value that meets all measured conditions with a high degree of accuracy.

    Because the amount of exhaled CO is important the 3-equation method requires that the patient exhale to RV (or at least until the expiratory flow rate equaled the flow rate of the gas analyzer’s sampling line). In addition, the expiratory washout volume was estimated using the Fowler approach towards measuring dead space (which was included for the first time as a recommendation in the 2017 ERS/ATS DLCO standards).

    Using the 3-equation method, the authors found that DLCO remained relatively constant over a broad range of inspiratory, breath-holding and expiratory times as well as a variety of inspiratory volumes in normal subjects. Later studies however, indicated that in subjects with ventilation inhomogeneity DLCO decreased when breath-holding time was decreased. It was noted that there was a strong correlation between the phase III slope of the N2 washout curve and the amount of decrease in DLCO that occurred at shorter breath-holding time. For this reason this effect was attributed the fact that the amount of time needed for diffusive gas transport to the alveoli increases as ventilation inhomogeneity increases. Despite this, the authors have continued to advocate that the 3-equation DLCO be performed without a breath-hold period. In addition, they also advocate that the inspiratory phase of the 3-equation maneuver be performed from FRC, not RV, and that the inspired volume be ½ the IC. They state:

    “While a VC inhalation minimizes ventilation nonuniformities in the lung and would be easier to standardize in the clinical setting, a submaximal breath from FRC is more sensitive in the detection of peripheral inhomogeneities in the lung…”

    Interestingly, one study showed that the 3-equation DLCO was sensitive to volume history. Specifically, the DLCO measured more or less immediately after a deep breath was significantly higher than one performed after a prolonged period of tidal breathing. This was noted in normal subjects when the 3-equation DLCO was performed without a breath-holding period. Possible reasons for this include an increase in surface secondary to re-establishment of the alveolar surfactant, redistribution of capillary blood volume and a transitory change in hematocrit.

    Note: No further research appears to have been done of this subject however, and little or no research has been done on the effects of volume history on the traditional single breath DLCO method. Notably the 2017 ERS/ATS DLCO standard says nothing about volume history, so it’s unclear how important this actually is to routine DLCO testing.

    Regardless of the cause, the authors have suggested that the 3-equation DLCO always be performed immediately after a deep breath in order to standardize volume history. For this reason a 3-equation maneuver is expected to be performed like this:

    Since the 3-equation DLCO seems to be relatively insensitive to inspiratory and expiratory time, and would be easier to perform than the standard DLCO why hasn’t its use become more widespread?

    One major reason for this is that a primary assumption of the 3-equation method is that DLCO remains the same regardless of lung volume. There is, however, a large amount of research that indicates that this is not true. Admittedly this is based primarily on the standard single-breath DLCO methodology but even so, the fact that other techniques like rebreathing DLCO and steady-state DLCO which are measured at lung volumes below TLC always produce results that are lower than single-breath DLCO would seem to support this.

    Another reason is that there are no reference equations for the 3-equation DLCO. One population study (283 subjects) did compare the 3-equation DLCO (performed with an RV to TLC inspiration) to the standard single-breath DLCO, using the three BHT measurement approaches (Ogilvie, ESP, Jones-Meade). It found that the 3-equation and Jones-Meade were in close agreement in both normal patients and those with restriction or obstruction. However the 3-equation DLCO was not performed as advocated by the original authors so it is unclear how well these results actually compare to those. In addition the 3-equation method has not been formally studied with any lung disorders so interpreting results remains somewhat problematic.

    In addition almost all of the research on the 3-equation DLCO has been performed by the same small group of researchers. Rightly or wrongly this may have implied that the technique was not easily transferable. There has been some truth to this in that at the time the majority of this research was performed (1980’s through 1990’s) testing required a mass spectrometer, a fast-responding CO analyzer, and careful attention to gas sample transit time and analyzer response time. This has since changed and the type of equipment needed to perform the 3-equation method has become fairly standard. For this reason some manufacturers have at one time or another offered the ability to perform 3-equation DLCO testing but this is likely too little, too late.

    Finally, although I have a great deal of respect for the originators of the 3-equation method, I also have to say that they’ve done a remarkably poor job of making it understandable. Their explanation, such as it is, has always begun (and pretty much ended) with a series of mathematical formula that require an understanding of integrals and calculus. A simpler and more detailed explanation might have gone a long way towards improving acceptance and a more widespread use of the 3-equation DLCO.

    The 3-equation DLCO attempts to include the inspiratory and expiratory phases of the single-breath maneuver and thereby overcome one of the conceptual faults of the standard DLCO test. To some extent it succeeds at this but it does so at the cost of assuming that DLCO remains constant regardless of lung volume despite evidence to the contrary. Although it is an interesting technique that could have simplified DLCO testing it never achieved “critical mass” and instead remains a historical footnote.

    References:

    Beck KC, Offord KP, Scanlon PD. Comparison of four methods for calculating diffusing capacity by the single breath method. Chest 1994; 105(2): 594-600.

    Cotton DJ, Taher F, Mink JT, Graham BL. Effect of volume history on changes in DLCO SB-3EQ with lung volume in normal subjects. J Appl Physiol 1992; 73(2): 434-439.

    Cotton DJ, Prabhu MB, Mink JT, Graham BL. Effect of ventilation inhomogeneity on DLCO SB 3EW in normal subjects. J Appl Physiol 1992; 73(6): 2623-2630.

    Cotton DJ, Mink HT, Graham BL. Nonuniformity of diffusing capacity from small alveolar gas samples is increased in smokers. Can Respir J 1998; 5(2): 101-108.

    Graham BL, Dosman JA, Cotton DJ. A theoretical analysis of the single breath diffusing capacity for carbon monoxide. IEEE Trans Biomed Eng 1980;BME-27:221-227.

    Graham BL, Mink JT, Cotton DJ. Improving the accuracy and precision of single-breath diffusing capacity measurements. J Appl Physiol 1981; 51(5): 1306-1313.

    Graham BL, Mink JT, Cotton DJ. Effect of breath-hold time on DLCO(SB) in patients with airway obstruction. J Appl Physiol 1985; 58(4): 1319-1325.

    Graham BL, Mink JT, Cotton DJ. Implementing the three-equation method of measuring single breath carbon monoxide diffusing capacity. Can Respir J 1996; 3(4): 247-257.

    Graham BL, Brusasco V, Burgos F, et al. 2017 ERS/ATS standards for single-breath carbon monoxide uptake in the lung. Eur Respir J 2017; 49: 1600016.

    Wand JS, Abboud RT, Wang LM. Effect of lung resection on exercise capacity and carbon monoxide diffusing capacity during exercise. Chest 2006; 129: 863-872

    Creative Commons License
    PFT Blog by Richard Johnston is licensed under a Creative Commons Attribution-NonCommercial 4.0 International Lic

  • VA, two ways

    One of the recommendations in the 2017 ERS/ATS DLCO standards was that VA should be calculated using a mass balance equation. I’ve discussed this approach previously, but basically the volume of the exhaled tracer gas is accumulated over the entire exhalation and the amount of tracer gas presumed to remain in the lung is used to calculate VA. The conceptual problem with this for DLCO measurements is that VA is calculated using the entire exhalation but CO uptake is based solely on the CO concentration in the alveolar sample. Since VA calculated using mass balance tends to be larger than VA calculated traditionally in subjects with ventilation inhomogeneities this mean that DLCO calculated with a mass balance VA is also going to be proportionally larger as well.

    This problem has concerned me for a while but what wasn’t clear was what difference should be expected in the VA (and DLCO) when it is calculated both ways. In order to figure this out I’ve taken a real-world example of a subject with severe COPD and calculated the difference in VA and DLCO.

    Fortunately, my lab software lets me download the raw data for DLCO tests (volume, CH4, CO at 10 msec intervals) into a spreadsheet. The PFT results for the subject looked like this:

      Observed: %Predicted:
    FVC (L): 2.39 97%
    FEV1 (L): 0.66 36%
    FEV1/FVC: 27 38%
         
    TLC (L): 6.11 126%
    FRC (L): 4.84 174%
    RV (L): 4.04 171%
         
    DLCO: 9.21 57%
    VA (L): 3.19 68%
    Vinsp (L): 2.32  

    In order to use the mass balance approach with the spreadsheet I found that I could determine the start of exhalation after the breath-holding period but determining where the alveolar plateau started was much more difficult. For this reason I had to include the dead space but made adjustments for this when calculating VA.

    To start off with, using the inspired volume and concentration of CH4 in the DLCO test gas mixture, the volume of inhaled CH4 was:

    2.32 L x 0.003 = 6.96 ml.

    Integrating the exhaled CH4 throughout the exhalation showed a total volume of 2.567 ml, which leaves a volume of 4.393 ml remaining in the lung at the end of exhalation. The average CH4 concentration at the end of exhalation (averaged over the remaining 250 msec, per the 2017 DLCO standard) was 0.1601 percent. This means the volume of the lung at end-exhalation was:

    0.004303 L / 0.001601 = 2.74 L.

    The total volume exhaled was 1.32 L so total lung volume was:

    2.74 L + 1.32 L = 4.06 L.

    The dead space in our test systems defaults to 0.32 L (0.07 mouthpiece + 0.25 anatomical). VA is therefore:

    4.06 L – 0.32 L = 3.74 L.

    Since the VA calculated using the traditional approach was 3.23 L, the difference in VA was a factor of

    3.74 L / 3.23 L = 1.16

    and since DLCO scales directly with VA the DLCO calculated with the mass balance VA would have been:

    9.21 ml/min/mmHg x 1.16 = 10.68 ml/min/mmHg.

    Which is an increase in percent predicted from 57% to 66%.

    I repeated this for a couple more subjects with severe COPD and was mildly surprised to get very similar results (VA factor range was 1.12 to 1.16) . Since these patients have fairly severe ventilation inhomogeneities these results would seem to set an upper limit in the difference between VA calculation methods.  To some extent this comparison of VA methods is limited by the fact that we don’t have our patients exhale all the way to RV after the breath-holding period (part of the 2017 DLCO recommendations) but for those patients with COPD exhaling completely both at the beginning and the end of the single-breath maneuver is problematic anyway.

    The point of using the mass balance method is get a more accurate VA but in patients with ventilation inhomogeneities the difference was not as great as I expected it might be. This is not the first time VA was corrected to reflect “true” lung volume since up until the mid-1960’s a number of researchers advocated adding the RV (measured either by helium dilution or plethysmography) to the inspired volume to calculate VA. This practice fell by the wayside for a number of reasons, one of which was that DLCO measured this way assumed that the test gas mixture (and CO uptake) was homogeneously distributed throughout the lung and this is often not the case.

    The problem as I see it is that the DLCO measured using the mass balance method extrapolates the rate of CO uptake to the parts of the lung that either do not contribute or contribute poorly to the alveolar sample and this may or not be reasonable. The difference in VA and DLCO using the two methods has not been studied to any degree in either normal subjects or those with various lung disorders and this makes it difficult to say whether the ATS/ERS recommendation actually makes sense.

    The biggest difference in VA between the two methods would be expected in patients with ventilation inhomogeneities. In my lab, the DLCO re-calculated using the mass balance method in a patient with a severe ventilation inhomogeneity would have changed the interpretation from a “moderate” to a “mild” gas exchange defect”. In one sense that’s a significant change but I suspect that for most patients, even those with some degree of ventilation inhomogeneity, the difference will likely be small.

    References:

    Graham BL, et al. 2017 ERS/ATS standards for the single-breath carbon monoxide uptake in the lung. Eur Respir J 2017; 49: 1600016.

    Creative Commons License
    PFT Blog by Richard Johnston is licensed under a Creative Commons Attribution-NonCommercial 4.0 International Lic

  • What’s normal about airway resistance?

    The question that was actually posed to me a month or so ago was “when is RAW abnormal?” I didn’t have a good answer at the time since airway resistance (RAW) tests are not performed by my lab. The pulmonary physicians I work with don’t think that RAW is a clinically useful measurement and for a variety of reasons I don’t disagree with this. Nevertheless, RAW testing is routinely performed in many labs around the world so I thought it would be interesting to spend some time researching this.

    When asking what’s normal the first issue is which RAW value are you talking about? The measurement of airways resistance using a body plethysmograph was first described by DuBois et al in 1956. Airway resistance (RAW) is the amount of pressure required to generate a given flow rate and is reported in cm H2O/L/Sec. A number of physiologists quickly found that the reciprocal of RAW, conductance (GAW), which is expressed as the flow rate for a given driving pressure (L/sec/cm H2O), was also a useful way to describe the pressure-flow relationship of the airways.

    For technical reasons TGV (Thoracic Gas Volume) must be measured at the same time as RAW. It was soon noted that there was a relationship between RAW and TGV and that airway resistance decreased as lung volume increased.

    Because the relationship between RAW, GAW and TGV was considered to be linear (and little affected by age), specific GAW (i.e. GAW/TGV, SGAW, expressed as L/sec/cm H2O/L) is considered to be independent of TGV (this is also more or less the case with the specific RAW (i.e. RAW/TGV, SRAW, expressed as cm H2O/L/sec/L) but this measurment is rarely made or reported). RAW and SGAW have since become the most common ways to report airway resistance but there are still occasional research papers where SRAW or GAW is reported instead.

    Although RAW has been studied many times there are actually very few articles that are concerned with normal values and the majority of them have reported just the mean and range for their study populations. The values for RAW and SGAW that Briscoe and Dubois published in 1958 were considered the gold standards for decades despite having a study population of only 26 individuals (10 F, 16 M, 8 children, 18 adults). In fact, there are almost no reference equations for RAW and SGAW and almost all of the pulmonary function textbooks I have on hand have published the normal range for RAW as 0.5 – 2.0 cm H2O/L/sec, and the normal range SGAW as 0.13 – 0.35 L/sec/cm H2O/L.

    There is however, one recent population study (Marsh et al, 2006) with 212 individuals (Caucasian, 25-75 years, 51.9% male) that described the mean, ULN and LLN for SGAW and SRAW, and another (Gutierrez et al, 2004) with 627 individuals (Caucasian, 20-80 years, 47.8% male) that provided reference equations with a calculable ULN and LLN for RAW and SGAW.

    Not surprisingly given the relationship between TGV and RAW, the reference equations from Gutierrez et al show that RAW is primarily influenced by height.

    They also showed that age has a slight effect for males, but was not considered significant for females.

    Given that SGAW is considered to be independent of TGV it is somewhat surprisingly to find that the Gutierrez reference equations show a distinct relationship with height. Marsh et al however, determined that height was not a factor. Both studies did not find age to be a factor for SGAW.

    The Gutierrez reference equations have problems with both the male and the female LLN since at lower heights these values are negative. It should be noted that the SEE for SGAW was large relative to the mean value and the range of heights in the study population was not included.

    There are fairly significant differences in values from the Gutierrez and Marsh reference equations, particular for taller individuals. This may be due to a smaller study population for the Marsh equations but also points out that the normal range for SGAW (and all of the other RAW measurements) still tends to be inconsistent from one study to another. As an example, a recent study of SGAW, RAW and airway obstruction indicated that within their clinical study population that consisted largely of patients with asthma and COPD the LLN for SGAW was 0.98. Another recent study however, showed that the normal range in SGAW for healthy subjects was 0.78 – 1.47.

    So why are there some much difference in RAW and SGAW from otherwise comparable studies?

    One factor that may affect the accuracy of measured RAW values may be the panting frequency. Although there are no ATS/ERS standards for panting frequency the original Dubois 1956 study used a panting frequency of 2 Hz. It has been variously reported in other studies however, as being anywhere from 1 – 3 Hz but far more frequently the panting frequency is unreported. The reason why this may be important is that Peslin et al showed an increase in RAW from 0.62 (±0.55) to 1.71 (±0.76) over a range of 0.25 to 3.0 Hz which the authors primarily attributed to inadequate BTPS correction in the software. An earlier study however, by Krell et al performed at two frequencies (approximately 1.5 and 3.0 Hz) did not show any difference in SGAW but the plethysmograph used in the study was constructed for research and all loops were analyzed manually.

    In addition the flow interval over which RAW is usually measured (i.e. ±0.5 L/sec) is relatively arbitrary. It was originally ±1.0 L/sec in the original 1956 DuBois article but by 1958 the same investigators had changed the measurement range to ±0.5 L/sec because “The oscilloscope tracing was sometimes alinear, showing a tendency to curve at the extremes, presumably due to turbulent flow at higher flow rates.”

    From Clausen J. editor. Pulmonary function testing. Guidelines and controversies. Equipment, methods and normal values. Published by Grune & Stratton, 1982, page 148, figure b.

    The middle section of the RAW loop may be more linear but a study by Lord and Edwards showed that RAW measured over ±0.5 L/sec was significantly more variable than when measured over ±1.0 L/sec and ±2.0 L/sec. In addition the RAW measured over ±0.5 L/sec is significantly greater than the RAW measured over ±2.0 L/sec.

    It is also usually assumed that inspiratory and expiratory resistance in the range it is measured (i.e. at FRC and ±0.5 L/sec) are the same. This is reasonably true for individuals without airways disease but is not the case when significant airway obstruction is present.

    From Topalovic M, Exadaktylos V, Troosters T, Celis G, Aerts J-M, Janssens W. Non-linear parameters of specific resistance loops to characterise obstructive airways disease. Respir Res 2017; 18: 9, figure 1, page 2.

    When the RAW loop opens up as shown above in figure C it becomes significantly less clear how RAW should be calculated. This probably does not affect the normal ranges for RAW but does make it more difficult to determine RAW in individuals with significant airway obstruction.

    Another factor that I’ve not seen discussed is that there is a mild discrepancy between TGV and the lung volumes at which RAW is actually measured. Specifically, if measured correctly TGV should be essentially the same as FRC but although the tidal volumes that occur during the panting maneuver are likely small they occur above FRC. In addition, TGV is not always measured correctly and may be either above or below true FRC by some unknown amount. These errors are likely small but the effect they have on RAW and SGAW measurements has not been studied.

    Finally, there is really no way to verify the accuracy of a RAW measurement (admittedly this is one of my chronic complaints but is more true of RAW than for most other pulmonary function tests). The days of manually taking the RAW angle from a storage oscilliscope or polaroid film and then hand calculating airway resistance are long gone. We are highly dependent on our lab software to not only record the raw test data accurately but to analyze it correctly. Spirometry, lung volume and DLCO measurements can be verified using simulators of one kind or another but there is no simulator for airway resistance. Also unlike spirometry, lung volumes and DLCO no inter-laboratory comparisons have ever been made of RAW measurements.

    Given that the normal range for RAW is somewhat vague Ries and Clausen suggested that elevated RAW values be classified as:

    RAW: Severity:
    <2.8 Normal
    2.8 – 4.5 Mild obstruction
    4.5 – 8.0 Moderate obstruction
    >8.0 Severe obstruction

    Miller et all made similar suggestions:

    RAW: Severity:
    <3.0 Normal
    3.0 – 4.5 Mild obstruction
    4.5 -8.0 Moderate obstruction
    8.0 – 15.0 Severe obstruction
    >15.0 Extreme obstruction

    Miller also made suggestions assessing the severity of SGAW measurements:

    SGAW: Severity:
    >0.114 Normal
    0.114 – 0.070 Mild obstruction
    0.070 – 0.040 Moderate obstruction
    0.040 – 0.021 Severe obstruction
    <0.020 Extreme obstruction

    Note:  Millers values actually points out an interesting problem I’ve run across. Specifically, SGAW measurements reported from more than a couple decades ago are almost exactly 1/10th the magnitude of SGAW measurement reported currently (compare these to the predicted SGAW measurements from Gutierrez and Marsh). In many instances this is because the units used to report SGAW have changed to liters/sec/Kpa/liter (Kpa = 10.19 x cm H2O) but in other instances the SGAW measurements still use same units, i.e. liters/sec/cm H2O/liter. I’m sure there is an explanation somewhere but in the meantime it does make it difficult to compare results over time.  FYI, RAW results reported in Kpa/L/sec are 1/10th those reported in cm H2O/L/sec so it’s important to keep track of which units are being reported.

    Regardless of whether airway resistance is reported as RAW or SGAW, the normal range is comparatively much larger than it is for FEV1. As importantly the reported mean values for RAW and SGAW are often inconsistent from one study to another. RAW and SGAW can show significant changes post-bronchodilator or during a methacholine challenge but again these are comparatively much larger than they are for FEV1 and there are no ATS/ERS guidelines for the level of change that should be considered significant.

    RAW and SGAW measurements require a body plethysmograph which severely limits which facilities are able to perform these tests, particularly when compared to spirometry. RAW and SGAW measurements are easy to add to the routine measurement of plethysmographic lung volumes but still require careful calibration and attention to detail. Whether or not RAW testing should be performed at all comes down to its perceived clinical utility.

    RAW and SGAW may be more sensitive to changes in airway status than FEV1 and at least one study has suggested that airway resistance was abnormal in many patients with asthma who otherwise had a normal FEV1/FVC ratio. But the problem with RAW and SGAW in routine clinical testing is that what’s considered normal, what’s considered abnormal and what’s considered a significant change is still much “fuzzier” than it is for spirometry and this places serious constraints on its clinical relevance.

    References:

    [A] Briscoe WA, Dubois AB. The relationship between airway resistance, airway conductance and lung volume in subjects of different age and body size. J Clin Invest 1958; 37: 1279-1285

    Clausen J. editor. Pulmonary function testing. Guidelines and controversies. Equipment, methods and normal values. Published by Grune & Stratton, 1982.

    DuBois AB, Botelho SY, Comroe JH. A new method for measuring airway resistance in man using a body plethysmograph: values with normal subjects and in patients with respiratory disease. J Clin Invest 1956; 35: 327-335.

    Greenway SD, Blackwell S, Jhali N, Stenning J, Wilbraham D, Clarke GS. Variability of airway conductance (SGAW) in healthy volunteers. Amer J Respir Crit Care Med 2010; 181: A5003.

    [B] Gutierrez C, Ghezzo RH, Abboud RT et al. Reference values of pulmonary function tests for Canadian Caucasians. Can Respir J 2004; 11(6): 414-424.

    Krell WS, Agrawal KP, Hyatt RE. Quiet-breathing vs. panting methods for determination of specific airway conductance. J Appl Physiol 1984; 57(6): 1917-1922.

    Lord PW, Edwards JM. Variations in airways resistance when defined over different ranges of airflows. Thorax 1978; 33: 401-405.

    [C] Marsh S, Aldington S, Williams M et al. Completer reference ranges for pulmonary function test from a single New Zealand population. New Zealand Med J 2006; 119, no. 1244

    Miller WF, Scacci R, Gast LR. Laboratory evaluation of pulmonary function. Published by J.B. Lipincott Co., 1987.

    Peslin R, Duvivier C, Malvestio P, Benis AR, Polu JM. Frequency dependence of specificic airway resistance in a commercialized plethysmograph. Eur Respir J 1996; 9: 1747-1750.

    Ries AL, Clausen J. Chapter: Airway Resistance. Pulmonary function testing indications and interpretation. Wilson AF, editor. Published by Grune & Stratton. 1985.

    Topalovic M, Derom E, Osadnik CR, Troosters T, Decramer M, Janssens W. Airways resistance and specific conductance for the diagnosis of obstructive airways diseases. Respir Res 2015; 16: 88.

    Topalovic M, Exadaktylos V, Troosters T, Celis G, Aerts J-M, Janssens W. Non-linear parameters of specific resistance loops to characterise obstructive airways disease. Respir Res 2017; 18: 9.

    Creative Commons License
    PFT Blog by Richard Johnston is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License

  • Thinking about the past

    This is the time of the year when it’s traditional to review the past. That’s what “Auld lang syne”, the song most associated with New Year’s celebrations, is all about. I too have been thinking about the past but it’s not been about absent friends, it’s been about trend reports and assessing trends.

    In the May 2017 issue of Chest, Quanjer et al reported their study on the post-bronchodilator response in FEV1. I’ve discussed this previously and they noted that the current ATS/ERS standard for a significant post-bronchodilator change of ≥12% and ≥200 ml penalized the short and the elderly. Their finding was that a significant change was better assessed by the absolute change in percent predicted (i.e. 8%) rather than a relative change.

    I’ve thought about how this could apply to assessing changes in trends ever since then. The current standards for a significant change in FEV1 over time (also discussed previously) is anything greater than:

    which is good in that it is a way to reference changes over any arbitrary time period but it also looks at it as a relative change (i.e. ±15%). A 15% change however, comes from occupational spirometry, not clinical spirometry, and the presumption, to me at least, is that it’s geared towards individuals who have more-or-less normal spirometry to begin with.

    A ±15% change may make sense if your FEV1 is already near 100% of predicted but there are some problems with this for individuals who aren’t. For example, a 75 year-old 175 cm Caucasian male would have a predicted FEV1 of 2.93 L from the NHANESIII reference equations. If this individual had severe COPD and an FEV1 of 0.50 L (17% of predicted), then a ±15% relative change in FEV1 would ±0.075 L (75 ml). That amount of change is half the acceptable amount of intrasession repeatability (150 ml) in spirometry testing and it’s hard to consider a change this small as anything but chance or noise. It’s also hard to consider this a clinically significant change.

    If an 8% change in percent predicted is needed for it to be considered significant, then the FEV1 would have to change by 0.23 L and either decrease to 9% of predicted (0.27 L) or increase to 25% of predicted (0.73 L) which are relative change of ±46%. This much of a change is certainly larger than the limit for intrasession repeatability, and is most likely clinically significant for the individual.

    The question that needs to be asked is whether an 15% relative change in percent predicted sets the bar too high for a clinically significant change individuals with near-normal results and at the same time sets it too low for those individuals with severely abnormal results. I think it’s clearly evident that it does for those with abnormal results and it may also be true for those near-normal results.

    Does an 8% change in percent predicted make more sense? I think it’s reasonably clear it does for individuals with severely abnormal results and may also for those with near-normal results.

    Clinical significance is hard to study however, and usually the best you can do is to look at outcomes. But even this is complicated since outcomes can’t be based solely on one variable (i.e. pulmonary function results). The best we can do is draw some kind of line, and in this instance using a relative change to assess changes over time seems to have serious flaws. It’s possible that changes in Z-scores may be more effective way to look at changes over time, but that will have to wait on future study. At the moment, I’d have to say that looking at changes in percent predicted seems to have fewer flaws than relative change and looks to be a better approach when assessing changes over time.

    There are numerous studies on longitudinal changes in pulmonary function but these are mostly concerned with what constitutes a normal change over time or the changes that are typically seen with various lung disorders. The question of what constitutes a significant clinical change from one visit to another for values other than FEV1 hasn’t been studied all that much. The 2005 ATS/ERS statement on interpretation included guidelines for assessing change over different intervals of time for FVC, FEV1 and DLCO (and FEF25-75) but these are based on limited data, and are all based on relative changes. I’m inclined to think that assessing changes in FVC, TLC and DLCO in terms of changes in percent predicted makes as much sense as it does for FEV1.

    That however, brings me to trend reports. I was very disappointed that the subject of trend reports wasn’t mentioned in even the slightest way in the recent 2017 recommendations for standardized reports. In my lab (and I suspect most clinical PFT labs as well), the majority of spirometry is performed not so much for diagnostic purposes, but for looking at changes over time. It’s for this reason that trend reports are actually quite important. But as a tool to help assess changes over time, every trend report I’ve ever seen does a remarkably poor job of this. Trend reports usually look something like this:

    FVC FVC %Pred FEV1 FEV1 %pred
    12/29/2017 1.02 52 0.61 41
    12/01/2017 1.54 79 0.73 48
    10/13/2017 1.32 67 0.67 44
    06/15/2017 1.23 62 0.79 51
    03/16/2017 1.34 67 0.75 49
    11/10/2016 1.16 57 0.77 49
    08/04/2016 1.31 64 0.88 56
    05/24/2016 1.71 85 1.01 64
    03/01/2016 1.59 79 0.99 63
    12/29/2015 1.17 58 0.77 49
    06/29/2015 1.53 74 0.84 53
    09/15/2014 1.58 75 0.92 56
    05/22/2014 1.46 69 0.81 50
    11/21/2013 1.89 90 1.08 67
    06/06/2013 1.58 74 1.00 60
    03/25/2013 1.38 65 0.84 51

    And when shown graphically, usually look something like this:

    Neither of these formats is terribly useful. When looking at the tabular version you’ve need to have a calculator on hand in order to determine how big a change has actually occurred from visit to visit. The graphical version isn’t much of an improvement partly since the lines are jumbled together and it’s hard to read but also because even though you can see changes when they’re large enough, assessing their actual magnitude is difficult.

    Conceptually, at least, there’s an easy fix for the tabular report format (although it isn’t possible to do this with my lab’s software, so it’s a moot point for me). Just adding %change (relative change) or a percent predicted change to the columns makes it much easier do assess changes overall.

    For relative change this would look like:

    FVC FVC %Pred FVC %Change FEV1 FEV1 %pred FEV1%Change
    12/29/2017 1.02 52 -34 0.61 41 -16
    12/01/2017 1.54 79 +17 0.73 48 +9
    10/13/2017 1.32 67 +7 0.67 44 -15
    06/15/2017 1.23 62 -8 0.79 51 +5
    03/16/2017 1.34 67 +16 0.75 49 -3
    11/10/2016 1.16 57 -11 0.77 49 -13
    08/04/2016 1.31 64 -23 0.88 56 -13
    05/24/2016 1.71 85 +8 1.01 64 +2
    03/01/2016 1.59 79 +36 0.99 63 +29
    12/29/2015 1.17 58 -24 0.77 49 -8
    06/29/2015 1.53 74 -3 0.84 53 -9
    09/15/2014 1.58 75 +8 0.92 56 14
    05/22/2014 1.46 69 -23 0.81 50 -25
    11/21/2013 1.89 90 +20 1.08 67 +8
    06/06/2013 1.58 74 +14 1.00 60 +19
    03/25/2013 1.38 65 0 0.84 51 0

    and for percent predicted change this would look like:

    FVC FVC %Pred FVC %Pred Chg: FEV1 FEV1 %pred FEV1 %Pred Chg
    12/29/2017 1.02 52 -27 0.61 41 -15
    12/01/2017 1.54 79 +12 0.73 48 +9
    10/13/2017 1.32 67 +5 0.67 44 -14
    06/15/2017 1.23 62 -5 0.79 51 +4
    03/16/2017 1.34 67 +10 0.75 49 0
    11/10/2016 1.16 57 -7 0.77 49 -13
    08/04/2016 1.31 64 -21 0.88 56 -13
    05/24/2016 1.71 85 +6 1.01 64 +2
    03/01/2016 1.59 79 +21 0.99 63 +29
    12/29/2015 1.17 58 -16 0.77 49 -8
    06/29/2015 1.53 74 -1 0.84 53 -5
    09/15/2014 1.58 75 +6 0.92 56 +12
    05/22/2014 1.46 69 -21 0.81 50 -25
    11/21/2013 1.89 90 +16 1.08 67 +12
    06/06/2013 1.58 74 +9 1.00 60 +18
    03/25/2013 1.38 65 0 0.84 51 0

    Since it is the amount of change that’s important, it would seem that’s all that would need to be shown on a graph. Depending on whether you’re more interested in relative change or percent predicted change, this could look like this:

    or this:

    In either case, showing the level of change that’s considered significant makes it easy to see when it is and when it isn’t.

    It’s always seemed to me that my lab’s reporting software was an afterthought. The tools I have for formatting and managing reports continues to be crude, limited, time-consuming and rarely updated. This applies just as much (if not more so) to trend reports, and is just one of the reasons I was disappointed at the limited scope of the 2017 ATS reporting standards.

    One of the primary purposes of routine pulmonary function testing is to monitor changes due to the progression of a lung disease or it’s treatment. Despite this fact, far more attention has been paid towards the initial diagnosis than to assessing changes over time. Assessing trends is important and there are easy ways to fix trend reports and make them a lot more useful but here we are, well into the 21st century and I still have to keep a pocket calculator on hand so that I can calculate the percent change when I review reports, something that hasn’t changed for decades. This may be the time of year to think about the past but it’s also clear that in many ways we’re still stuck there.

    References:

    Brusasco V, Crapo R, Viegi G. ATS/ERS Task Force: Standardisation of lung function testing. Standardisation of spirometry. Eur Respir J 2005; 26: 319-338.

    Brusasco V, Crapo R, Viegi G. ATS/ERS Task Force: Standardisation of lung function testing. Interpretive strategies for lung function tests. Eur Respir J 2005; 26: 948-968.

    Culver BH, Graham BL, Coates AL, et al. Recommendations for a standardized pulmonary function reports. Am J Respir Crit Care Med 2017; 196(11): 1463-1472

    Quanjer P, Ruppel GL, Langhammer A, et al. Bronchodilator response in FVC is larger and more relevant than FEV1 is severe airflow obstruction. Chest 2017; 151(5): 1088-1098.

    Townsend, MC. ACOEM Guidance Statement. Spirometry in the occupational health setting – 2011 update. J Occup Env Med 2011; 53(5): 569-584

    Creative Commons License
    PFT Blog by Richard Johnston is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License