Validation of Wagner's Classification: A Literature Review
P ractitioners attempt to provide care diligently, honestly, and sincerely. Clinical decision-making is an important tool used during treatment. Keenan and Redmond1 refer to the evaluation of evidence as one key factor that governs clinical decision-making. To ensure high quality care, Baxter and Baxter2 instruct practitioners to base their practice on sound clinical evidence. Redmond et al3 extends the concept because podiatrists rely on quantitative measurements when evaluating clinical research; these measurements must be accurate, reliable, and valid. Clinimetrics, the development and evaluation of measurements,4 may be applied to wound assessment tools such as diabetic foot ulcer classification systems.
Mulder5 acknowledges the existence of national guidelines for the treatment of chronic wounds. The Agency for Health Care Policy and Research6 and the National Pressure Ulcer Advisory Panel provide these practice guidelines.7 Zulkowski et al8 report that these guidelines are not designed to direct assessment of foot ulcers. The International Working Group on the Diabetic Foot agreed on 43 standard definitions for treating diabetic feet.9 However, a classification system for diabetic foot ulcers was not included because no one has documented having enough clinical experience with one system to validate or endorse it.10 Although many diabetic foot ulcer classifications are available, few have been clinically tested.
The first classification systems developed and accepted by clinicians were Meggitt's11 and Wagner's.12,13 The Wagner system is taught in podiatry colleges in the United States. The Wagner system is a source of questions used on the American College of Foot and Ankle Orthopedics and Medicine Board examination and has been introduced as evidence in United States' court proceedings.14
The purpose of this review is to present the Wagner dysvascular foot classification system, discuss its clinimetric properties, and examine existing validation literature of the Wagner and other diabetic classification systems.
Of the many diabetic wound classification systems available today, the Meggitt-Wagner Classification is the one most often cited. This system was first described by Meggitt11 and subsequently universalized by Wagner.13 The natural history of dysvascular foot breakdown is divided into six grades ranging from Grade Zero to Grade Five. For comparison purposes, both classification systems are presented in Figure 1. The Wagner system is similar to an ordinal scale denoting ranked order, allowing for nonparametric data analysis.15,16 Grade is determined based on depth of the skin lesion and the presence or absence of infection and gangrene.13,17
Both Meggitt's11 and Wagner's13 systems allow for bidirectional progression from Grade Zero to Grade Four and regression from Grade Four to Grade Zero. The property of bidirectionality is not generally accepted as a positive attribute to a classification system. One reason is that many third-party reimbursement plans are tied to a particular wound description or class. However, the original intent of Meggitt's11 and Wagner's13 was to allow for descriptionof the dysvascular foot over a period of time pre- and postsurgery and for nonsurgical interventions.11,13 Wagner's classification system is a visual one, implemented without the aid of an objective precision device like a ruler, grid, or measuring tape. Subjective in nature, it may be considered a noncontact measurement system. Jeffcoate et al18 considers this subjectivity a major disadvantage of the system. Initially, this subjectivity may present reliability concerns.
Yarkony et al19 demonstrated excellent interrater reliability when comparing Shea's pressure ulcer classification to their classification system with confidence intervals of 0.86 and 0.9, respectively. Like Wagner's system, these two systems are noncontact assessment tools using the deepest anatomical landmark as a limit to wound grading.
Science of the Meggitt-Wagner Classification
Meggitt defends his system as a rational approach to guide treatment of the many varied diabetic foot lesions.11 Data to substantiate its use are the result of a 14-month prospective study involving 151 consecutive foot breakdowns in 145 patients with diabetes. Overall, 78% of the feet healed locally and 22% required an amputation procedure. Sensitivity, specificity, and predictive values were not presented. Sensitivity is defined as the proportion of true positives that are correctly identified by the test. Specificity is the proportion of true negatives that are correctly identified by the test.20 Predictive value is the proportion of patients with positive test results who are correctly diagnosed; negative predictive value is the proportion of the patients with negative test results who are correctly diagnosed.21 The inclusion of these values may have added more credibility to Meggitt's findings. Meggitt's system was the first attempt to classify and treat diabetic foot ulcers in a logical manner. As such, its contribution to the science of managing diabetic foot lesions should not be underestimated.
Wagner intended to provide greater specificity and precision during clinical descriptions of diabetic foot ulcers by matching the patients with treatment programs.13 He acknowledges that his system evolved from the accumulation of 30 years of experience, with statistical data gathered by many individuals who trained with him since 1969. Although he credits Meggitt for assisting in the development of this classification system, Wagner does not share any statistical data in this review.13 Instead, detailed descriptive cases are presented along with procedures and results. Wagner states that some ideas were original and others were borrowed, and the line separating them may have become blurred.13
Clinimetrics of the Wagner Classification
According to Jeffcoate et al,18 a classification system has multiple purposes and its design depends on its application. This belief may be applied to the Wagner classification system by examining the clinimetric properties of readability, accuracy, reliability, and validity.
Readability. Readability is the property of the test that surrounds the language used in the test. The Wagner test features no unnecessary technical jargon ambiguity. van Rijswijk22 observes that the major advantage of assessment scales, such as the Wagner scale, lies in their use of standardized terminology; thus, facilitating communication. When reviewing the Wagner's language, all grades are described as a function of wound anatomy, presentation of infection, and the clinical signs of ischemia. Evaluators must be consistent with their observations and adopt standard definitions and classifications similar to those offered by Lazarus et al.23 After reviewing five reference sources written by different healthcare professionals (ie, nurses, physical therapists, podiatrists, surgeons, and pedorthotists), differences in the Wagner classification system are revealed. Two references, Hess24 and Sussman and Bates-Jessen,25 follow Wagner's descriptions exactly. Sussman includes a visual reference similar to Meggitt's and Wagner's original work.11,13,25 Hetherington26 paraphrases the system by adding and eliminating words. Browne and Sibbald27 subdivided Grades Two and Three using a category B to denote cellulitis; thus, increasing the grades to eight and adding to the system's complexity in an attempt to increase its sensitivity. An explanation for this subdivision may be found in their reference to a critical review by Armstrong et al.28 McDermont and McDermont29 uses the most abbreviations. All grades are reduced to two- to three-word descriptions. An extensive literature review reveals no attempts by these authors to validate their modified Wagner systems through clinical trials.
Accuracy. Accurate is defined as careful, exact, and free from errors; therefore, accuracy is the state of being accurate or having precision.3 All measurements include some element of error.30 Obtaining accurate descriptions of an ulcer is important to planning, monitoring, and predicting the patient's clinical outcome.27,28 The intention of Wagner's system was to assess the natural history of the dysvascular foot and apply treatment protocols to each clinical stage to maximize outcomes. The addition of a visual reference guide increases this system's accuracy and reduces tester error potential by removing bias when grading ulcers. According to Riegelman and Hirsch,30 instrument error includes a bias in assessment that results when the testing instrument is not appropriate to the conditions of the study or is not sufficiently accurate. When Wagner's system is abbreviated, paraphrased, or subdivided, accuracy of the original system is lost due to instrument error. Accurate inferences cannot be drawn between the data of the original and an adulterated tool. If a comparison is made between these groups, an ecological fallacy error can occur because a relationship is implied between a group and an individual level, when in actuality, no association exists.
Reliability. Reliability is the degree to which a measure can be reproduced.31,32 Although reliability is a necessary condition for validity, it is not a sufficient condition for validity (ie, other factors must be considered).31,33 Interrater reliability is an evaluation of the reproducibility of a measure between observers. The definitions of Wagner's grades represent clinical evaluations as a function of wound depth. The end points of Grade Zero and Grade Five leave little room for interpretation; however, Grades One, Two, and Three all require clinical experience to demonstrate consistent reproducible reliability.
Validity. Validity is the extent to which an instrument measures what it is intended to measure.3 Lacity and Jansen33 define validity as making common sense and being persuasive to the reader. Validity is not a property of the test or assessment but rather the meaning of the test scores.3 Messick34 implies a measurement must have reliability in order to be considered valid. Six types of validity are noted: Face validity, content validity, criterion validity, criterion-related validity, concurrent validity, predictive validity, and construct validity.3
Face validity is validity taken at face value.31 Face validity is the weakest form of measurement.3 Wagner's work in developing and testing the classification system can be interpreted as face validity. Even though the Meggitt-Wagner classification system is widely used, few attempts have been made to validate its predictive outcome value.17 Validation attempts are limited in the current literature.35-37
Content validity often applies to questionnaires and inventories that make up an instrument and how well, when considered together, they address the issues.3 Content validity draws an inference from test scores to a large domain and is concerned with sample-population representativeness.31 Through content validity, evidence is obtained by looking for agreement in judgments by content experts. One observation made during this review of the literature concerning the validation of a diabetic foot ulcer classification system is that no agreement exists among the experts concerning content validity. The unfortunate result is that one expert finds fault with another author's system while presenting the benefits of his own classification system.
Calhoun et al35 retrospectively assigned Wagner grades to 850 infected diabetic foot wounds to determine whether intensive and aggressive medical and surgical treatments improved patient outcome. They concluded Wagner's system allows for the development of rational therapy algorithms and provides a convenient method of comparison for scientific communication.
Pittet et al36 conducted a 5-year retrospective cohort with prospective follow-up on 105 patients with diabetic foot lesions. Because of its ease of use and reproducibility for future research and comparison, the Wagner system of clinical ranking was selected. Most patients with Wagner Stage 1 and Stage 2 diabetic foot ulcers were cured with conservative measures, while patients diagnosed with Stage 5 "gangrene" failed conservative treatment.
Criterion-related validity is determined by benchmarking the new measure against a gold standard test already in existence.3Predictive validity is used to predict a future criterion score by establishing the outcome of the target test.3Criterion validity is about prediction rather than explanation.31 A gold standard test is often preferred as end points or outcome measure in clinical investigations.38 Riegelman and Hirsch30 define a gold standard as the criterion used to unequivocally define the presence of a condition or disease under study. When Wagner established his classification system, he used Meggitt's system as a model; thus, the gold standard established criterion-related validity. Over time, the Wagner system, through a grandfather phenomenon, has been accepted as the gold standard for diabetic ulcer classification.
Concurrent validity is established when two measurements are taken at relatively the same time. This type of validity is used to determine whether a new target test is more efficient than the existing "gold standard." Only the Oyibo et al37 study satisfies both conditions of criterion and concurrent validity. They compared the University of Texas San Antonio Diabetic Wound Classification system with the Wagner system to predict the outcome of 194 patients with diabetic foot ulcers. The Texas classification system uses a grid with the ulcer grade as the x axis and the stage of the ulcer as the y axis. Grades of wounds are defined using the following terms: Grade 0 represents a pre- or postulcerative site. Grade 1 ulcers are superficial wounds through the epidermis or epidermis and dermis but do not penetrate to tendon, capsule, or bone. Grade 2 wounds penetrate the tendon or the capsule. Grade 3 wounds penetrate the bone or joint. The four stages used to describe each wound grade are 1) clean wounds, 2) nonischemic infected wounds, 3) ischemic wounds, and 4) infected ischemic wounds. The criteria for each stage are based on clinical and laboratory results. The University of Texas San Antonio Diabetic Wound Classification system is represented in Table 1. Using Cox regression statistical analysis, Oyibo et al37 assessed the ability to grade and stage foot ulcers; the Texas system was determined to be a better predictor of group outcome than of individual patient outcome.
Construct validity examines the agreement between the measure and the theory of what the tool should be measuring in the instructions or items included.3 Construct validity is also known as theoretical construct because it draws an inference from test scores.31 Test bias is a major threat to construct validity.31 The Wagner system does not consistently consider (or ignores) biomechanics, foot deformity, ulcer size, infection, and peripheral vascular disease. All of these factors have been shown to affect ulcer formation and impact the risk of lower extremity amputation.39-47 The lack of consistent representation of these variables at each stage jeopardizes construct validity. Without consistent representation of these variables, the system will fall short of the demands and expectations of the tester. Foster and Edmonds48 emphasize that neuropathy, ischemia, and infection are involved in almost every ulcerative lesion of the diabetic foot and should be included in any grading system. Their omission in Wagner's system is a major shortcoming. Oyibo et al37 proved the construction of the Wagner system can predict individual patient outcome but lacks sufficient predictive abilities for diabetic populations.
Newer Classification Systems
Armstrong and Peters17 cite limitations to Wagner's system, noting that infection is included in only one stage (Grade 3) and vascular status is only included in the last two stages - Grade 4 and Grade 5. These limitations detract from Wagner's intention when the data are applied to diabetic populations. Recognizing these limitations may have been the motivation for the developing the Texas Classification (UT) system. Lavery et al46 defends UT's system objectivity by noting it is based on clinical and laboratory data. Jeffcoate18 commends the Texas creators for this addition but counters that it is complex, imprecise, and retains some of the ambiguity of Meggitt's and Wagner's classifications. Armstrong et al28,49 validates the Texas system using the contributions of depth, infection, and ischemia to the risk of amputations for groups of diabetic patients as opposed to individual patients. They discovered that as the grade and stage of the wound increased, the patient's risk for amputation increased. It may be inferred that Jeffcoate's criticism prompted this research in order to establish predictive validity by Armstrong and colleagues.
Jeffcoate et al18 concluded that the lack of a working diabetic foot ulcer classification hinders management practices. An attempt to develop a working system is the Size (Area and Depth), Sepsis, Arteriopathy, and Denervation - the S(AD) SAD system (see Table 2).50 Measurements of foot ulcers predict the outcome; including foot ulcer measurements into a diabetic wound classification system will make the system more sensitive.42
The S(AD) SAD system differs from Wagner's and Texas systems in that it addresses degrees of ischemia, categorization of area and depth, inclusion of reference to neuropathy, and most of all, is not intended as a guide to management.50 During a review of the S(AD) SAD system and Wagner's system, Young51 reported that the categories of infection have clinical "face validity," but both authors failed to define cellulitis and osteomyelitis.
Although validation evidence of Wagner's dysvascular foot classification system and its clinimetric properties are scant, the Wagner system is widely used. Diabetic foot ulcer classification systems should delineate unique ulcer types with definable characteristics and offer prognostic information. The Wagner system may be appropriate to guide treatment and prognosis for individual patients. Plassmann and Peters52 discovered that even simple methods and instruments for measuring wound parameters require well-defined protocols. If protocols are not in place, wound measurements are useless and may lead to misinterpretation and resultant dangerous consequences. Hopefully, the University of Texas or the S(AD) SAD systems, once further validated, may fill the current void. - OWM
This manuscript was completed as part of a written assignment for the MSc/PostGraduate Training in Wound Care and Tissue Repair sponsored by the College of Medicine at the University of Wales, Cardiff, UK, under the direction of Mrs. Vanessa Jones, Dr. Patricia Price, and Professor Keith Harding.