AMMUNITION SELECTION: RESEARCH AND MEASUREMENT ISSUES By N.J. SCHEERS, Ph.D Operations Research Analyst and STEPHEN R. BAND, Ph.D. Special Agent Institutional Research and Development Unit FBI Academy Quantico, VA When law enforcement officers talk about the "most effective" caliber bullet or the "best" combat handgun on the street, emotions run high and opinions vary. This can be expected, since these topics have caused considerable debate for years. But what of the firearms expert who is tasked with the responsibility of selecting ammunition and firearms for a department? What are the crucial issues that should be considered? Where should testing begin? What needs to be addressed in order to conduct a fair and impartial ammunition and firearms selection program? The FBI Academy's Institutional Research and Development Unit (IRDU) provides consultation primarily to the FBI's Training Division personnel regarding research methodology, evaluation and statistical analysis. This article provides an introduction to research design and statistical analysis with regard to ammunition selection. It is intended to assist firearms personnel in designing an ammunition research project and analyzing the results. The topics addressed include (1) research design, (2) criteria for selecting ammunition, (3) rater bias, and (4) statistical analyses. Throughout the article emphasis is placed on understanding the logic of the various elements of a research project. DESIGN OF THE RESEARCH Kerlinger, a research methodologist, indicates that research design is the structure, plan or strategy developed to obtain results from a research project. "Research designs are invented to enable the researcher to answer research questions as validly, objectively, accurately, and economically as possible."(1) In designing any ammunition selection study, the first step is to determine the comparisons to be made. For example, is the purpose of the study to compare the same caliber bullet performance for ammunition made by different companies or to compare the performance of the same caliber bullet in handguns produced by different manufacturers? The following research design is used throughout this article as a convenient example; three different calibers are compared on performance measures of penetration, expansion and weight in a variety of target simulants (targets). Examples of targets are gelatin blocks to simulate human tissue, sheets of metal to resemble the properties of an automobile door, automobile windshield glass held at a given angle, and so on. "Internal validity" and "external validity" are two major criteria by which any research design is judged. Internal validity, for the example shown above, is the extent to which differences in penetration, expansion and weight can be attributed to differences in the physical characteristics of the calibers rather than to other influences or conditions. External validity is the extent to which similar differences in performance would generalize to other ammunition, conditions or settings. The ideal would be to maximize both internal and external validity. However, the importance of maximizing internal validity, that is, controlling unwanted influences, by necessity, often limits external validity. Internal Validity Internal validity is extremely important in any ammunition selection study; if the research is internally valid, then there is a high probability that the differences in caliber performance are caused by the different sizes of the calibers. Internal validity is synonymous with control over unwanted influences. For ammunition selection studies, the unwanted influences that must be controlled or held constant would include environmental conditions, physical/human conditions, and target simulants. Environmental conditions-In an indoor range, environmental conditions for firing ammunition can be easily controlled. Shooting should take place where temperature, weather, light and noise are kept fairly constant. Without an indoor range, keeping these conditions constant is extremely difficult. Physical/human conditions©©Many other physical and human influences can affect a study. Some of these influences can be determined; others cannot. The best way to control unwanted influences is to simultaneously set up test barrels, one for each caliber to be tested, and randomly determine the order in which the test barrels are fired. (A table of random numbers can be used to determine the order.) For example, a researcher who fires one caliber all morning and then fires a different caliber throughout the afternoon might have measurements influenced by the fatigue of late afternoon shooting and therefore unintentionally record measurement results favoring the caliber shot in the morning. Other variables are not controlled by random ordering for firing the different calibers. For example, if test barrels are not of equal length, firing them in random order would not compensate for these differences. Using test barrels of unequal length will affect not only the velocity but also the extent of penetration. Therefore, if unequal length test barrels are used, additional research is necessary to determine the ”extent• of the differences among the calibers tested, which adds greatly to the comp lexity of the research. Targets-Whether one type of target or a variety of targets are used in the study, controlling the variations in the construction of these targets is critical and can be done by randomly distributing targets (again using the random numbers table) of a given type across calibers. For example, if a batch of gelatin blocks is not mixed thoroughly and blocks with greater density are used with only one caliber, then any differences in penetration, expansion or weight for the different calibers could be partiall y or fully caused by the consistency of the gelatin blocks. Since gelatin blocks are used both as stand-alones and behind other targets, two other controls are suggested. First, because gelatin blocks can deteriorate easily, care must be taken to preserve their integrity. Gelatin blocks should be stored in insulated coolers prior to use and should be checked by measuring their temperature before being used for targets. Second, an already-penetrated gelatin block should not be used again as a target. The trauma from the first round's impact may disturb the consis tency of the gelatin and affect the measurement of penetration from later rounds fired into it. External Validity After maximizing internal validity, the reseaercher must also plan for external validity so that the results can be generalized beyond the bullets used in the study. There are many conditions under which results may be generalized; no study can accomplish all of them. However, it's important to know what these conditions are since the generalizations that cannot be made set the limitations of the study. External validity is the extent to which any difference in performance among the calibers can be generalized to (1) a larger population, such as other lots of ammunition of the same caliber made by the same manufacturer; (2) different populations, such as other ammunition of the same caliber made by different manufacturers; (3) "real-life" targets that the study targets purport to "simulate"; and (4) other conditions and settings. How can a researcher determine if the results of a study can be generalized to a larger population of other same caliber bullets from the same manufacturer? If the bullets in a study are a random sample from this larger population of bullets, the bullets are representative of that population. This means that any sample of the same caliber bullets from this population can be expected to produce similar results. How can the results be generalized to other conditions or settings? One way is to build important conditions into the research design. When the study at the beginning of this article was designed to compare the performance of different calibers in a variety of targets, we decided to see if performance results would generalize over the different target types. If a particular caliber shows superior performance, will this occur in all targets in the study? Some of the targets? No one study can provide answers to all the questions that can be generated around a particular research question. Often, logic and expert judgment must be used to provide some tentative answers as to whether the results will generalize to the same calibers made by other manufacturers and to other conditions and settings. Will the same results be obtained in actual automobile doors as in simulated targets? Will the same results hold in extreme temperature as in an indoor range? If it is important to ans wer these questions with confidence, the best procedure is to carry out a series of studies that vary the important conditions and settings to determine the extent of the generalization over conditions. CRITERIA FOR AMMUNITION SELECTION The criteria we are using to determine the most effective bullet are performance measures linked to adversary incapacitation. These performance measures are penetration, expansion and weight. Reliable and Valid Measurements-Whenever any measurement is taken, whether it is a blood pressure test, an achievement test or measurement of bullet performance, it is important to know how reliable and valid these measurements are. Reliability refers to consistency of measurement; for example, it is the extent to which two raters measuring penetration for a given round obtain similar results. Validity refers to the accuracy of measurement; biased measurements can occur if the measurement of penetration f or one of the calibers is consistently too high or too low. Reliability and validity can affect the results of a study. If measurement is unreliable, i.e., if the measurement was taken with a ruler made of very flexible rubber, it will be more difficult to find true differences among the calibers. If a measurement is biased for one caliber but not another, the results may show differences that are not true differences. A New Measurement Procedure-Of the three criteria for ammunition selection, the measurement of a round's penetration into a gelatin block seems to have the most potential for reliability and validity problems. The traditional method of measuring wound tracks in ballistic gelatin is to view the track through the surface of the gelatin block and measure the channel from bullet entry to the end of the "bounce back" with a tape measure or ruler. We call this method of measuring penetration "topical measurement." There are two potential problems with the traditional measurement of penetration. The first problem centers on reliability of the measurement. Would optical/light refraction through the gelatin block result in inconsistent (more unreliable) results when penetration was measured topically? The second problem centers on the accuracy of the measurement. Is there sufficient curvature in some of the wound tracks that differential results would occur if a more accurate (valid) measure of the wound track were applied? In our work in ammunition selection, these problems have been addressed by measuring each "wound track" by two different raters using two different methods. First, measurements were taken topically using a locking metal tape measure. Then, a medical urethral catheter was used to measure the wound track internally up to the back of the resting bullet. The total catheter measurement was the internal measurement added to a topical measurement from the back of the bullet up to and including "bounce-back." Fo r each round fired, two raters measured penetration both topically and with the catheter. Both topical and catheter procedures were highly reliable when the measurements of the two raters were compared. In examining the validity of the two procedures, we found that the heaviest caliber studied showed more curvature than the lightest caliber. The average curvature for the heaviest caliber was almost one-third of an inch, with the largest recorded curvature of over one-half inch. Therefore, if curvature is expected, it is probably best to use the catheter method of measuring penetration. RATER BIAS Rater bias can occur in ammunition selection research when the researchers themselves (raters) are measuring penetration, expansion and weight. Under these conditions it is necessary to guard against conscious or unconscious biases of the researchers who may favor a specific caliber. However, favoring a specific caliber should not prevent individuals from being active in a research project. Rather, controls must be built into the research that prevent conscious or unconscious biases from affecting the re sults. The usual procedure for eliminating rater bias is to keep the raters "blind," that is, prevent those who take the penetration, expansion and weight measurements from knowing which caliber is being fired. In ammunition selection studies, firearms experts are often employed as researchers to select the most effective bullet. These experts can, for the most part, immediately determine bullet caliber from bullet performance; it is impossible to keep them "blind." To get around this problem, staff members not familiar with firearms can be taught to take penetration, expansion and weight measurements. Using blind raters will add much credibility to a research project. STATISTICAL ANALYSES When statistical inference tests are used in making decisions about results, the question being asked is, "Did the differences among the calibers happen by chance or are they true differences?" A statistically significant result is interpreted to mean that the probability of the differences among the calibers being due to chance is very small. Ammunition and firearms experts may find it useful to call upon experts in research methodology and statistics to make recommendations concerning the design of the study, sample size, procedures and statistical analyses. Oftentimes, it is possible to use a graduate student in research methods and/or applied statistics at a local university to assist in research projects. Conditions That Influence Statistical Tests Several conditions influence whether results of performance tests are statistically significant. Two of the most important influences are the size of the sample and the variability of the data. In general, the larger the sample size (the number of test bullets fired) and the smaller the variability (the amount of variation in penetration of several rounds of a specific caliber), the more likely it is that the results will be statistically significant if true differences exist among the calibers tested. While a researcher usually does not have control over the variability of the data, it is possible to have some control over sample size. In ammunition selection studies, because of the labor involved in making gelatin blocks, a sample of five rounds per caliber for several targets is considered quite large. Statistically, however, this is a small sample size and depending on the variability of the data, differences as large as one inch may not be statistically significant. Statistical Procedures for Ammunition Selection Testing Because various types of designs can be applied to ammunition selection studies, numerous types of statistical tests can be applied to the resultant data. The following analyses can be considered and discussed with a consulting statistician for additional advice with a specific project: 1. Descriptive statistics summarizing the numberÁÁ of rounds fired, the means, standard deviations, standard errors, 95% confidence intervals, and minimum and maximum measures can be recorded and displayed in tables; 2. Homogeneity of variance tests can be conducted to identify significant differences in the variability of the different calibers tested; 3. Analysis of variance (ANOVA) tests can be conducted to identify significant mean differences among two or more calibers for the various targets. If an equal number of rounds is fired for each caliber, ANOVA is the appropriate statistical test since it is robust to violations of the homogeneity of variance assumption; and 4. For those ANOVA analyses where significant differences are found, post hoc comparisons can be calculated to determine significant differences between all possible pairs of means for the different calibers tested in a project. CONCLUSION Ammunition selection research projects must be considered in the context of the overall difficulty in obtaining bullet performance data. Despite the best intentions of researchers to control potential bias and extraneous variables, "real world" variables associated with law enforcement combat situations can never be perfectly simulated. The research and measurement techniques suggested for ammunition selection projects are not unique to ammunition selection; indeed, they are widely used in the physical and behavioral sciences. However, techniques of this type infrequently appear in law enforcement-related research literature for ammunition testing. When more rigorous approaches to research are used, there is much more confidence in the results and the interpretation of the results. The importance of valid results cannot be overstated; t he lives of law enforcement officers depend on the results. Footnote F.N. Kerlinger, Foundations of Behavioral Research (New York: Holt, Rinehart and Winston, 1984).