The Kansas City Gun Experiment

INTRODUCTION

This paper provides a critical assessment of a level 3 impact evaluation that was assigned in 2012. The study chosen was the “Kansas City Gun Experiment” which was undertaken by Sherman and Rogan (1995). This paper analyses how well the selected study addressed the issues of reliability of measurement, internal validity of causal inferences, external validity of conclusions to the full population the study sampled and the clarity of the policy implications of applying the results in policing.

This essay is divided into six areas. Firstly, a summary of the Kansas City Gun Experiment was presented. This summary gives a brief account of the history of the experiment as well as describes the criminological theories to which the experiment was based, the methodological processes of the experiment and a brief description of the findings of the experiment.

Following the summary the essay verges onto the main assessment of the study. Firstly the reliability of measurement of the study is critiqued by examining its test-retest reliability and its internal consistency. Secondly the internal validity of causal inferences was assessed to determine whether the causal relationships between the two variables were properly demonstrated. The external validity of conclusions to the full population the study sampled was then assessed followed by the clarity of the policy implications of applying the results in policing.

SUMMARY

The Kansas City Gun Experiment, carried out for 29 weeks, from July 7^th 1992 to Jnuaray 27^th, 1993, was a police patrol project that was aimed at reducing gun violence, drive-by shootings and homicides in the U.S.A. It was based on the premise that seizure of guns and gun crime are inversely proportional. This hypothesis was based on the theories of deterrence and incapacitation. The Kansas City Police Department ( KCPD) implemented greater proactive police patrols in hotspots where gun crimes were prevalent. The study of these patrols were studied by Sherman and Rogan 1995) employing the use of quasi-experimental design.

Two areas were chosen for the experiment. Beat 144, the target area, was chosen due to elevated incidences of violent crimes including homicides and drive-by shootings. Beat 242 was chosen as the comparison area or control group due to similar numbers in drive-by shootings. The control group which was used to increase the reliability of results was left untreated meaning that no special efforts or extra patrols were carried out. In contrast beat 144 was treated several different strategies for increasing gun seizures. Some of the techniques used included stop and search and safety frisks.

Officers working overtime, from 7pm to 1 am, 7 days a week, were rotated in pairs to provide patrols focused solely on the detection and seizure of guns. These officers did not respond to any other calls that were not gun related. Some of the data collected to be analyzed included number of guns seized, number of crimes committed, number of gun related calls and arrest records before initiation of the experiment, during and after completion, for both experimental and control groups. The differences between the experimental and control group were then compared using a difference of means test (t-test). Gun crimes in the 52 weeks before and after the patrols in both the experimental and control group were compared using autoregressive moving averages (ARIMA) MODELS.

There was indeed a 65% increase in gun seizure and a decrease in gun crime by 49% in the target area. In the control group, gun seizures and gun crimes remained relatively unchanged. Also, there was no significant displacement of gun crimes to areas surrounding the target area. These results were also similar for homicides and drive-by shootings. Citizen surveys also revealed that most of the general public were less fearful of crime as compared to those in control groups.

RELIABILITY OF MEASUREMENT

The results of this study suggest that there may be clear implications for other cities wishing to reduce their gun crime. But how valid are these conclusions? How reliable are they? All measurements may contain some element of error. In order for the measurements recorded during the Kansas City Gun experiment to be sound, they must be free of bias and distortion. Reliability and validity therefore are important in this regard.

Reliability can be seen as the extent to which a measurement method is consistent. Reliability of a measure can be described as when a measure yields consistent scores or observations on a given phenomenon on different occasions ( Bachman and Schutt 2007, p.87). It refers firstly to the extent to which a method of measurement produces the same results for the same case under the same conditions referred to as test-retest reliability and secondly the extent to which responses to the individual items in a multiple-item measure are consistent with each other known as internal consistency.

A measure that is not reliable cannot be valid.Can it be said that the measurements used in the Kansas City Gun experiment were reliable and valid? This can be assessed by firstly by looking at its’ test-retest reliability and then secondly, its’ internal consistency.

Test-retest reliability

As funding ran out the study was never repeated under the same conditions in beat 144, thus strictly speaking there was never an opportunity to test whether the same or similar results would have been obtained over an equivalent period some time later.

Internal consistency

The measures used in this study included separate bookkeeping and an onsite University of Maryland evaluator who accompanied the officers on 300 hours of hot spots patrol and coded every shift activity narrative for patrol time and enforcement in and out of the area. Property room data on guns seized, computerized crime reports, calls for service data, and arrest records were analyzed for both areas under the study.

Sherman and Rogan (1995) then analyzed the data using four different models. The primary analyses assumed that the gun crime counts were independently sampled from the beats examined before and after the intervention. This model treated the before–during difference in the mean weekly rates of gun crime as an estimate of the magnitude of the effect of the hot spots patrols, and assessed the statistical significance of the differences with the standard two-tailed t–tests (Sherman and Roagn (1995)).

A second model assumed that the weekly gun crime data points were not independent but were correlated serially, and thus required a Box–Jenkins ARIM (autoregressive integrated moving average) test of the effect of an abrupt intervention in a time series.

A third model examined rate events (homicide and drive-by shootings) aggregated in 6-month totals on the assumption that those counts were independent, using one-way analysis of variance (ANOVA) tests. A fourth model also assumed independence of observations, and compared the target with the control beat in a before–during chi-square-test.
The t–tests compared weekly gun crimes for all 29 weeks of the phase 1 patrol program (July, 7, 1992, through Jan. 25, 1993) with the 29 weeks preceding phase 1, using difference-of-means tests. The ARIM models extended the weekly counts to a full 52 weeks before and after the beginning of phase 1. The ANOVA model added another year before phase 1 (all of 1991) as well as 1993, the year after phase 1 (Sherman and Rogan (1995)).

It is submitted that Sherman and Rogan (1995) use of the four different models described above attempted to ensure that an acceptable level of triangulation and as such, internal consistency was achieved given the fact that the program design itself did not lend itself to the researcher having data and an opportunity such that responses to the individual items in a multiple-item measure could be checked for consistency.

Reliability may be seen as a prerequisite for validity. Therefore the fact that there was never any opportunity to repeat the study, there was never any opportunity to examine whether the same or similar results would have been obtained in beat 144 over an equivalent period some time using the same policing tactics.

In other words can it be safely said that the use of the same measures as mentioned above, i.e., the onsite University of Maryland evaluator who accompanied the officers on 300 hours of hot spots patrol together with Property room data on guns seized, computerized crime reports, calls for service data, and arrest records would have yielded similar results? The simple answer is no as it was never done.

It is to be noted that the evaluator accompanied the officers on 300 hours of hot spots patrol out of 2,256 (assuming that the 300 referred to patrol car-hours). Is this number statistically sufficient to reduce the occurrence of random errors which occur as a result of over-estimation and under-estimation of recordings? It is accordingly submitted that the level of reliability of measurement is limited to the instance of this study as there is no way of testing its stability short of repeating it.

THE INTERNAL VALIDITY OF CAUSAL INFERENCES

Validity is often defined as the extent to which an instrument measures what it purports to measure. Validity requires that an instrument is reliable, but an instrument can be reliable without being valid (Kimberlin and Winterstein (2008)).

Validity refers to the accuracy of a measurement or what conclusions we can draw from the results of such measurement. Therefore, apart from the issue of reliability discussed above, it must also be determined whether the measures used in the Kansas City Gun Experiment measured what they were suppose to measure and whether the causal inferences drawn possess internal validity.

Internal validity means that the study measured what it set out to whilst external validity is the ability to make generalizations from the study (Grimes and Schulz (2002)). With respect to internal validity, selection bias, information bias, and confounding are present to some degree in all observational research.

According to Grimes, David, A. and Schulz, Kenneth, F. (2002), selection bias stems from an absence of comparability between groups being studied. Information bias results from incorrect determination of exposure, outcome, or both. The effect of information bias depends on its type. If information is gathered differently for one group than for another, this results in biasness. By contrast, non-differential misclassification tends to obscure real differences.

They viewed Confounding as a mixing or blurring of effects: a researcher attempts to relate an exposure to an outcome but actually measures the effect of a third factor (the confounding variable). Confounding can be controlled in several ways: restriction, matching, stratification, and more sophisticated multivariate techniques. If a reader cannot explain away study results on the basis of selection, information, or confounding bias, then chance might be another explanation. Chance should be examined last, however, since these biases can account for highly significant, though bogus results. Differentiation between spurious, indirect, and causal associations can be difficult. Criteria such as temporal sequence, strength and consistency of an association and evidence of a dose-response effect lend support to a causal link.

It is submitted that the onsite University of Maryland evaluator who accompanied the officers on 300 hours of hot spots patrol and coded every shift activity narrative for patrol time and enforcement in and out of the area would have been able to give a rough measure of the number of guns seized, whilst the Property room data on guns seized, computerized crime reports, calls for service data, and arrest records would have after analysis indicated whether gun crimes increased or decreased.

It could be inferred therefore that as the number of guns seized increased, the level of gun related crimes decreased and that this inference possessed internal validity.

THE EXTERNAL VALIDITY OF CONCLUSIONS TO THE FULL POPULATION THE STUDY SAMPLED

According to Grimes, David, A. and Schulz, Kenneth, F. (2002), external validity is the ability to make generalizations from the study. With regard to the Kansas City Gun Experiment, the question which must now be asked is whether the program is likely to be effective in other settings and with other areas, cities or populations.

Steckler, Allan & McLeroy, Kenneth R. (2007) quoting Campbell D.T. & Stanley J.C. (1966) argues that internal validity is as important as external validity. We have thus gone a bit further so not only is it important to know whether the program is effective, but also whether it is likely to be effective in other settings and with other areas, cities or populations. This would accordingly lead to the translation of research to practice.

It must be submitted that as with internal validity, the fact that there was never any opportunity to repeat the study, there was never any opportunity to examine whether the same or similar results would have been obtained in beat 144 over an equivalent period some time using the same policing tactics and or in any other beat for that matter. It cannot therefore be validly concluded that the Kansas City Gun Experiment would be as effective in any other beat area.

THE CLARITY OF POLICY IMPLICATIONS OF APPLYING THE RESULTS IN POLICING

The policy implications of applying the results of the Kansas City Gun Experiment are arguably fairly clear. The most important conclusion is that police can increase the number of guns seized in high gun crime areas at relatively modest cost. Directed patrol around gun crime hot spots is about three times more cost-effective than normal uniformed police activity citywide, on average, in getting guns off the street[1].

Policing bodies around the United States can conclude that although the raw numbers of guns seized in a particular beat may not be impressively large, the impact of even small increases in guns seized in decreasing the percentage of gun crimes can be substantial. If a city wants to adopt this policy in a high gun crime area, this experiment proves that it can be successfully implemented[2].

It is also clear from the Kansas City gun experiment that a focus on gun detection, with freedom from answering calls for service, can make regular beat officers working on overtime very productive.

REFERENCES

Bachman, R. and Schutt, R, K, (2007). The Practice of Research in Criminology and Criminal Justice. 3^rd Edition , Sage Publications Inc.

Sherman and Rogan (1995), “The Kansas City Gun Experiment”, National Institute of Justice, Office of Justice Programs, U.S. Department of Justice.

Sherman and Rogan (1995), “The Kansas City Gun Experiment”, National Institute of Justice, Office of Justice Programs, U.S. Department of Justice

Kimberlin, Carole L., and Winterstein, Almut, G. (2008),“Validity and Reliability of Measurement Instruments used in Research” Research fundamentals, Am J Health-Syst Pharm—Vol 65 Dec 1, 2008

Grimes, David, A. and Schulz, Kenneth, F. (2002), “Bias and causal associations in observational research”

Campbell D.T. & Stanley J.C. (1966), Experimental and Quasi Experimental Designs, Chicago, Ill: Rand McNally; 1966.

8. Steckler, Allan & McLeroy, Kenneth R. (2007), The Importance of External Validity, Am J Public Health. 2008 January; 98(1): 9–10. doi: 10.2105/AJPH.2007.126847

9. Sherman, Lawrence W., and R.A. Berk, (1984), “The Specific Deterrent Effects of Arrest for Domestic Assault,” American Sociological Review, (49)(1984):261–272.

[1] Sherman, Lawrence W., and R.A. Berk, (1984), “The Specific Deterrent Effects of Arrest for Domestic Assault,” American Sociological Review, (49)(1984):261–272.

[2]

Order Now