Standards of Evidence and Scientific Integrity
Treves, A., Krofel M, Ohrens O and van Eeden 2019. Predator Control Needs a Standard of Unbiased Randomized Experiments With Cross-Over Design. Frontiers in Ecology and Evolution 7:462. doi: 10.3389/fevo.2019.00462
Standards of Evidence in Wild Animal Research
Read the full report prepared for the Brooks Institute for Animal Rights Policy & Law, published online 30 June 2019, and continually evolving as we experiment and learn more >Or read the abstract below
This document is designed as a list of principles and expectations for gold standard research on wild animals. It is intended for those funders, scientists, peer reviewers, editors, publishers, or reporters who are supporting, conducting, reviewing, or communicating research to any audience. Stated simply, gold standard research aims for the strongest inference conducted with the highest standards of evidence and scientific integrity.
Transparency is the clear and thorough explanation of all assumptions, methods, and steps in science. As a precondition and consistent pattern, every step in the research process should be clear and understandable to an educated lay audience. The principles of objectivity, reproducibility, and independent review depend on thoroughgoing clarity. Therefore, each of the following steps should pass its own test of transparency.
Objectivity is “the ability to consider or represent facts, information, etc., without being influenced by personal feelings or opinions; impartiality; detachment” . Starting assumptions, worldviews, and presuppositions should be made explicit at the outset, beginning with anthropocentric or non-anthropocentric value judgments (does the researcher grant humans and nonhumans equitable consideration or place priority on humans or nonhumans?) In addition, the researcher should make clear if legal structures or institutional permits relating to property rights or responsibilities toward animals have shaped their research design and be explicit about which legal requirements act under what jurisdiction. Objective research almost always includes statement of opposed alternatives, as in “We tested x against its alternative(s) y and z” rather than “We tested the effectiveness of x”. Also, y and z should be genuine plausible alternative explanations. Even for research that is not experimental, it is wise for scientists to keep in mind alternative explanations for cause-and-effect or for the origins of natural phenomena throughout the research process. Note that some research on animals involves animals as interventions in addition to animals as subjects (e.g., predator-prey experiments). In such experiments, all of the recommendations in this document should be considered for both the treatment animals and the subject animals. For simplicity, we refer to subjects below for all animals involved in research.
Reproducibility is the quality of a scientific finding that can be replicated by other scientists under the same conditions as the original: Independent scientists should be able to follow written descriptions of research methods to replicate every step and the findings, given sufficient resources and equipment. Facilitating reproducibility is a responsibility of the original researcher and they should welcome oversight of that facilitation and welcome efforts at replication, including sharing materials, techniques, and raw data no matter how intellectual property is conceived and despite rivalry or interpersonal animosities. Failure to reproduce is a bad sign for the evidentiary strength of the original research if the effort at replication is done in good faith with care. There are three categories of reproducibility: exact, technical, and conceptual. Exact reproducibility requires every step in the original process be replicated identically, which is rare because the location, timing, materials, individuals, etc. might be influential on the findings and might differ in the subsequent replication efforts. Failure to replicate under exact reproducible research suggests the original findings were misleading. Technical reproducibility is more common and consists of replicating with very close approximations of all methods. Failure to replicate under such circumstances might require review of the described methods and repetition by one or both parties. Finally, conceptual reproducibility aims to replicate the findings by a different cause-and-effect pathway or using different methods. Such efforts can be powerfully confirmatory of underlying biological mechanisms exposed by the original research. Failure to replicate might indicate the causal mechanism was misidentified by the original researchers or the subsequent researchers erred.
Independent review: Before data collection, scientists should subject the proposed methods to scrutiny and subject their own interpretation of data after their collection to scrutiny. Publicizing scientific communications prior to independent review is a questionable practice, although this is an evolving debate in the literature on pre-publication review. The scrutiny of methods and scrutiny of interpretations should be undertaken by qualified parties with an arm’s length relationship to the researcher and without conflicts of interest about the scientists conducting the research or their findings. Conflicts of interest relate mainly to financial or career advancement issues, not to differences of opinion. Researchers should welcome review by experts in their field, not side-step such review by omitting citation to such experts or by explicitly discouraging those experts as reviewers. Peer reviewers should also follow the steps in this document, particularly for transparency and objectivity. Reviewers, editors, and publishers also have specific responsibilities for the quality of scientific communications. The specific responsibilities relate to maintaining the quality of the scientific record long after a particular scientific communication has been made public and passed review (See the 2019 guidelines of the Committee on Publication Ethics (COPE here (accessed 30 June 2019). Researchers have primary responsibility for correcting, retracting, or publicly expressing diminished confidence in their own scientific communications no matter how old they might be. The broader scientific community has secondary responsibility for cleaning the scientific record if credible evidence surfaces of omissions, errors, or misconduct (fabrication, falsification, or plagiarism). Efforts by any party to silence critics or ignore qualified criticisms are unacceptable. Efforts to retaliate against critics should lead oversight organizations to investigate possible misconduct by the retaliators (See the National Academies 2017 recommendations on fostering scientific integrity here (accessed 30 June 2019).
First, research should adhere to 4 essential principles of scientific integrity.
Second, consider the gold standard for strength of inference.
Randomized, controlled experiment: Researchers should randomly select the subjects who receive treatment and those who receive no treatment (control conditions). Any departures from fully random selection should be documented and justified. If the treatment involves interventions that are presumed to have no effect in addition to the effective treatment, the control conditions should also include those interventions, e.g., placebo controls. Only the presumed effective component of the treatment should differ between treatment and control conditions, lest uncertainty about the effectiveness of treatment reduce the strength of inference.
Lesser standards: We accept the rare need to study wild animals using the lower silver or bronze standards because some sociopolitical or biophysical settings preclude gold standard experiments. Such situational constraints should be rare. These lower standards lower confidence in the results by 50% or more. Silver standard or lower research is affected by uncontrolled factors that weaken inference. Many such factors can intrude. For example, the silver standard of before-and-after comparisons introduces the variable of time passing, because all subjects receive the treatment and its effect on subjects are followed over time. The bronze standard of correlational or observational study introduces many such potentially misleading factors because the researcher did not exert control over the intervention timing, magnitude, design, or the subjects receiving it.
Higher standards: We defined the higher platinum standard that strengthens inference beyond the gold-standard of randomized, (placebo) controlled experiments without bias (see below for more on bias). The platinum standard includes both cross-over design and some level of blinding. Cross-over design requires the reversal of treatment and control within subjects. Because of randomization, some subjects will begin as placebo controls and others in treatment conditions, but all subjects will reverse to the other condition at approximately the same time midway through the experiment. A third reversal further strengthens inference about the effect of treatment. Blinding refers to concealing aspects of the experiment from different persons responsible for different portions of the research team or from reviewers, as we explain next.
Single-, double-, triple-, or quadruple-blinding: Blinding is a design element intended to further reduce possible intentional or unintentional bias by researchers. The amount of blinding (single-, double-, triple-, or quadruple-) refers to how many steps in the experiment are concealed from researchers or reviewers. The steps that might be blinded include: (i) those intervening randomly should be unaware of subject histories and attributes and should not communicate which subjects received the control or treatment intervention to others in the research team (this depends on having used an undetectable intervention); (ii) those measuring the effects are unaware of which intervention the subject received (this too depends on having used an undetectable intervention); (iii) those interpreting results are unaware of which subjects received treatment or control; and (iv) those independently reviewing results are unaware of which subjects received treatment or control and unaware of the identity of the scientists who will or have conducted the research. Because science knows no authority, only evidence, blinding independent reviewers to conceal all unnecessary information might avoid several forms of bias (below). Note that blinding steps (ii) and (iii) might be feasibly done by the same set of people but the role in step (i) should be separate from all other roles to assure the success of blinding, and the role in step (iv) should be separate from all other roles to meet the criterion of independence.
Third, consider potential biases (intentionally or unintentionally slanting evidence to favor or disfavor one hypothesis or treatment) especially when it favors the scientist’s preferred result.
Selection or sampling bias and selecting a suitable sample of subjects: Any research on animals should consider the minimum number of subjects needed to detect an effect of intervention (treatment), while at the same time minimizing the infringement on the lives of those animals. Hence sample size is both a scientific and ethical decision that should be made transparently and subject to external review (see above). Once the appropriate number of subjects has been identified, selection of which subjects to investigate demands the utmost care to prevent self-selection bias and researcher bias, both of which might lead to treating subjects likely to show an effect of treatment. Self-selection and researcher bias are forms of selection bias that are very common and pernicious sources of unreliable findings. Random assignment is recommended to avoid the worst form of bias, which is systematic error in favor of a preferred results. When randomization is impossible, the next best procedure is blinding the selection and choosing subjects haphazardly without regard to their attributes or history and without regard to the potential effects of treatment or control.
Treatment bias: This bias arises when placebo control or treatments are applied without regular, consistent intervention methods (e.g., haphazard doses of a medicine). The worst form of treatment bias is systematic for a favored result, when the timing, magnitude, or quality of the intervention is tailored to the history, attributes, or susceptibility of the subjects. Blinding (see above), standardized intervention protocols, and registered reports (see below) are reliable defenses against treatment bias.
Measurement bias: This bias arises when measurement methods are inconsistent, imprecise, or inaccurate. The worst form of systematic bias arises when measurements are tailored to the history, attributes, or susceptibility of the subjects. Blinding (see above), standardized measurement protocols, and registered reports (see below) are reliable defenses against measurement bias.
Reporting bias: This bias arises when analysis, of data, interpretation of results, or scientific communications misrepresent research methods or findings. The worst form arises when the reporting favors the scientists’ preferred outcomes and naturally this is the most common form. Blinding (see above), standardized analysis protocols, and registered reports (see below) are reliable defenses against reporting bias.
Independent review and publication bias: This bias arises when independent reviewers are favorably or unfavorably disposed toward the scientists, their results, or the nature of the scientific communications arising from the research. The worst form (and most common) arises when reviewers, editors, or publishers have an interest in findings or the power structures that might be affected by findings. A related form of independent review bias arises after scientific communications are made public, when critics try to silence or retaliate against the scientists who made those communications. Criticism should be welcomed but silencing or retaliating against scientists is unacceptable. The best defense against bias in independent reviews is the registered report and concealing the identity of authors from their peer reviewers. Registered reports are a new tool spreading in the scientific peer-reviewed journals. It adds an initial round of peer review of methods prior to data collection. If the first round of peer review accepts the methods, the journal commits to publish the findings regardless of the outcome, as long as no substantive changes in methods occurred after the first round of peer review. Registered reports guard against a publication bias that favors novel, striking results and disfavors confirmatory, replication efforts, while simultaneously guarding against reviewer bias that can favor or disfavor findings based on non-objective preferences of the reviewers.
Recommended citation: “Treves, A. (2019) Standards of evidence in wild animal research. Report for the Brooks Institute for Animal Rights Policy & Law. 30 June 2019, available at http://faculty.nelson.wisc.edu/treves/CCC.php/standards"
Scientific articles addressing standards of evidence and scientific integrity
Treves, A., Santiago-Ávila, F., Lynn, W.S. (equal co-authors) 2018. Just Preservation. Biological Conservation 229: 134-141. Soon to be reprinted with a foreword and reader commentary in Animal Sentience in 2019. Our most detailed examination of anthropocentrism and non-anthropocentrism in conservation science and practice, with a recommendation on legal mechanisms for equilibrating human and nonhuman interests in courts.
Treves, A., 2019. Peer review of the proposed rule and draft biological report for nationwide wolf delisting, ed. U.S.F.W.S. Department of Interior. Department of Interior, U.S. Fish & Wildlife Service, Washington, D.C. full peer reviews here. This lengthy critique of the US federal government's 2019 proposed rule, to remove Endangered Species Act protections from gray wolves nationwide, includes several passages that address standards of evidence in animal research, recommending that the government should sift and winnow evidence based on accepted scientific standards including strength of inference, independent review, and debate among scientists, before treating all published studies as equivalent during the policy process.
van Eeden, L., Eklund, A., Miller, J.R.B.,...17 co-authors... Treves, A. (equal first authors) 2018. Carnivore conservation needs evidence-based livestock protection. PLOS Biology https://doi.org/10.1371/journal.pbio.2005577
Treves, A., Artelle, K.A., Paquet, P.C. 2018. Differentiating between regulations and hunting as conservation interventions. Conservation Biology (Accepted articles are posted online prior to type-setting and publication in print.).
Treves, A., K. A. Artelle, C. T. Darimont, W. S. Lynn, P. C. Paquet, F. J. Santiago-Avila, R. Shaw and M. C. Wood 018. Intergenerational equity can help to prevent climate change and extinction. Nature Ecology & Evolution DOI: 10.1038/s41559-018-0465-y. Supporting Data.
Artelle, K.A., Reynolds, J.D., Treves, A. Walsh, J.C., Paquet, P.C., Darimont, C.T. 2018. Hallmarks of science missing from North American wildlife management. Science Advances. 2018.
Lopez-Bao, J.V., Chapron, G., Treves, A. 2017. The Achilles heel of participatory conservation. Biological Conservation 212: 139-143.
Treves, A., Artelle, K.A., Darimont, C.T., Parsons, D.R. 2017. (3.8 Mb) Mismeasured mortality: correcting estimates of wolf poaching in the United States. Journal of Mammalogy 98(3): open access at DOI: https://doi.org/10.1093/jmammal/gyx052
Darimont, C.T., Paquet, P., Treves, A., Artelle, K.A., Chapron, G. 2018. Political populations of large carnivores.Conservation Biology 32(3):747-749.
Carroll, C., B. Hartl, G.T. Goldman, D.J. Rohlf, A. Treves, J.T. Kerr, E.G. Ritchie, R.T. Kingsford, K.E. Gibbs, M. Maron, J.E.M Watson. 2017. Defending scientific integrity in conservation policy processes: lessons from Canada, Australia, and the United States. Conservation Biology DOI: 10.1111/cobi.12958
Treves, A., Krofel, M., McManus, J. (equal co-authors).2016. Predator control should not be a shot in the dark. Frontiers in Ecology and the Environment14: 380-388.
Treves, A., Chapron, G., Lopez-Bao, J.V., Shoemaker, S., Goeckner, A., Bruskotter, J.T. 2015. Predators and the Public Trust. Biological Reviews doi: 10.1111/brv.12227