# 656 | ResearchBox

ResearchBox # 656 - 'DataColada 101'

Bingo Table
  Show file names
  Show file IDs
  Show timestamps
Fig 1 - prereg vs paper

  Fig 1 - prereg vs paper.csv

  Fig 1 - prereg vs paper - 1. Generate .csv file.r

  Fig 1 - prereg vs paper - 2. Make Figure 1.r


  Fig 1 - prereg vs paper - Figure 1.png

Fig 3 - CDF Heatmap

  Fig 3 - CDF Heatmap - 1. Simulate heat map 2022 04 23.r

  Fig 3 - CDF Heatmap - 2. Make Figure 3.r


  Fig 3 - CDF Heatmap - heatmap CDF .05.png

  Fig 3 - CDF Heatmap - heatmap CDF .10.png

  Fig 3 - CDF Heatmap - observed heatmap.rds

  Fig 3 - CDF Heatmap - simulation results 1k.rds

This cell has more files.

(29 Mb)

  Fig 3 - CDF Heatmap - simulations results 50k.rds

Previewing files
Files can be previewed by clicking on descriptions.
Codebooks can be previewed by clicking on


Tell us if something is wrong with this Box


Joe Simmons; Uri Simonsohn, '[101] Transparency Makes Research Evaluable: Evaluating a Field Experiment on Crime Published in Nature', Data Colada

April 27, 2022   (files may not be changed, deleted, or added)

Uri Simonsohn (urisohn@gmail.com)
Joseph Simmons (jsimmo@upenn.edu)

This blog post evaluates the evidence presented by Shah & LaForest 2022, focusing on deviations from the pre-regisration and a confound in the original materials.


The authors have written the following message for visitors to this box.
Please note that these messages can be modified or deleted at any point (even after a box is made permanent)

Dear Reader,

The data were originally collected by Shah & LaForest (2022 | htm). All their data and code are available from https://osf.io/mkgwr/
While they posted all the data needed to reproduce their results, the Stata .do files with the key regressions do not load the posted data files, instead, they read files that need to be generated using the other scripts.  We reproduced those necessary data files and posted them here. Readers, therefore, do not need to on their own conduct the data cleaning and processing, they can go directly to the data analysis.

Figure 1.
To reproduce our figure taking the data as given, you just need the .csv file we posted, and the second of our R scripts 2. R Code - Make Figure 1
If you want to also reproduce the data cleaning done by Shah & LaForest, the task is more involving. The steps below will guide you through it:

Steps to reproduce the .csv file
You will need to download the Stata scripts and files posted by Shah and LaForest: https://osf.io/mkgwr/
After unzipping the file "Field Intervention Analysis Data & Code.zip" navigate to the folder: Field Intervention Analysis Data & Code\main analysis code
And execute Stata scripts numbered 1-6 until the line  #185 in the 6th file, it generates the .dta file needed for the analysis:

The first R script in our Bingo table for Figure 1 The R script: "1. R Code - Generate .csv file" takes that .dta file, keeps only the needed variables to reproduce the figure (dropping more than 1000 variables) and saves the .csv file 

Figure 3.
For this figure you can reproduce the simulations with the first script, and the figure with the second. Note that because the simulations are slow, we run them on an Amazon AWS EC2 server with 36 cores. The 50k simulations took about 1 hour. In a regular laptop it should take 8+ hours.

Our R scripts start loading a dataset, a .csv file that we reproduced using the Stata .do files posted by Shah & LaForest (because they did not post the .csv file itself, so we had to run their Stata code to produce it, and then use our code on that re-generated dataset).
If you want to re-generate that .csv file, you will need to follow the additional steps listed below. 

Steps to reproduce the .csv file
You will need to download the Stata scripts and files posted by Shah and LaForest: https://osf.io/mkgwr/
After unzipping the file "Field Intervention Analysis Data & Code.zip" navigate to the folder: 
Field Intervention Analysis Data & Code\exploratory analysis code\rolling month and radius (Figures ED3, ED4, S1, S2, S3)

The folder contains .do files numbered 1,2,3,4 and 6 (there is no file  #5).
Execute files 1-4, generating multiple dta files that are then merged in the first part of file #6 and saved as 

That file is read by our first R script to run the simulations.
The simulations are saved and the other script makes the figure.

This version: April 22, 2022