About Data
Data Sources
The original data source is from Data.Gov.
Data.Gov is managed and hosted by the U.S. General Services Administration, Technology Transformation Service.
This website allows the government provided data to be open, machine-readable formats, while continuing to ensure the privacy and security of individuals.
Our Journey: Cleaning the Data
The L.A. crime dataset from 2020 to the present is sourced from the United States government’s open data website, comprising roughly 800,000 data points across 28 variables. This data is transcribed from original crime reports that are typed on paper and therefore there may be some inaccuracies within the data. For example, some location fields with missing data are noted as (0°, 0°) and address fields are only provided to the nearest hundred block to maintain privacy. Given LAPD OpenData’s weekly updates, I focused specifically on the 2022 data due to the dataset’s substantial size. Using the OhioSuperComputer Center, I accelerated the process by filtering the 2022 records, replacing the missing values as ‘NA,’ and reorganizing and renaming variables as necessary.
Further Explanation
Time occurrence variable: changed the format into 4 digits using 24-hour clock format (XX:XX).
MO Codes: Count the number of MO codes assigned for each cases.
Victim Sex: Re-group so that X and NA will show as unknown.
Victim Descent: Re-group it to ethnicity (Asian, Black, Hispanic/Latin/Mexican, Native, White, Others).
Crime Code Description: Change it to all lower cases and using the key words and phrases, re-group them (Inchoate Crimes, Crimes Involving Weapons, Sex Crimes, Miscellaneous Crimes, Violent Crime, Property Crime).
Weapon Description: Change it to all lower cases and using the key words and phrases, re-group them (other weapons or weapons not stated, blunt instruments, knives and other cutting instruments, personal weapon, handguns, and other firearm, strangulation, poison, narcotics).
Date Reported: Remove time and format the dates (mm/dd/yyyy)
Date occurrence: Remove time and format the dates (mm/dd/yyyy)
Premis Code Description: Change it to all lower cases and using the key words and phrases, re-group them (Commercial, Vehicle, Public, Religious, Educational, Residential, Recreation, Others).