Cleanning combined dataset

1) Read all csv files and merge them as one data frame.
2) Standardize date columns StartDate and CompletionDate
3) Create new column DurationDays
4) Deduplicate the InterventionType and create new column ArmNumber
5) Fill Nan values
6) Standardize the MaximumAge and MinimumAge columns in new AgeGroups column
7) Drop columns of MaximumAge and MinimumAge
8) Deduplicate the LocationCountry column into set
9) Manage columns of EventGroupDeathsNumAffected and EventGroupSeriousNumAffected
10) create new column AdverseEffectsorDeath



Dataset before cleaning





Dataset after cleaning





Data Cleanning for ARM

Take the previously cleaned dataset in this page



ARM data prepare





Data Cleanning for Decision Tree

1) Decision Tree Text data cleanning


Take the previously cleaned dataset in this page



Orignial text data




Matrix of token counts