Sprayed: 0 means trap not sprayed, 1 means trap is sprayed
We generated three feature categories for our modeling process: We removed duplicates, split the dataset, dropped columns and performed a SMOTENN to deal with imbalance data. We visualise the geolocation of the data on maps for better understanding. In addition, these 3 categories made up more than 96% of the sample of the species sampled. Very inconsistent and high oversampling in August 2007 leading to high population of mosquitos in 2007.Īlthough all species sampled are carriers of WNV, the presence of WNV was tested positive for Culex restuans, C. Most sampling was performed in 2007 and decreased afterward. T900 (at Ohare airport) is sampled the most. Processing Dataframe for Kaggle Submission 7. For Balanced Data : F1 Score, Accuracy.For Imbalanced Data : F1 Score, Precision, AUC.Depending on the preprocessing method, different evaluation metrics will be used accordingly. Since this is a classification model, we will evaluate how our model will perform in classifying between the presence and the absence of Wnv. The time of mosquito spray (H:mm:ss AM/PM) Presence of West Nile Virus (0=No, 1=Yes) Letter behind Trap ID indicate it is near a main Trap The following datasets were provided for this project: Elang Setiawan - Feature Engineering, Documentation.