Abstract:
There is already an ocean of data, and it is set to expand even more in the next years. This mounting concern is exacerbated by the prevalence of bias in data, which threatens our dignity, our rights, and our safety. In crash testing, women were found to be 17% more likely to die in car accidents as opposed to males, since females were either excluded, or simply not considered as drivers. The scope of this paper is to highlight how bias is omnipresent at every stage in the data analysis process, and how to address it effectively. To support these ideas, the data was gathered from journals, websites, books, and organizations. Analysis reveals that human nature is prone to heuristics – mental shortcuts that may induce cognitive bias when collecting or interpreting data. Consequently, data bias stems from cognitive bias, meaning that data becomes unrepresentative. On the other hand, algorithmic bias arises often from data bias, either during model training or evaluation. By engaging a diverse workforce and providing ethics training, companies can minimize the risk of unwanted biases. Furthermore, implementing pre-processing, in-processing, and prostprocessing techniques may increase substantially the odds of fairer outcomes.