Machine learning and artificial intelligence are the new hot topics in data analytics.
These topics define subsets of data science that are primarily characterized by mathematical and statistical processes applied to data.
In machine learning, algorithms replace humans in interpreting data.
The expectation is that the machine will make purely data-driven ( better ) decisions that its human counterparts.
Machine learning depends on quality data
We assume that machines are not subject to emotion and that machine learning, therefore, is unbiased.
Yet – machine learning is inherently biased based on the the bias inherent in the data.
In an analysis of the upcoming Texas senate race on fivethirtyeight.com, Nate Silver writes:
“When building a statistical model, you ideally want to find yourself surprised by the data some of the time — just not too often. If you never come up with a result that surprises you, it generally means that you didn’t spend a lot of time actually looking at the data; instead, you just imparted your assumptions onto your analysis and engaged in a fancy form of confirmation bias. If you’re constantly surprised, on the other hand, more often than not that means your model is buggy or you don’t know the field well enough; a lot of the “surprises” are really just mistakes.”
This means that both the model and the data must be tested and validated in order to ensure good decision making.
The biggest risk for machine learning is that the mathematics boffins responsible for building the models may not understand this if they do not have basic data management skills.