Machine learning depends on quality data

Machine learning and artificial intelligence are the new hot topics in data analytics.

These topics define subsets of data science that are primarily characterized by mathematical and statistical processes applied to data.

In machine learning, algorithms replace humans in interpreting data.

The expectation is that the machine will make purely data-driven ( better ) decisions that its human counterparts.

Machine learning depends on quality data

We assume that machines are not subject to emotion and that machine learning, therefore, is unbiased.

Yet – machine learning is inherently biased based on the the bias inherent in the data.

In an analysis of the upcoming  Texas senate race on, Nate Silver writes:

“When building a statistical model, you ideally want to find yourself surprised by the data some of the time — just not too often. If you never come up with a result that surprises you, it generally means that you didn’t spend a lot of time actually looking at the data; instead, you just imparted your assumptions onto your analysis and engaged in a fancy form of confirmation bias. If you’re constantly surprised, on the other hand, more often than not that means your model is buggy or you don’t know the field well enough; a lot of the “surprises” are really just mistakes.”

This means that both the model and the data must be tested and validated in order to ensure good decision making.

The biggest risk for machine learning is that the mathematics boffins responsible for building the models may not understand this if they do not have basic data management skills.




Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.