The 4 crazy reasons why you will never use
machine learning

Introduction

Most business leaders are intrigued by the opportunity of machine learning.

 

They understand its power and potential and have seen the stories of businesses - maybe their competitors - making gains from it.

 

But crazily, they haven't deployed machine learning yet. Or even tried.

 

Why?

idea They haven't worked out where it could be applied
money (1) Or they fear the costs will be too high
statistics Or they know that their data is in too poor condition
secure-data Or they have concerns over the security, compliance or ethics of using it.

 

And so the machine learning journey ends before it has even begun.

 

Because of these four reasons, in every industry - without exception - the machine learning commentary and hype far exceeds its actual deployment.

 

And it's mad! Because every reason can be solved!

 

What stops businesses using machine learning

None of these 4 categories of obstacle - nor their sub-categories - are unreasonable or exaggerated. They are all typical among many businesses, and very real.

 

But they are solvable. And they must be solved.

Machine learning offers too much potential for businesses to admit defeat.

 

To find out more about each obstacle, and how to solve it, select from the list on the left

Skip to the solution

Where do I deploy machine learning?

Machine learning isnt relevent to my business-1

 

I dont need Machine Learning

I can’t see where I would use Machine Learning

 

I don’t understand what machine learning could do for me

 

Machine learning is probably the most transformative introduction into the modern business world. Granted, that is a description used for many innovations over the years, but no other new trend or technology can equal machine learning for its simultaneous promise and the change it requires to business’ status quo.

 

Which means before deploying machine learning can even be considered, there is a massive knowledge gap that needs crossing.

 

And if you do not understand something – even at the basic level – how can you envisage how you can use it?

 

Sure, there are various well-publicised business use cases for machine learning, ranging from sales forecasting and churn predictions to manufacturing efficiency and fraud detection.

 

But what if these do not apply to your business? What if these are not problems that you believe – rightly or wrongly – require improvement?

 

Machine learning’s power is its incredible flexibility, but that means businesses need to identify their own ways of using it.

 

Which returns us to the opening point: how can you use a tool that does not have a specific pre-packaged purpose, or that you don’t understand well enough to imagine a profitable use for?

See the solutionDiamondsOrangeGreyAndBlue

Data analytics is enough

The other obstacle to machine learning is a belief that data analytics is good enough.

 

But data analytics can only answer the specific questions you ask it – which pre-supposes you know what you are looking for.

 

Data analytics question:

 

Data analytics answer:

“Do my sales increase or decrease when I use this consultant?”

next-button

 

Increase

“Do my sales increase or decrease when I use this offer?”

next-button

 

Increase

“Do my sales increase or decrease when I use this marketing tactic?”

next-button

 

Decrease

“Do my sales increase or decrease when I use this sales method?”

next-button

 

Stay the same

Machine learning question:

 

Machine learning answer:

“What’s my winning formula for sales?”

next-button

 

Use salesperson A and offer B when using marketing tactic C in geography X, for company type Y and when the decision-maker is Z.

 

Machine learning not only cross-references and balances many questions and factors against each other, but it also offers additional insights that were never asked for, or originally considered as pertinent.

 

The example above is a simple one, but imagine what insights your data holds that you have never previously considered. And until you step beyond data analytics, you probably never will.

See the solutionCirclesOrangeGreyAndBlue

My data is poor quality

It is as if the business mantra “garbage in, garbage out” was originally solely intended for machine learning. Businesses will not and should not trust any insights based on incomplete, inaccurate, duplicate, outdated or erroneous data.

 

The quality of the data used has such impact on the success of machine learning that George Krasadakis, Senior Program Manager at Microsoft, has stated publicly that Microsoft begins every project with a data quality assessment.

 

“Data-intensive projects have a single point of failure: data quality.”

George Krasadakis, Senior Program Manager at Microsoft.

Data quality in the era of Artificial Intelligence

 

The logic is sound, but why is the problem so common?

 

Because for most businesses, the rate of data creation has far outrun the speed of data governance.

 

Businesses are literally creating and gathering more data than they can handle. And with every day of data creation, and the introduction of every new data source (whether as seemingly benign as a new supplier or marketing tactic, or as transformative as new technology initiatives like Internet of Things), the problem grows.

 

Gaps, inconsistencies, duplications and incomplete datasets are rife in every business, and can be machine learning's undoing.

 

“Increasingly-complex problems demand not just more data, but more diverse, comprehensive data. And with this comes more quality problems.”

 

Thomas C. Redman

Harvard Business Review: “If Your Data Is Bad, Your Machine Learning Tools Are Useless”

 

 

This combined with every business’ (entirely correct) determination to put data into the hands of frontline employees means that without data governance controls in place, data quality is further compromised.

 

And for most businesses, the problem appears too great to solve. The time and investment required to address the issue – while hugely beneficial for purposes far beyond machine learning – repeatedly drives machine learning projects into the “too difficult” camp.

See the solutionHexagonsGreyBlueAndOrange

 

I don’t have enough data

Data Availability Problem #1: Not enough data

As above, machine learning requires data. And the great irony is that despite every business now creating more data than they have before, many machine learning projects fall by the wayside simply because there is not enough data to feed them.

 

This is not because businesses do not have enough data overall. It is instead because the data they have is insufficient for the project’s purpose, or is of too poor quality, or does not exist, or would be too difficult or time-consuming to gather.

 

Models need to be “trained” i.e. they need enough data to understand enough context to the question being asked of it in order to return results with any degree of confidence, or accuracy. The amount of data required depends on the purpose and complexity of the project, and of course a proof of concept requires less than a live or “production” project.

 

It is impossible to define even loose rules for how much data is required, or how far back in time your dataset must run. Every project is simply too different.

 

See the solution

T3 - triangles 1

Data Availability Problem #2: Structured and unstructured data

Another contributing factor to how much data is available is whether your data is structured or unstructured.

 

learning   neural-1

Structured Data


 


Unstructured Data


Data that is organized according to a pre-defined data model or structure, such as SQL databases or even tables and Excel files. It is usually text-based, or numerical, but can in rare scenarios be images or audio.

 

Structured data is easier for machine learning to work with as gaps and anomalies are easier to identify and fix, and patterns are more explicit.

 

Data that is not organized in such a format, and where data points do not have the same defined relationships between them that structured data does. It is usually stored in a NoSQL database or even a data lake.

 

This data is harder for machine learning to identify patterns in. Instead, data is often analysed for distinguishing features, which are then searched for in other data points, though this can be imprecise. Data mining and autocoders can structure unstructured data.

 

Some typical examples of unstructured data include email, because while it may structured in inboxes and sub-folders, its actual content is entirely unstructured. Similarly, text-heavy data, such as sales proposals, audit documentation and reports is unstructured as individual items cannot be naturally assessed by their relationship with other similar items.

 

Given most businesses’ data falls into these categories – in fact, some estimate as much as 80% of business data is unstructured – many machine learning projects are delayed or even abandoned as the process of structuring enough data is considered too labour-intensive.

See the solutionT3 - sqaures 1

 

Data skills are too expensive

The barriers of entry for machine learning mentioned above – Data Quality and Data Availability – are solved only with specific data architecture skills. Your datasets will need to be analysed and rectified using technologies and capabilities that do not exist in most organisations, and that are expensive to recruit or outsource.

 

But they are commonplace compared to data science skills. The appetite to resolve the issues described in this article and to then implement and capitalise on machine learning is so great that data science job openings far outweigh the skills available in the market, driving up salaries and delaying – even prohibiting – business’ ambitions.

 

73percent

of businesses were dissatisfied with their AI and data science skills

47percent

of businesses could not fill their publicised data science vacancies

3xBlack

data science job postings as there were job searches

(SnapLogic)

(O’Reilly Media)

(Quanthub)

 

And the absence of data skills in an organisation is not only critical for the deployment of machine learning projects. These individuals also provide the vital context and experience to be able to help a business envisage what could be possible with their data in the first place.

Besides the data itself, data skills are the most vital piece of the puzzle, and the hardest to obtain.

See the solutionT3 - Circles 1

 

The software is too expensive

Markets and Markets predict that the AI industry will reach $190 billion by 2025.

 

That enormous sum is driven by two things: demand, and the high cost (and value) of the technology.

 

Custom AI solutions will regularly cost businesses high 6-figure sums, while third-party software will often run into high 5-figures.

 

And with many businesses only just beginning their machine learning journey, these are large investments into unproven territory.

 

Combined with the cost of skills and infrastructure, suddenly an exploratory project into potential opportunities is carrying a hefty price tag.

See the solutionT3 - Hex 1

 

I can't afford the infrastructure

Very few businesses have designed their IT environment to cater for the additional processing required for machine learning.

 

Companies with on-premise infrastructure will certainly struggle to invest in the greater resources, and even those on inherently scalable cloud platforms are unlikely to have expected to scale up to the degree machine learning would require.

 

Also, while the vast majority of businesses’ day-to-day activities will operate comfortably off a CPU-based infrastructure, most machine learning projects require more powerful and more expensive GPU resource due to their ability to handle simultaneous computations faster than CPUs.

 

And compute resource and speed are not factors to compromise on. Machine learning is only valuable if it is accurate, and speed of processing is one of the key contributors to an algorithm’s accuracy.

See the solutionT3 - sqaures 2

 

Will my data be safe?

The question above is worded deliberately. Before deploying a machine learning project - or indeed any new data-driven initiative - businesses need to think about the overall safety of the data it depends on.

 

Data Safety is comprised of three pillars: 

 

shield (3)-1

Data Security

Privacy in services 3-2

Data Privacy

process

Data Governance

 

All three must be addressed equally for any data project to be a success, and machine learning is no different - in fact, given the sensitivity of the data used by most machine learning projects, there are few projects where data safety is more important. 

 

Data Security

The same risks of misconfiguration, user access and platform vulnerabilities apply to machine learning as much as to any other data projects.

 

But there are also risks that are particular to machine learning projects:

 

dashboard

Machine learning is notoriously data-hungry, and so requires connections to numerous datasets, whether your own databases or third party sources. Every one of these data sources, and the connections to them, need to be secure.

 


consumer

 

One of the most popular use cases for machine learning projects is to deliver insights on customer behaviour. To do so accurately, the models require as much contextual information as possible in order to investigate relevant patterns. This means they must be fed large, rich datasets of information from multiple systems across the business, most of which is sensitive and could include location, gender, transactional history and more. This will make the application a 'one stop shop' for complete customer records, and a far more valuable target for hackers.

 


cyber-security

Machine learning algorithms are commonly created at least in part from open source code and by data scientists, not security engineers. There is therefore a high possibility for security risks to be accidentally built into an algorithm or inherited from open source code - or even maliciously added. The risk is then doubled by the fact that these vulnerabilities are hard to detect without extensive "ground up" analysis, as there are not yet any industry standard best practices for the secure production of algorithms.

 


check-list

 

Most enterprises will build their own machine learning applications. Aside from the threat outlined above of them being built by data scientists not security experts, they are also often built iteratively and to tight timescales. This is a lethal combination for 'security by design', as each iteration risks sidestepping well-defined and otherwise well-observed internal core data security principles and ISO compliance.  

Data Privacy

One of the greatest threats to a 'data responsible' organization's continuous data privacy adherence is the introduction of a machine learning project. 

 

GDPR, and the many privacy regulations that followed in its wake and emulated its protections, specifically safeguards against personal data being used to drive automated decision-making without consent. 

 

"The data subject shall have the right not to be subject to a decision based solely on automated processing, including profiling, which produces legal effects concerning him or her or similarly significantly affects him or her... [unless] the decision is based on the data subject's explicit consent."

 

Article 22 of the GDPR

 

 

This protection is echoed in the California Privacy Rights Act of 2020 (CPRA), also known as “CCPA 2.0” , plus Canada’s Consumer Privacy Protection Act (CPPA) that is currently going through the Canadian legislative process, and many many more.

 

The crux of the problem is therefore this:

While GDPR and its many cousins do not prevent an organization from using any personal data to train the model - this can be done quite lawfully under legitmate interest - individuals can only be subjected to an automate decision if they have given consent.

 

As an example...

   

An insurer is permitted to use all of its gathered customer data to create a model that determines the likelihood of certain types of individual to make particular types of claims.

   
However, that insurer then cannot use that model "in production" to assess whether or not a particular individual should be granted a policy - without that individual's consent.

 

 

Gaining this consent in a totally legitimate way - i.e. ensuring that consent is clear, informed and freely-given - and at the correct point in the process, and in every instance, is crucial to ensuring that your data privacy and broader safety is preserved. This will often require an arduous overhaul of processes and policies before the model can be deployed and start to deliver value. 

 

Data Governance

Most industries are overseen by regulations governing how data can be used. Even those that are not regulated are likely to have their own compliance requirements to maintain, such as ISO accreditations or SOC frameworks - many of which will be prerequisites to serve their customers.

 

But when machine learning is brought into the corporate world, against the backdrop of such scrutiny, there is often a clash.

 

Data science is exactly that - it inherently involves experimentation. However experimentation can at times lead to non-compliant behaviour. Data may be moved to unauthorised resources outside the protection of the network perimeter, simply so that a new model can be applied to it, or faster resources deployed. 

 

Looking at it from the other direction, if an organisation creates too strict a governance framework around its data, it can constrain data science's ability to innovate. 

 

Audit trails and restrictions of data's movement, security risk controls, data accountability and regulatory pressures - the notorious irony of data science is its common clash with organizational data governance, and even data safety as a whole.

See the solutionT3 - triangles 2

 

Is machine learning ethical?

There are two main areas of debate as to the ethics of the use of machine learning, and neither are easily solved.

 

1. Explainability & Transparency

This is commonly referred to as the 'block box' problem. How are the decisions and insights that an algorithm outputs actually reached?  

 

The problem persists for two reasons:

 

brain

 

The calculations are too mathematically complex for a human to understand, usually based on deep neural networks of layers of millions of interconnected variables.

algorithm

 

The algorithm is proprietary and the creator insists on protecting its IP, or is protecting its methodology to prevent anyone "gaming the system."

 

In either case, any organisation relying on the algorithm will not be able to explain why or how any decisions are being made, which is strategically less than ideal, while also opening up the issue of uncontrolled bias (see below).

 

It is also however a regulatory concern. In Articles 13-15 of GDPR, it states repeatedly that data subjects have the right to request “meaningful information about the logic involved” in automated decision-making. Meanwhile, Recital 71, gives subjects the additional right to an explanation of how automated decisions about them are made, and a right to challenge them.

 

Regardless of whether they had given consent to be subjected to automated decision-making, if a data subject were to ask how a decision about them were made, few business users of machine learning would be able to provide anything "meaningful."

 

2. Bias

There are two principle ways in which machine learning can be biased:

next-button Pre-existing bias in the datasets chosen for training the model, often caused by a dataset only showing limited scenarios, or the data reflecting human or social bias  
next-button Technical bias in how the model is built and where in the dataset it places value 

 

For non-emotive use cases such as sales forecasting, these biases will result in little more than inaccuracy and misdirection of varying degrees.

 

But for sensitive use cases or where the use cases uses sensitive data, the repercussions of machine learning projects that become known to be biased can be highly embarrassing.

 

"Amazon scraps secret AI recruiting tool that showed bias against women"

Reuters

 

One of the reasons this issue occurs and reoccurs is linked to the transparency issue above: if the algorithm's methodology cannot be understood or revealed, how can we know that it is not biased?

 

Another reason why bias problems are a continuous concern, and why they are made by even some of the most sophisticated and prevalent users of machine learning, is its remarkable difficulty to guard against. Not only is bias in pre-existing data difficult to distinguish from a legitimate pattern, but we are also asking ourselves - unavoidably tainted by conscious or conscious bias - to evaluate it. 

 

Some may seek to solve this problem by applying anonymization or tokenization to personal data, for example by removing gender from a dataset. But bias can appear in more forms than simply explicit data.

 

To continue the gender example, and in the context of the Amazon recruting tool scenario, social gender biases can appear throughout a dataset, even when the explicit gender values and individuals' names are removed. Social gender biases may include typical salaries (the gender pay gap is a well-recognised social issue), roles (the "glass ceiling" effect is equally recognised) or career gaps (maternity leave is more common and typically longer than paternity leave). 

 

Without recognising and addressing the likelihood of more subtle issues in a dataset, bias can be a silent killer to the efficacy - and acceptability - of any machine learning project. 

 

All of this combined places a huge burden - even an impossible burden - on any business using machine learning, especially those that use personal data or even demographic data, to be able to explain and defend every automated decision or insight. 

See the solutionSquaresBlueAndGrey

 

Conquering these obstacles

six_reasons_venn_diagram_v01_01 (1)

 

The obstacles that prevent machine learning being deployed into organizations seem considerable. 

 

But they are not insurmountable. 

 

Calligo's Machine Learning as a Service is the first managed service to simultaneously and cost-effectively address all the obstacles described above, delivering more accurate insights faster, while continuously protecting the safety of your data.

 

See the solution

 

Want to find out right now how we could help your business with Machine Learning?

Request a call back from our team