Machine Learning Analytics: Understanding Data with Stories

TABLE OF CONTENTS

At the Heart of Every Good Business is Good Decision Making

The great pursuit of every business is optimized decision-making - finding the exact right path to maximize growth in an incredibly complex and competitive marketplace. Any good decision is built on the foundation of business intelligence. Understanding the customer, understanding the opportunity, understanding the competition, and understanding the strengths and weaknesses of your team and your ability to execute.

A big part of understanding needed for good decision-making is derived from experience - years on the job absorbing information and learning what works and what does not. Even then, markets evolve at the speed of technology, and information evolves rapidly. But experience is subject to individual interpretation, can drift over time, and can often be wrong. Fortunately, there is another method of understanding that helps drive better decisions - and that’s the understanding of your data.

Entire industries are built on helping businesses make sense of their data - from business consultancies (market size of $241 Billion in the US alone) to data analytics software solutions (market size of $26 Billion) - no stone is left unturned as businesses work to leverage their data into a competitive advantage. Unfortunately, extracting the critical information from the ever-growing mass of big data each company (qualitative and quantitative) generates is an incredibly complicated undertaking.

Machine Learning is Changing Analytics

Data is exploding at such a furious pace that artificial intelligence is fast becoming the only way to make sense of it all. Machine learning at its core is the use of software and computing power to recognize and learn patterns. Machine learning algorithms can sift through mountains of data to find insights and surface the signal from the noise. Machine learning is the future of analytics.

But there’s a problem. Machine learning usually works like a black box - data in, predictions out - and users have no understanding of what’s driving a model. And if a business user doesn’t understand a process, they generally don’t trust the result. Data scientists have developed highly technical metrics to evaluate machine learning models, but most people don’t know how to think about F-scores and ROC curves.

It is becoming increasingly easy to take advantage of the power of machine learning to both understand your data and even automate certain types of high-value business decisions. Here at Akkio, we are working to democratize access to machine learning - and our new no-code data stories feature is a big step down that path. Now anyone can train a machine learning model in minutes to surface the drivers of key business outcomes. Let’s take a look at how that works with some example datasets.

Direct Mail Bank Campaign

The direct mail bank campaign dataset is the classic lead scoring challenge. It contains over forty-one thousand customer records, each the target of a marketing campaign. It also includes 20 different demographic and financial data points on each customer. To examine some of the patterns in the demographic data, we first trained a model with “age” as the prediction target. Once we have a machine learning model that predicts “age” based on all the other factors, we can explore the data story for age.

Age Patterns:

The data stories for age make a lot of sense - retired married people who completed 4 years of post-high school education are highly likely to be older on average than single high school students. Next, let’s look at the patterns related to subscriptions.

Here we see that older people with technical jobs and cell phones are much more likely to subscribe to the service than younger people working blue-collar jobs. It’s also interesting to see the duration field, which records the length of time spent on the phone with the bank discussing the offer (in seconds). If you spend a long time on the phone, you are more likely to subscribe. The bank can use this new knowledge to target their campaign to an older audience.

More broadly speaking, direct mail bank campaigns are an excellent way to identify potential new customers. In today's digital world, it's challenging for brands to stand out and get noticed. By using a data-driven approach, banks can target their marketing efforts more efficiently and effectively to individuals most likely to respond positively to their offer.

In fact, one survey shows that 57% of millenial respondents have acted on direct mail offers in the past year, and 87% of them love receiving mail.

There are a few things banks should keep in mind when running direct mail campaigns:

1. The offer: make sure the offer is appealing and relevant to the target audience.

2. The copy: the copy should be clear, concise, and persuasive.

3. The design: the design should be visually appealing and easy to understand.

4. The call to action: the call to action should be clear and concise.

5. The list: make sure the list is accurate and up-to-date.

HR Employee Attrition and Review

Here is a second example - an HR dataset of 1,470 employees from IBM. Each record contains data on the employee’s department, salary, employment history, demographics, and performance review scores. Each record is tagged for “attrition.” This dataset can be used to build machine learning models that explore the relationships between HR data, reviews, and employee turnover (a 17% rare-case classification task). Here is the Attrition data story.

From this data story, we learn that employees facing long working hours, punishing travel schedules, and either fresh into the workforce or those with frequent turnover in the past are way more likely to leave the company. Now the HR team can better screen resumes to avoid investing in new employees who are unlikely to stick around.

The theme of the "Great Resignation" hasn't subsided in the post-pandemic job market. Candidates are choosier now than ever before and companies must put their best foot forward to attract and retain top talent. But hiring that talent isn't the end of the story - keeping that talent engaged and productive is crucial to the success of any business. The IBM dataset provides some valuable insights into what may drive employee attrition and how best to avoid it.

Telco Customer Churn

Another example - two months of historical data on customer churn for a Telco. Seven thousand records contain demographic information and details on the services the customer subscribes to, contract terms, and payment details (methods and amounts). Each record is tagged with if the customer churned out in the last two months. Here is the data story for churn.

There is lots of obvious stuff here - if you are on a contract, you are much less likely to churn. This is a European telco that exists in a competitive environment with many providers offering internet and data services. They would do well to check their pricing model and work to move their new users onto contracts and automated payment methods as quickly as possible (perhaps with some incentives).

Churn is the silent killer of businesses. It's the toll that is extracted on a business every time a customer leaves. The lost revenue, the wasted marketing expenses, the negative impact on morale; all of these factors add up to make churn a very real and very costly problem.

And yet, despite its importance, few businesses have a solid plan for dealing with it. In part, this is because the reasons for churn are often hard to identify. Is it a failure to provide adequate customer service? Is it because the product doesn't meet customer needs? Is it because of pricing pressure from competitors?

The truth is, it could be any number of things. And that's why businesses need to take a data-driven approach to combating churn.

Insurance Charges

Finally, let’s check an insurance company dataset of just over 1,300 records that track the cost charged to a health insurance plan given demographics like age, gender, BMI, etc. Training a model shows the stories we would expect - older fathers who are overweight and smoke are more likely to have high costs associated with healthcare, while young, healthy, non-smoking women have the lowest costs. Let’s look at the data story.

Charge Patterns:

Now It’s Your Turn

See how easy it is to identify and understand the drivers of your critical business outcomes by signing up for a free account at Akkio and training your first model. It only takes a few minutes to unlock new insights and understanding from your data. And from there, you can deploy your machine learning model with just a few clicks - as new data flows in, you’ll understand the predictions that flow out. Now you are on the path towards predictive analytics and automated data-driven decision-making.

<- Previous

Best AI for Data Analysis: Our Top Picks

Next ->

Comprehensive Guide to Predictive Analytics Tools in 2024