Machine Learning

Augmenting Data With AutoML: Fuzzy Merge

by
Craig Wisneski
December 18, 2020

Leveraging AI for business applications depends on building and deploying the most accurate machine learning (ML) models. The more precise the model, the greater the business value. It is almost always the case that bringing more data to the table when training models will improve prediction performance.

A Basic Example of Data Augmentation

Consider, for example, ShovelCo - a manufacturer of high-end, direct-to-consumer snow shovels. They capture current and past customer information, including name, address, and phone number. Like any business, some of the prospects in the database will purchase over time. Still, they could drive a lot more revenue if they could predict which people are actively in the market for a new snow shovel and increase their marketing volume to those customers. 

using weather data to augment predictive modeling

Using historical purchase data and training a machine learning model will capture general trends, adding efficiency to the revenue acquisition engine. People who live in snowy regions are more likely to buy and tend to purchase during the winter months. Their model yields a meaningful improvement in business velocity, but what if they could combine each prospect’s zip code with historical weather data?

Taking the date of past purchases and the associated zip code and merging the weather report for a week around the purchase date, they can train an AI model that will recognize exactly when users are most likely to need a shovel. Once the model is trained, they can run it in real-time, taking the latest weather forecast, merging it to the prospect database, and predicting precisely the customers that are most likely to buy a shovel ahead of an upcoming snow day. Now ShovelCo has achieved a massive improvement in revenue generation efficiency and handily wins their market.  

Fuzzy Matching Without Unique IDs

Like our hypothetical ShovelCo, today's businesses are built on the backbone of big data. They often already have the information in-house that will allow them to unlock massive business value. Unfortunately, that data lives in multiple systems. Pulling it all together into a single database to use with a Machine Learning platform can be surprisingly difficult and time-consuming -- particularly when you try to match records that don’t have unique identifiers. 

Matching records gets worse when you need to augment external information, such as supplementing a business email address with 3rd party demographics and firmographics. Historically the task of pulling together disparate data sources has fallen on the data scientist -- if your organization is lucky enough to have one.

A Fuzzy Match or Fuzzy Merge refers to matching records that are similar but not the same. Some quick examples are matching the email craig.wisneski@akk.io with the name Craig Wisneski, or maybe one data system assigns facility ID "615," and another data system sets "00615". 

Those examples are relatively easy -- it gets more complicated from there. You can imagine two records with multiple data fields that are similar but not unique. For instance, you’d likely want to match craig.wis@akk.io with another record that has FirstName=Craig, LastName=Wiz, Company=Akkio.

Akkio has built a state-of-the-art fuzzy matching tool that leverages our in-house AI technology to make it simple to join datasets. Yes, we’re using ML to get your data ready for ML. Here is how it works.

Using Fuzzy Match

When you create a new flow, start with two datasets that you want to combine. The first dataset you select -- your “step 1” -- is your primary dataset. It should include the reference or trigger dataset (i.e., the primary data you’ll feed into your model to make a prediction). In your second step, add the dataset with backing information, the one that augments your first. As your third step, select “Merge Data.” Slide on down to “Advanced,” and you’ll see that you have the option to combine on “Exact Match Only” or “Fuzzy Match.”

easily merge datasets

When Fuzzy Match is selected, you can choose any two columns to match. Add as many more columns as you like to dial it in and click "Merge." Once you’ve combined your data, you can move on to the Predict step, where you can quickly build an AI model and check its performance.

Check out this video to see it in action.

Deploying Your Model

You can deploy your machine learning model in a few clicks in a web app, a few lines of code with our API, or set up automation with no-code with our Zapier integration (and other integrations that are on the way). 

Once a model with a merged dataset is deployed, here’s how the data flow works. You pass in data that matches the format of the primary dataset. Akkio will try to look up matching data from your second data-source (your augmentation set) and then run the matched records through your model. The model returns to you (1) the merged record, (2) the model's prediction, and (3) the confidence the model has in its projection, expressed as a percentage of likelihood.

Akkio's mission is to make it incredibly easy for any user to build with AI. Getting the right data together is a core part of any AI strategy, so we created an easy-to-use tool that lets you quickly merge and augment records -- no data science or software skills required.

Grow faster with AI
Now everyone can leverage the power of AI to grow their business.
Try for Free →
Already have an account? Log in here
Akkio has built a state-of-the-art fuzzy matching tool that leverages our in-house AI technology to make it simple to join datasets.