Making AI Models at Warp Speed
Science fiction enthusiasts know that the shortest journey between two distant galaxies is best achieved by flying at Warp Speed. While space travel at warp speed remains in the realm of fiction, data scientists are now employing artificial intelligence (AI) to build and analyze complex data models in real-time.
A data modeling workflow begins with collecting and wrangling the right data. This is traditionally a manual and laborious process that is now being created at warp speed thanks to AI tools that automate both data aggregation and workflow.
But before you implement automated feature engineering (AFE) solutions, it's critical to understand the entire workflow process, and the choke points that have traditionally slowed down model creation. You'll then be in a better position to understand how automation can help data scientists create models in days, instead of weeks or months.
How the Data Workflow Works
Businesses across industry sectors rely on data modeling for a multitude of use cases. Whether it's a bank trying to predict which customers will pay their loans on time or mobile phone companies seeking to reduce customer churn, the process for creating a data model usually follows a predictable workflow:
1. Project Creation
2. Data Exploration
3. Data Wrangling
4. Feature Creation
5. Model Creation
6. Poor Model? Return to Steps 3 & 4
8. New Project
This might seem like a fairly simple and streamlined process, but the truth is that most data scientists spend an inordinate amount of time on steps three and four: data wrangling and feature creation.
Let's say a bank or lender wants to build a model that predicts which customers are most — and least — likely to repay their loans on time. They'll want to use as much data as possible to build the most accurate model they can. This means aggregating data from multiple sources, such as internal customer systems, credit reporting agencies, government agencies, and even social media. Then they'll have to organize these variables in a spreadsheet or table to prepare it for data modeling.
The data processing and feature engineering steps have typically been manual and time-consuming. Automating data modeling projects can condense projects by weeks at a time — freeing-up data scientists to focus on other value-added tasks.
Taking Flight with AutoML
The creation of machine learning models (ML models) has been traditionally been a manual process requiring specialized skills and coding capabilities to perform. Recently, a technique has been created that automatically creates these ML models without the need of hand coding. This technique is called AutoML or automated machine learning.
This enables programmers to provide labeled training data as input and receive an optimized model as the output. Once the AutoML software has learned and developed enough from training data, it can be applied to real-world projects with a high amount of accuracy.
There are two steps to implementing AutoML. The first is for the software to train on every kind of model available, then select the model it thinks is best for predicting accuracy, deployability, speed, etc. The second step is to optimize the hyper-parameters of that model to improve accuracy, reduce bias, or over-fitting.
A bank, for instance, may be collecting variables from a plethora of different data sources to build a model to predict which customers will repay their loans on time. Age, geography, credit score, lending history, criminal background, all may effect their likelihood of repayment.
AutoML can then analyze and build a scoring model in a fraction of the time it would take data scientists to do manually. Not only does AutoML save time and money as opposed to previous methods, its machine learning capabilities also construct more accurate models than ever before.
Modeling at Warp Speed with AFE
AFE is a significant time saver. “Automated feature engineering identifies the most important signals, achieving the primary goal of data science: reveal insights hidden in mountains of data," explains Will Koehrsen, a data scientist at Cortex Intel.
The first step to putting AFE to use in your workflow is deciding whether you'll use Single Table or Multi-Table AFE. Single table AFE involves creating new features, like splitting up a variable like “dates" into sub-features like “week," “hour," or “weekday vs. weekend."
Multi-Table AFE involves automatically combining multiple tables into one single table that is then prepared for modeling. For example, you'll combine a transactional sales table with other tables such as a demographic table containing “geography," “marital status." This is automatically achieved by what's called Deep Feature Synthesis (DFS), an algorithm that automatically creates features between sets of relational data to synthesize automatic learning processes.
The initial data modeling workflow can take time to fine-tune. That's largely due to the process of creating the right data sets and choosing the best features to build into the model. But when AFE is combined with AutoML, this process is greatly condensed. Models that typically take weeks and months are created in a matter of days or even hours. That's because steps three to six - data wrangling, feature creation and model creation - are fully automated by AFE and AutoML.
But once the modeling is complete, automation can dramatically boost efficiency when it comes to processing and analyzing big data. Ultimately, AutoML gives data scientists to more quickly and efficiently develop highly accurate and useful data models. That's good news for companies that need real-time business insights from their data.
To learn more about how AFE can help you make models at warp speed, request a demo and see Ki in action!
LATEST POST : Data Management Best Practices: Six Ways to Improve Analytics Impact