The Simplicity (and Complexity) of Auto-Feature Engineering
Data Science is no longer a foreign concept to companies. Businesses understand, to a certain extent, that integrating data analytics may procure measurable value. Terms like “Big Data” or “Artificial Intelligence” are ubiquitously known and talked about every day.
The question is: How many companies are working towards being “data-driven”, or are already there? NewVantage Partners’ 2019 Big Data and AI Executive Survey reveals a chilling trend – the proportion of companies that consider themselves data-driven had actually declined in the past three years, from 37.1% in 2017 to 31.0% in 2019. The news itself lies in contrast to the rapid growth in AI ventures and investments, where 55% of the surveyed companies now invest more than $50M in big data and AI.
Why is This Happening?
There are a few explanations to the decreasing percentage of (self-reported) data-driven companies. One such explanation, raised by Vision Critical, points to the difficulty of obtaining the right data. At its core, the process of data analysis is to collect data, throw it into a model, and observe the outputs. However, a machine learning model is only as accurate as the data it’s given – meaning that the input data, or “features”, need to be properly prepared beforehand. These feature preparation steps are often the most difficult and time-consuming areas in data analysis.
Let’s take an example – suppose a retail store wants to increase their overall website sales. They currently collect online transactional data, online advertising data, and membership data. As an analyst, there are several default values to look at: customer demographics, advertising methods, or number of visits to a website. Experienced analysts might go even further, working Excel™ magic to visualize time-series trends through exponential smoothing. There may be thirty to fifty feature combinations created, all painstakingly generated over several hours. But past that, feature generation is exponentially more time/computationally intensive.
AFE to the Rescue
So when Automatic Feature Engineering (AFE) was introduced, many of the previous complexities faded away. Analysts could now aggregate, transform and extract features more easily than ever before. Datasets that once only included preliminary data could now be modified to contain multiple combinations of features. Subsequently, the models became more accurate, and provided a stronger direction on business decisions.
In our aforementioned example, the retail store may have had no clear course to take -- was it supposed to focus on male customers, or maybe older shoppers? After using Auto-Feature Engineering, the store knew to focus advertising content on middle-class men aged over 45, specifically those with more than two dependents.
So what even is Auto-Feature Engineering? It’s a data science technique developed to expedite the preparation of data. Simply put, AFE iterates through every possible feature combination, then uses machine learning to determine which ones are most impactful to the business question. In doing so, it can create hundreds of thousands of new data points, with significant feature depth, in an incredibly short amount of time (depending on your data). This process seems excruciatingly complex to do by hand – and it is. But AFE performs the tasks automatically and quickly, saving the user a significant amount of effort and resources.
Looking to the Future
Ultimately, the introduction of AFE removed a significant roadblock in data analytics. Analytics projects that previously took weeks to perform have now been reduced to a few days. With AFE rolling out in the future, companies that want to take the first step towards a data-driven culture can do so without the complications that impeded them previously. We may see an increase in the number of data-driven companies – and that is a step in the right direction.
If you'd like to learn more about how AFE and Ki can take your analysis to the next level, download the 4 Step Automated Process Brochure!
LATEST POST : What Is Prescriptive Analytics?