Last week my friend and colleague, Cory Hefner, wrote a very noteworthy article touching on a type of predictive modeling method called Ensemble Models. He concluded that when you combine the results of more than one predictive modeling method, it is better than any single method alone—just like two heads are better than one. As these methods gain more and more popularity within the analytics community I thought I would touch on a very key point that should be taken into consideration before a single model is even built. Aside from using more accurate methods which may obtain better results the age old process of variable creation should not be ignored.
Building models is very much an art as it is a science. I believe one of the keys to building good models is how creative you can be in developing variables which truly capture’s your customer’s behaviors. Companies can sit on a wealth of data that is collected digitally or organically but what they collect does not have to stop there. Often times many more variables can be derived from this set and will tell you more about a customer than what you have already collected.
This process is not always easily done however. An Analyst or the ever popular term, Data Scientist, must have a true understanding of the business and also an understanding of the business objective to create a well posed modeling problem. Only then can we start creating variables that will capture the varying behaviors as to why a customer may or may not make a purchase or signup for an offer. The more creative you can get in developing variables, the better your models can potentially be.
As fellow predictive model builders have experienced, about 80% of the time is being spent on cleaning and aggregating the data to form the final modeling data set. The fun part of actually building a model is only 20% of the project, sometimes less if you have a slick automated routine such as Quaero. An analyst should seriously consider spending more of that 80% looking at the data they have and trying to develop new variables. I personally spend a lot of time studying the data we collect and thinking about new variables that can be formed before I build models—especially if I am building models for the same initiatives year-to-year. This has resulted in better model performance helping our clients meet and often exceeding their goals as marketers.
Using this philosophy our models have gotten better over time. It shows that being good at the simple things without knowing all the newer fancy modeling algorithms such as Gradient Boosting Machines and Support Vector Machine goes a long way. I always like to say, your results are only as good as your data. Hopefully with this blueprint you will see added lift and better results with your models no matter what algorithm or method you choose.
Happy Modeling!View all Blog Posts