From Rules to Data
Author: Ben Daniel. Sr. Mgr. Revenue Data Science, CHEP
Rule sets are often convenient ways for dealing with analytic problems. They are quickly formed, and when they come from those with years of industry experience - they often make a lot of business sense. For example, a marketing team for a building supply company might want to create a campaign to target electricians. The team could draw on their previous business experience and (probably) correctly guess what electricians are likely to buy. Wire, conduit, and switches come to mind.
The team could query the sales data for customer accounts that over-penetrate into electrical product categories, and out would come a list of accounts that they could enter into a digital marketing campaign for growing the business among likely customers.
There are problems with this approach, however.
Problems with The Rules Based Approach:
1. Assumptions and Biases: The rule for identifying an electrician as described above are straightforward. Whether the rules are correct is irrelevant, the fact of the matter is the team that created these rules baked in their assumptions and biases about electricians. Certain customers may be missed because they did not fit the rule that the team may have believed to be true about their target customer segment.
2. Exception Management: Every rule has its exception. As more business rules are created, exceptions appear. The job of managing all these exceptions can be daunting even with robust information systems.
3. Exceptions Create More Rules: Like the air bubbles in a block of Swiss cheese, exceptions create gaps in rule sets. So, to manage the exceptions, analyst teams create more rules. This becomes a vicious cycle - where even the rules that were created to handle exceptions indeed have exceptions themselves!
Data Driven Approach
How do we break out of this cycle? We must have a data driven approach whereby we start with data, run it through the right machine learning algorithms, and allow the model to derive the rules. But these rules are not like the rules that were formed in the first case, where assumptions and biases are inherently included. These rules are similar to a decision tree. For example, the new rule might state, “if a customer buys more than 1.5 standard deviations from the mean in wire coils in a year, they have an 90% chance of being an electrician.” Machine learning techniques such as CRISP-DM can model and evaluate the accuracy of the decision rule. Hence, analyst teams can validate their approach before they deploy their campaign.
In pricing, we can break out of the rules-based approach by asking some big questions:
1. What am I trying to predict (or classify)?
2. What am I trying to estimate?
3. What might I learn if I were to group the subjects of my analysis together?
4. Do I know the important features of my data?
Asking these types of questions first and choosing the right analytic technique(s) second can lead to robust model development that not only solves business problems but also sheds useful business insight. For instance, pricing managers might ask themselves, ‘Which of my customers are price sensitive’?
The next step might be gathering the data about those customers from sales and marketing systems, tagging the customers who reduced their business after a price change, and then creating a predictive model (e.g., logistic regression, random forest, etc.) to predict which customer is likely to reduce their business after a price increase. Not only can the model make predictions about individual customers, but the coefficients from the model itself can indicate what is likely driving the sensitivity and to what degree.
In my personal experience, I have seen the data driven approach work time and again - when it is applied correctly. It can liberate pricing managers from being stuck in rule sets that are not only full of exceptions, but also impossible to validate. With the modern data technology stack and training of data scientists coming from academia, there is little holding companies back from using this approach to optimize their pricing analytics.