Over the course of the past half year, we have been working closely with The Kroger Co. and the asset protection team utilizing analytics to drive insights from data that lead to better overall understanding and decisions regarding a problem common to the entire retail industry—retail shrinkage, or shrink.
The produce department, in particular, is susceptible to loss and represents a disproportionate amount of shrink relative to the entire Kroger enterprise. Addressing shrinkage within the produce department will help Kroger reduce costs and improve profitability. The goal of this capstone project is to utilize data science and analytics to better understand the relationship between inventory, sales, produce freshness, customer satisfaction, and shrink.
Exploratory Data Analysis
Our first step was to better understand the business and determine the need. Due to the lack of bar codes, the variety of products and vendors, and the perishable nature of the products, produce department data can be extremely problematic. We needed to understand the departmental structure at Kroger, financial data, how Kroger measures and records produce inventory, and how to calculate shrinkage at a granular level. This information played a crucial role in understanding which questions to ask next, which direction to take the analysis, and ultimately which recommendations to provide.
Good practice dictates exploratory data analysis (EDA) when starting any analytics project to better understand your data. As a first step in our data analysis we decided to analyze data at the overall store level, focusing on the big picture and looking at trends that affected each location as a whole. Our questions included:
- Which stores were performing the best and worst in terms of shrink results?
- Do these stores have any clear physical relationship?
- How do average shrink results vary depending on store type, produce square footage, number of deliveries per week, and seasonal considerations?
- Are there correlations between produce freshness and wastage?
This analysis resulted in some interesting takeaways. First, stores that carry more value-based items had shrink results higher than more upscale stores. On average, stores that received six to seven deliveries per week had better shrink results than those with fewer deliveries. Perhaps these stores order less per delivery and carry less on the floor anticipating that another delivery will be made soon.
Further, on a seasonal basis, shrink as a percentage to sales tends to be lowest in February through May. This may be due to a difference in product mix or other seasonal effects.
However, while the results of the preliminary analysis may display general patterns of shrink performance by varying characteristics, the purpose of the exploratory data analysis was simply to uncover major trends and answer relevant questions rather than determine causal relationships.
Digging Deeper
Our next step was to analyze more granular data based upon sales, inventory, and cost factors for item-level data. We uploaded over 300 item-level data files, calculated shrink at the item-level based on the given information, and calculated aggregate statistics at the subcommodity level.
As an enterprise, Kroger anticipates which commodities and subcommodities result in the most waste based on the experience and expertise of the employees. Some store managers claim there are items that are restocked solely to throw away again. Through data analysis we can provide greater accuracy regarding the wastefulness of each product and use quantitative analysis to support employee experience and expertise.
Where is Kroger’s shrink in the produce department coming from? Our first objective was to determine which commodities contributed the most to Kroger’s total shrink.
The dashboard created will then allow Kroger to mine deeper into each commodity to see the true problem subcommodity. By reviewing the results for each commodity and subcommodity, Kroger can better determine what is happening in stores, investigate results that appear problematic, and make informed decisions based on the available data.
We then addressed whether certain products were being restocked solely to be thrown away, as well as the shrink cost per commodity. As depicted in the Tableau visual above, we were able to identify certain products that experience a high shrink percentage in the produce department. While clearly problematic, this insight creates a valuable business opportunity based on a data-driven decision. Based on this analysis, Kroger has the opportunity to reconsider best practices and improve results at every location where the product is available and all new locations where the products will be stocked moving forward.
Similarly, we created a second dashboard representing the percent of waste within each commodity and subcommodity allowing Kroger to review and compare the performance of each group of products.
For example, upon determining that a large portion of a certain product goes to waste, data mining was able to reveal the primary problem by narrowing it down to a few select items with extraordinarily high shrink-to-sales ratios. This may lead to critical business decisions, including the possibility of adjusting delivery sizes or frequency for these products to better match inventory with demand and to reduce shrink. The Tableau dashboard can be a crucial tool allowing store managers to visually observe trends in their produce departments.
The final dashboard depicts a map of stores in a Kroger division, color coded by shrink results with red representing the worst-performing stores and grey representing the best. Using the dashboard, regional managers will be able to visualize how the division is performing and which areas need attention. Store managers can compare performance to that of neighboring locations and immediately identify problem commodities and subcommodities. Employees can click on individual commodities or subcommodities and visually identify strong performance and problem areas.
This dashboard allows for easy comparison between stores and much faster identification of problems in the field, providing awareness and visibility into performance at a granular level and offering high-value information to Kroger.
Predictive Modeling
In addition to exploratory data analysis, predictive models were developed for the Kroger data to help answer key questions and provide additional visibility into Kroger operations. While this analysis can show what’s happened in the past, predictive models will determine the average expected result (shrink percentage) for a given set of conditions. Additionally, a well-fitted predictive model will quantify the impact of a change in a factor (such as moving a store from a high-risk neighborhood to a low-risk neighborhood) when all other conditions remain equal.
The first predictive model used was a multilinear regression. We used multiple linear regression to model the relationship between several explanatory variables (including store type, store area, delivery schedule, customer satisfaction scores, and more) and the desired response variable—produce shrink.
The resulting regression describes how mean responses vary in response to changes in the explanatory variables. The model predicts whether shrink will increase or decrease when a given variable changes and quantifies the expected magnitude of change. Further, it can determine which variables impact shrink, allowing Kroger to better understand factors affecting shrink rates. We modeled shrink rate, expressed as basis points rather than the monetary value of shrink, so conclusions would be easier to translate from one store to the next.
Explanatory variables included produce department square footage, store employee turnover rate, percentage of produce department area relative to overall store area, number of produce deliveries per week, the risk tier for the store determined by the asset protection team, store type, inventory, charges, ratio of sales to inventory, net sales, and customer satisfaction score.
The first four explanatory variables were determined to have no significant effect on the shrink percentage.
The following explanatory variables were all found to be significant and are discussed in more detail below. In these relationships, the impact of these variables is expressed “cetereis paribus,” or when all other things are equal. Due to the confidential nature of company records, actual figures will be replaced with X, Y, or Z. The actual value of X, Y, and Z varies for each factor.
- Risk Tier. Based on various metrics at the discretion of the asset protection team, there are four risk tier categories: low risk, medium risk, high risk, and max risk. Our model found no significant differences between shrink rates at low- and medium-risk stores. High-risk stores are expected to have X basis points more shrink, and max-risk stores are expected to have Y basis points more shrink, compared to low- and medium-risk stores.
- Store Type. Our analysis included five store types, categorized 1-5, where store type 1 corresponded to the most upscale stores, and store type 5 corresponded to the least upscale stores. Transitioning from store type 1 to 5, each change in store type corresponds to a reduction in shrink by X basis points. For example, consider a store that is type 1 and has 600 basis points of shrink. If all other explanatory variables (risk tier, inventory, and so forth) are held constant, but the store type changed to type 2, the expected average shrink would be (600-X) basis points. If the store type changed to type 3, it would be (600-2*X) basis points, and so on.
- Inventory, Net Sales, and Charges. These explanatory variables are expressed in thousands of dollars per period. The shrink increases by X basis points for each additional $1 million inventory, decreases by Y basis points for each additional $1 million net sales, and increases by Z basis points for each additional $1 million sales.
- Sales Per Inventory. Sales per inventory is expressed as a ratio of sales during a given period ($) to the value of the inventory on display at the end of the period ($). For example, if there were $800 in sales during the period, and $1,000 of inventory was on the shelves, the sales per inventory figure would be 0.8. For each unit increase in sales per inventory, the average shrink is expected to decline by X basis points.
- Customer Satisfaction. The produce freshness score from customer feedback was used as a proxy for customer satisfaction. For each 1 percent increase in the average customer satisfaction score, the average shrink decreases by X basis points.
Decision Tree Model
The second predictive model used was a decision tree. We used decision trees to understand the relationship between various factors like shrink, sales, and inventory, and produce freshness, customer ratings, and produce per square foot. Store-level shrink (%) was divided into three categories namely low, medium, and high.
A decision tree with shrink categories as the target variable was plotted to understand how the variables interact with one another to classify a store into the various categories of shrink. The decision rules were identified leading to shrink categories, which would help Kroger make data-driven decisions.
Sell-through was revealed as the most important factor when categorizing shrink followed by employee turnover rate and customer ratings. For example, if a store’s sell-through is below average compared to other stores, employee turnover is much higher than average, and the store is a type 1 or type 2 location (upscale stores), the store will likely experience high shrink.
Recommendations
A dashboard was then created to identify problem areas at the individual store level, sorted in high-to-low shrink order, with red indicating poor performance and yellow signifying good performance for factors like sales, inventory, customer ratings, and so forth.
The third decision tree shows low-shrink stores that have above-average performance across the columns. This dashboard is interactive, and one can click on each store to find the problem products and how they vary across peer stores and peer commodities.
Limitations
Due to the nature of the retail industry, accurate data collection can be difficult and impractical when considering the bottom line. For example, determining the precise quantity of gala apples present in each store on each date may not be cost effective considering the benefit of having such detailed records.
Produce in particular can present a daunting task considering the multitude of vendors delivering the product. There may be five different vendors providing strawberries, tracked by different SKUs. For other items such as potatoes, it’s nearly impossible to determine the original vendor once the product is on display leading to overlap or confusion between different items and different vendors. Due to these and related issues, the physical inventory conducted every cycle faces similar limitations.
Another common issue with produce is the exchange between different categories. For example, a store may order organic grapes but receive regular grapes. This can lead to inconsistencies in billing for inventory and sales. Further, an item can be modified in some manner leading to a different label when the item is finally sold. For example, whole pineapple is cut and sold as precut fruit to customers. Similar issues happen with juice bars, prepared foods, and the deli.
Kroger is aware of these issues and makes corrections to their financial records at the commodity level. However, these corrections cannot fully capture all the unique scenarios that arise in produce retail and cause noise in the data. Additionally, it’s not uncommon to see negative inventory figures as a result. To compensate for these issues, we have rolled the data up to higher levels, joining items or subcommodities together, or averaging over longer periods of time. This reduces the noise and extracts a stronger signal from the data but also conservatively limits our analysis and recommendations.
We elected to provide higher-level analysis for the data at a level we are comfortable with, rather than risk overfitting our models to extremely noisy item- or cycle-level data. These decisions are supported by cross-validation and out-of-sample testing during our analysis.
Experience at the RILA Conference
Our presentation was scheduled for the afternoon of day one of the Retail Industry Leaders Association (RILA) conference in Orlando. As part of the conference, we had the opportunity to attend a plethora of events, ranging from talks given by distinguished speakers from the retail industry, to networking events where we could meet people and exchange information regarding the innovations in the retail industry.
We presented our findings to a crowd filled with experts from the retail industry. It was well received and a proud moment for the team. The conference also had an exhibition hall with solution providers showcasing their products. Donning different caps for each event, we loved learning something new from each and every person we met. Overall, it was a great learning experience and we enjoyed the conference thoroughly.
Final Thoughts
Our project benefitted immensely from a close partnership with the Kroger team and their willingness to provide support and in-depth information. Many of the most rewarding aspects of our work were only possible because Kroger was willing to provide us with weekly sales data for each item at every store. The success of every analytics project depends on the quality of the data, and we were fortunate to have access to such detailed records.
“We were delighted to support this initiative, both for the benefit of the students and the value we derived from their efforts and expertise,” said Mike Lamb, LPC, vice president of asset protection with Kroger. “We know that in order to stay ahead of the ever-changing environment that affects shrinkage and waste, the benefit of analyzing our data in a meaningful and thoughtful way allows us to take a proactive versus reactive approach in mitigating our shrink. This not only serves to improve shareholder value from a profitability point of view but also enhances our in-stock position and product freshness focus.”
Analytical, data-driven investigations are important to the entire retail industry. Shrink exists in all sectors of retail, and this type of project benefits the entire industry. Through accurate root-cause analysis, we can help individual stakeholders identify different pain points and achieve a level of detail that was not previously possible. In addition, these methods provide tools that allow greater visibility into the business, making retail processes more efficient.
With the diversity of products and vendors, seasonal variations, and highly perishable goods, the grocery industry can be extremely complex. This project demonstrated that signals can be found even in this noisy produce data and reflects how similar methods can be applied to other retail sectors. Investing in analytics can benefit retailers in many different ways, whether in terms of reducing losses, being better prepared for future issues, understanding the retail customer, and running more efficiently as a whole.
Finally, we would like to thank each and every person who helped in this project, including our sponsors at Kroger, Aaron Medley, Jason McClure, and Mike Lamb—thank you for supporting our team and answering our endless questions regarding the data and work at Kroger. Ed Tonkon, our sponsor from Zebra Technologies—we appreciate your time and thank you for your ideas, feedback, and comments. Our sponsor at RILA, Ellen Jackson-thank you for listening in on every call and coordinating our trip to the conference in Orlando. Finally, Dr. Mike Hasler and Dr. Ramesh Rajagopalan at The University of Texas for guiding us through this project, providing us with critical feedback, and helping us provide the most value through our work.
SIDEBAR: Collaborating with Tomorrow’s Industry Leaders
By Ed Tonkan, President, Zebra Retail Solutions
The annual Retail Industry Leaders Association (RILA) Asset Protection Conference provides attendees with exceptional opportunities to connect with peers on the most pressing issues facing the industry while exploring the innovative technologies currently transforming asset protection. However, the event also provides a unique opportunity to collaborate with tomorrow’s leaders through an innovative program.
Founded under the guidance and direction of Lisa LaBruno, senior vice president of retail operations, the RILA Student Mentor Program was established to integrate the skills and insights of a prominent retail chain, a retail solution provider, and academia as part of a semester-long project. Each year the program focuses on a major area of interest for loss prevention professionals, with the students presenting their findings at RILA’s annual Asset Protection Conference.
Participating students are selected from The University of Texas at Austin, McCombs School of Business, where they are pursuing their master’s degrees in business analytics. The master’s program produces data scientists and was recently ranked second of its kind globally. Michael Hasler, PhD, director of the program at McCombs, has supported the RILA Student Mentor Program for the entirety of the collaboration and has developed it into a capstone project for his students.
Over the past seven years, retail participants have included JCPenney, 7-Eleven, The Home Depot, and most recently The Kroger Co. under the leadership of Mike Lamb, LPC, vice president of asset protection. Aaron Medley, the senior manager of asset protection analytics, was also very involved with the students throughout the project as they applied their advanced analytical skills to over 25 million rows of data.
Along with the respective retailers, I’ve had the privilege of serving as co-mentor for the program each year and have seen it become a valuable opportunity for all involved. In addition to presenting at the conference, students have the opportunity to become fully immersed in this top retail loss prevention event, engaging with both retailers and solution providers at the show. The retailer also receives key insights from these data scientists that it can leverage to improve business operations. I’m personally inspired by the results of RILA’s college student program as the students work with industry mentors to complete these research projects.