Previous posts in this series have addressed theory, retail fraud, and methods for addressing specific problems for loss prevention.
In this post, we move one step further and describe the tools, team members, data infrastructure, and planning needed to develop an analytics environment within a retailer’s loss prevention team.
Most retailers today do not have a traditional analytics group within their loss prevention department. The goal of this post is to create a roadmap for loss prevention teams to plan for and adopt a culture of analytics.
The use of analytics and predictive modeling is a critical component in the future of loss prevention. The ability to assess patterns in data, measure loss prevention programs, and make decisions in real time is fundamental in solving complex issues related to customers, sales, and loss. As criminals become more sophisticated, they will always go to areas of weakness; retailers that have not adopted an aggressive strategy for detection and prevention will be easy targets.
An analytics infrastructure can bring significant value to an organization through the use of data and analytical methods. With the proper analytics infrastructure, analytics projects can be performed with a higher probability of success and usability.The analytics infrastructure often refers to the staff, services, applications, utilities, platforms, software, and systems that are used for housing the data, accessing the data, preparing data, estimating models, validating models, and taking actions.
The first step in integrating analytics is to create a data infrastructure. Without the data needed to develop models, analyze performance, and deploy solutions, there will be no foundation to build on. Computing data storage infrastructures have changed radically in the past 15 years. This change is good news for those wanting to do more with their data. Companies like Google and Facebook have destroyed these boundaries by architecting parallel computing frameworks like Hadoop, HBase, Cassandra, Greenplum, Netezza (now IBM’s PureData for Analytics), and other tools, which allow for supercomputer-type power for a fraction of the cost. Some of these are fairly low-cost solutions and should be considered.
Big data and its associated tools are allowing practitioners solve more complex problems and tackle traditional problems more rapidly. Loss prevention teams should be using these tools to stop criminals, prevent fraud, and reduce shrink. They can begin today by setting the building blocks in place to build a world-class analytics infrastructure. See Figure 1 for a good barometer to tell you when you are nearing a transition point to a big data infrastructure.
You will also need to have the appropriate computing infrastructure. This will be the (1) analytics software, (2) data management software, and (3) appropriate hardware. Analytics teams, in their infancy, may cope with general analytics using Excel, but as the group begins to tackle more complex problems, the group will invariantly need to use a more robust statistical programing software package. There are a few options to choose from, the most common being R, SPSS, and SAS. To use R and SAS, one must learn the specific coding language, but the benefits are greater flexibility and strength. Another important note is that SAS and SPSS will have license fees, while R is an open-source technology. However, R may be harder to use for the beginner.
As with the software, there are choices regarding data management software and hardware (see Figure 2). Oracle, SQL Server, and MySQL are commonly used data management software tools. Netezza, Greenplum, and Hadoop’s Hive are all data management infrastructures that offer more scalability for larger data projects. The retailer’s IT organization typically tries to maintain only one type of data management software, so it is likely that the analytics organizational function will use the same data management software as the general organization. The more advanced the problems your functional data is tasked with solving, or the more data you have available to tackle, the more powerful hardware you will have to invest in. In our most recent infrastructure, we found it useful to have many high-powered workstations running Microsoft Windows (the Linux operating system works well too), along with the data on Hadoop, Netezza and Greenplum.
It is best to start out with some basic equipment that matches the immediate goals of the team and then grow from there. This does not have to be a costly undertaking. As the team proves their incremental value to the organization and makes a business case for themselves, additional members, tougher problems, and more advanced hardware are likely to follow.
Next, you should consider the team members and skills needed to advance your analytics functions. Here are a few needed team members:
- 1) Team leader – this individual needs to have a business understanding of how analytics can impact the overall organization, should be able to speak to the executives in business terms and translate these ideas to the team in mathematical terms.
- 2) Data experts – these individuals will know how to work with data from many different sources.
- 3) Modeling experts – these individuals can work with data (provided by the data experts) to produce and implement working models.
- 4) Communications leader – this individual can work with key stakeholders and decision makers to ensure that their needs (and those of the end user) are met in the process; they should have an understanding of mathematical concepts and be good at developing presentations and documentation to communicate these concepts to the rest of the organization.
- 5) Project manager – this individual can remain focused on completing modeling tasks, manage meetings and communications, and be a liaison to other teams in the organization.
Measuring the Impact of a Loss Prevention Initiative
Depending on the data infrastructure status of your team, the organization may start with basic analyses or tackle more complicated problems. Following are a few examples of analytical methods, gradually becoming more complicated as data infrastructure grows.
A/B Testing & Simple Experiments
The starting point of most loss prevention tools are the results of trial and error. Someone in the organization has a good loss prevention idea but is unsure if it will work. As a result, a test is run, and outcomes are measured. If the results show that the test was successful (compared with control stores), the program may move ahead (see Figure 3). Many experiments are conducted at the store level, and the best strategy for selecting stores is to use complete randomization. This can be done as easily as putting the store list into Excel and assigning a random number to each store, and then picking the top stores for the study. Following traditional statistical approaches for designed experiments and A/B testing will go a long way in setting up the experiments so that the results are defensible. Basic com¬parison tests, like t tests and chi-square tests for independence, can help to prove the efficacy of the initiative.
Exception-Based Reporting to Identify Fraud
Most exception reporting systems combine data from the point of sale, employee records, item files, and store files into a single system to help users identify fraud cases for further investigation. The system is typically based on a query-based set of rules, which identify a reasonable concentration of fraudulent and abusive employees and consumers. The more accurate the queries, the more cases the investigator can identify and resolve in a brief period.
Advanced Data-Based Investigations
Data handling problems are those where practitioners quickly hit boundaries, due to the size of the data or the complexity of the problem. During data handling many types of data are typically brought together, and investigators will begin to ask questions that the computers or the database cannot solve. Many of these types of questions are often referred to as “N2 problems” (N squared problems) because they usually involve comparing lots of records with lots of records, resulting in a squaring of the data (see Figure 4). Many of the problems outlined in Figure 4 will overwhelm traditional hardware architectures. On a positive note, the problems listed would likely produce many intermediate variables that can be used in further analytics or predictive modeling.
Predictive Modeling and Scoring
Everything we do generates data, and, thanks to the advent of big data, hundreds of scores from models are calculated for every individual based on his or her past behavior in a wide array of industries. These scores, in turn, provide a likelihood of some future behavior, which can be used to drive anything from marketing decisions to banking decisions to crime prevention decisions. While predictive modeling has been used in conjunction with big data to analyze trends for other markets, its application in loss prevention is in its infancy.
From a loss prevention viewpoint, predictive modeling involves performing statistical analyses that may uncover trends in the underlying risks that may indicate the likelihood of future loss. Predictive modeling anticipates future behavior and improves strategic planning. For example, it can identify how an institution can effectively and deliberately target certain fraudulent employees, high-risk locations, and high-risk products-leading to increased action via a more efficient loss prevention process.
Video analytics is evolving rapidly and there is still a lot of room to grow. There are a few barriers in the way or large-scale video deployments. First, video data is very large and moving it across the network can be costly. Second, video data can be difficult to analyze. It can take millions of calculations to identify a face in a video or a product on the counter. Third, while video analytics offer a wide aspect of new cutting-edge technology, the business case may not be a strong one. Forth, there may be privacy concerns when consumers or employees are being tracked by too much technology.
The world of loss prevention analytics is evolving and with it the infrastructure needed to successfully use analytics to provide value to the business. The analytics infrastructure is composed of the staff, services, applications, utilities, platforms, software, and systems. Analytics infrastructure is needed for housing the data, accessing the data, preparing data, estimating models, validating models, and taking actions. As the analytics tasks become more complex, the analytics infrastructure needs to grow as well. Figure 5 shows a sample plan that we believe could move a loss prevention team from the basics to applying advanced analytics.
EDITOR’S NOTE: This post has been excerpted/adapted from the authors’ text, Essentials of Modeling and Analytics: Retail Risk Management and Asset Protection. Learn more.