The video AI space is fascinating and its potential is massive. Vendors and customers alike get excited about the possibilities, envisioning a compelling world where intelligent machines bestow unparalleled insights or save humans from the drudgery of repetitive, tedious tasks. As you evaluate vendor claims on their offerings in AI and machine learning (ML), it’s important to have both feet on the ground and know what questions to ask to determine what’s possible and what’s not with their solutions.
Here are six critical questions to ask your video AI vendors to assess their true capabilities:
- Do you use image processing or ML?
- Does your ML training data include environments like ours? (initial training)
- Does your ML learn from our environment over time? (ongoing training)
- Does your ML run on-edge (cameras), on-premise (local server), hybrid (split AI) or on-cloud?
- How do you protect us from excessive false alarms?
- What are the hardware costs required to deploy your solution?
Now let’s delve into each question more deeply:
1. Do you use image processing or machine learning?
Image processing technologies, around for decades, involve taking an image as input, analyzing the image pixel-by-pixel, and getting a result as output. This approach is limited by the scope of the computer algorithm itself. For example, to detect a cat you’d have to know all possible combinations of pixels that could represent a ‘cat’. If the algorithm didn’t account for a half-obscured bobcat, you couldn’t detect it.
Machine learning relies on pattern recognition. Give the machine a set of examples (the training set), and it ‘learns’ how to detect that pattern. In our cat example, instead of coding the system pixel-by-pixel to detect a cat, we’d feed the ML system thousands of pictures of cats, so it learns what a cat looks like. Research has proven this to be a more robust and accurate strategy for cat detection—or any other type of visual recognition.
Our recommendation: Given the astounding advances in machine learning lately, you should only work with vendors using ML, avoiding vendors using traditional image processing.
2. Does your ML training data include environments like ours? (initial training)
The results delivered from an ML system are only as good as the data used to train it. The GIGO principle—garbage in, garbage out—truly applies here! Vendors thus need to have robust training data that includes examples from pertinent environments to create their ML models.
For example, if you’re trying to identify fire and smoke in an open air environment, it’s of no use to train your model on fires and smoke detected indoors. Similarly, even forested area fire and smoke will not help your model accurately detect those conditions in an open field. Many an ML prediction has failed not because of a deficit in the core model but because it was not trained adequately and correctly for the use case for which it was intended.
Our recommendation: Make sure your vendor can confirm environments like yours were included in their training data, and they’re not using a generic off-the-shelf ML model.
3. Does your ML learn from our environment over time? (ongoing training)
Life would be easy if you could just focus all your energy on correctly training your ML model and then move on. Unfortunately, as often, ‘set and forget’ doesn’t work with ML.
‘ML model drift’ is something like digital entropy. Recall ‘entropy’ from high school physics (sorry for the flashback!)—things left to themselves get more disordered over time. Similarly, ML models not periodically re-trained tend to ‘drift’—their predictive power degrades as the real-world environment slowly changes around them. This could be a new type of camera that was introduced, and now there’s a whole new perspective to contend with. Or the seasons changed, and ambient lighting conditions added a color hue that caused an object to no longer be identified properly.
No matter the reason, the only way to ensure continued accuracy of your ML models, and hence the quality of the detections, is to have a periodic, systematic process to keep the ML models up-to-date—akin to the industrial approach of ‘kaizen’ or continuous improvement. This involves continually monitoring detections, tracking false positives and false negatives, and re-training models daily. If your vendor doesn’t do this, or does it once a quarter or year, you will get a lot of false alerts, or worse, silently miss critical problems.
Our recommendation: Ensure your vendors improve their models daily, based on observed false positives and false negatives in your environment.
4. Does your ML run on-edge (cameras), on-premise (local server), hybrid (split AI), or on-cloud?
There’s no ‘right answer’ here. The ML ‘smarts’ can reside on the camera itself, or be offloaded to a server sitting on-premise. Or the ML processing could be done on the cloud. Finally, the ML workload could be ‘split’ between on-premise and cloud.
If you need video analytics applied to exactly one camera, choose an ‘AI camera.’ One capital expense, and you’re all set! Edge/camera-based ML is becoming more popular as camera chipsets become more powerful. Unfortunately, ML models get stale quickly, and firmware updates are a big hassle. As research advances and new models are published, keeping these cameras up-to-date becomes near impossible.
With on-camera analytics, if you have more than one camera, something as basic as ‘people counting’ does not work properly. Imagine an overlap between the viewable area of two cameras, and you can see how that leads to double counting of people in that overlapping area. Server-based or on-premise analytics have more flexibility and can support more complicated ML models and reconcile analytics from multiple cameras. However, on-premise GPU and servers cost money (and space!) and can blow through budgets quickly. Add to that all the ongoing maintenance required, and it’s no wonder on-premise analytics have been deployed in limited fashion.
Cloud ML implementations offer more flexibility as they can easily incorporate the latest advances while avoiding on-premise complications entirely. The downside is that the video needs uploading to the cloud for the analysis, straining low bandwidth environments.
The optimal solution is a hybrid model such as Split AI—splitting the ML analysis across the server and cloud. This reduces the upload bandwidth needed to analyze video in the cloud while also removing the need for costly GPU installation on-premise. This helps offset the disadvantages of on-prem vs. cloud, while maximizing the benefits of both.
Our recommendation: Buy a single AI camera if you need a one-off solution. Get an on-prem solution if you have multiple cameras and a solid hardware budget. However, get a cloud solution instead if you have sufficient bandwidth. To optimize on both on-premise cost/space and bandwidth, select a Split AI hybrid solution.
5. How do you protect us from excessive false alarms?
All alerting systems can lead to false alarms. If your vendor does not improve their ML models daily (see #3), they will slowly ‘drift’ towards more and more false alarms. How do you tackle this problem?
Good vendors, such as some weapons detection companies, deploy their systems in tandem with a control center that review alarms before they’re dispatched. Better yet is a generalized service, such as Screener+, which can work with any AI model to ensure that 99 percent of false alarms do not get dispatched. Combined with a continuous improvement process for the ML models, the system will get smarter and false alarms will decrease. Meanwhile, if false alarm dispatches are cost-effectively curtailed, the ML deployment will have achieved its aim.
Our recommendation: Ensure your vendors can answer how they’ll handle any spikes in false alerts since the cost of dealing with them falls on you, not on them.
6. What are the hardware costs required to deploy your solution?
An AI/ML configuration with unlimited access to CPU/GPU would be ideal. But resources cost money, and most companies are negotiating the tradeoff between technology that works for them and the cost to deploy it. Therefore, it’s important to know whether the solution you’re evaluating has hardware already included in the price, and ensure the full cost is listed explicitly. Otherwise, you’ll be responsible for acquiring, configuring, maintaining, and upgrading a (potentially expensive) server or other device to ensure continuity of your ML services.
Our recommendation: Look at a ‘fully-loaded’ cost, that includes not just the vendor licensing fees, but also the purchase cost, installation, configuration, and ongoing maintenance of such hardware.
These six checklist items are by no means exhaustive. But they should arm you with enough background to help make educated and informed decisions on which video AI solution will work best for your needs and in your environment.
Padma Duvvuri is co-founder and VP of business at Dragonfruit AI, an award-winning enterprise video AI company. She has extensive experience crafting GTM strategies and jump-starting digital transformations, particularly in the IoT and AI arenas. Previously, she served as the VP of business development at Electric Imp and the AVP and head of global IoT practice at GlobalLogic (now a subsidiary of Hitachi Ltd). She holds a BS with honors in EECS from UC Berkeley, an MS in Computer Science from John Hopkins University, and an MBA from London Business School. Padma is also a passionate public speaker who cares deeply about fairness in AI systems and humanitarian tech.