CEO, Vionvision PTE Ltd.
Highlights:
* The argument that binocular sensors are more privacy-compliant is specious, as both use exactly the same imaging sensors, and binocular sensors even use one more.
* Rapid advances in artificial intelligence SOCs have made the new generation of monocular sensors superior in terms of measurement accuracy and robustness.
* Monocular footfall sensors are better suited for ultra-high and ultra-low ceilings and concealed installations, while binocular sensors are more advantageous in side-view mounting situations.
* Monocular sensor solutions can save end customers more than 20% in overall expenses..
After understanding the technical principles behind binocular and monocular footfall sensors [1], here we can make a detailed comparison of the advantages and disadvantages of these two sensors.
The future development of the two sensors is also analyzed and evaluated. These comparisons and evaluations also help users when choosing footfall sensors and, more generally, when choosing video analytics products and solutions [2].
Both binocular and monocular sensors are based on imaging sensors that acquire optical information for further processing. Both sensors require some common practices for privacy protection:
No Images should be stored in the sensor: footfall sensors should only process images in real time and then discard them immediately. Sensors should only store metadata temporarily and transmit it to an external destination.
Image information should not be transmitted outside the local LAN: Image information can be shared in a local network environment, but such information should never be transmitted over the Internet to an external destination, even in a VPN environment. Transmission of image information not only causes privacy breaches, but often causes bandwidth problems when considering a large number of deployments.
No facial recognition: Facial features are considered to be a form of biometric information. Due to varying regulations in different parts of the world, it is safest to avoid using any facial image processing algorithm components, including facial detection, feature extraction, demographic analysis, and even shoplifter identification.
Binocular Sensors: Binocular sensors use two, rather than one, imaging sensor, and therefore also require careful privacy measures to avoid disclosure of the sensor's image information. In addition, binocular sensors may use data directly from the imaging sensor along with the depth maps to enhance functionality, so the above considerations apply to these sensors as well.
Monocular Sensors: Some monocular footfall sensors with limited computing power or algorithms that have not been carefully optimized are likely to leverage external AI computing power, particularly GPU resources in the cloud, to expand the feature set. This will result in image information being transmitted outside the store and end users should be very careful about this.
Some vendors argue that binocular sensors are more privacy compliant because they do not use the image information directly, but instead use the depth map as the only source of information for further processing, which eliminates the image texture information and thus avoids the risk of leaking biometric-type information about the customer. However, vendors supporting monocular cameras argue that this argument is specious because monocular sensors discard all image information after processing the image and output only statistical data, which is the same as binocular sensors, except that the step of calculating a depth map is reduced. Both sensors use the same type of imaging sensor and acquire a stream of video images, and the binocular sensor even uses an additional imaging sensor.
Binocular Sensors:
Typical binocular sensors obtain statistics such as foot traffic (including staff and visitor repetitions), occupancy (entries and exits), and average length of stay (the time integral of occupancy divided by the mix of foot traffic, staff, and visitors).
Tracking and Matching: The binocular foot sensor detects and tracks people based on a depth map sequence, which lacks image texture information. As a result, it cannot distinguish between people of similar size and height unless the entire store is completely covered and tracked full-time, which is very expensive. It also fails to match people when they re-enter the scene or appear in another non-overlapping sensor.
Demographics, staff exclusion and dwell time distribution: It is relatively easy to distinguish between adults and children because body size and height information can be obtained from the depth map. However, customer flow attributes such as gender and age cannot be estimated from the depth map alone. Staff exclusion is not possible unless complete coverage of the entire store and unobtrusive tracking of everyone in the store is achieved. Staff exclusion is often achieved by external information (e.g., RFID name tags) or special movement of employees at the entrance. With regard to the dwell time of individual visitors, there is no image texture information to enable matching of visitors as they enter and leave the store, and therefore no statistics can be obtained in this regard.
Monocular Sensors:
Statistics obtained from a typical monocular sensor include footfall traffic with/without staff exclusion, and with/without visitor repeats removal, dwell time distribution, demographics including age and gender estimation, and visitor group statistics.
Tracking and matching: monocular sensors directly process the sensor image to detect, track and match pedestrians. Texture features can be extracted and non-biometric and privacy-preserving Re-identification of Persons (ReID) techniques can be used for reliable tracking and matching tasks, even with non-overlapping sensor coverage. This dramatically reduced the number of sensors necessary for obtaining statistics such as dwell time distribution, staff exclusion, and grouping. With the help of the holistic body features, it is also possible to analyse the demographics of the visitors.
Complete conversion funnels with only two sensors: Measuring the complete conversion funnel is the ultimate goal of a people flow sensor. Simply install a sensor at the entrance to measure passersby, entrants and exits with staff exclusion and visitor repeats removal, dwell time distribution, demographics and groupings. Then install another sensor at the till to serve as a ReID checkpoint for the final stage of the conversion funnel - "purchase".
The accuracy of the various statistics obtained by footfall sensors is critical to providing quantitative insights into customer experience and operational effectiveness. If the data is inaccurate, not only will it fail to do so, but it can even be misleading and lead to incorrect conclusions or adjustments. However, vendors often exaggerate the accuracy of their products, making it difficult for buyers to purchase the right product. In order to understand the factors that may affect the accuracy of human flow sensors, it is necessary to verify them more closely. Factors that affect the accuracy of human flow sensors include
Gathered crowds, which are caused by high traffic volumes
Umbrellas, door frames, curtains and other shelters
Reflections, shadows
Mannequin, portrait floor display boards
Very high or low light conditions
Plants, balloons (specifically for binocular sensors)
An important aspect of accuracy is the robustness of the accuracy. In practice, the robustness of the accuracy becomes more important when the accuracy is above a certain range (e.g., 98%). That is, when a sensor is installed in 100 locations, it is better to achieve 98% accuracy in all locations than to achieve 99% accuracy in 50 locations but only 90% accuracy in the rest of the locations. The correct way to define the accuracy of a sensor is what accuracy can be achieved in at least 90 of the 100 locations where the sensor is installed. Though I doubt this will be adopted by vendors.
Binocular Footfall Sensors:
Prior to 2010, binocular footfall sensors had a significant advantage due to their resistance to the aforementioned disturbances such as shadows and reflections. At that time, most monocular sensors still relied on motion detection and image change tracking to calculate the number of people, which was susceptible to disturbances such as shadows or reflections.
From 2010 to 2018, after the computational power of embedded computer vision systems eventually reached more than 50 GOPS (e.g., TI OMAP 3630), it became feasible for monocular sensors to distinguish between shadows and reflections of real people using more sophisticated pedestrian detection algorithms such as AdaBoost and SVM. The two sensors achieve a similar level of 95% accuracy in people counting.
Starting in 2018, with the rapid development of AI SOCs and the adoption of sub-16nm processes, today's AI chips can be 1TOPS or faster (e.g., the Movidius Myriad series), with high computational power and power consumption as low as around 2 watts. This has dramatically changed the competitive landscape between binocular and monocular sensors, making the latter superior. Very powerful and efficient deep learning models can be integrated into the sensors, making them resistant to almost all of the disturbances mentioned earlier in this article, essentially eliminating the previous advantage of binocular sensors in terms of accuracy robustness. In addition, more powerful pedestrian matching algorithms such as ReID can be integrated into the sensors, which will greatly enrich the capabilities of footfall sensors and revolutionize the smart retail industry.
Both monocular and binocular sensors are suitable for overhead or slightly tilted viewing angles, and for mounting heights ranging from 2.8 to 4 meters. This is the most common situation in retail scenarios, where the sensors can be embedded in the ceiling, surface-mounted or boom-mounted. However, there are some scenarios where the adaptability of these two types of sensors is quite different, and we will elaborate on them here.
Low Ceilings:
The advantages of monocular sensors become apparent when installed in stores with low ceilings (e.g. 2.3 meters high), as the use of ultra-wide FOV lenses can cover entrances more than 3 meters wide, whereas traditional stereo sensors can only cover a much smaller width. Modern monocular sensors can support FOVs of 140 degrees and beyond, even approaching 180 degrees with fisheye lenses, significantly reducing the number of sensors in a project.
High Ceilings:
Binocular sensors require a wider baseline when mounted on high ceilings to maintain the same depth resolution, which necessitates a variety of sizes for binocular sensors, with the horizontal size increasing the higher the mounting. For monocular sensors, when mounted under higher ceilings, only lenses with longer focal lengths are required to maintain roughly the same spatial coverage of the sensor.
Severe Tilt and Side View:
The advantage of binocular sensors is that they can know the distance between the pedestrian and the sensor when the footfall sensor is counting from the side, thus knowing the pedestrian's position in the scene. However, more sophisticated segmentation algorithms are required to separate individual pedestrians from clusters. On the other hand, monocular sensors can still use pedestrian detection, tracking, and ReID matching algorithms to monitor activity in the scene, but without depth information, pedestrian localization may require complex algorithms such as obtaining depth information from a single image.
Concealed Mmounting:
Monocular sensors are designed to be unobtrusive and easy to integrate into a variety of environments. Their compact size makes them ideal for locations where space is limited or for applications that require a covert surveillance solution. Binocular sensors have a baseline of at least 7-9 centimeters, so the sensor size is generally not too small either.
Hardware Costs: Binocular and monocular sensors have similar unit costs. Binocular sensors are more costly in terms of adding more sensor optics, while monocular sensors generally require more computing power. As a result, the hardware costs of both sensors are similar. However, because monocular sensors have greater coverage, the total number of sensors used in a project can be approximately 20% less than a binocular system, resulting in a lower overall hardware cost for a monocular system.
R&D Costs: The development of an industry-leading monocular sensor is generally much more complex than that of a binocular sensor due to the need to embed more sophisticated computer vision algorithms and implement more analytics to output higher dimensional traffic analysis data.
Subscription Fees: Access to data analytics platforms, reporting tools and technical support requires an annual subscription fee. This software-as-a-service (SaaS) model ensures that organizations have continuous access to the latest software updates and customer support. Monocular sensors generally have higher subscription fees than traditional binocular sensors due to the amount of data they require and collect.
Installation and Maintenance Costs: Installation and maintenance costs are similar for both sensors, with the data accuracy verification process and data integrity maintenance costs generally being slightly higher as functionality is added.
Understanding these cost components is essential for businesses to budget effectively and ensures that a people counter system delivers a return on investment through optimized space usage and long-term expenditure savings, while normally comes with a rough conclusion that monocular sensor solutions could save the end client with over 20% of overall long-term expenditure than binocular ones, further reinforce its competitiveness in multiple application scenarios.
Understanding these cost components is critical for organizations to effectively budget to ensure that people counting systems deliver a return on investment through optimized space usage and long-term savings, and it is often roughly concluded that monocular sensor solutions can save end customers more than 20% of their overall overall expenditures, further enhancing their competitiveness in a wide range of application scenarios.
References:
An In-Depth Comparison of Binocular and Monocular Footfall Sensors- Part 1
An In-Depth Comparison of Binocular and Monocular Footfall Sensors- Part 3