RYH: Sensors & Interactivity

Raise Your Hand: Sensors & Interactivity

The primary goal of the Sensor Processing & Networking (SPN) Team is to create an immersive, interactive art exhibit by allowing exhibit visitors’ motions to directly impact the exhibit, so they can sense auditory, visual, and physical changes in response to their actions.


Networking

Ethernet was selected by the SPN team as the communication medium due to its reliability, low latency, lightweight, and long-distance coverage capabilities. Wi-Fi was discarded as an option due to concerns about its reliability and latency in a public space with varying traffic loads; similarly, USB was also ruled out because of its limited range.

The Ethernet network is implemented as a virtual local area network (VLAN), allowing the exhibit to be moved to different locations on Georgia Tech’s campus without changing the IP addresses used by computer programs. DHCP within the VLAN ensures that devices receive the same IP addresses each time they boot up.

Originally, the plan was to use the Robotic Operating System 2 (ROS2) for data interpretation and communication with various receiving platforms like Unity, Arduino, and Max for LIVE. However, issues with the installation of ROS2, particularly related to subscriber and listener nodes, caused the team to abandon this approach. Instead, they decided to send UDP packets directly from the Mediapipe programs.

The Python program handles the transmission of Ethernet packets to different receivers based on the type of effect to be created. Each Arduino is equipped with an Ethernet Shield featuring a unique IP address and port number.

To ensure smooth networking, all devices within the exhibit are connected to a single virtual local area network (VLAN). Port numbers are chosen to be unique across the entire exhibit, and a document with mappings of computer labels to IP addresses is maintained.


Sensors

Two types of sensors were used: USB cameras for capturing key points and LIDAR for intrusion and occupancy detection.

The cameras were positioned above the rear-projection screens at a height of 7 feet 2.5 inches, angled slightly downward. The Spedal Wide Angle Webcam was the final choice of camera with a 120 degree Field of View (FOV). However, it’s wide FOV led to issues where a participant in one section could affect the neighboring section. This was resolved by applying Gaff tape over the camera’s lens, blacking out the adjacent section from its field of view.

For facial expression recognition, a 1080P USB camera with a 5-50mm manual zoom lens was used. To illuminate the participant’s face, a gooseneck desk lamp was strategically placed in a cluster of leaves on the floor of Section 2.


Sensor Processing

The following section describes the calculations that were performed on the Mediapipe and LiDAR outputs.

The following equation computed the Normalized Hand Height (NHH). Mediapipe provides 3D coordinates for a total of 32 keypoints, however only a portion of them are needed to calculate the NHH.

Y_LW = Vertical coordinates of the left wrist

Y_RW = Vertical coordinates of the right wrist

Y_S = Averaged vertical coordinates of the shoulders

Y_H = Averaged vertical coordinates of the hips

Since most people’s torso length is roughly equivalent to the length of their arm, the NHH ranged from -1 (both hands by the sides) to +1 (one hand at highest height).

Torso tilt (TT) was computed through the two following equations-

y_REL = Intermediate variable

Y_M = Averaged vertical coordinates of the left mouth corner and right mouth corner.

Y_E = Averaged vertical coordinates of the left and right elbows

The image below shows the LiDAR system. Both LiDAR modules communicate with the PC via the UART-to-USB boards and are powered by the boards. Two versions of the software, one for intrusion and one for occupancy, were run on the host PC. The isolated nature of the software minimized downtime overall, as one program could be shut down for edits or testing without affecting the description of the other sensor.