A camera system developed by Carnegie Mellon University researchers can see sound vibrations with such precision and detail that it can reconstruct the music of a single instrument in a band or orchestra.
Even the most high-powered and directed microphones can’t eliminate nearby sounds, ambient noise and the effect of acoustics when they capture audio. The novel system developed in the School of Computer Science’s Robotics Institute (RI) uses two cameras and a laser to sense high-speed, low-amplitude surface vibrations. These vibrations can be used to reconstruct sound, capturing isolated audio without inference or a microphone.
“We’ve invented a new way to see sound,” said Mark Sheinin, a post-doctoral research associate at the Illumination and Imaging Laboratory (ILIM) in the RI. “It’s a new type of camera system, a new imaging device, that is able to see something invisible to the naked eye.”
The team completed several successful demos of their system’s effectiveness in sensing vibrations and the quality of the sound reconstruction. They captured isolated audio of separate guitars playing at the same time and individual speakers playing different music simultaneously. They analyzed the vibrations of a tuning fork, and used the vibrations of a bag of Doritos near a speaker to capture the sound coming from a speaker. This demo pays tribute to prior work done by MIT researchers who developed one of the first visual microphones in 2014.
The CMU system dramatically improves upon past attempts to capture sound using computer vision. The team’s work uses ordinary cameras that cost a fraction of the high-speed versions employed in past research while producing a higher quality recording. The dual-camera system can capture vibrations from objects in motion, such as the movements of a guitar while a musician plays it, and simultaneously sense individual sounds from multiple points.
“We’ve made the optical microphone much more practical and usable,” said Srinivasa Narasimhan, a professor in the RI and head of the ILIM. “We’ve made the quality better while bringing the cost down.”
The system works by analyzing the differences in speckle patterns from images captured with a rolling shutter and a global shutter. An algorithm computes the difference in the speckle patterns from the two video streams and converts those differences into vibrations to reconstruct the sound.
A speckle pattern refers to the way coherent light behaves in space after it is reflected off a rough surface. The team creates the speckle pattern by aiming a laser at the surface of the object producing the vibrations, like the body of a guitar. That speckle pattern changes as the surface vibrates. A rolling shutter captures an image by rapidly scanning it, usually from top to bottom, producing the image by stacking one row of pixels on top of another. A global shutter captures an image in a single instance all at once.
The research, “Dual-Shutter Optical Vibration Sensing,” received a Best Paper award at the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) in New Orleans. Joining Sheinin and Narasimhan on the research were Dorian Chan, a Ph.D. student in computer science, and Matthew O’Toole, an assistant professor in the RI and Computer Science Department.
CVPR is the premier conference on computer vision. The conference had a record 8,161 papers submitted and accepted about a quarter of them. Of those, only 34 were short-listed for best paper awards.
“This system pushes the boundary of what can be done with computer vision,” O’Toole said. “This is a new mechanism to capture high speed and tiny vibrations, and presents a new area of research.”
Most work in computer vision focuses on training systems to recognize objects or track them through space — research important to advancing technologies like autonomous vehicles. That this work enables systems to better see imperceptible, high-frequency vibrations opens new applications for computer vision.
The team’s dual-shutter, optical vibration-sensing system could allow sound engineers to monitor the music of individual instruments free from the interference of the rest of the ensemble to fine tune the overall mix. Manufacturers could use the system to monitor the vibrations of individual machines on a factory floor to spot early signs of needed maintenance.
“If your car starts to make a weird sound, you know it is time to have it looked at,” Sheinin said. “Now imagine a factory floor full of machines. Our system allows you to monitor the health of each one by sensing their vibrations with a single stationary camera.”