I’ve been working on a project recently called a computer vision piano. I’m still working on it, so I should have more to show soon, but I’ve got at least enough to start showing it off and explain how it works. Here is a quick youtube demo to show what I’m talking about.
The algorithm that we’re using here basically has three parts: circle detection, template detection, and note detection. First we detect circles, then we make sure that there’s a template in the center to ensure that it’s a “valid” detection, then we can look at the note to determine what we should play.
Circle detection is handled using HoughCircles, which works by first finding the edges in a picture and then identifies likely circles from that edge information. I won’t really be explaining exactly how this works, but it’s enough to just know that it finds circles from edges. The problem here is that we’re going to end up with a decent number of false positives, even if there’s nothing else round on screen. Here’s an example where there’s only one valid circle, but a few others are detected just because there are dark and light edges at just the right places.
Now our problem is that given a list of circles, we have to determine which ones are actual notes being displayed and which ones are just false positives. That’s what this image right in the center of a note is for:
Once a circle is detected, I’ll have an x,y coordinate of the center and then the radius. If this is a valid circle, then I should see that exact template image (or at least very close to it) right at the center. Imagine taking a small piece of the image from the center of the circle, and then comparing it pixel for pixel with the image above (after scaling it down so that they’re the same size). We then have a score for how well the center of the circle from the image matches that template above. From there, we just have to set a threshold for how close the two images have to be, and we can throw out all of our false positives.
We’ve now got our set of circles, and we’ve gotten rid of (hopefully) all of the false positives. Now we just need to identify the note, and we’ll be done. So we want to take a section of the image and determine which letter symbol it is closest to. If that sounds familiar, it’s because that’s basically what we just did! Instead of taking that template image and comparing it to the center of the circle, we need to take an image of the letter and compare it to the upper half of the circle. We just try to match every letter to the top half of the circle image, and then we assume that whichever letter had the highest score must be the letter being displayed.
And that’s basically it! There are a couple more things going on than I mentioned, but that was pretty much everything that you need to know to understand the code.
Code is available here if anyone wants to play with it: https://github.com/ArtificiallyInteresting/CVPiano