While sitting in the restaurant or shopping somewhere, we may suddenly hear a song that catches our mind and never leaves our head. We want to know what the name of that song was and who sings it so we can find this song and upload to our smartphone. In this article, you will learn how to build your mobile application that can identify any known melody.
You may have heard about various mobile apps that can do a magic of recognizing soundtracks using a microphone on a mobile device. The most known solution is Shazam which was first launched in 2002 in the U.K. Its music catalog contains billions of songs and their number increases every day. In September of 2016, Shazam surpassed one billion downloads and doubled its annual revenue growth rate.
More than 120 million users utilize this app to identify their favorite songs and it contains a database of more than 1,800 verified artists. So, what has ensured such a giant leap for startup popularity? The answer is in the list of benefits the app provides and the technology that is used in the project.
How Shazam Works
From the user’s retrospective, the process of music recognition is clear and simple. While the music is playing, a user just has to tap the main button to make the app identify sound fingerprints and find the exact match in its database. Once a song is recognized, a user gets the detailed information such as a song name, artist, lyrics, etc. But how does Shazam recognize songs? This is where all magic happens.
An author of its algorithm is Avery Wang who is a chief scientist in Shazam. In 2003, he told Scientific American how the idea had occurred to him. He realized that in that time, the business approach was quite impractical in the company. To compile a song signature, it needed to process too much data that can be received from each song. So he decided to highlight the most important indicators that are unique for any soundtrack: frequency, amplitude, and time. The application measures three values every second of the song playing. Then it compares the determined data with the one that is stored in its database.
- Soundtrack recognition;
- User accounts;
- Music sharing;
- Music purchase;
- Visual material recognition;
- Offline mode.
Another amazing fact about this application is that it has versions for all popular mobile platforms, Web, and even smart watches (both Apple Watch and Android Wear). Despite how useful and well-designed Shazam is, it still has significant competitors.
Music Identifying Apps Like Shazam
The applications like Shazam have totally changed the way we usually listen to music. When we got hooked by a song we had never heard before, we used to have only one option to discover the soundtrack which is trying to remember a couple of lyrics sentences to type them into a search later. Fortunately, now we have various alternatives for discovering new artists and music styles.
SoundHound is one of the best apps for identifying songs and Shazam’s alternatives that exist on the market of mobile apps. Its main advantage is providing many different ways to discover a song. Users can sing, play, or even hum a soundtrack to get it recognized. Moreover, the application lets type or pronounce the lyrics or the artist’s name to find the match. SoundHound is available for both iOS and Android platforms.
Track ID is a service created in 2006 for Sony Ericsson mobile phones. Since that time Track ID has gained a wide popularity among all music fans who use Android-based mobile devices. It allows users to identify soundtracks using a smartphone-based microphone. This application also contains a huge database of artist biographies and has both online and offline modes. Unfortunately, iPhone fans cannot use Track ID app.
Tunatic is another Shazam free alternative that has an expanding database of songs. Users can discover a song by playing a melody through the device-based sound card or record a song using smartphone’s microphone. Tunatic was developed as a software for Windows and Mac OS platforms.
How to Create a Music Identification App for Android and iOS
Before starting creating a code for a new app like Shazam, it is crucial to understand the algorithm that recognizes a soundtrack by comparing it with audio fingerprints stored in the database. So what is an audio fingerprint?
An audio fingerprint is a set of data that describes a piece of a soundtrack as a time-frequency graph. This graph is called a spectrogram that contains three axes with the following parameters: time, frequency, and intensity. It means that the graph shows the frequency intensity at a specific moment.
The Shazam algorithm generates this 3D graph when a user turns a mic on. It remembers frequency points of the peak intensity during each period. The algorithm forms a hash table with two columns of data for each song: time and frequency. Then it compares this data table with the ones in the database. If the match is found, a user receives the information about the soundtrack. If vice versa, an error notification appears.
How a music identifying algorithm works:
- Analog sound waves come to the sound recording element (a microphone).
- The system records and samples the sound.
- The system performs the sound conversion to the frequency domain.
- The algorithm creates an audio fingerprint.
- Finding a fingerprint match within the storage.
- Capturing and Recording the Sound.
Capturing the Sound
All modern sound cards have analog-to-digital converters and that makes writing a code, which would initiate sound recording, much easier. It is important to understand at least basic digital characteristics of a sound to write a proper code.
An order of actions in any language for capturing the sound:
- Choose an appropriate library;
- Set the frequency sample;
- Set a sample size (a number of bits);
- Set a number of channels (stereo or mono);
- Indicate the data format (signed or unsigned);
- Set a storage type of the audio data (little-endian or big-endian order);
- Run the command to get an audio format of a sound (“return new AudioFormat”).
After having a sound captured and then recorded, it comes the most important part of the whole music recognition algorithm. To develop a music identification app, it is necessary to implement audio fingerprinting which is the way how the system distinguishes various soundtracks.
Before creating an audio fingerprint, the algorithm divides a converted sound into particular data chunks for further Fast Fourier Transform (FFT) analysis. This is where the knowledge of further mathematics will help you create a proper code. To avoid analyzing the entire frequency range, it is better to pick particular smaller intervals and analyze them separately.
You may take the following frequency ranges for further analysis:
- 20-40 Hz;
- 40-80 Hz;
- 80-120 Hz;
- 120-180 Hz;
- 180 Hz-300 Hz.
The 20-120 range covers low tones of a sound and the 120-300 range covers high tones. Why should you take a range starting from 20 Hz? Because it is the minimum frequency a human ear is able to recognize. Then a music recognition algorithm has to identify frequencies with the highest magnitudes. The data, which includes those frequency picks, forms a digital fingerprint of a captured soundtrack.
Another important part of creating the algorithm is taking a fuzz factor into account. In real life situation, the usage of the algorithm never means recording a sound in perfect conditions such as a fully isolated or “deaf” room. One way or another, there will always be some other noises that will interfere with proper sound recording. That is why it is crucial to implement noise reduction in the algorithm to remove external noises from the audio signal. Furthermore, your application must have an option to set the noise reduction depending on surrounding conditions.
A frequent story is that many hashtags of different songs coincide. That is because one song can sound very similar to another one. Therefore, timing is the other parameter you have to cover with your music recognition algorithm. Those frequency peaks we have covered above can provide a unique audio fingerprint only when they are connected to a specific point of time.
Moreover, the more peaks you compare, the more certain result you get. That is why applications like Shazam require recording up to 20 seconds of a sound to get it recognized. So, the last step of creating a proper music identifying algorithm is to write a simple script that will compare a generated audio fingerprint with the ones stored in your database.
A Simple Way of Music Identification App Development
In 2011, the U.S. company Echo Nest, which specializes in the development of technological projects in music related industry, launched the Echoprint platform allowing developers to create their own app for music recognition. The Echo Nest’s technology can be easily implemented in other apps to reduce expenses on a long-term development of such solutions. Its functionality is not limited to music recognition using a short sample.
Echoprint also provides such features as:
- Recognition and fulfilling tags in music collections;
- Finding music duplicates;
- Checking whether a particular audio or video file contains specific material;
- Music collection synchronization.
It works in a comprehensive way: the platform launches the client-side code generator that forms a unique fingerprint of a recorded sound. Then the software sends this data to the server to find a match and recognize a soundtrack. Echoprint’s main advantage is a possibility to implement it in any other software by initiating the following short command: “Codegen * pCodegen = new Codegen(const float* pcm, uint numSamples, int start_offset);
string code = pCodegen->getCodeString();”
how Much Does It Cost to Build a Music Identification App
The first stage of writing a code for any application is backend development and NodeJS suits well for this task. The necessary period for complete backend development is approximately 500 hours. To develop a Shazam clone, you will need to implement such components as user accounts and displaying information about artists and soundtracks. It will take up to 800 hours.
A communication module, which includes location defining, and push notification, and so on will require approximately 400 hours of programming work. Other 300 hours will be needed for creating a module for smart watches, animations, and ads. The total time estimation for building the platform is about 1500 hours.
A pipeline of music recognition app development:
- Backend development;
- Creating a design;
- Native platforms development;
- Web application development.
- App testing.
Taking into account that an average rate of programming work is about $50 per hour in the U.S. and Australia, a price of the project development is about $100,000 excluding expenses on creating a design, web app development and quality assurance. Therefore, you may add $15,000 – $20,000 to the previously calculated cost for custom development.
However, if you use programming services provided by companies from other countries, you can reduce the total cost almost twice. The average rate of programming in Eastern Europe is about $25-$30 per hour. That is why you can spend only $60,000 -$70,000 to develop a Shazam clone. Lunapps is the company that is located exactly in Eastern Europe. Furthermore, your ideas will be fully realized in a complete product with a perfect design and advanced functionality. We will be more than happy to assist you in native app development for both iOS and Android platforms. Contact as at firstname.lastname@example.org for more information.