A Stems Player powered by serverless micro-services

This was a great project! Build a stems player from scratch for a start-up. A stems player is a player that can play multiple tracks, where each track represents a different musical instrument. Then once the user was happy with her mix, she could then download a custom high quality mix.

Sounds simple? It really isn't!

One can't, for example, just stick a few <media> tags together and use javascript to start playing them all at once. This would be the equivalent of stacking a few cd players and trying to hit play all at the same time. Things would be (and sound) out-of-sync. the reason is that individual <media> elements do not obey a single clock so that a busy renderer process will introduce small timing-delays when executing the javascript commands that are to control the players.

The solution is, of course, the Web Audio API which allows one to schedule audio with incredibly high precision.

However this too introduced challenges. To name but a few:

a compressed audio file such as MP3 will decode into memory as raw PCM (32 bit), which means that it will take up a lot more memory than the few MB that the audio-file is made of. Considering that one could be playing 10+ 5 minute stems simultaneously, this could require a worrying amount of memory (not to mention the amount of data that would have to be downloaded before being able to start playback).
due to the way most audio formats are stuctured, one cannot (natively) partially decode audio.
to solve this one can use the HLS protocol, which effectively means cutting an audio file into chunks and scheduling them sequentially. The aim is to only decode into memory the data which immediately required.
this required building a custom HLS server, which transcoded and segmented and then served the source audio files. This we did using AWS lambda and AWS SAM.
it required building a custom HLS "driver" to handle the scheduling of the segments in the browser.
finally, this results in small "glitches" at the point where segments are "stiched" together. This is due to the fact that most audio encoders introduce small bits of data (padding) at the beginning and end of a track. The next challenge was then to achieve Gapless playback - which we did.

One other option would be to mix server-side and send the raw PCM data over via web-socket, but we rejected this option. Firstly, this would require more complex & costly infrastructure; we wanted this to be serverless and have the bulk of the data transfer be done via normal HTTP which could then benefit from CDNs. Secondly, mixing server side would cause an audible delay when changing the volumes due to buffering. Thirdly, websocket traffic is sometimes blocked by corporate firewalls, making the player seem less reliable to end-users.

Development Workflow

Since this project required the creation and publishing of several distinct services & packages, we decided to organise this project as a Lerna monorepo, combined with Yarn Workspaces and publish packages to Github Packages.

By relying on conventional commits, and enforcing these via git hook, we ended in the happy situation where any feature that was being implemented would allow Lerna to automatically determine what version bump would be desired. It would then push the tagged commit to the CI server which would build, test and publish any packages to github packages, using the correct version information.

Additionally, we wanted the serverless micro-services to be published to the AWS Serverless Application Repository (AWS SAR), which is a great way to share serverless applications in an organisation or with third parties. This required us to use AWS Serverless Application Model (AWS SAM).

By utilising the lerna postpublish lifecycle script the micro-services would automatically be published to AWS SAR as part of the lerna publish lifecycle, from where they could be shared with other AWS Accounts or organisations.

Although normally we use Serverless Framework, there are benefits to AWS SAM. For example, since AWS SAM is a superset of AWS Cloudformation it can use cloudformation parameters, as oposed to serverless which generates cloudformation with any environment variables "baked" in, allowing for cleaner deployment artifacts.