
Sound Without Autoplay
Interaction DesignSound on the web has a trust problem, and autoplay created it. Years of auto-playing video ads, background music on Flash sites, and surprise audio on landing pages taught users to associate unexpected sound with hostility. Browsers responded by blocking autoplay audio by default. The policy is correct, and any sound design for interactive web experiences should start from the same position: sound is opt-in, always.
This note covers how to think about sound in browser-native storytelling without falling into the autoplay trap. The principles apply to any project where audio could enhance the experience, from atmospheric scenes in The Journey to interactive demonstrations in the Notes collection.
Why autoplay breaks trust
The problem with autoplay is not technical. It is social. Sound that starts without consent violates the user’s expectation of control. They opened a web page, not a media player. They may be in a quiet office, on a train, in a meeting, or near a sleeping child. Unwanted audio is not just annoying - it creates a moment of panic followed by a scramble to find the mute button or close the tab.
That moment of panic destroys every other quality the page might have. The design, the content, the performance, the atmosphere - none of it matters once the visitor has been startled by uninvited sound. The emotional response is immediate, negative, and lasting.
Browsers enforce autoplay restrictions precisely because the web’s track record with unsolicited audio is so poor. Chrome, Firefox, Safari, and Edge all require a user gesture before audio can play. Attempting to circumvent these restrictions - through interaction event hacking or delayed play calls - is adversarial, unreliable, and ethically indefensible.
Sound as opt-in enhancement
The correct model is to treat sound as an enhancement layer that the user explicitly activates. A visible, clearly labelled audio control - not a tiny icon buried in a corner, but a real interface element - that lets the user choose to hear the audio component of the experience.
The control should be persistent. If the user turns sound on, it should stay on across sections until they turn it off. If they turn it off, it should stay off. The state should be remembered through the session, ideally through a simple flag in session storage.
The control should be honest about what it does. “Enable ambient sound” is good. A speaker icon with no label is ambiguous. A play button that might trigger video or might trigger audio or might do something else entirely is confusing.
When sound is off - which is the default - the experience must be complete. No information should be conveyed solely through audio. No interaction should depend on hearing a sound cue. No atmosphere should require sound to make sense. Sound enhances. It does not carry.
What sound can add
When used with consent, sound adds a dimension that no visual element can replicate. Ambient audio - a low atmospheric tone, environmental textures, subtle spatial effects - creates a sense of presence that imagery alone cannot achieve. The viewer is not just looking at a scene. They are in it.
Sound also provides interaction feedback. A soft click when a section transitions. A tonal shift when the ambient colour changes. A subtle audio cue when the reader reaches a chapter boundary. These sounds reinforce the visual signals and make the experience feel more responsive.
The critical constraint is subtlety. Atmospheric audio should be quiet enough that the user can hear their own environment over it. Interaction sounds should be brief and low in the frequency range. Nothing should loop in a way that becomes annoying within thirty seconds.
Technical considerations
The Web Audio API provides the tools for interactive sound on the web. It offers low-latency playback, real-time audio processing, spatial positioning, and dynamic mixing. For atmospheric scenes, it is substantially more capable than simple HTML audio elements.
Audio files should be small. Ambient loops can be short - 10 to 30 seconds - and loop without audible gaps. Interaction sounds should be under one second. The total audio budget for a page should be considered alongside the image and script budgets as part of the overall weight constraint.
Loading should be deferred until the user activates sound. Do not preload audio files on page load. When the user clicks the sound control, begin loading the audio assets. On a fast connection, the delay is imperceptible. On a slow connection, a brief loading indicator is better than downloading audio files that may never be played.
Format compatibility is straightforward. MP3 is universally supported. AAC works in Safari and Chrome. OGG works in Firefox and Chrome. For maximum compatibility with minimum complexity, MP3 is the safe default.
Reduced motion and reduced sound
There is no browser-level preference for reduced audio equivalent to prefers-reduced-motion. But the principle is the same: users who are sensitive to sensory stimuli should be able to control their experience.
Since sound is opt-in by default, this concern is largely addressed by the consent model. The user never hears sound unless they choose to. But even after opting in, they should be able to adjust the volume or disable specific audio layers independently.
For projects that combine motion and sound, the reduced motion preference should influence the audio design as well. If motion is simplified, the sound layer should simplify correspondingly - fewer interaction cues, simpler ambient textures, and no audio events tied to animations that have been removed.
A baseline principle
Sound on the web works when it respects a simple contract: the user decides. They decide whether to hear anything. They decide how loud it is. They decide when to stop. Every design decision about audio in browser-native experiences should start from this contract and stay within it.
The Making Of essay discusses how this principle shaped the sound approach for the site’s scene work. The Performance section covers the weight implications of audio assets.