There is an inherent problem with stereo imaging. Either loudspeaker accuracy or stereo imaging is sacrificed. Loudspeakers radiate in three dimensions. When two or more loudspeakers are radiating at the same frequency interference patterns are created. Whenever sound is reflected from wall floor or ceiling, the interference patterns are exaggerated. Another way around this problem is not to use loudspeakers in the first place but instead earphones.
I have always been more interested in the accurate reproduction of sound and less interested in stereo imaging. So much so, that I resisted adopting stereo when it was first introduced in the 50’s.
To understand stereo imaging it is necessary to realize that all human perception is synthetically created in and by the mind. Both sound and vision are digitized in the ear and eye and the resulting stream of bits is merely used as a guideline for the interpretive powers of the mind. The human mind is always creating a synthetic image of what is going on in its environment. It uses information from the senses to help create that image. The image is usually much more detailed than is justified by the sensory information. This is why eyewitnesses to the same event can give contradictory testimony as to what happened. their internal images were created differently from exactly the same sensory information.
There are many clues that the human hearing system uses to deduce from what direction a sound is coming. Most cues depend on a difference in sound between the two ears. But even with only one ear, the ear lobe changes the characteristics of sound from one direction differently from another direction, so even with only one ear there is some stereo information.
In real life, stereo imaging or directional perception is almost entirely due to transients. If a person is listening to a steady tone, it is almost impossible to determine which direction it is coming from. What is easy is clicks and bangs, which are essentially transients. There are two basic characteristics that determine where a sound seems to be arising. One is the timing of the arrival of the transient. Unless the source is directly behind or directly in front of the listener the sound will first arrive at one ear and then the other. The difference in time provides the information of how far front front or rear is the sound arising. The second clue is the relative loudness of the sound. A sound which is louder in one ear than the other will be interpreted as coming from that side. This is a much less positive clue for direction finding. Unfortunately this is the only information available from many popular recordings. In the studio every instrument is recorded separately and then mixed into the final recording. If the recording engineer chooses to have one instrument seem to be on one side or the other, he makes the loudness of that recorded instrument predominantly on one side.
When the human hearing system is trying to create a stereo image from loudspeakers it uses the cues of transient timing and steady tone loudness. A stereo image can be created even though the sound reaching the ears from the speakers is substantially different from the the way the sound would have reached the ears in real life or even different from how the sound reached the recording microphones. For clarity of imaging, it is important that the transients arrive at the ear in pristine condition. This means that the transient pulse arrives without echoes from other parts of the speaker cabinet or surfaces in the room. This means that the speaker itself has to be rock solid, not vibrating or wobbly.
Speakers which produce good stereo imaging tend to be curved, if the sound is radiated in a sphere or hemisphere, or flat and directional, like an electrostatic. They tend to be placed symmetrically in a bare room far from the walls. Usually there is one preferred listening position, a so-called “sweet spot,” usually along a center line between the two speakers. The speakers and enclosure are well built and identical. There can be no extraneous vibrations or ringing. They tend to be speakers that cost thousands or tens of thousands of dollars, for starters. Then there is the cost of the room and the ancillary equipment.