Thing is, if we were the actual character, we wouldn't be confused about where the sounds are coming from, whereas as it is, we as players are. There is a sensory mismatch between what we are seeing and what we are hearing, and I personally feel that is much more unrealistic than the alternative.
I see where you are coming from re hearing sounds you shouldn't, but in order to strike a balance, sound loudness could be calculated in terms of distance from the character, while sound positions could be based on the camera. Notice that the game is already kind of doing that with the zoom function: as you zoom out, sounds sound farther.
An alternative method would be to have a cutoff distance from the character: within a certain distance, when you pan, you hear the sounds; outside that distance you simply don't.