Actually, after reading thrugh the thread it seems both positions are correct. I think we are basically saying the same thing in a different way.
For example, arabic looks like a bunch of squiggles to me, and pretty much that's all they are till someone who can interpret the stuff comes along. The stimulus exists at all times, but it only becomes writing in the mind of someone who knows arabic.
Same with sound. The stimulus is there, but someone/something needs to interpret it to call it sound.
I think it's just a matter of what you are calling sound, the data or the interpretation