July 2016
This is a follow-up to the Project Reason Forum Podcast

Better Know Time


In the first section about very short delays, I excluded the very shortest delay times because of their potential to send an irritating or even destructive (fuse blowing) peak through your audio system or worse, in headphones. This is due to the unpredictable quirkiness of individual sound systems and tonal enhancements and not inherent in using these delay times. That's a drag, because the demonstration could of used them to show how tiny delays are used for localizing sounds.

If I had been able to use delays of 0.1 ms to 1.0 ms, you would have heard more examples of my voice seeming to jump from one phantom locality to another -just like the 1 ms to 25 ms delay times if more dramatically. Phantom, because in the podcast, the two sounds are exactly equal in volume and tonality, which rarerly happens in nature.

This is the familiar and easy to misunderstand principle of echo-location. It is not like Sonar, where a specific signal is excreted and then heard again with a delay that reveals distance. That's the mono version. Our brains use the stereo version sans any specific signal excretion. We all know that a sound that is louder in our left ear is coming from our left. Less known, is that we listen to the timing of a sound's arrival at each ear and that is just as important to hearing as loudness.

This indicates that we are by nature armed with certain 'unknown' facts. If I asked if you knew exactly how far apart your ears are or exactly how fast the speed of sound is right now, you might say no. But you'd be wrong. We would be unable to localize sounds unless these two values existed like Planckian constants in our brains. Or, we might be updating them afresh every time we sniff or clap our hands. Sort of like Sonar.

The weather affects our hearing with humidity and changing pressure and temperature. That includes water-vapor in our ear canals or how much heat we are dissipating with our fleshy sonic scoops (ears). We make adjustments and compensations for these effects in order make 'locality' out of the timing of arriving sounds. We know how to use the space between the sounds. That may seem like no big deal.

It's a big deal. It implies certain possible facts about our perception that are at considerable variance with how we like to think of 'consciousness'.

First of all, what is zero milliseconds? 0.0 ms never happens. So, why do we stick to a model of mind that presumes that 0.0 ms is happening all the time? This is not a question of Time's Arrow or Quantum Weirdness but rather how we perceive our perception… or what we tend to think our perception is doing. If our minds worked the way most nuero-enthusiasts describe it, we would be unable to perceive much of anything. The digital delay demonstrates why.

The objective here is to evoke the idea of little stretches or 'windows' of time like the little gaps between me and the delayed me in the podcast starting with the shortest gaps of 20 ms or less. For example, with a 4 ms delay, we hear a single sound that is 'me plus a localizing echo'. In order for that sound to be perceived that way, the 'window of perception' must be at least 4 ms wide.

Wide might seem like the wrong word because the sound is 4 ms long. The duration of the stimulus was 4 ms long but our perception of the sound was 4 ms wide. Techs and geeks will know the terms buffer or sample and hold. The stimulus is buffered in order to be analyzed as a single sound for clues about location. Each sound and each syllable is buffered and analyzed as wave forms at a pace only slightly behind their arrival. We hear the result as an ongoing audible sensation of lots of little individually buffered sounds. I say lots because we also analyze how many little buffered and nearly simultaneous sounds relates to each other… but only once they are sounds. Waveform analysis built the little buffered sounds which are then analyzed for pre-recognized patterns like recognizing a voice or a familiar car door slam in the garage.

As a process, this mental machinery can capture about 10 to 25 ms of buffered sounds and pattern-analyze them… but only in a further encompassing buffer or audible window that is about 10 to 25 ms wide. As a perception, we hear a symphony of stutters. The window width varies depending on what our hearing needs to do. In this window, syllables become words and repeating echos become musical sounds. But these are evolutionary bonuses. The original point was localizing. That is why hearing was a successful adaption.

Human hearing is known to be receptive to a range of frequencies between 20 and 20,000 air-wiggles per second (Hz) at the max. Most people can hear only a narrower range and some much narrower. It is also known that higher pitched sounds are easier to localize than low pitched sounds. Deep bass tones are impossible to localize. That's why it matters where your tweeters are but not where the woof box is. These are not only the technical limitations of our hearing, but also the metaphorical frame that surrounds the perceptual window.

A sustained tone of 20 khz (the highest screech you can imagine) is composed of air wiggles of compressing-expanding air that arrive at YOUR EARDRUM every 0.05 ms. A 40 hz air-wiggle (a very low bass tone) repeats every 25 ms. Our lowest octave, 20 to 40 hz, stretches to 50 ms and is difficult to hear without deliberately listening. Our sensitivity, or perceived loudness, tapers off. A 20 hz tone takes 50 ms just to be a 20 hz tone. They are easier to feel than hear as they involve serious pockets of air banging against you.

Likewise, our ability to localize sounds tapers off with echoes beyond 25 ms. Instead of one buffered sound of correlated components, we get two de-correlated sounds heard close together. Within this established range of 0.05 to 25 ms, localizing and hearing a musical pitch are the same perception. Any sound with a reflection that we hear as an ambiance will become a audible pitch if the reflection time is repeated continuously. A reflective echo of 2.727272 ms will become a 440 hz tone.

That puts hearing music in the same 25 ms window full of lots of little buffered sounds. This symphony of stutters can be a symphony. A collection of musical instruments will be perceived as producing a collective sound (a chord) if they all begin well within 25 ms of each other. That gives them a combined and common attack. Instruments that initiate their sound outside of this range are heard as separate, sloppy and not with it.

The window can only stretch so far and only then with some effort. There is a limit to how many little windows the big window can hold. Beyond some number of them, sounds become too complex and we hear chaos. But there is more to hearing than filling a brief window with simultaneous sounds. Where the perceptual window ends at 20 or 25 ms, a whole other perception begins.

There is good reasons to call what happens next a separate perception. It is composed of the end-products of the first perception and not built from the original wave forms or any pattern-recognition. Those processes have already occurred. The second perception builds upon the first. It provides an opportunity to perceive relationships between sounds that are not simultaneous. And that gives us our perception of rythmn.

Rhythms do not occur in nature. They are illusions or mirages created by our second perception. That needs some clarification. Evenly spaced pulses of sound really exist. Multiple pulses really can align in ways that are mathematically simple or very complex. However, musical structures known as waltz time (3/4) or common time (4/4) only exist in our perception and only because of how our second perception operates.

The talents exemplified by Sherlock Holmes come from our second perception and can also be observed by using the electric delay device. We can employ deduction, induction, comparing and isolating to quickly examine a small collection of perceptions that are not simultaneous and make a summary perception of them. There is a lot of flexibility in how we can capture multiple perceptions that are not simultaneous (ish… within the 25 ms window at least) and that includes hearing rhythms.

Back to the podcast… In the range of 25 to 55 ms, we can hear two separate voices that we can immediately relate to each other as we hear them. Speech at a normal pace is still intelligible. This capacity is entirely eroded at 80 ms and things become a muddle. A relationship can still be perceived, but not as you hear it. That observation must now come slightly after the fact and that taps into our perception of rhythm. If the right channel voice with a delay of 100 ms suddenly became a recording of a slightly differently worded sentence, that would be detected after the fact if at all. At a 30 ms delay, the difference would be plain as or while you heard it.

This second perception operates comfortably at a rate between 20 to 30 captures per second. This would suggest some limitation on frequency or any tone higher than 30 hz. But this perception is not hearing frequencies. That has already been done by the first perception. All the second perception is capturing is some of the little windows of the first perception. It wants to build composites of them. Your sudden knowledge that the two voices are tracking each other (or not) is such a composite perception.

This facility will also try to make a composite of serial perceptions. Instead of parallel perceptions becoming conclusions, a sequence of perceptions will try to lead to an anticipation of the next perception. In other words, a rythmic structure will be composited and lead to an anticipation of a Beat One, where the rhythm resets and starts again.

In an environment full of evenly spaced pulsing sounds with simple mathematical relationships in their timing, any perception of rhythm will require one more planckian-constant-like value that must belong to the perceiver. What is the maximum number of perceptions that can be used to create a summary or an anticipation? Our musical preferences reveal that for us, and likely most primates, that value is four. That means rhythms based on two, three or four can be perceived or created and shared with all creatures with the same perceptual constant or, chunk-limit.

This trick can be observed in the podcast or in a rhythmic rain-drip outside the window. Let's say there is a waterdrop sound every 280 ms. Imagine clearing your mind and focusing on the sound (translation: switch your second perception to serial reception). Follow the beat. It will eventually appear to have a two, three or four beat based cadence with a repeating anticipation of drip-beat one. You can make yourself count out any number, but two, three and four will come naturally and even irresistibly. Without some observer's perceptual chunk-limit to divide time into finite measures, any rythmn the universe could have would be the full fifteen billion years in length.

If those raindrops become too far apart, our perceptual reach will be exceeded and no phantom rythmn will emerge. That threshold comes around a second or so. That would be the end of it except that there is one more perception to find and we can get a peek at it with the digital delay or an imaginary roof gutter dripping onto a window air conditioner.

The rain has stopped and the drips are a steady second and a half apart. Relax and listen. Try to anticipate the next one. Phantom beats will appear between the drips. The space will be divided into two, three or four and the rythmn will land on the drip as if it were beat one. You might get lucky and pick a time-gap that successfully creates a rythmn in one try or, more likely, it will take a few tries before the correct gap is found. Once found, the rhythm sustains itself as long as we listen to it. Or until the drip slows.

That little trick in the middle where you try-out beat spacings until the timing works is noteworthy. You have to be a human to do it. There is now a third facility shepherding our second perception into postulating beat-spacings that aren't there. We are freely creating re-sequences of just our second perception and not generating frequencies or anything our first perception can hear. When we find the correct spacing, the shepherding stops. That is a glimpse of our ability to self-narrate, read, and speak in sentences and paragraphs.

An 'oon' is a gleaned perception. Simple animals glean their perception once and are monoon and uncontested. Animals with steerable EYEBALLS glean their perceptions twice and are bioon with a chunk-limit. We humans glean our perceptions thrice and are trioon with narrative re-sequencing. That's all there is to trioonity. Choice and volition play out where we are plainly aware of them as a contest of perceptions that are not bound by the same rules. We perceive the contest. We perceive owning the contest. What more would we need to think we just had volition?

That covers the serious stuff. Now the electric delay toy can resume turning TV evangelists into alien clown-speech. Or just exaggerate the alien clown-speech-ness.