Improving the Signal to Noise Ratio – Revisited

Additional thoughts about signals and noise that have been rattling around in my brain since first posting on this topic.

At the risk of becoming too ethereal about all this, before there is signal and before there is noise, there is data. Cold, harsh, cruelly indifferent data. It is after raw data encounters some sort of filter or boundary, something that triggers a calculation to evaluate what that data means or whether it is relevant to whomever is on the other side of the filter, that it begins to be characterized as “signal” or “noise.”

Since we’re talking about humans in this series of posts, that filter is an amazingly complex system built from both physiological and psychological elements. The small amount of physical data that hits our senses and actually makes it to our brains is then filtered by beliefs, values, biases, attitudes, emotions, and those pesky unicorns that can’t seem to stop talking while I’m trying to think! It’s after all this processing that data has now been sorted according to “signal” (what’s relevant) and “noise” (what’s irrelevant) for any particular individual. Our individual systems of filters impart value judgments on the data such that each of us, essentially, creates “signal” and “noise” from the raw data.

That’s a long winded way to say:

data -> [filter] -> signal, noise

Now apply this to everyone on the planet.

data -> [filter 1] -> signal 1, noise 1

data -> [filter 2] -> signal 2, noise 2

data -> [filter n] -> signal n, noise n

As an example, Google, itself a filter, is a useful one. Let’s assume for a moment that Google is some naturally occurring phenomenon and not a filter created by humans with their own set of filters driving what it means to create a let’s be evil good search engine. To retrieve 1,000,000 pieces of information, my friend, Bob, entered search criteria of interest to him, i.e. “filter 1.” Maybe he searched for “healthy keto diet recipes”. Scanning those search results, I determine (using my “filter 2”) 100% of the search results are useless because my filter is “how do i force the noisy unicorns in my head to shut the hell up”. The Venn diagram of those two search results is likely to show a vanishingly small set of relationships between the two. (Disclaimer: I have no knowledge of the carbohydrate content of unicorns nor how tasty they may be when served with capers and a lemon dill sauce.)

Google may return 1,000,000 search results. But only a small subset is viewable at a time. What of the rest of the result set that I know nothing about? Is it signal? Is it noise? Is it just data that has yet to be subjected to anyone’s system of filters? Because Google found stuff, does that make it signal? Accepting all 1,000,000 search results as signal seems to require a willingness to believe that Google knows best when it comes to determining what’s important to me. This would apply to any filter not our own.

All systems for distinguishing signal from noise are imperfect and some of us on the Intertubes are seeking ways to better tune our particular systems. The system I use lets non-relevant data fall through the sieve so that the gold nuggets are easier to find. Perhaps at some future date I’ll unwittingly re-pan the same chunk of data through an experienced-refined sieve and a newly relevant gem will emerge from the dirt. But until that time, I’ll trust my filters, let the dirt go as noise, and lurch forward.