Data can be and has been used to solve a lot of problems in our world today. In both our personal and professional lives, it is worthwhile exploring data-based solutions to our problems. Statistical solutions to problems are usually the best approach to problem-solving because to a large extent, it is based on fact or verifiable evidence.
One of the main uses of data is to tell a story or narrative. Descriptive Analysis is important for our understanding of the world in which we live. But do you know who has the most resources to tell these narratives? You guessed right, media and politics. Given the resources and expertise, false narratives can quickly spread about a given subject matter even when the raw data is presented. We will show a real-life example below.
— At one point or another, you’ve probably done it too. 👀
So whether in business, academics, or other areas of life, a critical look at the data presented before you is very important.
There are several ways in which data inconsistency presents itself, but we’ll take a few examples today.
Multiplication Factors ❌
This is a consequence of incomplete information. When someone fails to share all the numbers, it is an indicator that they're hiding something.
Consider the following statement:
“Our company doubled sales in the last quarter”.
This sounds exciting, right? As an investor, you see this is as a marker for growth and want to invest your money. But be careful, if the entrepreneur doesn’t share the exact sales numbers for the previous quarter, something might be fishy. It’s easy to double sales from two to four. However, it’s harder to double sales from a thousand to two thousand.
Other Examples
- “I 40x my salary last year”. You should ask what the salary was before?
- “We can double our profits this year.” Really? From what to what? 👀
If you don't know the base number, the given information may be misleading, even if it is correct.
Percentage Confusion đź’Ż
Here’s something to think about:
“If the unemployment rate in your country is 1.5% in 2020 and it is calculated to be 3% in 2021. Is that a 1.5% increase in unemployment or a 100% increase? Now, let’s further say that it is election time and the topic of unemployment is a concern.”
Here are two different possible answers by two different people with two different interests:
- The incumbent president: There was a 1.5% increment in unemployment, but that doesn’t mean our economy is not flourishing.
- The opposition: During the president's tenure, unemployment increased by 100%. They must be voted out.
**In a way, both stories are true. Unemployment has definitely increased by 1.5%. However, it is now twice what it used to be (from 1.5% to 3%), which makes it a 100% increase.
But because of differing interests, each person can tell their story in a way that benefits them or harms the other. The media carries either news depending on who they support and causes further separation amongst the population of your country.
So you see, depending on who’s talking, you get different narratives.
Using Minimal Parameters
“Globally, one in hundred people get sick every day. Given a random person from Nigeria, what is the probability that they are sick?”
It is very tempting to say 0.01 and this may be judged correct, depending on context. However, we know that there are several factors that contribute to the wellbeing of a human and can determine whether they live or die. Some of these factors include:
- Lifestyle and health choices.
- Location
Let’s take location as an example. The truth is more people get sick in poor countries. While globally, 1 in 100 people get sick, it’s possible that if someone lives a healthy lifestyle and lives in a peaceful, prosperous location, they can lower their chances of getting sick. So if this random person is from a wealthy nation, they have a lower chance of getting sick and our initial assumption of 0.01 would be incorrect.
The more parameters we collect, the more accurate we can get. Of course, it’s impossible to gather all parameters, but the more the better, in many cases. 👀
Given more parameters, we can make more accurate decisions.
Exaggeration offered by Arbitrariness đź“Ź
There’s a word people love to use, it’s called “almost”. Its variations too (near, close to, approximately, etc) are not left out of this. The problem with this word is that it is arbitrary. My perception of “close” or “near” may be different from yours, so if the margin of error is not given, everyone has different ideas of the subject matter. This gives room for exaggeration.
Take the following example:
- “We have close to a hundred thousand customers”, the question here is how close? If “close” means at most a 2% margin-of-error to me and it means at most a 3.5% margin-of-error to you, we may have a problem.
In certain circumstances, this may be okay. But in cases of life and death, or where there’s a huge amount of money at stake, exact numbers should be preferred to approximates.
It is easy to exaggerate numbers when approximating.
Simpson’s Paradox
This one is very popular you've probably heard of it already, but it is worth talking about. As an example, look at the table below:
So we took a sample of 200 people and asked 100 of them if they liked books and another 100 if they liked movies. We found that 80% like books and 75% like movies. So more people prefer books to movies, all well and good. Then we decided to categorize our sample based on sex.
This would suggest that 84.4% of men and 40% of women like books. It also suggests that 85.7% of men and 50% of women like movies.
So in general, more people like books than movies, but when divided, more people (men and women) like movies? 🤔 That seems weird.
This is a classic case of Simpson’s paradox and is usually caused by hidden variables.
For more details on Simpson’s Paradox, check out this great article. I used a very similar example to the one presented in it.
Depending on my goals, I could frame this analysis however I wanted to favor my opinion.
Sampling Errors ✏️
“According to our survey, 80% of Nigerians have health insurance.”
We all know this isn’t true. Questions should arise about how that survey was done. If you survey only the working population (those who have jobs), there’s a high chance that they will have insurance because many organizations give it as a benefit. As you can see, the sampled population is very inaccurate.
This is a contrived example, but it goes to show you how an error in sampling can be harmful.
If someone tells you they did a survey, ask for more details on that survey.
Misleading Charts đź’ą
Look at the chart below from Fox News a while ago.
If you look at the graph, it may appear like a huge difference, however, you would notice that the y-axis (the vertical axis) doesn't start from 0. The numbers are 35% and 39.6% which isn't that much of a difference. However, they make it look bigger by manipulating the graph.
Here are other ways in which graphs and charts can be manipulated:
- Using a pie chart to represent percentages, which should be done using a bar chart.
- Choosing a smaller dataset to show a false trend (e.g. stock prices)
How To Catch These Lies? 🤥
- Ask more questions. The more questions you ask, the more clarity you get.
- Consider another source of information. Sticking to a single source may not be beneficial. Check other sources to verify assumptions you may have.
- Finally, there’s an extent to the discrepancies you can detect in data. So long the outcome is positive in general and the negatives are manageable, one may be content with the level of accuracy provided by the data.
Thanks for reading up to this point. If you like what you just read, kindly give a few claps. 👏🏽
You can also follow me on Twitter: @iamtemibabs