The Imperative to Increase Data Literacy

Cleveland Data Days

Thank you for joining me at Cleveland Data Days to learn about the importance of data literacy. Here are a few resources that will help you grow your and your team’s data literacy. You can download the full slides from the presentation here:

Interested in carrying out a data audit? A data audit will help you document the data you own or have access to. You should, at a minimum, be able to describe the database, where it comes from or who in the organization manages it, and the actual data fields contained in the data. Here is a simple Excel template you can download and fill out for each dataset in your organization:

How much do you know already about key data literacy topics? Challenge yourself with the following questions, or pull your team together for a great data socialization activity as you review the questions and discuss the answers!

What is the difference between a mean and a median? When would you choose one over the other?

Both of these are types of averages, or measures of the center of a population. But they are calculated differently and therefore are impacted differently by the distribution, or spread, of the values being averaged. A mean adds up all the values and divides by the number of values. So one extra big value or extra small value will have a big impact on the final mean. A median, on the other hand, lines up all the values in order of size, and then finds the middle value. It doesn’t matter at all how far apart the values are, or how big or small values toward either end are.

Medians are good for things with a few extreme values, like household income. Means are good when every value should matter, like the average weight of goods in a container.

If eating bacon makes you twice as likely to get colon cancer, would this be a risk you would worry about? Why or why not?

This seemingly simple question includes several key data literacy concepts. First is the technical distinction between “relative risk increase” and “absolute risk increase.” A two-fold increase can seem like a lot – but it’s being measured as a comparison to what it was before. Knowing what that absolute risk is now matters far more in determining if it’s truly a danger you care about – if your original risk of colon cancer was 1 in 4,000, doubling it sends it to 1 in 2,000. Is that worth skipping bacon?

The second major data literacy topic is present in that word, “worth”. There is a value judgment here. A big part of being data literate is understanding how to use important data in assessing things that don’t have a single ‘right’ answer. To know if avoiding bacon is worth eliminating a two-fold increase in colon cancer, you don’t just need to understand the technical parts of that risk assessment. You also need to know how much it matters to you not to get cancer, and how much you like bacon, and what other risks you want to take in your life.

A telephone-based survey of your donors showed that 80% of them wanted you to launch program A over program B. Would this information impact your decision on which program to chose? Why or why not?

Ah, surveys! Any time your information is based on a SAMPLE of the full population you care about (as in, you haven’t talked to every single one of your donors here, but rather a sampling of them), you need to consider a number of factors that influence how much to trust the result.

First and foremost, you want to know how the sample was selected. After all, one goal of a survey is to use it to understand what EVERYONE would say without having to talk to everyone (because no one has time or money for that). But in order to be able to extrapolate from a survey, or apply the findings to the whole group, you need to know that everyone in the group is fairly represented. You can do this by deliberately picking people so that the important characteristics of the whole group are represented in the sample. Or you can do it by making sure that everyone has the same chance of being included – a random sample. If neither of these things are true, then your sample is biased, and you can’t really use it to know what the whole group thinks.

Next, you want to know how MANY people were in the sample. Were only 10 donors contacted? Or 100? More people means better quality data (up to a point).

Lastly, you want to know, are these the people who’s opinion matters the most? Or are there other opinions that should also be included?

Your doctor tells you that these new headaches you're getting could be caused by a new food allergy. You notice that they show up after your breakfast of eggs and coffee. If you wanted to find out for sure if coffee was blame, what should you do?

Most of us understand that we would need to first make sure that we do in fact get headaches when we drink coffee. In this case, we see that is true because we’re getting headaches after our breakfast, which includes coffee. A persnickety scientist would point out that you should probably have a breakfast of ONLY coffee to strengthen your evidence against coffee.

But there’s a second half to this issue that most people miss, because we usually want to do only things that CONFIRM what we suspect or believe. What if your new headaches are actually being caused by your new morning routine of eating first thing? In order to know for sure it’s the coffee, you need to look for evidence that can DISPROVE your theory. In other words, you would need to have breakfast without coffee and look for a headache. If one showed up even though you skipped your morning cuppa Joe, then you can say for sure that it’s not the coffee.

Our avoidance of data that disproves our theory is hugely problematic. This tendency is called Confirmation Bias, and it’s why stereotypes persist and why newscasters get away with cherry-picking stats.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cleveland Data Days

Please Share This Share this content

Merakinos

Share this content