Artificial Intelligence Data Analysis Data Science

Is AI Changing the 80/20 Rule of Data Science?

Cleaning and optimizing data is one of the biggest challenges that data scientists encounter. The ongoing concern about the amount of time that goes into such work is embodied by the 80/20 Rule of Data Science. In this case, the 80 represents the 80% of the time that data scientists expend getting data ready for use and the 20 refers to the mere 20% of their time that goes into actual analysis and reporting.

Much like many other 80/20 rules inspired by the Pareto principle, it’s far from an ironclad law. This leaves room for data scientists to overcome the rules, and one of the tools they’re using to do it is AI. Let’s take a look at why this is an important opportunity and how it might change your process when you’re working with data.

The Scale of the Problem

At its core, the problem is that no one wants to be paying data scientists to prep data anymore than is necessary. Likewise, most folks who went into data science did so because deriving insights from data can be an exciting process. As important as diligence is to mathematical and scientific processes, anything that allows you to do more diligence and to get the job done faster is always a win.

IBM published a report in 2017 that outlined the job market challenges that companies are facing when hiring data scientists. Growth in a whole host of data science, machine learning, testing, and visualization fields was in the double digits year-over-year. Further, it cited a McKinsey report that shows that, if current trends continue, the demand for data scientists will outstrip the job market’s supply sometime in the coming years.

In other words, the world is close to arriving at the point where simply hiring more data scientists isn’t going to get the job done. Fortunately, data science provides us with a very useful tool to address the problem without depleting our supply of human capital.

Is AI the Solution?

It’s reasonable to say that AI represents a solution, not The Solution. With that in mind, though, chipping away at the alleged 80% of the time that goes into prepping data for use is always going to be a win so long as standards for diligence are maintained.

Data waiting to be prepped often follow patterns that can be detected. The logic is fairly straightforward, and it goes as follows:

Have individuals prepare a representative portion of a data set using programming tools and direct inspections.

Build a training model from the prepared data.

Execute and refine the training model until it reaches an acceptable performance threshold.

Apply the training model and continue working on refinements and defect detection.

Profit! (Profit here meaning to take back the time you were spending on preparing data.)

There are a few factors worth considering. First, depending on the size of the task and its overall value, it has to be large enough that a representative sample can be extracted from the larger dataset. Preferably, you don’t want it to be 50% of the overall dataset, otherwise, you might be better off just powering through with a human/programmatic solution.

Second, some evidence needs to exist that shows the issues with each dataset lend themselves to AI training. While the power of AI can certainly surprise data scientists in terms of improving processes such as cleaning data as well as finding patterns, you don’t want to be on it without knowing that upfront. Otherwise, you may spend more time working with the AI than you gain for doing analysis.


The human and programming elements of cleaning and optimizing data will never go away completely. Both are essential to maintaining appropriate levels of diligence. Moving the needle away from 80% and toward or below 50%, however, is critical to fostering continued growth in the industry. 

Without a massive influx of data scientists into the field in the coming decade, something that does not appear to be on the horizon, AI is one of the best hopes for turning back the time spent on preparing datasets for analysis. That makes it an option that all projects that rely on data scientists should be looking at closely.

Data Analysis Data Preparation Data Quality

Top 3 Risks of Working with Data in Spreadsheets

Microsoft Excel and Google Sheets are the first choices of many users when it comes to working with data. They’re readily available, easy to learn, and support universal file formats. When it comes to using a spreadsheet application like Excel or Google Sheets, the point is to present data in a neat, organized manner that is easy to comprehend. They’re also on nearly everyone’s desktop and were probably the first data-centric software tool any of us learned. 

While spreadsheets are popular, they’re far from the perfect tool for working with data. There are some important risks to be aware of. We’re going to explore the top three things you need to be aware of when working with data in spreadsheets.

Risk #1: Beware of performance and data size limits in spreadsheet tools 

Most people don’t check the performance limits in spreadsheet tools before they start working with them. That’s because the majority won’t run up against them. However, if you start to experience slow performance, it might be a good idea to refer to the limits below to measure where you are and make sure you don’t start stepping beyond them. 

Like I said above, spreadsheet tools are fine for most small data, which will suit the majority of users. But at some point, if you keep working with larger and larger data, you’re going to run into some ugly performance limits. When it happens, it happens without warning and you hit the wall hard.

Excel Limits

Excel is limited to 1,048,576 rows by 16,384 columns in a single worksheet.

  • A 32-bit Excel environment is subject to 2 gigabytes (GB) of virtual address space, shared by Excel, the workbook, and add-ins that run in the same process.
  • 64-bit Excel is not subject to these limits and can consume as much memory as you can give it. A data model’s share of the address space might run up to 500 – 700 megabytes (MB) but could be less if other data models and add-ins are loaded.

Google Sheets Limits

  • Google Spreadsheets are limited to 5,000,000 cells, with a maximum of 256 columns per sheet. (Which means the rows limit can be as low as 19,231 if your file has a lot of columns!)
  • Uploaded files that are converted to the Google spreadsheets format can’t be larger than 20 MB and need to be under 400,000 cells and 256 columns per sheet.

In real-world experience, running on midrange hardware, Excel can begin to slow to an unusable state on data files as small as 50MB-100MB. Even if you have the patience to operate in this slow state, remember you are running at redline. Crashes and data loss are much more likely!

(If you’re among the millions of people who have experienced any of these, or believe you will be working with larger data, why not check out a tool like Inzata, designed to handle profiling and cleaning of larger datasets?) 

Risk #2: There’s a real chance you could lose all your work just from one mistake

Spreadsheet tools lack any auditing, change control, and meta-data features that would be available in a more sophisticated data cleaning tool. These features are designed to act as backstops for any unintended user error. Caution must be exercised when using them as multiple hours of work can be erased in a microsecond.

Accidental sorting and paste errors can also tarnish your hard work. Sort errors are incredibly difficult to spot. If you forget to include a critical column in the sort, you’ve just corrupted your entire dataset. If you’re lucky enough to catch it, you can undo it, if not, that dataset is now ruined, along with all of the work you just did. If the data saves to disk while in this state, it can be very hard, if not impossible, to undo the damage.

Risk #3: Spreadsheets aren’t really saving you any time

Spreadsheets are fine if you just have to clean or prep data once, but that is rarely the case. Data is always refreshing, new data is continually coming online. Spreadsheets lack any kind of repeatable processes and or intelligent automation.

If you spend 8 hours cleaning a data file one month, you’ll have to repeat nearly all of those steps the next time a refreshed data file comes along. 

Spreadsheets can be pretty dumb sometimes. They lack the ability to learn. They rely 100% on human intelligence to tell them what to do, making them very labor-intensive.

More purpose-designed tools like Inzata Analytics allow you to record and script your cleaning activities via automation. AI and Machine Learning let these tools learn about your data over time. If your Data is also staged throughout the cleaning process, and rollbacks are instantaneous. You can set up data flows that automatically perform cleaning steps on new, incoming data. Ultimately, this lets you get out of the data cleaning business almost permanently.

To learn more about cleaning data, download our guide: The Ultimate Guide to Cleaning Data in Excel and Google Sheets

BI Best Practices Business Intelligence Data Analysis

Self-Service Analytics: Turning Everyday Insight Into Actionable Intelligence

Business intelligence and analytics have become essential parts of the decision-making process in many organizations. One of the challenges of maximizing these resources, though, comes with making sure everyone has access to the analysis and insights they need right when they need them. The solution you may want to consider is self-service BI.

What is Self-Service BI?

The idea behind self-service BI is simple. Users should be able to access reports and analysis without depending on:

  • An approval process
  • A third party
  • Any specific person in the organization

In other words, everyone should be able to ask the person to their left to pull something up. If the boss needs to hear what the details of a report are, their team should be able to access key information without contacting a help desk or a third-party vendor. When they need help, anyone from the top down should be able to instantly address the issue by pointing them to the proper dashboards and tools.

Defining Your Requirements

Before getting too deep into the complexities of self-service BI, it’s important to establish what your requirements are. First, you’ll need to have the resources required to provide self-service to your end-users. If you’re going to have 10,000 people simultaneously accessing dashboards from locations across the globe, that’s a huge difference compared to a company that has 5 people in the same office on a single system.

Scalability is an extension of that issue. If your company has long-term growth plans, you don’t want to have to rebuild your entire analytics infrastructure three years from now. It’s important to build your self-service BI system with the necessary resources to match long-term developments.

Secondly, you’ll want to look at costs. Many providers of BI systems employ license structures, and it’s common for these to be sold in bulk. For example, you might be able to get a discount by purchasing a 500-user license. It’s important that the licensing structure and costs must match your company’s financial situation.

Finally, you need to have a self-service BI setup that’s compatible with your devices. If your team works heavily in an iOS environment on their phones, for example, you may end up using a different ecosystem than folks who are primarily desktop Windows users.

Developing Skills

A handbook has to be put in place that outlines the basic skills every end-user must have. From a data standpoint, users should understand things like:

  • Data warehousing
  • Data lakes
  • Databases

They also should have an understanding of the BI tools your operation utilizes. If you’re using a specific system in one department, you need to have team members who can get new users up to speed company-wide. You’ll also likely need to have team members who are comfortable with Microsoft Excel or Google Sheets in order to deal with the basics of cleaning and analyzing data.

Your users need to be numerate enough to understand broad analytics concepts, too. They should understand the implications of basic stats, such as why small sample sizes may hobble their ability to apply insights to larger datasets.

Understand How Users Will Behave

Having the best tools and people in the world will mean nothing if your team members are always struggling to work the way they need to. This means understanding how they’ll use the system.

Frequently, user behaviors will break up into distinct clusters that have their unique quirks. Someone putting together ad hoc queries, for example, is going to encounter a different set of problems than another user who has macros set up to generate standard reports every week. Some users will be highly investigative while others are largely pulling predefined information from the system to answer questions as they arise.

Within that context, it’s also important to focus on critical metrics. Team members shouldn’t be wandering through a sea of data without a sense of what the company wants from them.

By developing an enterprise-wide focus on self-service BI, you can help your company streamline its processes. When the inevitable time comes that someone needs a quick answer in a meeting or to make a decision, you can relax knowing that your users will have access to the tools, data, and analysis required to do the job quickly and efficiently.

Big Data Data Analysis

Why Everyone Hates Spreadsheets

It’s Time to Part Ways With Excel Spreadsheets for Data Analysis

Excel is excellent for some things, like performing quick calculations or keeping track of your personal spending. Heck, it’s even great for startup e-commerce shops with minimal inventory or sales. But for other tasks and bigger businesses, Excel spreadsheets can create more problems than solutions. 

So, in an effort to hustle the world toward better IT solutions, we’re breaking down why everyone should be moving away from spreadsheets for data analysis work.

What Are the Pitfalls of Using Spreadsheets for Data?

Why don’t spreadsheets cut it anymore? There are a number of practical reasons for businesses and organizations to shy away from Excel. Some are simple functionality issues while others have only been recently discovered under specific work environments.

Overall, there are four main reasons: data inaccuracy, real-time update constraints, capacity breaks, and limited analytical parameters.

Data Inaccuracy

Spreadsheet accuracy is dependent on human accuracy — and that’s a recipe for disaster because it’s dangerously easy to mess up a field. Common mistakes include:

  • Mechanical Errors: Replacing formula fields with static numbers, keying in typos, and transferring mishaps rank among the most common mechanical spreadsheet errors. More than just simple mistakes, a single flub in one field can compromise the integrity of an entire workbook.
  • Logic Errors: Logic errors stem from bad formulas. Due to the relational nature of spreadsheets, a flawed foundational calculation has the power to compromise a whole document.
  • Errors of Omission: Due to workplace pipeline breakdowns, data can simply be left off spreadsheets. Unless there are validation checks built into your system, discovering such errors of omission may be impossible.

Lack of Real-Time Updates

Another problem with spreadsheets is their static nature. While several people can access a single document, things become easily jumbled when two or more people try to change it simultaneously. In many instances, the last person to enter data is not the person with the right figures.

Mistakes like this have a ripple effect, and it can be weeks before the problem is accurately identified — if it’s ever caught at all!

Capacity Breaks

In 2020, over 15,000 COVID-19 cases went unreported in the United Kingdom — all because of an Excel spreadsheet.

What happened?

Well, Public Health England (PHE) used Excel to collate data from hospitals and medical clinics across the country. But what the agency failed to realize is that the Excel version running on its network had a 65,536-row limit. To shorten a long story, the number of cases exceeded the cap, and the oversight triggered an administrative nightmare.

Excel was forged in the crucible of early tech — before the days of big data — and it still hews to the limited footprint of that time.

One-Dimensional Analysis

Spreadsheets were made for arithmetic and a bit of elementary calculus. But today’s data analysis procedures use more complex, multi-faceted approaches. Plus, you cannot measure progress or see status updates on spreadsheets, and the physical view is confined to a row-column setup, which forces constant back and forth scrolling.

These one-dimensional limitations are time wasters that ultimately eat into your bottom line.

What Are Some Spreadsheet Alternatives?

These days, there are thousands of superior programs that have muscled in on Excel’s market share. The trick is finding the ones that work best for your business and market niche. Partnering with an AI-powered data analysis platform is usually the way to go, as they can produce real-time insights and develop robust solutions tailored to your needs.

It’s time to move on from inefficient spreadsheets. Using one to coordinate game night is great, but demonstrably better options are available for data analysis and business projects.

Business Intelligence Data Analysis

The Beginner’s Guide to SQL for Data Analysis

What Is SQL?

SQL stands for “Structured Query Language,” and it’s the programming protocol used for relational database management systems. Or, in plain English, SQL is the code that accesses and extracts information from data sets.

The Importance of SQL and Data Analysis

In our current economy, data ranks among the most commodifiable assets. It’s the fuel that keeps social media platforms profitable and the digital mana that drives behavioral marketing. As such, crafting the best SQL data queries is a top priority. After all, they directly affect bottom lines.

In our examples below, we use the wildcard * liberally. That’s just for ease and simplicity. In practice, wildcards should be used sparingly and only at the end of the query condition.

Display a Table

It’s often necessary to display tables on websites, internal apps, and reports.

In the examples below, we show how to a) pull every column and record from a table and b) pull specific fields from a table.


Adding Comments

Adding comments to SQL scripts is important, and if multiple people are working on a project, it’s polite! To add them, simply insert two dashes before the note. Don’t use punctuation in comments, as it could create querying problems.

Below is an example of a comment in a SQL query.

Combine Columns

You’ll want to combine two columns into one for reporting or output tables.

In our example below we’re combining the vegetable and meat columns from the menu table into a new field called food.

Display a Finite Amount of Records From a Table

Limiting the number of records a query returns is standard practice.

In the example below, we’re pulling all the fields from a given table and limiting the output to 10 records.

Joining Tables Using INNER JOIN

The INNER JOIN command selects records with matching values in both tables.

In the example below, we’re comparing the author and book tables by using author IDs. This SQL query would pull all the records where an author’s ID matches the author_ID fields in the book table.

Joining Tables Using LEFT JOIN

The LEFT JOIN command returns all records from the left table — in our example below that’s the authors table — and the matching records from the right table, or the orders table.

Joining Tables Using RIGHT JOIN

The RIGHT JOIN command returns all records from the right table — in our example the orders table — and the matching records from the left table, or the authors.

Joining Tables Using FULL OUTER JOIN

The FULL OUTER JOIN command returns records when there’s a match in the left table, which is the authors table in our example, or the right table — the orders table below. You can also add a condition to further refine the query.

Matching Part of a String

Sometimes, when crafting SQL queries, you’ll want to pull all the records where one field partially meets a certain criteria. In our example, we’re looking for all the people in the database with an “adison” in their first names. The query would return every Madison, Adison, Zadison, and Adisonal in the data set.

If/Then CASE Logic

Think of CASE as the if/then operator for SQL. Basically, it cycles through the conditions and returns a value when a row matches. If a row doesn’t meet any of the conditions, the ELSE clause is activated.

In our example below, a new column called GeneralCategory is created that indicates if a book falls under the fiction, non-fiction, or open categories.


The HAVING and WHERE keywords accomplish very similar tasks in SQL. However, WHERE is processed before a GROUP BY command. HAVING, conversely, is processed after a GROUP BY command.

In our example below, we’re pulling the number of customers for each store, but only including stores with more than 10 customers.

It’s fair to argue that SQL querying serves as the spine of the digital economy. It’s a valuable professional asset, and taking time to enhance your skills is well worth the effort.

Data Analysis Data Preparation Data Quality

Cleaning Your Dirty Data: Top 6 Strategies

Cleaning data is essential to making sure that data science projects are executed with the highest level of accuracy possible. Manual cleaning calls for extensive work, though, and it also can induce human errors along the way. For this reason, automated solutions, often based on basic statistical models, are used to eliminate flawed entries. It’s a good idea, though, to develop some understanding of the top strategies for dealing with the job.

Pattern Matching

A lot of undesirable data can be cleaned up using common pattern-matching techniques. The standard tool for the job is usually a programming language that handles regular expressions well. Done right, a single line of code should serve the purpose well.

1) Cleaning out and fixing characters is almost always the first step in data cleaning. This usually entails removing unnecessary spaces, HTML entity characters and other elements that might interfere with machine or human reading. Many languages and spreadsheet applications have TRIM functions that can rapidly eliminate bad spaces, and regular expressions and built-in functions usually will do the rest.

2) Duplicate removal is a little trickier because it’s critical to make sure you’re only removing true duplicates. Using other good data management techniques will make duplicate removal simpler, such as indexing. Near-duplicates, though, can be tricky, especially if the original data entry was performed sloppily.

Efficiency Improvement

While we tend to think of data cleaning as mostly preparing information for use, it also is helpful in improving efficiency. Storage and processing efficiency are both ripe areas for improvement.

3) Converting fields makes a big difference sometimes to storage. If you’ve imported numerical fields, for example, and they all appear in text columns, you’ll likely benefit from turning those columns into integers, decimals or floats.

4) Reducing processing overhead is also a good choice. A project may only require a certain level of decimal precision, and rounding off numbers and storing them in smaller memory spaces can speed things up significantly. Just make sure you’re not kneecapping required decimal precision when you use this approach.

Statistical Solutions

Folks in the stats world have been trying to find ways to improve data quality for decades. Many of their techniques are ideal for data cleaning, too.

5) Outlier removal and the use of limits are common ways to analyze a dataset and determine what doesn’t belong. By analyzing a dataset for extreme and rare data points, you can quickly pick out what might be questionable data. Be careful, though, to recheck your data afterward to verify that low-quality data was removed rather than data about exceptional outcomes.

Limiting factors also make for excellent filters. If you know it’s impossible for an entry to register a zero, for example, installing a limit above that mark can eliminate times when a data source simply returned a blank.

6) Validation models are useful for verifying that your data hasn’t been messed up by all the manipulation. If you see validation numbers that scream that something has gone wrong, you can go back through your data cleaning process to identify what might have misfired.

Data Analysis Data Preparation Data Quality

Content Tagging: How to Deal With Video and Unstructured Data

Working with unstructured video data can be extremely difficult to tame! But don’t worry. With a few handy tips, the process becomes a lot more manageable.

Please note: this is our second article in a series on unstructured data. Click here to read the first installment, which explores indexing and metadata.

What Is the Problem With Unstructured Data?

Unstructured information is an unwieldy hodgepodge of graphic, audio, video, sensory, and text data. To squeeze value from the mess, you must inspect, scrub, and sort the file objects before feeding them to databases and warehouses. After all, raw data is of little use if it cannot be adequately leveraged and analyzed.

What Is Content Tagging?

In the realm of information management, content tagging refers to the taxonomic structure established by an organization or group to label and sort raw data. You can think of it as added metadata.

Content tagging is largely a manual process. In a typical environment, people examine the individual raw files and prep them for data entry. Common tasks include:

  • Naming each item
  • Adding meta descriptions of images and videos
  • Splicing videos into frames
  • Separating and marking different mediums

How to Use Content Tagging to Sort Unstructured Data

You can approach content tagging in several ways. Though much of the work is best done manually, there are also ways to automate some processes. For example, if an incoming file ends with a .mov or .mp4 suffix, you can write a script that automatically tags it as a video. The same can be done for graphics and text documents.

Tagging helps organize unstructured data as it provides readable context atop which queries can be crafted. It also allows for pattern establishment and recognition. In fact, photo recognition programs are, in large part, fueled by extensive tagging.

The Pros and Cons of Content Tagging

Tagging has its pros and cons. The downside is the manual labor involved. Depending on the amount of inbound data, it could take considerable resources to get the job done. Many businesses prefer to enlist third-party database management teams to mitigate costs and free up personnel.

As for pros, there are a couple. Firstly, content tagging makes data organization much more manageable. When you label, sorting becomes a snap. Secondly, tagging adds more value to data objects, which allows for better analysis.

Let’s Transform Your Unstructured Data

Leveraging AI-powered tools to perform complex data management tasks can save you money and increase efficiency in the long run. Inzata Analytics maintains a team of experts that focuses on digital data, analytics, and reporting. We help businesses, non-profits, and governments leverage information technology to increase efficiency and profits.

Get in touch. Let’s talk. We can walk you through the advantages of AI-powered data management and how it can boost your bottom line. See a demo of the platform here.


Business Intelligence Data Analysis Data Visualization

What is Data Storytelling?

What is Data Storytelling?

How do you tell a GREAT story with data? 

Everyone likes hearing a good story. However, being asked to “tell a story” using data and visualization is often a big source of anxiety for analysts of all backgrounds.

An informal Twitter poll returned the following responses to the question “When I’m asked to show the data, I feel….”

Frustrated, because I don’t think I’ll tell the story effectively, and might miss important parts.

I feel pressure, pressured to make it clear for everyone, and what if people don’t like my story?

Inadequate, because I’m sure there are questions people will have that I haven’t anticipated.

Being able to tell stories with data is a skill that’s becoming ever more important in our world of increasing data and the desire for data‐driven decision-making. As more and more data visualizations are produced, they start to become a commodity and their quality suffers. This turns off viewers and people begin to rethink their investments. However, great Visual Storytelling can send the effectiveness and reach of your analysis through the roof and produce significant business influence, value, and career rewards.

Have a look at the graphic below. What story do you get from it? 

Global Surface Temperature Chart

Here’s what most people see:

  1. The average surface temperature is trending higher.
  2. Multiple independent data sources all show the same trend, which lends further credibility.
  3. Temps were trending lower at one point, but that reversed and has started growing in lockstep with the rise of industrialization.

Even though it’s little more than a few words, numbers, and some colored lines, it tells a very compelling story, with strong supporting evidence, and makes its key points very persuasive.

Storytelling with data is no different than regular storytelling. Storytelling is by far the longest-running and most effective method of human-to-human knowledge transfer. The reason storytelling is so effective is that it engages emotions along with cognition (ability to learn). Emotion keeps you interested while you learn. 

  • Build Characters: First and foremost, stories involve characters. Without humanized characters, there’s nothing for the viewer to relate to; It’s not a story. Think about it. Every story you’ve ever read has human-like characters for you to relate to. Even movies and stories about animals and inanimate objects impart human characteristics to those characters: they talk, they react, they have expressions and emotions, they act human. In data, the character(s) can be you, or the reader, or named people, or people in a certain role, customers, employees. e.g. “Our Sales managers wanted to know how to ….” “I sought to uncover why ….” “Our CEO, Mike, asked me to investigate why ….” or even, “My daughter asked me about our company, and she wondered why X product was so successful….” A great way to introduce characters visually is with earlier Survey Results. The survey results let you introduce and describe characters, “Male employees, under 40, working in our US Offices had the following to say in a recent survey:” and also give them a voice.
  • “Use the Force, Luke”. The next thing that is required is some kind of goal, challenge, or objective. This line from Star Wars was Obi-Wan’s way of challenging Luke to learn the Jedi way and set out his hero’s journey. The characters must want or need to do something. That’s the hook. That’s what gets the reader to go along with the character, to put themselves in their shoes. In data, a great one here is answering a difficult question or solving a business problem. Even better would be showing how your insights and answers resulted in a measurable improvement. So give your characters a challenge to overcome. The bigger the challenge, the more interesting the story becomes because….. 
  • ….“Never Tell Me The Odds”: Give the character some stakes, some consequences of complete failure. Optional: Let the character fail, but use it to illustrate what they learned. Failure makes the stakes seem real and pulls the reader in even further. This is called “rising action”. Just don’t lay it on too thick.
  • Have a Point!“: Every story needs a climax, a point where the action peaks. This should involve being the main message you’re trying to communicate. It should always involve your character(s) achieving or exceeding their goal. For example, if your “challenge” was to use analysis in order to create a plan to change something, you can include a Gantt chart or Change Roadmap here as the deliverable of that analytics journey. You could also phase shift this slightly to show a visualization of what the Outcome of that Change was.
  • “Falling Action.” Now that you’ve made your main point, you can use this part to tie up loose ends, resolve other challenges or conflicts besides the main one, describe what happened to the characters after the story, or even use it to tease a sequel. You can also use this part as a “Call to Action” for the viewer if you want them to do something, such as give feedback or share your dashboard with others. 

Now that you have the main foundations of Data Storytelling under your belt, give it a try. The best way to master it is to practice it often. Learn what works best for your data and your audience. A-B test different approaches and get feedback on what worked. Look back over earlier work from yourself and others and list out the things you might have done differently now that you have this new knowledge. Hopefully having a structure like this to start with will give you confidence in choosing and arranging your next visual exercise to maximize its message and persuasiveness. How will you know you’ve succeeded? People will tell you. People know good storytelling when they experience it. Good luck!

Business Intelligence Data Analysis Data Visualization

3 Powerful Steps to Data Storytelling

In the digital economy, data is mana. It’s the fuel that keeps the tech and marketing sectors churning. But when using data as a sales or education tool, plain old stats and facts just aren’t enough. Massaging data into a compelling story is key to onboarding clients, securing investors, and training employees.

Human brains are wired for stories. As evidence, a Stanford Business School study revealed that 62% of participants remembered stories while only five percent remembered straight statistics. That’s an immense difference, and the results should have every business asking: How can we transform our data into engaging narratives that sell, convince, and teach?

What is Data Storytelling?

Without framing, data can come across as flat, bland, and vulnerable to interpretation. Businesses wanting to burnish their brands in the brains of target audiences must carefully craft their messaging and bolster it with supporting data.

Why is Storytelling So Effective?

Storytelling is how cave people evolved into modern individuals. It’s a linguistic tradition hard coded into our DNA; it’s how civilizations passed down survival skills and traditions. Storytelling remains an integral part of how we process and retain information.

Read more: Data Storytelling: The Essential Skill for the Future of Analytics

What is the Goal of Storytelling?

The goal of data storytelling is to engage audiences. You can highlight insights that will stick, convince, and stimulate the desired action by packaging narratives in digestible and engaging bites of information.

As a presenter, your job is to focus people’s attention on the most salient and engaging points. Think of yourself as the Degas of data — someone who paints beautiful pictures using stats and trends. By framing the mundane in gilded casings, you’re heightening the audience’s emotional response, which leads to better retention of the material.

Three Steps of Effective Data Storytelling

We’ve discussed why data storytelling works. Now let’s dig into the “how” of the matter.

Become Intimate With the Data

Before crafting data stories, familiarize yourself with the information. Don’t manipulate the data to suit your needs. People instinctively pick up on phony or inflated stats — and that diminishes trust. Instead, become intimate with the facts and figures and find the actual statistical trends hidden within. They’re more impactful than jerry-rigged half-truths.

Understand Your Audience

The next step is getting to know your audience. What makes them tick? What do they care about? What’s their worldview? How does your data connect to their goals? The answers to these questions will shape a story that connects with your targets emotionally. Once you tap into their zeitgeists, you can more readily sell the vision.

Remember that one size does not fit all when it comes to data narratives. The tale you tell to a room full of mid-level managers will differ from the one you tell executives.

Choosing the Right Data and Presentation Style

Visuals matter — a lot. They help clarify, connect, compare, and provide context. Effective visualizations include information about the most compelling data as well as highlight the best parts. While it’s almost always better to have professionals design presentations, here are a few DIY pointers:

  • Comparisons: If you want to highlight comparisons, use bar, line, and circular charts.
  • Composition Statistics: Showing data composition statistics is best done with pie charts.
  • Distribution of Data: Line distribution charts work for displaying data distribution points and trends.
  • KISS: When creating charts and graphs, adhere to the adage “keep it simple, silly.” Leave the 3D renderings and drop shadows to game developers. They only mess up business presentation aesthetics and may come across as outdated.
  • Color Consciousness: Data presentations are not the time to express your inner Rainbow Brite! Choose a pleasant color palette and use complementary colors; they’re easier to understand at a glance than a hodge-podge of hues.
  • Language: Use words and phrases that your audience understands. Don’t try to “sound smart.” It never works and can reflect a lack of confidence and communication.
  • Layout: Each slide should have a call to action, a header, and a short narrative summary. People’s attention wanders during presentations. Combat this by keeping things clear and concise!

Crafting effective data narratives that speak to people’s desires and emotions is a skill that takes time to develop. Professionals understand how to mold micro and macro elements into engaging stories. It’s no question that data storytelling is an invaluable tool. And when done correctly, profits follow.

Business Intelligence Data Analysis Data Analytics

How to Learn Data Analytics with the Feynman Technique

Using the Feynman Technique to Enhance Your Data Analysis 

The field of data analysis is an ever-growing, ever-changing industry. Most data analysis advice for best practices will go into the technical needs for the field, such as learning specific coding languages and relevant algorithms. However, to fully grasp your data analysis, you must be able to make it easy to comprehend for people outside of the field, such as business users or the general public. Thankfully, there are positive qualitative techniques that you can employ in your analytics practice to help with this, particularly the methodology known as the Feynman Technique.

Why is the Process Called the Feynman Technique?

The Feynman technique is named after the world-renowned theoretical physicist, Dr. Richard Feynman.

Who is Richard Feynman?

Dr. Feynman was a Nobel Prize-winning scientist, university professor, and writer. He was best known for both his work in the field of quantum electrodynamics and his involvement in major historical scientific events, specifically his work on the Manhattan Project and his official investigation into the Challenger shuttle explosion. As an educator, he was best known for his approach to teaching, which emphasized true understanding of the subject matter, as opposed to the then-standard of conventional learning techniques.

How Does the Feynman Technique Work?

The Feynman Technique is a multi-use means of understanding any new data, regardless of the context. The general goal is to better understand the information by effectively explaining it to others. The technique works by adapting Feynman’s personal approach to understanding data and involves a small number of steps to achieve this process. 

1. Study the Data Thoroughly

In order to fully understand a set of data, Feynman believed that you had to first truly study everything about it. In many cases, there are numerous items in a data set that might need additional study to thoroughly understand the data set as a whole. In these cases, the Feynman Technique dictates that you should narrow your focus to those items you might have any difficulty focusing on first.

2. Explain the Data

As an educator, Feynman believed that the next step for data, once understood, was the ability to teach it to someone else. For this step of the Feynman Technique, once a data set is truly understood, you then teach what you have learned to another person or group. It is at this stage where you welcome questions and feedback, this allows you to spot any weaknesses in your analysis or overall understanding of the data.

Further Study

If there are any gaps or inconsistencies that your audience points out in Step 2, this is where you return to the initial data set and dive deeper into those areas. Ideally, the more these points are analyzed, the more they will become the strongest points of your overall knowledge.

Create a Simplified Explanation of the Data

Once you have a thorough and reasonably airtight knowledge of the data and its implications, the last step of the Feynman Technique is to break down your analysis into as simple and basic an explanation as possible. This enables the fastest and most efficient means of communicating to your clients, coworkers, or any other audience you might have. From time to time, you will have to go into further details when asked about specific points related to your analysis, but for most audiences, basic information works best to allow others to understand it quickly.


In today’s modern society, secondary and higher education now emphasizes project-based learning and a more thorough understanding of the subject matter. With up-and-coming analysts approaching data with the Feynman Technique, or a similar model, this strategy enriches the overall quality of your analyses, and will most likely benefit you throughout your career.