1. Insights
  2. AI Data
  3. Article
  • Share on Facebook
  • Share via email

Five pitfalls to avoid before you start collecting data

Posted May 4, 2021
data collection zoomed

In this era of digital transformation, more and more businesses are collecting massive quantities of data at a rapid pace. The Rethink Data report from Seagate and IDC shows enterprise data growing at a 42.2% annual growth rate.

There’s good reason for this: When properly collected and used, data can help brands leverage machine learning to better engage their customers, significantly improve the digital customer experience (CX) and refine internal processes.

That’s if organizations know what to do with the data, and how to do it. In reality, collecting data comes with a number of unforeseen challenges that can easily overwhelm if you don’t know how to navigate through them.

Here are five of the most common pitfalls brands encounter with data collection — and ways to successfully overcome them.

1. Your data is low-quality

Although businesses may be tempted to collect as much data as they possibly can, it’s important to remember that not all data is created equal. In fact, poor data quality can greatly compromise the value and effectiveness of AI solutions.

A recent survey of IT leaders conducted by TELUS Digital (formerly TELUS International) revealed that data quality was the most frequently cited response when asked to name the biggest challenge in a data annotation project. This can range from incomplete data to inaccurate data to mislabeled or incorrectly formatted data. When these issues crop up, they can lead machine learning algorithms and other AI applications to make wrong assumptions and come up with faulty conclusions.

survey results displayed in bar graph: Which of the following do you see as the biggest challenge when conducting a data annotation project:
data quality (35%)
data security (29%)
cost 18%
resourcing (8%)
potential bias (6%)
speed of project (4%)

For many businesses, it is a good idea to start with a data quality assessment, which surfaces potential issues with the data. For example, an assessment may reveal that a data set is inaccurate and requires correction. By doing this heavy lifting at the beginning of an AI project, organizations have a far better chance of achieving their goals down the road.

2. You don’t have enough data to make AI work properly

Not having a large enough data set can make it really challenging for AI and machine learning systems to recognize patterns in data or execute on its analysis.

Small data sets might be sufficient for a proof of concept that gets your project underway. But, once you’ve determined the amount of data you’ll need to successfully launch the full project, how can you acquire it?

One way is by partnering with other organizations that may have the type of data you need. Depending on the application in question, open source data may also be a fit. That said, there are many types of open source data sets available today. That means you’ll first need to clarify your requirements and compare available data sets to make sure you have a match.

3. Bad labeling is stalling AI analysis

While it’s important to work with sufficient data, simply possessing the raw data is not enough. For AI to accurately perceive and interpret what the data actually means, it must be accurately labeled. Without proper data annotation in place, an AI solution will not be able to accurately recognize the patterns in the unstructured data sets it is analyzing. This step takes a significant amount of human effort and expertise, which is why companies may be tempted to skip it or rush through it, but it’s a crucial step for any machine learning project.

Data annotation requires a human to tag certain elements of data such as text, images, audio and video clips with metadata that the AI application then reads. It can also range from simple text-based annotation, to spatial annotation based on 2D and 3D raw images and beyond. The more accurate, precise and abundant the data annotation, the more sophisticated your AI’s analysis and classification capabilities will ultimately be.

Because it’s such a laborious, time-consuming process, knowing exactly how much data you need and what types of annotation you require, helps you to stay on budget. Depending on the scope and complexity of a project or application, it may be wise to partner with an expert data annotation provider.

4. You’re not doing enough to protect consumer data

Consumers are growing more and more concerned with how brands protect their privacy. According to the Anaconda State of Data Science 2020 report, 22% of data science professionals believe individual privacy issues are the biggest problem in AI and machine learning today.

In the age of the General Data Protection Regulation (GDPR) and other robust privacy legislation, proper compliance is a must. Moreover, as consumers become savvier about privacy issues, they want to make sure they’re doing business with brands they can trust. For these reasons, strong privacy protections are playing an increasingly large role in the delivery of high-quality CX.

In addition to ensuring they’re fully compliant with regulations like GDPR, brands should also look to adopt robust security frameworks to protect customer data from potential breaches. Designing digital solutions and applications with a privacy-first approach and industry best practices (such as ISO certifications) can also go a long way to protecting data and the customer experience.

5. Your data sets and AI algorithms suffer from bias

It’s shockingly easy for human biases to seep into AI processes, create off target solutions and even harm a brand. We have already seen it have major impacts across industries.

When organizations train AI, they typically feed it vast quantities of data. But what if that information is gathered from only a certain subset of the population, then extrapolated across multiple different groups? Would it really be representative of consumers?

A lack of diversity and inclusion within the data science profession is the number one issue contributing to data bias, according to the Anaconda report.

This problem has a human solution. Diverse teams can help companies identify and address potential issues with biased data before they affect an AI project. For instance, a diverse team may more easily notice when data sets are not representative of all customer demographics.

Transform your CX with high-quality data

Incorporating AI into your business comes with a lot of boxes to check: gather data, but not too little or too much. Make sure the data is properly formatted and annotated, but also ensure it’s not biased. By proactively addressing these pitfalls, brands can create a strong foundation for AI success and, ultimately, deliver a customer experience that is as engaging as it is exceptional.


Check out our solutions

We can help help with your data collection and data creation for all of your machine learning needs.

Learn more

Related insights