Your insights, analytics, marketing initiatives, and outreach are only as good as your database. Incorrect or outdated data has the potential to cause more harm than good, for which reason the maintenance of a clean database is one of the mainstays of data-driven decision-making.

With that in mind, we prepared this article introducing the concept of data cleansing and explaining why it's fundamental to prevent the wastage of time and money and the deterioration of your sender reputation. Once you read it, we encourage you to proceed to its companion guide: How to keep your database clean.

In this article

(click to jump to topic)


What is data cleansing?

Why is data cleansing important?

Duplicate profiles

Inactive profiles

Imprecise and misleading campaign stats

Misdirected marketing efforts

Inconsistent formatting

Security issues

What is data cleansing?

For a start, before we can even begin to think about the titular question of this article and how to answer it, it's important to establish what is meant here by "data cleansing".

Data cleansing – also known as data scrubbing –, is the process of identifying and fixing incorrect, repeated or irrelevant data from a database. The goal is to be left with the highest quality information possible in order to generate reliable insights, make smarter decisions, become operationally more effective and increase your marketing performance and conversion rates.

Common problems include duplicate profiles, scattered or misplaced information, inconsistent formatting, and more. The flawed data polluting a database is referred to as "dirty data", and its consequences are nefarious.

Let's take a look at the main ones.

Why is data cleansing important?

If left to their own devices, businesses tend to accumulate a lot of data over time, chunks of which will gradually turn obsolete. That's when it becomes dirty data. Once a school develops a dirty data problem, there's no ending to the repercussions that might occur. After all, data is the link between your school's goals and the applicants you need to reach in order to fulfill the former. So without regular upkeep, sooner or later you'll run into the following complications:

Duplicate profiles

Quoting from another article, duplicate profiles are the "unfortunate result of the same person having two or more profiles because they registered (or were registered) with different email addresses". Duplicate profiles clog your database, scatter information, and make it more difficult to find. Therefore, they must be merged so that you're left with one profile per individual.

For instance, suppose you have a candidate named Susan Pearlman, but she's got two accounts: in the oldest, her lifecycle state is applicant::started_application for an intake from several years ago whose application she abandoned at 32%; in the other, she's an applicant::submitted for a class that's currently open for applications. However, the older account happens to have more experience entries, painting a broader picture of Susan's employment and academic background.

With Susan's personal details spread across two profiles – in other words, fragmented –, they're harder to analyze as a whole. Merging the relevant parts of her dataset and discarding whatever's superfluous will resolve that.

Inactive profiles

There are two types of inactive profiles:

  • Profiles whose users were active for a period but are no longer, such as in the following example, where the user was Last seen 9 years ago:

This could happen for a variety of reasons: maybe they decided to study elsewhere, maybe there was a change of plans and they don't wish to advance their academic education, maybe they don't use that email address anymore but it's not expired yet, and so on and so forth.

After a while, inactive profiles can overstay their welcome and get to a point where, similarly to duplicate profiles, they're just taking up space and doing nothing. As a result, instead of having a tight and manageable database, you get a database that's now bulky from being stuffed with stale data.

That's not to affirm that historical data isn't valuable, though. Ultimately, you'll have to decide for yourself which profiles to purge, if any at all. Some questions to ponder could include:

  • Do you want to delete all users whose last logon timestamp precedes a certain date?

  • Do you only want to delete old profiles without a submitted application?

  • Do you only want to delete extremely bare accounts (for instance, profiles that only have a first name, a last name and a primary email address)?

  • Do you need to delete profile data to ensure compliance with your institution's data processing policy?

Please discuss this together with your team to reach a group consensus. In any case, we strongly recommend running a re-engagement campaign to win back lost leads and also let them know that their accounts will be removed unless they sign in by a given deadline. Them, if they don't, cut your losses.

As in life, say no to things that don't serve your best interests. 😉

Imprecise and misleading campaign stats

Full Fabric discloses the stats of every mass email sent via our Campaigns tool, including the number of bounces and the number of inboxes where you're classified as a source of spam. These stats don't merely concern the performance of a specific campaign, they also concern the health of your database and of your sender reputation, because bounces and spam complaints suggest a lack of data hygiene.

Furthermore, if you're addressing inactive profiles, odds are that many of the recipients that didn't mark your messages as spam could be passively ignoring you – yet, they'd still count towards your delivery rate, giving off the illusion of a larger audience. That's people who won't open, click or engage with your emails, sinking most of your metrics.

Ideally, the communications you send should be relevant to whoever's receiving them. Quality leads shouldn't miss out on important information and bad leads should not be bombarded with emails they don't care nor have a use for. By only addressing the people who can potentially care and take action, your figures are more reliable.

Misdirected marketing efforts

Related to the above, but on a wider scale, if you neglect the cleanliness of your database, you won't be able to swear by the health and quality of your funnel. A well-curated database is a database you can trust, saving your Marketing team from wasting precious time, energy and money pursuing people that it doesn't stand a chance with. 👎 Moreover, accurate data empowers you to make informed choices and decisions. By keeping your database clean, you'll be able to assess things like:

  • Who makes up your user base in Full Fabric?

  • How many active leads remain in your pipeline?

  • Who's going nowhere?

And much more!

Inconsistent formatting

If you use substitution tags to personalize the documents and emails that you send to your candidates, and the data you pull – such as their names and other personal details – has formatting issues such as inconsistent capitalization and punctuation, your communications will look unprofessional. 😬 For example:

Dear «=profile.first_name» «=profile.last_name»,

⬇︎

Dear Morag mac alister,

Dear odelia varvaris,

Dear BARBARA NIECHCIC,

At the very least, first and last names should be adequately formatted, beginning with an uppercase letter and continuing with lowercase letters. Unfortunately, you cannot force users to neatly enter their own information upon signing up to Full Fabric, but you can certainly do so yourself when creating a new profile, just as you can fix wonky formatting when you come across it. 🚿 This may appear to be an insignificant problem, but no matter how you cut it, badly formatted data (especially names) just looks sloppy on a formal letter, and the way to avoid such a predicament is to have a tidy database.

Security issues

Last, but obviously not least, it's of utmost importance to be mindful of the General Data Protection Regulation (GDPR) and other privacy regulations at all times, because dirty data raises the risk of non-compliance. Three situations, in particular, make you vulnerable:

  • Failure to forget or delete a profile when the respective applicant, student, or alumni requests it, or to fulfill the request in a timely fashion. For as long as a profile stays up, it's exposed to unwanted attention, such as being the recipient of marketing material. Under the GDPR, individuals have a right to erasure (also known as the right to be forgotten), making it your obligation to delete or forget personal data on demand. If someone finds out that their personal data is still in your possession against their expressed wishes or it somehow comes up in an audit, a regulatory fine may ensue.

  • Duplicate profiles. If a user executes a certain action, such as unsubscribing from marketing communications, but they have a duplicate profile, this change won't be reflected on the duplicate profile. Consequently, one of their email addresses will still be a target, which is technically an infraction (even if you didn't mean it). Since it's an infraction, you might be fined.

  • This one has more of an indirect relation, but it's worth considering: in general, prudence dictates that data should only be accessible to the people who need it. That's why Full Fabric allows the implementation of access scopes. Limiting access to profile data reduces the danger of security breaches and mishaps.

In short, not only do these make you subject to expensive fines and lawsuits, but they may also damage your reputation and scare away potential new applicants – hence the importance of routine pruning! ✂️ To learn more about the GDPR, please visit the official website.

*

You have reached the end of this article. Thanks for reading! 🤓 If you have any questions or comments on the topic at hand, or if you enjoy reads like this and have article requests, feel free to start a chat or email us at support@fullfabric.com. Also, please leave a rating below. Your feedback is highly appreciated! 💖


PUBLISHED: January 3, 2022
LAST UPDATED: January 3, 2022 at 10:03 a.m.

Did this answer your question?