The Importance of Data Classification

Mark Thornton

Mark Thornton

The Importance of Data Classification

Data classification is the process of categorizing data so organisations can understand what data they own, where it is located, what access controls are implemented and whether they are adhering to organisational measures and compliance regulations

In recent times, data classification has become a key component in numerous organisations’ data security procedures. Not only does it allow companies to get a better understanding of their data estate, it also enables them to put controls in place to safeguard their data and ensure only those with the required permissions can access sensitive company information. By ensuring that data classification measures are in place, companies can apply protection and security measures – at rest, in use and in transit.

Compliance Mandates

Classifying and labelling data is a requirement for various compliance mandates including, GDPR, ISO27001, California Consumer Privacy Act (CCPA) and more. If we take ISO27001 as an example, due to the nature of growing information security risks, many companies opt to implement an ISMS and may also decide to attain ISO27001 certification. The cornerstone of ISO27001 is a systematic approach to identifying and managing security risks to ensure that information is kept secure. Information security is commonly defined using the C-I-A triad

  • Confidentiality: information is available only to authorised users
  • Integrity: information is accurate and complete
  • Availability: authorised users have access to information when they need it

The classification of information (A.8.2.1) is a crucial element of ISO27001. Without an approach to categorise data and thus, enforce controls based on the level of classification, companies cannot effectively protect the CIA of information and as a consequence, attaining certification (or simply implementing an effective ISMS) can become challenging – especially for companies with a lot of legacy and unstructured data.

Not only is classifying data important in terms of meeting regulatory obligations or certification requirements, it also allows for the integration of Data Loss Prevention (DLP) software which is something that many companies have or are looking to integrate into their business processes.

Data Loss Prevention (DLP)

A DLP system is a security measure employed by various companies to protect their sensitive data from loss, misuse or access by unauthorized personnel using a variety of controls and business rules.  Companies such as Trend Micro and Microsoft (Azure Information Protection) have developed platforms that use data classification to enforce controls and rules for different classification labels – which are set by the organisation themselves. Rules can be used to control and manage any of the following:

  • PII (Personal Identifiable Information)
  • Intellectual Property
  • PCI (Payment Card Information)
  • PHI (Personal Health Information)
  • Keyword terms

An organisation’s data classification policy will steer its internal confidentiality levels in terms of classification. Many experts recommend the use of a maximum of four confidentiality levels, as anything more granular may cause confusion and reduce the impact that classification has on the overall data security initiative. These four levels may include:

  • Public
  • Internal
  • Confidential
  • Restricted

Based on the confidentiality level assigned to a file (determined by its contents), the DLP will alert users and enforce security protocols to safeguard the data and ensure it is only accessed by individuals with the relevant authority. These protocols may include:

  • Data encryption
  • Access restrictions
  • Sharing/editing/transferring/printing/deleting restrictions

The problem with most DLP solutions is that they only work if the data classification policy is enforced effectively. For enterprises, having to implement the classification that must go with DLP—especially if “Data at Rest” classification is required—is a painstaking and time-consuming process. That is, if it’s done correctly. If not, and data is improperly classified, the traditional DLP has no way of knowing if sensitive data is being leaked. DLP solutions alone are not an effective approach to preventing accidental or intentional data leakage. While we are on the subject of accidental/intentional data leakage, let’s take a look at one of the biggest risks companies are facing: insider threat.

Insider Threat

An insider threat is a malicious threat to an organization that comes from people within the organisation, such as employees (often disgruntled employees), former employees, contractors or business associates, who have inside information concerning the organisation’s security practices, data and computer systems. The 2019 Verizon Data Breach Investigations Report says that 34% of data breaches involve internal actors. Insiders do not always act alone and may not be aware they are aiding a threat actor (i.e. the unintentional insider threat).

Applying the correct data classification measures (Policy + Automated Data Classification + DLP integration) helps to significantly reduce the insider threat risk by preventing and detecting the threat.

New approach to data classification

The process of data classification was once a labour intensive and laborious task that required a lot of manual work on the behalf of staff whilst still not guaranteeing a high level of classification accuracy. In more recent times, the development of sophisticated machine learning algorithms by companies such as Getvisibility has completely redefined the classification process meaning companies can now discover, classify and secure their data quicker and more accurately than ever before.

These highly advanced algorithms mean a company’s entire data environment can be scanned, classified and secured in a matter of days rather than weeks and with a degree of accuracy and speed far beyond the capabilities of humans. These algorithms can be trained to detect and classify data from any industry and require just a small element of human work and sample data to generate highly accurate classification results.

By enabling companies to gain a greater understanding of where their sensitive data is stored, these algorithms reduce the risk of sensitive data being stored in locations such as archives and data silos that previously would have gone unnoticed or untouched for a long period of time. As a result of this, companies can now more rigorously administer data retention policies which is essential for regulatory and organisational compliance. Remember what we were saying about DLP solutions and the gap when it comes to data classification? By removing the human from the equation and allowing the accurate (and fast) classification of millions of files, you can use the outputs to build tags and rules in your DLP in order to ensure that sensitive data is never leaving your company.

Data classification policy

As already mentioned, an essential component of getting your classification procedure correct is the development of a data classification policy. This policy should detail your requirement for classification, how the classification process will be carried out and what controls will be in place to safeguard your data. Understanding what data your company has is critical; in fact, most organisations have trouble gaining a comprehensive view of their data for a variety of reasons including shadow I.T, lack of data management training for staff and a poor security culture within the organisation. How can you expect to keep your business-critical data safe when you don’t know where it’s stored and who has access to it?

Data classification is a comprehensive security procedure that when implemented correctly, gives organisations the ability to get total visibility of their data, thus enabling greater control and security. It allows for integration with DLP systems and is a fundamental requirement for various mandates such as the GDPR and for ISO27001 certification. Protecting your most sensitive information is imperative and implementing controls and rules for each classification level means that no IP, PII or other sensitive information will ever fall into the wrong hands. We are all aware of the cost for an organisation in terms of a data breach – fines go to 4% of total annual turnover or €20 million – whichever is higher. Therefore, it is clear why it is so important and why every organisation must make a special effort to develop a clear policy and train staff on the importance of such a policy.


Organisations cannot comprehensively protect their most critical data without a formal classification policy in place. It removes the guess work and provides a clear picture of what data an organisation has, where it is located and thus, provides the capabilities to implement various controls and rules to manage and secure the data appropriately. Failure to classify your data means your organisations and customers data is at risk to unauthorized individuals whose access may result in huge fines, reputational loss and a lowering of trust between your customers and your business.

Discover. Classify. Protect. Control

Mark Thornton

Mark Thornton