Skip to main content

A Data Catalog is the Key to Enterprise Data Masking Success

· 3 min read
Mark Smith
Founder - Touisset Services LLC

Executive Summary:
Data catalogs are crucial in modern data-driven organizations, where the volume and variety of data can overwhelm teams without proper organization. Integrating data masking with a data catalog enhances the data masking effectiveness by leveraging the catalog's metadata for smarter, more automated masking processes.

Data Masking Integration with a Data Catalog

Factors that highlight the importance of integrating data masking with a data catalog are:

  • Improved Data Governance and Compliance
    By tracking data lineage, ownership, and usage, catalogs ensure adherence to regulations like GDPR or CCPA. They help maintain data quality, enforce policies, and audit activities, reducing risks of noncompliance and associated fines.
  • Consistent Policy Enforcement and Governance
    Integration enables centralized management of masking rules based on the catalog's governance framework. For example, policies can be applied uniformly to comply with security standards, with the catalog tracking lineage to show how masked data relates to originals. This is vital for audits and maintaining data integrity in regulated industries.
  • Efficient Workflows in Non-Production Environments
    When preparing data for dev/test setups, the catalog provides context on data dependencies, allowing the masking tool to create realistic, anonymized copies without disrupting production. This streamlines DevOps pipelines and supports synthetic data generation.
  • Scalability for Complex Data Ecosystems
    In cloud or hybrid setups, integration ensures masking adapts to diverse data sources cataloged centrally, supporting tools like AWS Redshift, Snowflake or Salesforce for seamless operations.

Data masking integration with a data catalog transforms data masking from a reactive process into a proactive, metadata-driven strategy, aligning security with broader data management goals for greater efficiency and compliance.

Obfusware Integration with the AWS Glue Data Catalog

Obfusware integrates with the AWS Glue Data Catalog. The AWS Glue Data Catalog not only uses metadata to document the structure of databases (tables, columns, data types, partition and indexes) but also allows generic properties (name, value) fields to be defined at the table and the column level in the catalog. Obfusware takes advantage of this capability to define Obfusware specific properties to define data masking for columns that contain sensitive data.

By specifying data masking at the table level within the data catalog, Obfusware ensures a centralized data governance capability for data masking policies. The key benefits to this approach include:

  1. Data masking is specified once in the data catalog instead of within multiple jobs avoiding duplication, oversights, or conflicting implmentations.
  2. A Data Compliance expert can determine and implement the required masking policy and multiple data engineer can implement the data masking with the relevant jobs.
  3. When data and compliance requirements evolve, the new requirements can be updated in the data catalog and jobs will inherit the masking policy updates and execute them without the need to modify each job.

Data catalog integration is an example of Obfusware's seamless integration with AWS Glue. This integration allows Obfusware data masking to be seamlessly integrated into an organization's data processing. For an organization who has selected AWS to house their data and provide data processing, Obfusware is the perfect partner to help achieve their data compliance requirements.