Metadata risks - and how to protect them with imperia CMS

07.12.2023 | Company

Metadata has become indispensable in the digital world. They enable better search and categorization of data, but they also pose a security risk, particularly in terms of data protection regulations. In this post, you will learn about the types of metadata and the risks they pose. You will also learn how the imperia CMS protects your metadata and thus ensures secure data management.

Metadata in communication

The search for specific emails, data folders or image material taken at a certain point in time: Metadata is needed for these search and categorization processes. Metadata is therefore an essential part of your own data management and simplifies it through the abundance of their multifaceted information about the data itself. Metadata provides structured information, such as the file format used, the time of creation, or the author or copyright holder of data, and is usually written into the files in a way that is not immediately visible.

Metadata is also ubiquitous outside of explicit data management. Photos on your own smartphone or camera automatically contain data such as the time of capture, geodata or corresponding camera settings. In e-commerce, purchase history and behavior are among the most important parameters for personalized advertisements. Even modern search engines like Google rely on metadata to deliver location-based and relevant search results.

As digitization progresses, the topic of metadata is now gaining additional momentum. Keyword Big Data: Data is increasingly being generated in ever larger quantities and in ever more diverse formats. Without the use of metadata, such a flood of data can no longer be organized. Streaming services and social networks are also increasingly working with metadata to generate more personal and approachable user experiences. At the same time, the question increasingly arises as to what extent metadata itself can pose a security risk, especially with regard to compliance with data protection regulations (e.g. GDPR).

Overview: Categories of Metadata

Metadata can be distinguished by various categories, such as:

Descriptive Metadata
They describe the content of a resource or file and help the user to better identify and select it. Typical examples are: title, author, summary, topics, language, date, genre, source, format, publisher.

Structural Metadata
They describe the organization of a resource or file, such as chapters, sections, page numbers, diagrams and graphics, as well as references and hyperlinks.

Administrative Metadata
They describe the state of a resource or a file. Typical examples are: file formats, file sizes, creation date, property rights, access rights, license rights/copyright information, backup information, storage location

Rights and Terms of Use
They represent important aspects of metadata that indicate how a resource or file can be used and what restrictions there are. Typical examples are: copyright, Creative Commons, copyright restrictions, patents, trademarks, data protection, intellectual property rights, licensing terms, information on free use

Contextual or Supplementary Metadata
They contain information about the users or customers who are associated with the resource or file. They are mainly used to generate personalized recommendations or search results based on the preferences or behavior of the user. Typical examples are: name, email address, gender, age, location or GPS coordinates, devices and technologies used, transaction data (e.g. order number, payment methods, etc.)

Storage of Metadata

Metadata can be stored in various formats, depending on the type of application and suitability: Typical formats are:

EXIF (Exchangeable Image File Format): For metadata of digital images
IPTC (International Press Telecommunication Council): For metadata of images used in the press
ID3: For metadata of music files like MP3s
Dublin Core: For general metadata like title, author and publication date of documents
XML (Extensible Markup Language): For structured metadata of web content
JSON (JavaScript Object Notation): For structured metadata in web applications and APIs
RDF (Resource Description Framework): For semantic metadata (contain information about the meaning and the relationships between data) of web content and other documents
PDF (Portable Document Format): For metadata of PDF documents

Dangers and Risks of Metadata

If the volume of metadata continues to grow and format standards help to read and use this data, metadata can become a huge security risk for photographers, editorial offices and even private individuals who are professionally responsible for data management as part of a company.

A study published in March 2018 titled „You are your Metadata: Identification and Obfuscation of Social Media Users using Metadata Information“ analyzed metadata from thousands of typically 140 character Twitter feeds. The result: 144 data fields, unknown to the Twitter users, were also filled out. Based on these data fields, researchers were able to uniquely identify each of the approximately 10,000 users analyzed with an accuracy of 96.7%. The much-vaunted anonymity, which politically committed journalists, for example, rely on in their work, is thus not reliable data protection. This applies equally to private individuals who use instant messaging services like „WhatsApp“. As early as 2014, a research team from the University of Ulm was able to demonstrate in a popular study that the recorded metadata information is sufficient to precisely track the usage duration of the app.

If one now considers how networked the use of sensitive platforms such as online banking and the like is now taking place, a number of conceivable threat scenarios emerge:

1. Disclosure of personal information
If GPS information is captured by metadata, attackers can draw further conclusions about personal information in combination with names and time records that are also read out, and thus commit identity theft.

2. Location-based attacks
The reconstruction of physical movement patterns is also used by security authorities to better solve crimes. However, it can also be used by unauthorized persons to track private individuals, thus creating a high personal risk.

3. Vulnerable IT structures
Metadata can also contain information about the type of digital devices, operating systems and software versions used. This can identify vulnerabilities that enable targeted attacks.

Metadata: Protective Measures

There are various ways to improve the protection of metadata, especially within companies. One possible measure is to establish corresponding guidelines for the basic handling of metadata, especially for digital files such as office documents or images that leave the company. When storing office documents in Microsoft, for example, it is possible to directly check and set which metadata is present. Microsoft Office can check existing metadata and give users the option to remove all personal information, thus ensuring the protection of metadata.

Another approach relies on mixing accurate and false information, the so-called metadata shredding. This is a software solution that was developed, for example, for sending emails or direct messages. With metadata shredding, the actual metadata is mixed with randomly generated information. Through this process, it is no longer traceable which person sent a particular message from which computer at which time. If end-to-end encryption is also used, the content of the message is also protected.

Metadata Management in imperia CMS

In summary, working with metadata is very efficient, but security risks must be avoided. Guidelines and shredding may be a first aid, but the correct software-side handling of metadata is a long-term challenge. Especially in cases where established workflows with tried and tested work processes must be changed as little as possible, an integrative solution is needed.

The metadata management in imperia CMS takes this approach one step further by automating the reading of metadata, for example during image upload, and further individualizing it through upload templates. When using a large number of image materials, data fields such as the name of a photographer or author can be recorded and applied to all corresponding data fields. Regardless of the type of data available, read out information can be transparently traced back to a specific metadata standard, thus avoiding overlooking stored metadata. The metadata can then be reused for asset management, while it is deleted from the individual assets when content is published. This uses the advantages of metadata for content management and at the same time safely removes all potentially sensitive information.

Would you like to experience imperia CMS live? Feel free to book your personal and non-binding online demo. Our experts will show you how we have successfully implemented your requirements in similar customer projects.
Online Demo