Common Types of Unstructured Data for Text Analytics


Unstructured data comes in many forms. Some estimates say that over 80% of all existing data is in an unstructured form. Generally, this means that the data is comprised of free form text from things like emails, scanned documents, reviews, social posts, website content, and other forms of written textual data. Data science and advanced text analytics tools have evolved to try and analyze this wealth of information to gain access to the in depth knowledge that lies beneath the surface.

Structured data and some semi structured data are generally statistical or numerical data that is easily organized and managed by a database. As such, unstructured data includes various types of long form text or other forms of information that have context, but not distinct structure. While there are varying definitions and examples of unstructured data, some are more prevalent than others. These are the forms of unstructured data that companies seek to analyze when attempting to get the most from big data.

Some of the below examples can potentially have some structured portion, such as a timestamp, a phone number, or a user name. However, it is the unstructured portion of the data that companies find difficult to analyze in a timely manner and the part that text analytics software is employed to process.

  • Ratings - In many cases, ratings are a simple 1-5 system that rate how customers like a product or service. Sometimes, these can be accompanied by free form text to elaborate on the rating and describe what customers did and didn’t like. The textual portion of the information is much harder to analyze than the simple structured rating itself, requiring textual analytics in order to find deeper meaning.
  • Reviews - Sometimes reviews are very similar to ratings, having a structured scale to categorize. However, with most reviews customers leave behind detailed text data that greatly details their feelings toward a certain product or service. Businesses can analyze this information to gain valuable insight into the customer’s sentiment.
  • Surveys - Again, many surveys have a structured portion. The Likert scale feedback can be easily categorized, but many surveys have portions that offer respondents the ability to write detailed information regarding how they feel about the subject in question. In the case of business, companies can analyze this unstructured portion with text analytics to create better understanding.
  • Chat Transcripts - Real interaction between customers and businesses can be very valuable to understanding customer feedback and making adjustments to operations or policy to create a better customer journey. Analyzing the unstructured text data within a chat transcript can allow businesses to directly analyze customer interaction and inform real change.
  • News Articles - Sometimes the most valuable information is readily available to almost anyone. However, it can be difficult to gather and analyze vast amounts of existing news reports to discover common threads and find the information that’s most relevant. By bringing together news from multiple channels and analyzing the text for key points of interest, businesses can leverage these sources of information to their benefit.
  • Email - There are around 205 billion emails sent a day across the world. Many of these are catalogued and saved for a variety of purposes. Those that are available could hold a vast amount of important information that is relevant to businesses. By analyzing the text of the emails, companies can determine what content is most relevant to them and use it to their advantage.
  • Social Media - Certainly among the most widely regarded forms of unstructured data, social media channels provide a wealth of information in an arena where individuals feel safe to say exactly what they are thinking. This unfiltered feedback can provide companies a clear window into the mind of their customer by analyzing the associated text and revealing precious insight.

Analyzing both structured and unstructured data allows businesses to gain valuable insight into their customers, employees, products, and marketing efforts. They can gauge what is working and what needs improving by leveraging near real time analysis to inform data driven decisions. New text analytics tools powered by artificial intelligence and machine learning are revolutionizing how large companies gather, analyze, and use unstructured data to their benefit.