Unstructured Data and Why Its So Important
Grant Ingersoll, writing for Gigaom, lamented that the term “unstructured data” was obsolete. At one time a way to describe any data that fell outside of a database management system, “unstructured data” has become ubiquitous in every discussion about analytics. The term is as blanket a description as possible for data that doesn’t fit neatly into pre-prescribed boxes, but is all unstructured data truly unstructured?
According to W.H. Inman, writing for IBM, all of what’s considered unstructured data is not necessarily without structure. Inman discusses what’s known as repetitive and nonrepetitive data. Repetitive data is similar in size and structure, such as a call log with the same type of information: name, number, call time, etc. So, when we talk about unstructured data, what we really mean is non-repetitive data that can’t be managed by a database.This is what people actually mean by the term unstructured data, but why is it so hard for businesses to analyze? Sheer volume is the simple answer. Since the early 1990’s data has doubled, tripled, and then exponentially grown out of control year after year. Today, we produce 2.5 quintillion bytes of information every day, such as...
- 500 million tweets are sent every day
- 4 million hours of video is uploaded to Youtube
- 3.6 billion Instagram likes every day
- 4.3 billion Facebook messages a day
- 5.75 billion Facebook likes per day
- 6 billion daily Google searches
These are just a few examples of how data is growing at a breakneck pace. While somewhere in all of it there are insights that companies and consumers would find useful, the important information is buried under a mountain of uselessness. The Big Data industry has struggled with ways to accurately analyze a continually growing amount of data since the 1970’s. Even in 1992, only 100GB of data was being created daily versus the nearly 50,000GB per second that are created today. The vast majority of this data is in the form of text, such as Facebook posts, Tweets, consumer reviews, employee feedback, and Google searches to name a few. Text analytics that measure this non-repetitive unstructured data have advanced through the advent of artificial intelligence that measures the sentiment and context to understand the meaning across this vast data ocean.
Text analytics tools such as natural language processing and deep textual learning are powered by this artificial intelligence and take pairs of words, or bigrams, analyzing them to gain deeper insight into the feeling and intent of the person writing. As data science and data analytics companies continue to increase the capability of their products, companies are better able to manage their wealth of data, gain actionable intelligence from their data lakes, and spend less time figuring out what challenges they face while utilizing their time more efficiently to overcome those challenges. As the world creates more and more information daily, data analytics rushes forward in their development of technology to make use of it and stay ahead of the overwhelming tsunami of data crashing onto the web every second of every day.