Semantic Topics Widget Guide

Getting Started

What is the Semantic Topics widget?

Screen-Shot-2018-02-06-at-12.17.03-PM
The semantic topics widget is a pre-populated widget produced by Stratifyd. It contains a topic wheel, bigram word cloud, and a temporal trend graph. If your data has geolocation metadata, then the semantic topics widget will also show a geomap visual.


Topics

An unsupervised learning clustering algorithm characterizes each document as a mixture of topics, with each topic consisting of a small set of bigrams that frequently occur together. Documents can belong to multiple topics due to partial inclusion of relevant bigrams across topics generated by Stratifyd.

The topic wheel displays the bigrams grouped by topic number, while the word cloud/list ranks bigrams by either count or PMI score. Topics represent the slices of the pie chart, and the topics are indexed by the Stratifyd Significance Percentage. Color of the slice represents the sentiment towards that topic, wherein red signifies negative sentiment and blue signifies positive sentiment. Topics are sized and ranked by their statistical relevance determined by a significance index percentage similar to how bigrams are ranked by their PMI scores. The topic model visual can be viewed as a pie chart, network graph, or a tree-map.

Hovering over the topics within the wheel will automatically filter just the semantic topics widget to display data relevant to the chosen topic, whereas clicking on a topic filters the entire tab in the dashboard for the selected topic. The topic model header with topic numbers starting at “T0” does not have hover capability, but will filter the entire tab if a topic number is selected from the header. The colors of topics denote the average sentiment of all documents contained in the topic. Default color designations are red for negative, grey for neutral, and blue for positive. Different shades of the colors do not represent varying levels of sentiment within these sentiment categories, but rather different topics.

You can change the semantic topics visual to tree map or network graph by clicking on the icon next to “Overview” in the topic modeling header. Tree map-view still displays the topics with their percentages of prevalence and sentiment color coding, but shows all of the bigrams that are captured, whole or partial, in window pane modules. The network graph view clusters bigrams represented as nodes into topics. Bigrams occurring more often than others have bigger nodes than others. Lines between nodes show how frequently two bigrams co-occur.

Sunburst Mode can be turned on or off in the widget editor and is important when selecting individual topics for further analysis. When this mode is off and a topic is selected, the widget will drill into 100% of the topic by isolating it from all other topics. Turning sunburst mode on preserves topic percentages amongst all topics and enables further selection of unselected topics. Turning sunburst mode off is primarily helpful for analyzing the widget in tree map-view because the unselected topics are removed from the visual and produces a less cluttered widget. For node and pie chart-views, it is recommended to leave sunburst mode on, which is the default setting.


Buzzwords

Pointwise-mutual-information (“PMI”) is a computational linguistics measure of association and collocation between words. It counts how frequently two words occur together in a corpus as well as how frequently the words occur individually. The probability of co-occurrence and individual occurrence can then be approximated. A higher PMI score means the probability of co-occurrence (bigram) is higher than or slightly lower than the probabilities of individual occurrences (unigram) for two words. As a result, common words – such as “the”, “is”, “be”, “to”, “in”, etc. – have very low PMI scores. Bigrams, aka “unigram pairs”, with high PMI scores tend to be more unique in comparison.

Every possible combination of two words in a document within a corpus receives a PMI score. The word cloud is sized and ranked by PMI scores; bigger bigrams have higher PMI scores than smaller bigrams. The word cloud can be changed to a list in the widget editor by changing the Key N-Grams parameter from “cloud” to “list”. List-view will display the number of occurrences, sentiment polarity, and sentiment spark lines per bigram.

Bigrams can be ranked by number of occurrences by changing the N-Grams Sized By parameter to “count”. Depending on the goals of your analysis, ranking by count instead of score may produce more insightful results. Right clicking a bigram enables you to set the sentiment score for that bigram if you are unsatisfied with Stratifyd’s AI assigned sentiment score. You would do this instead of tuning your analysis if the bigram(s) in question are too important to simply remove altogether from the analysis.

For example, mentions of “fighting cancer” for charity were scored with negative sentiment since Stratifyd interprets cancer as a negative emotional trigger under most circumstances. In this context, the bigram should actually be scored positively, so rather than tuning the bigram out from the analysis entirely, you could simply alter its sentiment score within the widget for faster processing and more accurate results.

Bigrams are filtered by topic automatically when a topic is selected. When an individual bigram is selected from the word cloud or bigram list, the tab is filtered to include only the documents containing the selected bigram. You can drill down several tiers by selecting bigram after bigram after bigram in addition to selecting topics for deep analyses.


Temporal Trends

The number of documents is plotted against their dates of origin to produce a temporal trends graph directly beneath the topic model and bigram modules within the widget.

Selecting a topic or bigram also filters the temporal trends graph to show temporal metadata specific to that topic or bigram. This feature is helpful for visualizing the changes that occur over time within topics and across bigrams. The type of graphic display can be changed to line, area, stream, or bar graph in the widget editor and the grid can be disabled for easier viewing. Stratifyd defaults to stacking the bars, lines, streams, or areas, but this feature can be unchecked on the widget directly. When selecting bar as the style bar, it is recommended to also select “Grouped” in order to split out each topic’s bar for easier viewing. You can also choose to display an average line with data callout and/or overlay either a linear or quadratic regression line.

Each “stack” in the overview mode of the temporal trends graph represents a topic, and topics can be selected for deeper analysis from the temporal trends graph too. Hovering on a topic within the graph prompts a pop-out to display the topic number, the number of days covered in the slot hovered over, the start and end dates of the slot, the number of documents contained, average sentiment score, and the top Key N-Grams present.


Geomap

If there is location metadata present in the dataset, you should have already mapped it during your initial import, designating whether the location data refers to countries, states, cities, addresses, or longitude/latitude coordinates. A geomap will be automatically generated as part of the Semantic Topics widget based on the data mapping. The map will plot the data volume by location and location-level average sentiment. You can resize the map by clicking the + or – icons. Clicking on a data callout ticker within the map will filter the map to the selected location. The Geomap can also be switched to list-view.