How to Use Taxonomies

Getting Started

What is a taxonomy?

Stratifyd uses faceted hierarchical taxonomies with controlled vocabularies; labels are broad or narrow branches of the hierarchical categorization tree. The controlled vocabulary for each label is simply a restricted list of words and Boolean logic meant to index documents.

Controlled vocabularies in Stratifyd do not generate synonym rings; it is the user's responsibility to add synonymous terms to the vocabularies for each label.

Taxonomy--1-

Documents are funneled into labels based on a purely disjointed "did match" or "did not match" one of the terms in the label's controlled vocabulary. Narrower labels work with their broader parent labels vocabularies as nested levels of match logic, enabling a more fine-grained classification model.

Users can drill down through their datasets classified by a taxonomy using the taxonomy widget.

How to Create a Taxonomy

  1. On the main page, click on the Advanced option from the navigational bar next to Dashboards.
  2. Screen-Shot-2018-02-05-at-6.06.44-PM

  3. Click on the Taxonomies module.
  4. Click on the + icon in the bottom right of your screen, then select the second + icon that appears to create a new taxonomy.
  5. Name your new taxonomy in the resulting prompt.
  6. Screen-Shot-2018-02-05-at-6.08.34-PM

  7. Click "add a new label..." to create a new label.
  8. Screen-Shot-2018-02-05-at-6.11.46-PM

  9. Click on the newly named label to begin adding to its controlled vocabulary (see next section for details).
  10. Screen-Shot-2018-02-05-at-6.12.30-PM


Taxonomy Syntax

Rule Categories

Select whether to apply rules to sentences, paragraphs, or entire documents within the corpus.

Any: If any rules are met within this controlled vocabulary, the document is classified under the label.
All: Documents are only classified under the label if ever y rule is met within this section.
None: Documents are only classified under the label if none of the rules in this section are met.

A second ANY section can be added to create a logical conjunction (AND logic) between the two ANY sections. In other words, they are "ANDed" together

Screen-Shot-2018-02-12-at-2.33.39-PM

Proximity Rules

term~#~term will label a sentence where both terms are within the specified distance of each other:

uninstalled~2~reinstalled will match the phrase “uninstalled and then reinstalled”, but not “uninstalled and then reinstalled”.

Boolean Logic

  • for OR, use “|”
  • for AND, use “+”
  • for NOT, use “!”
  • Parenthesis can be used to set precedence
((uninstalled | reinstalled) + app)

Wildcards

Multi-character wildcard * and single-character wildcard ?

app* will match “application”
*install will match both “uninstall” and “reinstall”
rec??ve will match both “receive” and “recieve"

Structured Data

Use the syntax: {value_to_match:@:column_header_name} to match documents only if the specified value is present in the structured data field of the same document

Tabs

Multiple tabs within a label are used as logic OR relationships between tabs when you have two different vocabularies for capturing documents into the same label.

Nested Labels

You can create sub-labels by clicking the dropdown arrow next to any label, then clicking "add new label...".

Each narrower label can have as many sub-labels as desired.


How to Apply a Taxonomy

  1. Open the data manager in your dashboard by clicking on the data stack icon.
  2. Click the dataset options icon for the dataset you want to apply the taxonomy to, then select "Reprocess".
  3. In the advanced options on the data connector prompt, click on Taxonomies.
  4. Select the taxonomy and version number to apply to the dataset.

The Taxonomy Widget

In the widget editor, select "Taxonomy Analysis" to create the taxonomy widget. You can add a size dimension to see how prominent each label is by the size dimension. For example, most taxonomies size by Number of Records, with each label's bubble size referring to the number of documents belonging to the label based on the controlled vocabularies.

Screen-Shot-2018-02-05-at-6.16.35-PM

Other structured data elements can size the taxonomy. For example, Stratifyd Sentiment would show the labels sized by how positive their documents' average sentiment scores are. You can filter your dataset by drilling down on labels within your taxonomy tree on your dashboard for focused analysis.


Did you know you can edit your taxonomies without exiting your dashboards?

  1. Start by clicking the dashboard's dropdown menu from the top left.
  2. Screen-Shot-2018-02-05-at-6.17.22-PM

  3. Select "Tools".
  4. Click on "Taxonomy".

    Screen-Shot-2018-02-05-at-6.17.44-PM

    Screen-Shot-2018-02-05-at-6.18.00-PM

This is helpful when auditing a taxonomy by clicking on the "Not Labeled" bubble in the taxonomy widget to see which bigrams are not being classified. With the taxonomy open next to the dashboard, you can easily add the missing terms to your vocabularies, then reprocess the dataset.

Lastly, you can share taxonomies with your team members the same way you would share a dashboard:

Screen-Shot-2018-02-05-at-6.20.11-PM

This allows team members to apply taxonomies without having to replicate them since taxonomies live separately from dashboards.