AI: Artificial intelligence or augmented intelligence.
analysis: Systematic evaluation of data by breaking it down into constituent elements to uncover interrelationships. In Stratifyd, this may refer to analysis performed by the analytics engine using artificial intelligence (AI) and machine learning (ML) to generate new data points based on your unstructured data, or to analysis performed by a person interacting with the results via a dashboard.
artificial intelligence: Intelligence of machines or computers, in contrast to the natural intelligence displayed by humans.
Augmented Intelligence®: The tools provided by Stratifyd to augment human intelligence and analysis with machine learning.
buzzword: Statistically significant n-gram found in textual data. In order for an n-gram to be deemed statistically significant, we look at how often the words occur together in the dataset versus how often they occur individually.
channel: Customer feedback source.
Chinese dictionary: Customized list of Chinese tokens to use in n-gram generation.
comparative analysis: Comparison of two or more entities, datasets, taxonomies, dashboards, or filters.
convergence: Output from multiple iterations coming closer and closer to a specific value. If an algorithm will not converge, more iterations cause its output to undergo larger and larger oscillations, never approaching a useful result.
corpus: Collection of textual data used to train a model.
CSAT: Customer satisfaction score. A KPI used to track how satisfied customers are with an organization.
dashboard: Interactive collection of widgets containing visualizations of data from multiple datasets. Used for passive monitoring and for deep-dive or root-cause analysis. Can be split across multiple tabs for different focus areas.
data connectors: Features that allow you to extract data from files, public web sites, and third-party sites (those requiring authentication) for use in analysis.
data democratization: The Stratifyd mission to ensure that everyone has access to augmented intelligence and machine learning.
data fusion: Joining data sources on a field.
data model: Also model. A structure that is applied to raw data to organize data elements and standardize how they relate to one another, preparing it for analysis and visualization.
data normalization: Removal of non-textual information such as HTML artifacts and whitespace; tokenization; lemmatization; spam detection; and language detection. Occurs after unstructured data ingestion.
data point: Discrete unit of information or set of measurements on a single unit of information.
data stream: Also dataset. A set of data points collected via a data connector from a single source.
dataset: See data stream.
document: Complete text object returned in unstructured data. Also called "verbatim." The main unit of feedback communication from a single user.
DVH: SV and DVH are the engines that power visualizations. DVH was the original engine when Stratifyd was first founded, and SV was added later. The two will be unified soon, but some legacy visualizations that only require a single dimension like topics or ngrams or sentiment still use DVH.
dynamic topic modeling: The use of clustering algorithms and statistical algorithms to extract topics from unstructured data. Topic models are dynamic when they analyze the change in topics over time represented within a corpus.
export: Feature that allows you to extract analyzed data from a data model into a CSV file or into a Power BI dataset.
F1 score: The harmonic mean of the precision and recall, where the best value is 1 (perfect precision and recall) and the worst is 0.
filter: Tool for refining your dataset to show only the data that is relevant to you to reveal insights hidden by irrelevant or repetitive data. Applies to dashboard, tab, or widget level.
ground truth: Objective or provable data such as a star rating that is used to test machine learning for accuracy.
group: Set of users with whom to share a dashboard, data stream, or model. Also, sets of data values that help to clarify analysis, for example, you may group temporal data by week or month, or group locational data by country or state.
ingestion: Process of bringing data into the Stratifyd platform. Once ingested, data normalization occurs.
KPI: Key performance indicator. A measurement of how effectively an organization achieves its key objectives.
lemmatization: Process that groups words with the same root, such as run and ran.
lexicon: List of words and phrases with assigned polarities or sentiment.
ML: Machine learning.
model: See data model.
multi-channel: Analysis of data in separate tabs or separate dashboards for different feedback sources.
n-gram: Contiguous sequence of n words (usually two) from a document used to analyze and discover buzzwords. See Introduction to Language Models: N-Gram on Towards Data Science for more information.
neural sentiment model: A pre-trained model that determines sentiment using star ratings as the ground truth that is trained on 40 million product reviews.
neural sentiment scores: Labels for data returned by the neural sentiment model.
NLP: Natural language processing. A computer science discipline that analyzes text using linguistic and statistical algorithms to extract meaning. NLP relies on machine learning and artificial intelligence to understand human languages. See MonkeyLearn's Definitive Guide to Natural Language Processing for more information.
NLU: Natural language understanding. Logic that accurately categorizes sentiment scores without human bias.
NPS: Net promoter score. A measure of customer experience and predictor of business growth.
NSM: See neural sentiment model.
omni-channel: Analysis of data from different sources in the same space. May involve data fusion.
overfitting: A modeling error which occurs when a function is too closely fit to a limited set of data points; that is, the model is made too complex in order to explain random noise in the data. Can infect the model with substantial errors and reduce its predictive power.
PCI: Payment card information. One of the items redacted from textual data input, including S2T audio file transcriptions.
PII: Personally identifiable information. One of the items redacted from textual data input, including S2T audio file transcriptions.
play: A timer set to automatically page through every tab of a dashboard on a loop. Useful for displaying dashboards in a common area for real-time monitoring.
PMI: Pointwise mutual information. A measure of association between a feature (e.g. a word) and a class (e.g. a category), in contrast to measuring an association between an entire document and a category.
query: Request for information from a data source such as a database or a website. Stratifyd provides a number of data connectors that collect information from you to build queries to return the data that you want to analyze.
random forest: Algorithm that learns to classify and regress data by creating a multitude of decision trees during training. See Understanding Random Forest in Towards Data Science for more information.
redaction engine: Functionality attached to a data connector that permanently removes sensitive data such as PCI and PII from textual data input, including S2T audio file transcriptions.
S2T: Speech to text. The technology that transcribes audio files into data streams and redacts PCI and PII.
sentiment analysis: Identifies customer opinions, feelings, and intent expressed through text using natural language processing and a sentiment lexicon.
sentiment dictionary: See sentiment lexicon.
sentiment lexicon: A list of words and phrases with assigned sentiment scores or polarities.
stemming: A process that removes word endings such as -ed, -ing, -s preserving only the root of the word.
stopwords: A list of terms to ignore for analysis. Tells the analytics engine to ignore those terms when looking for patterns and connections between your data.
Stratifyd platform: A collection of modules that interact to facilitate analysis and free data analysts from reliance on data scientists; a means to data democratization.
supervised learning: The machine learning task of learning a function that maps an input to an output based on example input-output pairs. It infers a function from labeled training data consisting of a set of training examples. Each example is a pair consisting of a ground truth and one or more training features or unstructured texts. The algorithm analyzes the training data and produces an inferred function, which it applies to mapping new examples.
supervised models: A collection of machine learning models that you can apply to your data in order to extract a neural sentiment score. The AutoLearn model determines which algorithm (or ensemble of algorithms) works best for your data.
SV: SV and DVH are the engines that power visualizations. DVH was the original engine when Stratifyd was first founded, and SV was added later. We are close to unifying them. Some legacy visualizations that only require a single dimension like topics or ngrams or sentiment use DVH.
tab: A page within a dashboard on which you can group widgets with a common data theme. With multiple tabs, you can set a dashboard to play a slideshow of all of the tabs.
taxonomy: A tree-like hierarchical structure of labels into which data is sifted and classified. Labels within a taxonomy have relations to other labels within the taxonomy: parent label, child label, or both if it is at a mid-level within the hierarchy.
template: A ready-to-use dashboard with any number of tabs and widgets that can be re-used with different data sets.
token: A set of words that are frequently found in close proximity within unstructured data. Used to create n-grams.
tokenization: The process of delimiting and classifying unstructured data. The resulting tokens are then passed on for further analysis.
topic: Generated by performing unsupervised machine learning on top of n-grams to determine hidden themes and group documents accordingly. Any document can occur in more than one topic, therefore the percentage of documents contained in each topic adds up to greater than 100%. See MonkeyLearn's Topic Analysis for more information.
training feature: A data field that contains structured data such as a numerical value or form text that users select, as opposed to unstructured text.
tuning: A set of advanced options used to refine a data analysis. Includes data models, stopwords lists, and Chinese tokens dictionaries.
unstructured text: A data field that contains free-form text that users enter, as opposed to training feature, that is, numerical values or form text which users select.
unsupervised NLU model: A data model that automatically analyzes large textual datasets to discover topics and themes. Built on top of our proprietary Bayesian Neural Network and Generative Model, dynamically identifies semantic topic groups based on context in input data.
verbatim: The full text of a user comment found in the original data.
visualization: Interactive chart, gauge, map, word cloud, list, table, tree, or calendar used to display data visually.
WER: Word error rate. A measurement of the accuracy of speech-to-text transcription.
widget: Dashboard item used to display a data visualization.
ZSL: A zero-shot learning supervised model that categorizes input data without any example of that data during training.