Coming Soon: May 2019 Release

Blog

New Features

Scheduled and Filtered Analyses

How accurately a model makes predictions depends on the distribution of the data on which it trains. Since distribution of data deviates over time, it is a good practice to monitor incoming data and retrain your model using newer data. Our next release allows you to schedule time intervals at which to retrain your models automatically and allows you to filter the data on which to retrain.

Set the schedule when you deploy a new model via the Advanced section of the Deploy a New Model dialog.

schedule-advanced

In the advanced section, you can find the Schedule Model Retrain option near the bottom. Enter an interval number and select Days, Weeks, Months, or Years to specify how often you want the model to retrain.

schedule-model-retrain

You can also specify filters for the data on which to retrain your model. For example, you might want to retrain based only on data coming from a specific country or a specific range of dates. The Add Filter button at the bottom of the dialog opens the filter dialog.

schedule-add-filter

If you want to apply the filter to the entire data set instead of just the retraining data, scroll up and clear the checkbox next to Apply training filter to analysis.

To facilitate this feature, we store the following information on the model object.

  • schedule: training schedule interval as int {years, months, weeks}
  • schedule_last: timestamp of last scheduled training run in milliseconds
  • schedule_time: timestamp of next scheduled training run in milliseconds
  • schedule_hash: streamed version of model data from the last scheduled training run

When you schedule retraining and the scheduled time arrives, Stratifyd retrains the model and updates the schedule_last, schedule_time, and schedule_hash values. If there are no changes in the model data, the model version remains unchanged. If there are changes in the hash value, Stratifyd updates the model version.

Boolean Filters

With the new Global Filter designer, you can create more powerful filters by combining multiple filters using AND, OR, and NOT connectors. Access the Global Filter designer via the Advanced option on the vertical ellipsis at the top right of the Filters panel.

filters-advanced

You can also open the designer via the button at the bottom left corner of the dashboard, and then select the Advanced radio button at the top of the Global Filter.

filter-editor
filter-basic

In the advanced Global Filter designer, you can drag and drop filters, select data fields and values, and drag and drop AND, OR, and NOT connectors. Drag connections from the NOT connector to a single child. Drag connections from AND and OR connectors to multiple fields as children.

In this example, data must have both a location in Dublin, OH, and a sentiment value between 0 and 2, and employment status must not be empty.

Taxonomy V2

Version two of the taxonomy model has no changes in how you use it, but the underlying engine has been modified to improve speed and accuracy.

  • Taxonomies run four to six times faster. The more complex your taxonomy is, the more the performance gains you can expect.
  • Linguistic connections are now converted for a closer match rate with legacy results.
  • Mixing syntax is now supported so that you can use wildcards inside of both proximity rules and Boolean rules.
  • More modular approach paves the way for future functionality. Look forward to future releases with support for additional syntax.

AUC ROC Curve

The AUC (area under the curve) ROC (receiver operating characteristic ) curve allows you to evaluate the performance of your model.

This information has been working under the hood for you to provide the accuracy ratio, but now you can see a visualization of this data.

Supported model types include:

  • AutoLearn Model
  • Random Forest Model
  • Feedforward Neural Network Model
  • Logistic Regression Model
  • ZSL Model
  • Support Vector Machine Model
  • Embedding Attentive Model

From the Home page, on the Models tab, click any supervised learning model card to open the Model Info dialog. Just below the Accuracy box, click the See full metrics link. (If the link is missing, either the model type is unsupported or the model was created before the feature was added.)

auroc-model-info

The Metrics Plots dialog shows the Confusion Matrix and the Receiver Operating Characteristic or ROC curve.

For more information on confusion matrices, see the Understanding Confusion Matrix article in Towards Data Science.

The Confusion Matrix plots the predicted data against the ground truth. The horizontal X axis represents the predicted data, and the vertical Y axis represents the ground truth.

Data points that fall along the diagonal line are accurate predictions, and ones that fall outside of the diagonal line are inaccurate predictions.

In this example, the model predicts Net Promoter Score (NPS) labels for the data: detractor, passive, or promoter.

auroc-confusion-matrix

To make the data easier to understand, turn on the Show Number of Records option, and you can see how many records fall into each cell of the matrix from the validation data.

aurroc-confusion-show-number

The Receiver Operating Characteristic curve shows the true positive rate against the false positive rate based on the chosen threshold. In this example, you can select from detractor, passive, or promoter users. The perfect rate to aim for is a flat line across the top, as close to 1.0 as you can get for the True Positive rate for each of the thresholds. This example shows a trustworthy model with a true positive rate of 0.968.

auroc-receiver-op-characteristic

The AUC ROC curve measures how good your model is, and you can plot it against the rate of False Positives you are willing to tolerate.