Constraints on Scores Values in ThingWorx Analytics
This blog addresses a few points that are related to scoring with ThingWorx Analytics. It, particularly, brings a clearer understanding of the concepts behind the values of the scores that are generated when performing a scoring job.
It is important to note that when training an analytics model, the method is to create a generalizable model from a relatively small training dataset.
By its nature, we expect the training process to see a limited subset and not an exhaustive list of all possible values for many constraints, especially for time and practicality.
As such, these generalized models will be expected to handle unseen data in the form of new combinations or values outside of previously observed ranges (more on this below).
One common way to see scores that exceed the observed ranges in training, under the assumption that the goals are continuous, is to use prescriptive scoring.
Prescriptive scoring attempts to find optimal values for a lever, meaning tunable, features in order to maximize or minimize score values. See the prescriptive scoring documentation and functionality for more information.
min/max constraints: these are constraints that are placed upon the inputs for training and expected inputs for scoring.
• For training: If theses ranges were provided as part of the upload process, then training will raise exceptions regarding invalid data. However, if the ranges are not provided - they will be inferred from the data and, as such, training will not see values outside of observed ranges.
• For scoring: Validation of the ranges will only be performed on the inputs - not the outputs. It is very important to note that the handling of these "constraints" is dependent upon the data type.
For categorical (e.g. colors) and ordinal data (e.g. shirt sizes), the constraints are strict and data that was not observed in training will raise exceptions during scoring.
However, for continuous values (e.g. temperature ranges) these constraints are more informational in nature. For predictive scoring, our code will accept records with values outside of those ranges.
The rule of thumb is that values slightly outside these ranges are acceptable and that as the values stray farther from the ranges, the accuracy of the model degrades very quickly.
For prescriptive scoring, these constraints are used to determine the acceptable ranges of values to try when determining the optimal values. Values outside of these constraints will NOT be tried.