Scoring is the process of making the prediction on the basis of the available data. Scoring is the process of assigning a predicted outcome to an individual record based on running that record’s conditions through the trained model. It allows you to request and retrieve individual record level prediction scores for a defined data set for a set of prediction topics. The accuracy of the score will likely be a direct reflection of the error rate produced by the Trained Model.
Why the score value exceeds min or max value range of feature
There are a few concepts to address with regards to this:
Scoring outputs: It is important to note that when training an analytics model, the method is to create a generalizable model from a relatively small training dataset. By its nature, we expect the training process to see a limited subset and not an exhaustive list of all possible values for many constraints, especially time and practicality. As such, these generalized models will be expected to handle unseen data in the form of new combinations or values outside of previously observed ranges (more on this below). One common way to see scores that exceed the observed ranges in training, under the assumption that the goals are continuous, is to use prescriptive scoring. Prescriptive scoring attempts to find optimal values for lever, meaning tunable, features in order to maximize or minimize score values.
Min/Max constraints: these are constraints that are placed upon the inputs for training and expected inputs for scoring.
- For training: If theses ranges were provided as part of the upload process, then training will raise exceptions regarding invalid data. However, if the ranges are not provided - they will be inferred from the data and, as such, training will not see values outside of observed ranges.
- For scoring: validation of the ranges will only be performed on the inputs - not the outputs. It is very important to note that the handling of these "constraints" is dependent upon the data type. For categorical (e.g. colors) and ordinal data (e.g. shirt sizes), the constraints are strict and data that was not observed in training will raise exceptions during scoring. However, for continuous values (e.g. temperature ranges) these constraints are more informational in nature. For predictive scoring, our code will accept records with values outside of those ranges. The rule of thumb is that values slightly outside these ranges are acceptable and that as the values stray farther from the ranges, the accuracy of the model degrades very quickly. For prescriptive scoring, these constraints are used to determine the acceptable ranges of values to try when determining the optimal values. Values outside of these constraints will NOT be tried.
How to handle goal values while scoring
What should be the value for the goal(objective TRUE) column in new data which would be scored using existing prediction model?
<Dataset for making prediction model>
Independent value goal field
<New data to be scored>
Independent value goal field
Now scoring, by its definition, does not take into consideration the goal column when being run. Seeing as the goal column above is a Boolean, we can populate the yet to be scored records with either a 0 or 1 and it won’t matter when it comes to scoring.