The following is valid as of ThingWorx Analytics (TWA) 52.0.2 and below.

Overview

  • The main steps are as follow:
    1. Create a dataset
    2. Configure the dataset
    3. Upload data to the dataset
    4. Optimize the dataset
    5. Create filters for training and scoring data
    6. Train the model
    7. Execute scoring on existing data
    8. Upload new data to dataset
    9. Execute scoring on new data
  • TWA models are dataset centric, which means a model created with one dataset cannot be reused with a different dataset.
  • In order to be able to score new data, a specific feature, record purpose in the below example, is included in the dataset.
  • This feature needs to be included from the beginning when the data is first uploaded to TWA.
  • A filter on that feature can then be created to allow to isolate desired data.
  • When new data comes in, they are added to the original dataset but with a specific value for the filtered feature (record purpose), which allows to discriminate and score only those new records.

Process

  1. Create a dataset
    • This example uses the beanpro demo dataset
    • Create dataset is done through a POST on datasets REST API as below

createDataset.png

2. Configure dataset

    • This is done through a POST on <dataset>/configuration REST API

ConfigureDataset.png

3.      Upload data

        UploadData.png

4.      Optimize the dataset

        OptimizeDataset.png

5.      Create filters

    • The dataset includes a feature named record purpose created especially to differentiate between the rows to be used for training and the rows to be used for scoring.
    • New data to be added will have record purpose set to scoringnew, which will allow to execute a scoring job limited to those filtered new rows
    • Filter for training data:

TrainFilter.png

    • Filter for new scoring data

       scoreFilter.png

6.      Train the model

  • This is done through a POST on <dataset>/prediction API

       trainModel.png

7.      Score the training data

    • This is done through a POST on <dataset>/predictive_scores API.
    • Note the use of the filter TrainingData created earlier.
    • This allow to score only the rows with training as value for record purpose feature.
    • Note: scoring could also be done without filter at this stage, in which case all the data in the dataset will be scored and not just the ones with training fore record purpose

scoretraindata.png

 

  • Retrieving the scoring result show all the records in the dataset:

trainResult.png

 

8.      Upload new data

    • The newly uploaded csv file should only contains new record.
    • This will be appended to the existing ones.

uploadNewData.png

 

  • Note that the new record (it could be more than one) has a value scoringnew for the record purpose feature:

newrecord.png

  • This will allow to use the previously created filter ScoringNewData so that a new scoring job will only take into account this new record.

 

9.      Scoring new data

    • A POST on API predictive_scores is executed however using the filter ScoringNewData.
    • This results in only the newly added data to be scored and therefore a much quicker execution time too.

scoreNewData.png

    • Retrieving the scoring result shows only the new record:

newResult.png