Behavior of ThingWorx Analytics Anomaly Detection with Data Gaps

 



In ThingWorx Analytics, Anomaly detection is performed through the ThingWatcher API framework. This is done by observing the Data from an Edge device, learning what the data stream should look like and then monitoring for any unexpected sequences of Data within the incoming Data stream. Ideally, for this process to work properly, there should be no Data Gaps.

However, Data Gaps do occur, this blog describes how ThingWatcher deals with them in order to achieve high performance in anomaly detection.


 

Data Gaps and phases affected:


In anomaly detection, ThingWatcher goes through three consecutive phases which are Initializing, Calibrating and then Monitoring.

Both the Initializing and Monitoring phases involve either collecting or monitoring streamed Data, so these two phases are sensitive to Data streaming Gaps.

The Calibrating phase involves the use of already collected data to create the Anomaly Detection Model. Thus this phase is not directly affected by Data gaps.


Dealing with Long and Short Data Gaps:


Initializing Phase:


During this phase, Data is collected and as part of the collection process, the sampling rate is imputed. So when short data gaps occur these are interpolated so that there are no missing values.


However long Gaps might also occur. A Data gap is considered to be long if there is more than three missing Data points which should amount to three times the sampling rate. Basically, if the Timestamp on a data point is greater than the previous timestamp by more than three times the sampling rate that is considered to be a long gap.


If a long gap occurs, ThingWatcher would restart the Data Collection process since long Data gaps are not acceptable. The data recollection process could be initiated three times when there are long gaps before failing if the gaps persist. The Data source would then no longer be considered reliable


Monitoring Phase:


In this phase, the Data stream is monitored to detect any unexpected behavior. In that case, if a short time gap occurs between the previous and the current TimedValue data points, the lookback buffer would be cleared. ThingWatcher will re-enter the Buffering state and will remain in this state until the lookback window buffer is completely filled.


For more information on The functionalities of ThingWatcher, Please refer to the ThingWatcher Deployment Guide

https://support.ptc.com/WCMS/files/173109/en/ThingWatcher-Deployment-Guide-8.0.pdf


However, if the gaps are long and exceed three times the sampling rate, data filling could no longer be a valid solution and Data collection restarts.

It is important to note that these imputed values decrease the accuracy of the Anomaly Detection Model. Therefore data monitored by ThingWatcher should be incremented in regular intervals.

 

In general, persistent data gaps should be avoided by ensuring that data is streamed such that the timestamps increase in regular increments and any gaps that exist are generally incidental and small.