Performance Design, Pitfalls and Troubleshooting

Yes

A common issue that is seen when trying to deploy, design or scale up a ThingWorx application is performance. Slow response, delayed data and the application stopping have all been seen when a performance problems either slowly grows or suddenly pops up. There are some common themes that are seen when these occur typically around application model or design. Here are a few of the common problems and some thoughts on what to do about them or how to avoid them.

Service Execution

This covers a wide range of possibilities and is most commonly seen when trying to scale an application. Data access within a loop is one particular thing to avoid. Accessing data from a Thing, other service or query may be fast when only testing it on 100 loops, but when the application grows and you have 1000 suddenly it's slow. Access all data in one query and use that as an in memory reference. Writing data to a data store (Stream, Datatable or ValueStream) then querying that same data in one service can cause problems as well. Run the query first then use all the data you have in the service variables.

To troubleshoot service executions there are a few methods that can be used. Some for will not be practical for a production system since it is not always advisable to change code without testing first.

Used browser development tools to see the execution time of a service. This is especially helpful when a mashup is slow to load or respond. It will allow quickly identifying which of multiple services may be the issue.
Addition of logging in a service. Once a service is identified adding simple logging points in the service can narrow what code in the service cases the slow down (it may be another service call). These logging statements show up in the script logs with time stamps ( you can also log the current time with the logging statements).
Use the test button in Composer. This is a simple on but if the service does not have many parameters (or has defaults) it's a fast and easy way to see how long a service takes to return,'
When all else fails you can get thread dumps from the JVM. ThingWorx Support created an extension that assists with this. You can find it on the Marketplace with instructions on how to use it. You can manually examine the output files or open a ticket with support to allow them to assist. Just be careful of doing memory dumps, there are much larger, hard to analyse and take a lot of memory. https://marketplace.thingworx.com/tools/thingworx-support-tools

Queries

These of course are services too but a specific type. Accessing data in ThingWorx storage structures or from external sources seems fairly straight forward but can be tricky when dealing with large data sets. When designing and dealing with internal platform storage refer to this guide as a baseline to decide where to store data... Where Should I Store My Thingworx Data?

NEVER store historical data in infotable properties. These are held in memory (even if they are persistent) and as they grow so will the JVM memory use until the application runs out of it. We all know what happens then. Finally one other note that has causes occasional confusion. The setting on a query service or standard ThingWorx query service that limits the number of records returned. This is how many records are returned to from the service at the end of processing, not how many are processed or loaded in memory. That number may be much higher and could cause the same types of issues.

Subscriptions and Events

This is similar to service however there is an added element frequency. Typical events are data change and timers/schedulers. This again is often an issue only when scaling up the number of Things or amount of data that need to be referenced. A general reference on timers and schedulers can be found here. This also describes some of the event processing that takes place on the platform. Timers and Schedulers - Best Practice

For data change events be very cautions about adding these to very rapidly changing property values. When a property is updating very quickly, for example two times each second, the subscription to that event must be able to complete in under 0.5 seconds to stay ahead of processing. Again this may work for 5-10 Things with properties but will not work with 500 due to resources, speed and need to briefly lock the value to get an accurate current read. In these cases any data processing should be done at the edge when possible (or in the originating system) and pushed to the platform in a separate property or service call. This allows for more parallel processing since it is de-centralized.

A good practice for allowing easier testing of these types of subscription code is to take all of the script/logic and move it to a service call. Then pass any of the needed event data to parameters in the service. This allows for easier debug since the event does not need to fire to make the logic execute. In fact it can essentially be stand alone by the test button in Composer.

Mashup Performance

This one can be very tricky since additional browser elements and rendering can come into play. Sometimes service execution is the root of the issue and reviewed above, other times it is UI elements and design that cause slow down. The Repeater widget is a common culprit. The biggest thing to note here is that each repeater will need to render every element that is repeated and all of the data and formatting for each of those widgets in the repeated mashup. So any complex mashup that is repeated many times may become slow to load. You can minimize this to a degree based on the Load/Unload setting of the widget and when the slowness is more acceptable (when loading or when scrolling).

When a mashup is launched from Composer it comes with some debugging tools built in to see errors and execution. Using these with browser debug tools can be very helpful.

Scaling an Application

When initially modeling an application scale must be considered from the start. It is a challenge (but not impossible) to modify an application after deployment or design to be very efficient. Many times new developers on the ThingWorx platform fall into what I call the .Net trap. Back when .Net was released one of the quote I recall hearing about it's inefficiencies was "memory is cheap". It was more cost efficient to purchase and install more memory than to take extra development time to optimize memory use. This was absolutely true for installed applications where all of the code was complied and stored on every system. Web based applications are not quite a forgiving since most processing and execution is done on the single central web server.

Keep this in mind especially when creating Shapes, Templates and Subscriptions. While you may be writing one piece of code when this code is repeated on 1,000 Things they will all be in memory and all be executing this code in parallel. You can quickly see how competition for resources, locks on databases and clean access to in memory structures can slow everything down (and just think when there are 10,000 pieces of that same code!!).

Two specific things around this must be stated again (though they were covered in the above sections). Data held in properties has fast access since it is in JVM memory. But this is held in memory for each individual Thing, so hold 5 MB of information in one Thing seems small, loading 10,000 Thing mean instant use of 50 GB of memory!! Next execution of a service. When 10 things are running a service execution takes 2 seconds. Slow but not too bad and may not be too noticeable in the UI. Now 10,000 Things competing for the same data structure and resources. I have seen execution time jump to 2 minutes or more.

Aside from design the best thing you can do is TEST on a scaled up structure. If you will have 1,000 Things next year test your application early at that level of deployment to help identify any potential bottlenecks early. Never assume more memory will alleviate the issue. Also do NOT test scale on your development system. This introduces edits changes and other variables which can affect actual real world results. Have a QA system setup that mirrors a production environment and simulate data and execution load.

Additional suggestions are welcome in comments and will likely update this as additional tool and platform updates change.

Tudor · ‎Sep 26, 2017

Great article for designing a scalable application.

One thing I would add is that JVM performance (memory allocation, garbage collections and collectors) also play a significant role in shaping the end-user performance especially when the system scales up. We do have a quick guide for some JVM memory troubleshooting if the JVM ends up being the performance bottleneck:

JVM Performance Monitoring Part 1: Troubleshooting GC logs

Performance Design, Pitfalls and Troubleshooting

Service Execution

Queries

Subscriptions and Events

Mashup Performance

Scaling an Application

Best Practices

Design

Troubleshooting