Not as simple a question as it sounds. There more options than some might think and choosing the right one can be the difference between a well performing application and one that struggles as it scales up in size. There are options both internal and external to the Thingworx platform that can be used. Each has their own use cases and cost considerations.
Internal to Thingworx there are currently two options as the storage provider PostGreSQL and DataStax Enterprise Cassandra (Neo4j will only be supported for legacy customers and H2 version is not yet released). PostGreSQL is required for storing the Thingworx model structure and is an open source technology, meaning no additional cost. It performs well up to an estimated 500 Gb of data storage (this is a rough estimate dependant on use case). For very high volume data DSE is the choice, it performs well for large data sets.
External to Thingworx you can use virtually any data storage technology the provides a JDBC connector or even one that has a driver that can be used to create a Thingworx Extension via our SDK or edge SDKs. The platform knows how to use JDBC drivers so this can easily be used to connect to relational data storage like Oracle or SQL Server.
The first real question to ask when making the choice of where to store data is, what does my data look like? Many systems are adapted or migrated from legacy systems which may include relational data, others simply have this structure by necessity. If the data will need to use complex SQL to retrieve (like using joins, like, cursors, temp tables, etc.) then store the data in a true relational database. If it is simple historical data, time series data or data that does not require compounding or recursive calculation to be useful, then keep it in platform data storage.
The second question to ask is, how much data will I be storing. This adds a bit of complexity to where data is best stored. There is no limit to the number of records in any data structure however, the Thingworx Platform storage is optimized to store and retrieve time series data, using the ValueSteam and Stream types built into the Platform. This is the most common IoT data structure and in this case you can refer back to the previous information when choosing PostGreSQL or DSE as the correct backend storage. Data tables can be used when contained in small data sets (around 100,000 records or less) you can use Platform storage for this as these are intended for largely static data structures. Retrieving data when DataTables grow larger than this will begin to slow performance quickly. This is because currently Thingworx will do a full scan of the data, in this specific type of structure, when querying because all of the logic for the query or filter is done on the platform, not on the database (this will likely change in a future version). So small amounts of data can be quickly loaded and parsed in memory.
NOTE: In datatables if you add a index to a column, these indexes are used when calling "FindDataTableEntries" but not when using "QueryDataTableEntries".
Streams and ValueStreams, however, are optimized for time series data. In these structures Thingworx has built in datetime filters that allow for very fast retrieval of data based on a date range. When the number of records returned after the date range is applied is still a very large number (100,00 - 200,000) you may see a drop in performance of a query at that point. Just as before, all records, after the date filter is applied, are returned to the Platform and further query and filtering are done in memory.
The querying/retrieval of data is commonly where the greatest performance issues are seen. Using a JDBC connector to send the query to the database (even if it is DSE or PostGreSQL ) can help, or if the historical data is not queried regularly you can move this data to a separate Thingworx data store (another DataTable or Stream).
That would leave only large data sets of non-time series data as the outlier. This scenario could perform equally well (or poorly) primarily on how the data will be retrieved. If there are loose relationship between the data that need to be used then a relational system that would allow these to be executed on the database server is preferred. Sequential data that does not need this type of processing could be stored in DSE.
This is a base outline of considerations when designing data storage on your application. Most use cases are unique and may have additional considerations around process and cost.