Official name: DataStax Enterprise, sometimes referred as Cassandra.
Note: DBA skills required, free self-paced training can be found here Training | DataStax The extension package can further be obtained through Technical Support.
Thingworx 6.0 introduces DSE as a backend database scaling to much greater byte count, ad Neo4j performance limitations hit at 50Gbs. Some of the main reasons to consider DSE are:
1. Elastic scalability -- Alows to easily add capacity online to accommodate more customers and more data when needed.
2. Always on architecture -- Contains no single point of failure (as with traditional master/slave RDBMS's and other NoSQL solutions) resulting in continious availability for business-critical applications that can't afford to go down.
3. Fast linear-scale performance -- Enables sub-second response times with linear scalability (double the throughput with two nodes, quadruple it with four, and so on) to deliver response time speeds.
4. Flexible data storage -- Easily accommodates the full range of data formats - structured, semi-structured and unstructured -- that run through today's modern applications.
5. Easy data distribution -- Read and write to any node with all changes being automatically synchronized across a cluster, giving maximum flexibility to distribute data by replicating across multiple datacenters, cloud, and even mixed cloud/on-premise environments.
Note: Windows+DSE is currently not fully supported.
Prerequisite: fully configured DSE database.
1. Obtain the dse_persistancePackage
2. Import as an extension in Composer.
3. In composer, create a new persistence provider.
4. Select the imported package as Persistence Provider Package.
5. In Configuration tab:
- For Cassandra Cluster Host, enter the IP address set in cassandra.yaml or localhost if hosted locally
- Enter new of existing Cassandra Keyspace name
- Enter Solr Cluster URL
- Other fields can be left at default (*)
6. Go to Services and execute TestConnectivity service to ensure True response.
7. When creating new Stream, Value Stream, or a Data Table, set Persistence Provider to the one created in previous steps.
Currently all reads and writes are done through Thingworx and all Thingworx data is encoded in DSE. Opcenter still allows to see connectes streams, datatables, valuestreams.
*SimpleStrategy can be used for a single data center, or NetworkTopologyStrategy is recommended for most deployments, because it is much easier to expand to multiple data centers when required by future expansion.
Is there a limit of data per node?
1 TB is a reasonable limit on how much data a single node can handle, but in reality, a node is not at all limited by the size of the data, only the rate of operations. A node might have only 80 GB of data on it, but if it's continuously hit with random reads and doesn't have a lot of RAM, it might not even be able to handle that number of requests at a reasonable rate. Similarly, a node might have 10 TB of data, but if it's rarely read from, or there is a small portion of data that is hot (so it could be effectively cached), it will do just fine. If the replication factor is above 1 and there is no reads at consistency level ALL, other replicas will be able to respond quickly to read requests, so there won't be a large difference in latency seen from a client perspective.