3 Replies Latest reply on Oct 31, 2015 11:40 AM by carlesc RSS
    apineda Explorer

    Decimating large queries in streams

    I have a very large data stream where a typical query of 1 week of data results in tens of thousands of entries. At first, I had an issue with the chart not displaying all the data within the date range but I remedied this by setting a high value in maxItems of the QueryStreamEntriesWithData service. Unfortunately, this results in slow-downs in the mashup.

     

    I am wondering if there is a way to query entries in such a way that I can set some maxItems value and will ignore certain entries but will retain a meaningful display of data of that date range.

     

    For example, if I had 50,000 entries within the date range of 10 days and a maxItems at 500, the query will skip 100th entry.

     

    Thanks in advance!

      • Re: Decimating large queries in streams
        carlesc Ninja

        I think you will have to aggregate data on a separate Stream.

        • Re: Decimating large queries in streams
          ckulak Apprentice

          Hello Alister,

           

          First of all, avoid saving redundant data in the first place. This can be controlled to some extent by specifying thresholds for "Data change type: Value" on the property.

           

          Then, if you still need to query every n-th entry or aggregate your data somehow, consider those ideas:

           

          1. Aggregate the data:
            1. On demand, when the user first runs a query service / opens a mashup
            2. On schedule (it's easy to do in TWX)
            3. As the data streams in (use a service to set the properties and inside that service compute some running average over the last N rows, etc.)
          2. Put some flag like "odd/even" on your data and query only for the entries which are "odd". Or if you store some hh:mm:ss on your thing, then you can query only those entries with ss = 0 and skip the rest.

           

          I hope it makes sense. In general, try to think of how you'd do it in SQL -- the concept is similar.

           

          / Constantine

          • Re: Decimating large queries in streams
            carlesc Ninja

            For the number 2. that Constantine proposed, you also can use Tags in order to filter data.