2 Replies Latest reply on Jun 2, 2016 11:16 AM by wposner RSS
    mgioia Newbie

    From external dataset to Neuron

    Hi, could you summarize the process to make an external dataset (eg. csv or excel, or sql..) available in Neuron in order to be analyzed?

    Kind regards

      • Re: From external dataset to Neuron
        aminec Apprentice

        Hi Marco,

         

        In order to have an external dataset  ready for the Machine Learning Engine Neuron to be analyzed there needs to be what is called Data Preparation :

         

        This consisits mainly of the following:

         

         

        • Formatting: The data you have selected may not be in a format that is suitable for Neuron to work with. When using a CSV file as the Dataset please use the  "," as seperator. as in the Screenshot of the CSV  Dataset Sample below :

        Dataset-Sample.jpg


         

        • Cleaning : Cleaning data is the removal or fixing of missing Data(DAta with Null values) . There may be data instances that are incomplete and do not carry the data you believe you need to address the problem. These instances may need to be removed

         

         

        • Decomposition: There may be features that represent a complex concept that may be more useful to a machine learning method when split into the constituent parts. An example is a date that may have day and time components that in turn could be split out further. Perhaps only the hour of day is relevant to the problem being solved. consider what feature decompositions you can perform.

         

        • Aggregation: There may be features that can be aggregated into a single feature that would be more meaningful to the problem you are trying to solve. For example, there may be a data instances for each time a customer logged into a system that could be aggregated into a count for the number of logins allowing the additional instances to be discarded. Consider what type of feature aggregations you could perform.

         

         

        I hope this helps,

         

        Best Regards,

        Amine

         

         

         

         


        • Re: From external dataset to Neuron
          wposner Newbie

          Hi Marco...

           

          Not to knock the previous response to your question, but answers like this really drive me nuts!!!  You ask a straightforward question and get an answer that better relates to theory and/or concept, as opposed to something that is actually immediately useful.

           

          So, to answer your question:

           

          You first need to create your dataset.  This is the equivalent of naming your database.  You then need to configure your dataset.  This is basically the process of defining what your data will look like.  Next you can actually load your data.  While the Neuron plugin has some of these services, it's easier and faster to do these steps using something like Postman.

           

          Here are some screenshots that might help.  Note that any text between {{ }} would be replaced by you with applicable values unless you've configured a PostMan environment

           

          Create the dataset--Headers (the headers will almost always be the same for all the post commands):

           

          Screen Shot 2016-06-02 at 8.03.43 AM.png

           

          Create the dataset--body:

          Screen Shot 2016-06-02 at 8.04.34 AM.png

           

          Configure the dataset--body:

          Screen Shot 2016-06-02 at 8.05.56 AM.png

          The body is basically a big JSON object which contains all the metadata about your data.  Amine's answer gives some insight how to optimize your data which would drive what your data configuration looks like.

           

          Load Dataset Data--header:

          Screen Shot 2016-06-02 at 8.08.50 AM.png

           

          Load Dataset Data--Body:

          Screen Shot 2016-06-02 at 8.09.32 AM.png

           

          Here you would just choose your csv file and submit it to ColdLight.  Here is the header in my CSV file that I'm using for my current project:

           

          identifier,entry_dt,Latitude,Longitude,LastMaintained,MaintenanceOrg,MaintenanceTech,WindSpeed,Temperature,Humidity,AirContamination,SeismicReading,FrictionHarmonics,Voltage,Lubrication,GearboxModel,GearboxOEM,GearboxAge,BladeModel,BladeOEM,BladeAge,GeneratorModel,GeneratorOEM,GeneratorAge,GearboxFailure,BladeFailure,GeneratorFailure,AccelerationMagnitude,RPM

           

          You can see from my dataset config body screenshot how some of these fields are defined.

           

           

          Once you've got your data loaded you can now begin the process of submitting jobs for profiles, signals, and scoring.

           

          Hopefully this will get you on the right path to successfully getting data loaded in ColdLight.