The Metadata for the 8.1 release has been updated from that of the previous releases as part of the architecture update that is new in 8.1. The goal of this blog post is to inform you of these changes to the metadata, and where you can find more information on both the new metadata format, as well as all the other changes that are new for the 8.1 release.
The image below provides an excerpt of what the metadata file itself will look like in practice.
The table below provides a definition for what each field in the metadata is and how it should be populated in your metadata file.
The exact name of the field as it appears in the data file.
A list of the acceptable values for the field.
For Ordinal opTypes, the values must be presented in the correct order.
Required if the opType is Ordinal
Optional for Categorical opType
Do not use for Boolean and Continuous
For a Continuous field, defines the minimum and maximum values the field can accept. For informational purposes
Describes what type of data the field contains. Options include: Long, Integer, Short, Byte, Double, Boolean, String, Other.
Select the most accurate dataType. Selecting the String dataType for numeric data can lead to undesirable results.
Describes how the data in the field can be used. Options include: Categorical, Boolean, Ordinal, Continuous, Informational, Temporal, Entity_ID
An integer representing the time between observations in a temporal field.
Required if the opType is Temporal
Do not use for other opTypes
A flag indicating whether or not the value in a temporal field can change over time. Marking a field as static reduces training time by removing redundant data points for fields that do not change.
Things to Remember
Remember that the Metadata file that you create will need to match the data file that you have; furthermore, all of the columns that you have in your dataset will need to be represented in the metadata file.
The metadata file needs to be a JSON file.
Setting the opType parameter incorrectly can have a severe impact on system performance. For example, setting a numerical field that has thousands of different values as categorical instead of continuous will cause the system to handle each value as an independent category, instead of just a number, which will result in significantly longer processing time.
For more information on all the other changes that are new in the 8.1 release please follow this link for the complete reference document.
Feel free to use the blank example metadata file attached to this post to help you get started on your own.