Adding a Discovery Source
If you do not have information about the file format that you are using as a source, or need help filling options for a source, you can use the Discovery source to determine information about a file's format. When you press the Discover format button, this parser reads a sample of the file and it will change the selected format and fill in the rest of the form with what it discovered.
Currently, the Discovery parser can identify CSV, XML, and JSON files.
After Discovery runs, it should fill in a Format type, as well as any other parameters. In the case below, Discovery has filled in Column Separator, because it identified the file as a CSV. Discovery lists all columns it finds, and attempts to identify column types.
The Sample Bytes field determines how many bytes Discovery reads before analyzing the input. This number can greatly affect Discovery's performance. If you set it too high and your data is coming in too slowly, you won't see any response from Discovery until it has read these bytes. If you set it too low and it's smaller than the size of a record in your input, Discovery will have difficulty determining your file's format. A good rule of thumb is to set Sample Bytes to about 5X the size of a record in your input, so that Discovery sees multiple records and can make a better guess as to the data types of the columns it finds. For example, if each record is 80 bytes, it would make sense to set Sample Bytes at 4096.
Note: In this release, Discovery is an experimental feature, and will generally not work on any kind of complicated file. As a result, you may need to fill in some of the columns and other information by hand.