Documentation: Data Set Import

To create, edit, delete, schedule, or execute a data set import, go to Admin→Data Set Imports. Ada currently provides six data import adapters, three file-based: CSV, JSON, and tranSMART; and three RESTful-based: REDCap, Synapse, and eGaIT.

To create a new data set import, select a desired type in a drop down located on the right side.

If there is a data set import similar to the one you want to create, click located on the right side and select the source import in an autocomplete textbox. Then edit it accordingly.

To execute a data set import click at the associated row in the list table. Note that you do not need to wait for the import to fully proceed and meanwhile (as it is being executed) you can continue working at different sections of Ada. Depending on the data set size an import might take several seconds to minutes to finish.

Note

Data sets can be imported only by admins! If you have data that you believe should be imported to (a specific instance of) Ada, contact your admin.

Data Set Info

This panel specifies the data set's identity info:

Data space name usually corresponds to the study or project name and is manifested as a navigation tree node where a data set will be imported to.
Data set name is the display name of a data set (can be changed later).
Data set id is the unique data set identifier, which is automatically generated as <data_space_name>.<data_set_name> but can be overridden if needed. This identifier cannot be changed once a data set is imported.

Setting

The technical setting of a data set containing, e.g., key field, default distribution field, and filter show style. Note that this can (and most likely should) be changed after an import, so it is sufficient to specify only Storage Type (defaults to Elastic Search).

Schedule

Schedule defined by hour, minute, and second of the day, when a data set import should be periodically executed. In the schedule example of the right an import is set to be executed every day at 1am. Note that if scheduling is desired Yes must be checked (defaults to No).

CSV
JSON
tranSMART
REDCap
Synapse
eGAIT

CSV

One of the most common file-based import types for a csv file specified as

Source is either local (uploaded by an admin through the browser) or server-stored, in which case a path needs to be provided. Note that once a data set import is created a local file (if specified) is uploaded to the Ada server and the type is switched automatically to server-side.
Delimiter defaults to comma (,). For tab-delimited files enter\t as shown in the example.
EOL
Charset Name
Match Quotes
Infer Field Types must be checked if field types are supposed to be inferred from the column values. Warning: if unchecked ALL fields/columns are considered to be Strings.
Inference Max Enum Values Count defines the maximal number of distinct string values for which field enum type should be inferred. Defaults to 20.
Inference Min Avg Values per Enum defines the minimal allowed count per each distinct string value for which field enum type should be inferred. Defaults to 1.5.
Array Delimiter
Boolean Include Numbers says (if checked) that fields/columns containing solely 0 and 1 numeric values will be inferred as Booleans.
Save Batch Size

JSON

File-based import for a json file specified as

Source
Charset Name
Infer Field Types
Inference Max Enum Values Count
Inference Min Avg Values per Enum
Array Delimiter
Boolean Include Numbers
Save Batch Size

tranSMART

File-based import for tranSMART data and mapping files specified as

Data Source
Mapping Source
Charset Name
Match Quotes
Infer Field Types
Inference Max Enum Values Count
Inference Min Avg Values per Enum
Array Delimiter
Boolean Include Numbers
Save Batch Size

REDCap

RESTful-based import for a REDCap data capture system specified as

URL
Token
Import Dictionary?
Event Names
Categories To Inherit From First Visit
Save Batch Size

Synapse

RESTful-based import for Synapse, Sage Bionetworks's data provenance system, specified as

Table Id
Batch Size
Download Column Files?
Bulk Download Group Number

eGaIT

RESTful-based import for an eGaIT server storing shoe-sensor data specified as

Import Raw Data

Note

Due to the fact that eGaIT company does not exist anymore, this data set import adapter will be soon dereleased.

Ada

Introduction

Overview

Basics

Analytics

Statistics

Filters

Views

Machine Learning

Overview

Classification

Regression

Clustering

Administration

Data Set Import

User Management

Other

Technology