The importance of data preparation has grown exponentially with the rise of AI. Data comes in many forms and formats, including homegrown applications, SQL databases, files, sensors, video, and physics-driven analog data. Traditionally, data cleansing is defined as detecting and correcting (or removing) corrupt or inaccurate records from a dataset, table, or database. The data challenge presented is identifying the data's incomplete, incorrect, inaccurate, or irrelevant parts and then replacing, modifying, or deleting the dirty or coarse data.
In recent years, sensor networks have gained wide popularity in various application scenarios, ranging from monitoring applications in manufacturing production lines to more sophisticated sensor deployments in research and development scenarios such as autonomous driving in the automotive industry. The metadata gathered during the generation of the massive amount of sensor data sets plays a more important role because it provides key attributes and information so that the big data set can be strategically managed and prepared for analysis.
Metadata is the data about data and can be regarded as the properties of the data. Once the data has been acquired, the associated metadata becomes equally important. In general, it is common to see these types of metadata after the data acquisition process.
Viviota’s Time-to-Insight (TTI) software is based on NI’s DataFinder technology, which is an indexing service that parses any custom file format for descriptive information (metadata) and creates a database of the descriptive information within the target data files. This database is automatically updated when a valid data file is created, deleted, or edited. Once the metadata is indexed, with the help of DataPlugins, which map custom file formats onto the TDM model, the DataFinder search looks at all of the metadata at the file, channel group, and channel level based on user-specified search criteria.
In order for the the TTI software to rapidly and efficiently find the needed data sets for analysis, the TTI workflow goes through a module dedicated to the data cleansing tasks. The tasks typically include:
TTI Analytics Studio software shortens the time-consuming tasks in test data management that once took days to now happen in seconds, which improves efficiency and reduces the product time-to-market significantly.