Modules

Find the module that fits your needs!

ETL Modules

Currently, about 50 functions are registered as the ETL process, which means they are eagerly awaiting your use!

By clicking module name, you can directly go to corresponding API reference.

TypePackage/ModulesDescription

Extract

Loading data from any source to the preferred format

Transform

bias

(WIP) Reduce skewed or prejudiced data, particularly data that reinforce stereotypes.

Remove irrelevant, redundant, or noisy information, such as stop words or special characters.

decontamination

(WIP) Remove contaminated data including benchmark.

Remove duplicated data, targeting not only identical matches but also similar data.

PII stands for Personally Identifiable Information. Removing sensitive information from data.

Improving the data quality, in the perspective of accuracy, consistency, and reliability of data.

toxicity

(WIP) Removing harmful, offensive, or inappropriate content within the data.

Load

Saving the processed data to a preferred source like data lake, database, etc.

Utils

Essential tools for data processing, including sampling, logging, statistics, etc.

Pipeline

Executing the given ETL configuration.

Register

Register custom functions to be used in Dataverse.

Last updated