Modules
Find the module that fits your needs!
ETL Modules
Currently, about 50 functions are registered as the ETL process, which means they are eagerly awaiting your use!
By clicking module name, you can directly go to corresponding API reference.
Type | Package/Modules | Description |
---|---|---|
Extract | Loading data from any source to the preferred format | |
Transform | bias | (WIP) Reduce skewed or prejudiced data, particularly data that reinforce stereotypes. |
Remove irrelevant, redundant, or noisy information, such as stop words or special characters. | ||
decontamination | (WIP) Remove contaminated data including benchmark. | |
Remove duplicated data, targeting not only identical matches but also similar data. | ||
PII stands for Personally Identifiable Information. Removing sensitive information from data. | ||
Improving the data quality, in the perspective of accuracy, consistency, and reliability of data. | ||
toxicity | (WIP) Removing harmful, offensive, or inappropriate content within the data. | |
Load | Saving the processed data to a preferred source like data lake, database, etc. | |
Utils | Essential tools for data processing, including sampling, logging, statistics, etc. | |
Pipeline | Executing the given ETL configuration. | |
Register | Register custom functions to be used in Dataverse. |
Last updated