Tutorials

Guidebook for tutorials. Check this when you don't know what tutorial suits your demand.

🙋 I'm very new to Dataverse

: Introduces very basic, but core steps to use Dataverse.

ETL_01_how_to_run.ipynb
ETL_02_one_cycle.ipynb

🙋I want to use my custom function

: If you want to use your custom function, you have to register the function on Dataverse. These will guide you from register to apply it on pipeline.

ETL_03_create_new_etl_process.ipynb
ETL_04_add_new_etl_process.ipynb

🙋I need to test my ETL process with samples

: When you want to get test(sample) data to quickly test your ETL process, or need data from a certain point to test your ETL process

ETL_05_test_etl_process.ipynb

🙋 I want to run it on EMR cluster

Check AWS S3 Support for settings
ETL_06_scaleout_with_EMR.ipynb

🙋Is there any real-world dataset to use Dataverse?

: Shows how to use common crawl data.

EX_use_common_crawl_data.ipynb

🙋 I want to use Pyspark UI

: Helps you to use Pyspark UI to monitor the spark job in Docker environment.

EX_use_pyspark_ui.ipynb

PreviousQuickstart NextAWS Setting guides

Last updated 1 year ago