Dataverse Docs
Getting started
  • Dataverse
  • 🏃LET'S START!
    • Installation
    • Quickstart
    • Tutorials
    • AWS Setting guides
  • 📃Documents
    • Modules
    • API Reference
    • FAQs
  • 🌌Portal
    • arXiv
    • Discord
    • GitHub
    • GitHub Issues
    • GitHub Discussions
Powered by GitBook
On this page
  • I'm very new to Dataverse
  • I want to use my custom function
  • I need to test my ETL process with samples
  • I want to run it on EMR cluster
  • Is there any real-world dataset to use Dataverse?
  • I want to use Pyspark UI
  1. LET'S START!

Tutorials

Guidebook for tutorials. Check this when you don't know what tutorial suits your demand.

PreviousQuickstartNextAWS Setting guides

Last updated 1 year ago

I'm very new to Dataverse

: Introduces very basic, but core steps to use Dataverse.

I want to use my custom function

: If you want to use your custom function, you have to register the function on Dataverse. These will guide you from register to apply it on pipeline.

I need to test my ETL process with samples

: When you want to get test(sample) data to quickly test your ETL process, or need data from a certain point to test your ETL process

I want to run it on EMR cluster

Is there any real-world dataset to use Dataverse?

: Shows how to use common crawl data.

: Helps you to use Pyspark UI to monitor the spark job in Docker environment.

I want to use Pyspark UI

🏃
🙋
🙋
🙋
🙋
🙋
🙋
ETL_01_how_to_run.ipynb
ETL_02_one_cycle.ipynb
ETL_03_create_new_etl_process.ipynb
ETL_04_add_new_etl_process.ipynb
ETL_05_test_etl_process.ipynb
Check AWS S3 Support for settings
ETL_06_scaleout_with_EMR.ipynb
EX_use_common_crawl_data.ipynb
EX_use_pyspark_ui.ipynb