Installation
A step-by-step guide to setting up Dataverse for you
Last updated
A step-by-step guide to setting up Dataverse for you
Last updated
pip
:pip install dataverse
In order to use Dataverse, there are prerequisites you need to have: Python, Spark and Java. Right below, you can find guidelines for installing Apache Spark and JDK.
Python (version between 3.10 and 3.11)
JDK (version 11) & PySpark
Download pre-built version of Apache Hadoop and Spark 3 from Apache Spark. After that, extract the downloaded compressed file to your home directory using the following command:
You can simply download via HomeBrew by running brew install apache-spark
After you save ~/.bash_profile, you can check the variable is set properly with java-version
. Also, the specific path for JDK should be match with yours.
Download pre-built version of Apache Hadoop and Spark 3 from Apache Spark. Since this is .tgz file, download and install WinRAR if necessary.
After the extraction of the Spark archive, copy its contents into proper directory, for example C:\Spark
. There must be extracted contents directly under the folder.
Download proper winutils.exe from here and move the bin folder into C:\Hadoop
.
You must set new environment variables to use Java and Spark on Windows. Open Environment Variables via Windows menu.
JAVA_HOME
: {YOUR-JAVA-DIRECTORY}
ex. C:\JDK
and add %JAVA_HOME%\bin
into Path
variable.
SPARK_HOME
: {YOUR-SPARK-DIRECTORY}
ex. C:\Spark\spark-3.5.0-bin-hadoop3
and add %SPARK_HOME%\bin
into Path
variable.
HADOOP_HOME
: {YOUR-HADOOP-DIRECTORY}
ex. C:\Hadoop
and add %HADOOP_HOME%\bin
into Path
variable.
PYSPARK_PYTHON
: {YOUR-PYTHON-PATH}
ex. C:\anaconda3\python.exe
Note that above directories are example. You have to put the appropriate directory where each content is located.
sudo apt-get update
sudo apt-get install openjdk-11-jdk
echo "export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64" >> ~/.bashrc
source ~/.bashrc
source ~/.bash_profile
vi ~/.bash_profile
export SPARK_HOME={YOUR-SPARK-DIRECTORY}
export PATH=$PATH:$SPARK_HOME/bin
export PYSPARK_PYTHON={YOUR-PYTHON-DIRECTORY}
source ~/.bash_profile
pip install pyspark
echo "export SPARK_HOME=$(pip show pyspark | grep Location | awk '{print $2 "/pyspark"}')" >> ~/.bashrc
echo "export PYSPARK_PYTHON=python3" >> ~/.bashrc
source ~/.bashrc
tar -zxvf {YOUR-DOWNLOADED-SPARK-FILE}
cd {YOUR-JAVA-DIRECTORY}
vi ~/.bash_profile
export JAVA_HOME={YOUR-JAVA-DIRECTORY}
export PATH=$PATH:$JAVA_HOME/bin