Spark

Since Kubernetes v1.8, native support for Apache Spark applications has been available (requiring Spark to support Kubernetes, e.g., v2.3). You can submit Kubernetes tasks directly with the spark-submit command. Here's an example of computing Pi:

bin/spark-submit \
  --deploy-mode cluster \
  --class org.apache.spark.examples.SparkPi \
  --master k8s://https://<k8s-apiserver-host>:<k8s-apiserver-port> \
  --kubernetes-namespace default \
  --conf spark.executor.instances=5 \
  --conf spark.app.name=spark-pi \
  --conf spark.kubernetes.driver.docker.image=kubespark/spark-driver:v2.2.0-kubernetes-0.4.0 \
  --conf spark.kubernetes.executor.docker.image=kubespark/spark-executor:v2.2.0-kubernetes-0.4.0 \
  local:///opt/spark/examples/jars/spark-examples_2.11-2.2.0-k8s-0.4.0.jar

Or, the Python version:

Deploying Spark on Kubernetes

A detailed method for deploying Spark is provided on the github Kubernetes examples. To simplify some steps for an easier installation, follow the instructions below.

Deployment Prerequisites

Creating a Namespace

namespace-spark-cluster.yaml

For simplicity, we will not switch the kubectl context to spark-cluster. Instead, we will add the spark-cluster namespace to subsequent deployments.

Deploying the Master Service

Create a replication controller to run the Spark Master service.

Create the master service.

spark-master-service.yaml

Check if the Master is running properly.

For observing our Spark cluster via the Spark-developed web UI, deploy specialized proxy.

Deploy Spark workers to ensure the Master is up and running. Also create the Zeppelin UI, which allows direct task execution on our cluster through a web notebook, seen at Zeppelin UI and Spark architecture.

Once done, the Spark cluster is established.

Common Issues with Zeppelin

  • The Zeppelin image is quite large and takes some time to pull. Details at issue #17231.

  • kubectl port-forward may be unstable on the GKE platform; restart as needed. See issue #12179 for reference.

Reference Documents

最后更新于