Spark

Since Kubernetes v1.8, native support for Apache Spark applications has been available (requiring Spark to support Kubernetes, e.g., v2.3). You can submit Kubernetes tasks directly with the spark-submit
command. Here's an example of computing Pi:
bin/spark-submit \
--deploy-mode cluster \
--class org.apache.spark.examples.SparkPi \
--master k8s://https://<k8s-apiserver-host>:<k8s-apiserver-port> \
--kubernetes-namespace default \
--conf spark.executor.instances=5 \
--conf spark.app.name=spark-pi \
--conf spark.kubernetes.driver.docker.image=kubespark/spark-driver:v2.2.0-kubernetes-0.4.0 \
--conf spark.kubernetes.executor.docker.image=kubespark/spark-executor:v2.2.0-kubernetes-0.4.0 \
local:///opt/spark/examples/jars/spark-examples_2.11-2.2.0-k8s-0.4.0.jar
Or, the Python version:
bin/spark-submit \
--deploy-mode cluster \
--master k8s://https://<k8s-apiserver-host>:<k8s-apiserver-port> \
--kubernetes-namespace <k8s-namespace> \
--conf spark.executor.instances=5 \
--conf spark.app.name=spark-pi \
--conf spark.kubernetes.driver.docker.image=kubespark/spark-driver-py:v2.2.0-kubernetes-0.4.0 \
--conf spark.kubernetes.executor.docker.image=kubespark/spark-executor-py:v2.2.0-kubernetes-0.4.0 \
--jars local:///opt/spark/examples/jars/spark-examples_2.11-2.2.0-k8s-0.4.0.jar \
--py-files local:///opt/spark/examples/src/main/python/sort.py \
local:///opt/spark/examples/src/main/python/pi.py 10
Deploying Spark on Kubernetes
A detailed method for deploying Spark is provided on the github Kubernetes examples. To simplify some steps for an easier installation, follow the instructions below.
Deployment Prerequisites
A Kubernetes cluster, refer to Cluster Deployment
kube-dns functioning properly
Creating a Namespace
namespace-spark-cluster.yaml
apiVersion: v1
kind: Namespace
metadata:
name: "spark-cluster"
labels:
name: "spark-cluster"
$ kubectl create -f examples/staging/spark/namespace-spark-cluster.yaml
For simplicity, we will not switch the kubectl context to spark-cluster. Instead, we will add the spark-cluster namespace to subsequent deployments.
Deploying the Master Service
Create a replication controller to run the Spark Master service.
kind: ReplicationController
apiVersion: v1
metadata:
name: spark-master-controller
namespace: spark-cluster
spec:
replicas: 1
selector:
component: spark-master
template:
metadata:
labels:
component: spark-master
spec:
containers:
- name: spark-master
image: gcr.io/google_containers/spark:1.5.2_v1
command: ["/start-master"]
ports:
- containerPort: 7077
- containerPort: 8080
resources:
requests:
cpu: 100m
$ kubectl create -f spark-master-controller.yaml
Create the master service.
spark-master-service.yaml
kind: Service
apiVersion: v1
metadata:
name: spark-master
namespace: spark-cluster
spec:
ports:
- port: 7077
targetPort: 7077
name: spark
- port: 8080
targetPort: 8080
name: http
selector:
component: spark-master
$ kubectl create -f spark-master-service.yaml
Check if the Master is running properly.
$ kubectl get pod -n spark-cluster
For observing our Spark cluster via the Spark-developed web UI, deploy specialized proxy.
Deploy Spark workers to ensure the Master is up and running. Also create the Zeppelin UI, which allows direct task execution on our cluster through a web notebook, seen at Zeppelin UI and Spark architecture.
Once done, the Spark cluster is established.
Common Issues with Zeppelin
The Zeppelin image is quite large and takes some time to pull. Details at issue #17231.
kubectl port-forward
may be unstable on the GKE platform; restart as needed. See issue #12179 for reference.
Reference Documents
最后更新于