# Spark

![](https://i.imgur.com/6zYTLL8.png)

**Since Kubernetes v1.8, native support for Apache Spark applications has been available** [**(requiring Spark to support Kubernetes, e.g., v2.3)**](https://apache-spark-on-k8s.github.io/userdocs/running-on-kubernetes.html)**. You can submit Kubernetes tasks directly with the `spark-submit` command. Here's an example of computing Pi:**

```bash
bin/spark-submit \
  --deploy-mode cluster \
  --class org.apache.spark.examples.SparkPi \
  --master k8s://https://<k8s-apiserver-host>:<k8s-apiserver-port> \
  --kubernetes-namespace default \
  --conf spark.executor.instances=5 \
  --conf spark.app.name=spark-pi \
  --conf spark.kubernetes.driver.docker.image=kubespark/spark-driver:v2.2.0-kubernetes-0.4.0 \
  --conf spark.kubernetes.executor.docker.image=kubespark/spark-executor:v2.2.0-kubernetes-0.4.0 \
  local:///opt/spark/examples/jars/spark-examples_2.11-2.2.0-k8s-0.4.0.jar
```

**Or, the Python version:**

```bash
bin/spark-submit \
  --deploy-mode cluster \
  --master k8s://https://<k8s-apiserver-host>:<k8s-apiserver-port> \
  --kubernetes-namespace <k8s-namespace> \
  --conf spark.executor.instances=5 \
  --conf spark.app.name=spark-pi \
  --conf spark.kubernetes.driver.docker.image=kubespark/spark-driver-py:v2.2.0-kubernetes-0.4.0 \
  --conf spark.kubernetes.executor.docker.image=kubespark/spark-executor-py:v2.2.0-kubernetes-0.4.0 \
  --jars local:///opt/spark/examples/jars/spark-examples_2.11-2.2.0-k8s-0.4.0.jar \
  --py-files local:///opt/spark/examples/src/main/python/sort.py \
  local:///opt/spark/examples/src/main/python/pi.py 10
```

## Deploying Spark on Kubernetes

A detailed method for deploying Spark is provided on the [github Kubernetes examples](https://github.com/kubernetes/examples/tree/master/staging/spark). To simplify some steps for an easier installation, follow the instructions below.

### Deployment Prerequisites

* A Kubernetes cluster, refer to [Cluster Deployment](/en/setup/cluster.md)
* kube-dns functioning properly

### Creating a Namespace

namespace-spark-cluster.yaml

```yaml
apiVersion: v1
kind: Namespace
metadata:
  name: "spark-cluster"
  labels:
    name: "spark-cluster"
```

```bash
$ kubectl create -f examples/staging/spark/namespace-spark-cluster.yaml
```

For simplicity, we will not switch the kubectl context to spark-cluster. Instead, we will add the spark-cluster namespace to subsequent deployments.

### Deploying the Master Service

Create a replication controller to run the Spark Master service.

```yaml
kind: ReplicationController
apiVersion: v1
metadata:
  name: spark-master-controller
  namespace: spark-cluster
spec:
  replicas: 1
  selector:
    component: spark-master
  template:
    metadata:
      labels:
        component: spark-master
    spec:
      containers:
        - name: spark-master
          image: gcr.io/google_containers/spark:1.5.2_v1
          command: ["/start-master"]
          ports:
            - containerPort: 7077
            - containerPort: 8080
          resources:
            requests:
              cpu: 100m
```

```bash
$ kubectl create -f spark-master-controller.yaml
```

**Create the master service.**

spark-master-service.yaml

```yaml
kind: Service
apiVersion: v1
metadata:
  name: spark-master
  namespace: spark-cluster
spec:
  ports:
    - port: 7077
      targetPort: 7077
      name: spark
    - port: 8080
      targetPort: 8080
      name: http
  selector:
    component: spark-master
```

```bash
$ kubectl create -f spark-master-service.yaml
```

**Check if the Master is running properly.**

```bash
$ kubectl get pod -n spark-cluster
```

For observing our Spark cluster via the Spark-developed web UI, deploy [specialized proxy](https://github.com/aseigneurin/spark-ui-proxy).

Deploy Spark workers to ensure the Master is up and running. Also create the Zeppelin UI, which allows direct task execution on our cluster through a web notebook, seen at [Zeppelin UI](https://zeppelin.apache.org) and [Spark architecture](https://spark.apache.org/docs/latest/cluster-overview.html).

Once done, the Spark cluster is established.

### Common Issues with Zeppelin

* The Zeppelin image is quite large and takes some time to pull. Details at issue #17231.
* `kubectl port-forward` may be unstable on the GKE platform; restart as needed. See issue #12179 for reference.

## Reference Documents

* [Apache Spark on Kubernetes](https://apache-spark-on-k8s.github.io/userdocs/index.html)
* <https://github.com/kweisamx/spark-on-kubernetes>
* [Spark examples](https://github.com/kubernetes/examples/tree/master/staging/spark)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://kubernetes.feisky.xyz/en/practice/introduction/spark.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
