So now let’s talk about a service that is named after what it does it is AWS Batch. So batch is a fully managed batch processing service that can allow you to do batch processing at any scale. And with the batch service, you can efficiently run hundreds of thousands of computing batch jobs on AWS very easily.
So what is a batch job?
Well, a batch job is a job that has a start and an end. And that is opposed to say, a continuous or a streaming job that really doesn’t ever end it’s always running. But a batch job say, for example, starts at 1 a.m. and finishes at 3 a.m.
So a batch job has a point of time when it happens and so the batch service will dynamically launch EC2 instances or Spot Instances to accommodate with the load that you have to run these batch jobs. So batch will provision the right amount of compute and memory for you to deal with your batch queue. And you just submit or scheduled batch jobs into the batch queue and the batch service does the rest.
Now how do you define a batch job?
Well, it is simply a Docker image and a test definition that you run on the ECS service. So this is pretty much saying that anything that can run on ECS can run on batch. And this is going to be very helpful to use batch to run these batch jobs.
And because it automatically scales the right number of ECS2 instances or Spot Instances, to do these jobs, then you get lots of cost optimizations and you focus a lot less on the infrastructure, you just focus on your batch jobs.
Simplified Example
So for example, say we wanted to process images submitted by users into Amazon S3 in a batch way. So image will be put into Amazon S3, and this will trigger a batch job.
And so batch will automatically have an ECS cluster made of EC2 instances, or Spot Instances and batch would make sure that you have the right amount of instances to accommodate the load of batch jobs you have in the batch queue. And then these instances will be running your Docker images that will be doing your job. And then maybe that job will be to insert the processed object. Maybe it’s a filter on top of the image into another Amazon S3 buckets.
Batch vs Lambda
So the question you may have is what is the difference between batch and Lambda because they look similar?
So Lambda has a time limit, it’s 15 minutes, and you only get access to a few programming languages. On top of it, you have limited temporary disk space if you want to run your jobs, and it’s going to be serverless, whereas batch is very different. So batch has no time limit, because it relies on EC2 instances. It’s any runtime that you want as long as you package it as a Docker image.
And for storage, you rely on the storage that comes with an EC2 instance. So it could be an EBS volume, or an EC2 instance store for disk space, which can be a lot more than for Lambda functions. And then finally, batch is not a serverless service.
It’s a managed service, but it relies on actual EC2 instances being created. But these EC2 instances are managed by AWS so we don’t have to worry about the auto scaling and so on.