The way we do it at my company is this:
We have a service called overseer that runs on an instance in the VPC for our environment. That scripts adjusts ASG target numbers and LB targets.
We have ASGs for frontend and backend everything, then for each of those, we have A and B. (blue and green)
A new environment will see A with our target of 3-5 (depending on prod or dev) workers on the ASG. When the worker instances spin up, they ping the overseer that tells them to pull in the latest docker containers and spin up the services for that specific environment.
When we switch, we just tell overseer to crank up the B ASGs. When they are up, (and updated) overseer changes the ALB targets and soft-kills the A workers. We have a large environment, so the whole process takes about 30 minutes, but we really only have approx 15 seconds of downtime, so it’s pretty much imperceptible to end users.
I would love to share code and whatnot, but I’m not sure my NDA would allow that.
Feel free to pick my brain though.