Technical manual
...
iR Engine on AWS
02_EKS
8 min
create eks cluster with four nodegroups you first need to set up an eks cluster for ir engine to run on while this can be done via aws' web interface, the eksctl cli will automatically provision more of the services you need automatically, and is thus recommended first, follow these instructions https //docs aws amazon com/eks/latest/userguide/getting started eksctl html for setting up aws cli, eksctl, and configuring aws cli with your aws credentials you should also set up kubectl and helm, as we will be using that to install multiple codebases from charts next run the following command eksctl create cluster name \<name> version \<version> region \<region> managed nodegroup name \<name> node type \<instance type> nodes \<target node number> nodes min \<minimum node number> nodes max \<maximum node number> spot this will create an eks cluster with a managed nodegroup in the specified region, including automatically creating subnets, making a vpc, and more it may take up to 15 minutes to complete you can also use the flag zones \<zone1>,\<zone2> to specify which availability zones the cluster should set up in some regions have zones that are unavailable, but which eksctl will try to use if zones is not specified, leading to the setup to fail as an example, us west 1 (as of this writing) does not have any resources available in us west 1b; if you are setting up in us west 1, you would want to use zones us west 1a,us west 1c note that the region matters for almost all services in aws the default region is 'us east 1', but if you make the cluster in any other region, you'll need to make sure you're creating certs, dns records, etc in the same region as of this writing, the api and client are configured to run on a nodegroup named 'ng 1' if you name it something else, be sure to change the nodeaffinity in the configuration file this is one of four nodegroups that will be created for various services to run on make sure to increase the maximum node limit, as by default target, minimum, and maximum are set to 2, and ir engine's setup will definitely need more than two nodes if you've configured them to use relatively small instance types such as t3a medium enable ebs csi addon (if eks version is 1 23 or later) follow the instructions here https //docs aws amazon com/eks/latest/userguide/managing ebs csi html to enable an eks addon that's required for any cluster that will have persistent volumes, which an ir engine deployment cluster will install cluster autoscaler (optional) while not necessary, it can be useful to have an autoscaler installed in the cluster to increase the number of nodes available for pods when the cluster has high traffic and to decrease that number when it has low traffic follow these instructions https //docs aws amazon com/eks/latest/userguide/autoscaling html#cluster autoscaler to set up the autoscaler any managed nodegroups created in the following steps should by default be tagged such that the autoscaler can control them, so no further action should be required note that there is some lag time on scaling up and down it generally takes about 5 minutes from the time that the autoscaler sees the need to add more nodes before those nodes have been spun up, the appropriate docker image has been installed onto them, and they are ready to be used it takes about 15 minutes for the autoscaler to actually remove nodes that are deemed superfluous, as a hedge against the recent high traffic picking up again the oidc provider that was created in the prior step, installing the ebs csi addon, can be re used in this step create launch template go to ec2 > launch templates and make a new one name it something like 'ir engine production instanceserver' most settings can be left as is, except for the following storage > add a volume, set the size to 20gb, and for device name select '/dev/xvda' network interfaces > add one, and under 'auto assign public ip' select 'enable' create nodegroup for instanceservers go to the aws website, then go to eks > clusters > click on the cluster you just made > configuration > compute you should see one managed nodegroup already there; clicking on its name will open up information and editing, though you can't change the instance type after it's been made back at the compute tab, click on add node group pick a name (something like ng instanceservers 1 is recommended), select the iam role that was created with the cluster (it should be something like eksctl \<cluster name> node nodeinstancerole \<jumble of characters> ), toggle the use launch template toggle and select the launch template you made in the previous step, then click next on the second page, choose the instance type(s) you'd like for the group, set the minimum/maximum/desired scaling sizes, and hit next (t3(a) smalls are recommended) there may be connection issues with instanceserver instances in private subnets, so remove all of the private subnets from the list of subnets to use, and make sure that public subnets are being used (sometimes the workflow only selects private subnets by default) hit next, review everything, and click create create nodegroup for redis redis should get its own nodegroup to isolate it from any other changes that might be made to your cluster as with the instanceserver nodegroup, it's not strictly necessary, but can prevent various other things from going down due to the redis servers getting interrupted back at the compute tab, click on add node group pick a name (the default config in ir engine ops https //github com/ir engine/ir engine ops/blob/master/configs/local minikube template values yaml assumes a name of 'ng redis 1'), select the iam role that was created with the cluster (it should be something like eksctl \<cluster name> node nodeinstancerole \<jumble of characters> ), toggle the use launch template toggle and select the launch template used to make the initial nodegroup, then click next on the second page, choose the instance type(s) you'd like for the group, set the minimum/maximum/desired scaling sizes (you can probably get away with a single t3(a) small, but it's recommended to have at least two nodes so that one going down doesn't kill the entire deployment from a lack of redis), and hit next the default subnets should be fine, so hit next, review everything, and click create create nodegroup for builder the full ir engine stack needs a builder server within the cluster in order to bundle and build ir engine projects into the codebase that will be deployed this should run on its own nodegroup that has a single node only one copy of the builder should ever be running at a time, and due to the high memory needs of building the client service, a box with >8 gb of ram is needed back at the compute tab, click on add node group pick a name (something like ng dev builder 1 is recommended) and select the iam role that was created with the cluster (it should be something like eksctl \<cluster name> node nodeinstancerole \<jumble of characters> ) you don't need to use any launch template for this nodegroup click next on the second page, you can change the capacity type to spot if you want to in order to save money; the builder service will likely not be running very often or for too long, so the odds of it getting interrupted by spot instance outages are low, and it can always re build if that does happen set the disk size to 50 gb; it takes a good deal of disk space to install and build the ir engine codebase, and the default 20 gb will almost certainly not be enough for instance types, you need to only select types that have more than 8 gb; t3a xlarge are the cheapest that fit this criteria if you were to pick something with 8gb, it's highly likely that most builds would crash the node, as kubernetes tends to restart nodes if they get anywhere near memory capacity under node group scaling configuration, set all three nodes values to 1 we only want a single copy of the builder at any given time, and running multiple powerful boxes can get pricey click next you can leave the subnets on the next page alone and click next on the last page, click create