Vishal desai’s Oracle Blog

May 8, 2019

On-Demand Data science JupyterHub environment using AWS ServiceCatalog

Filed under: AWS, Data Science — vishaldesai @ 5:13 pm

Overview

In previous blog, I demonstrated how to create web based Data Science environment using JupyterHub on Elastic container services. There has to minimum of one task running so if environment is not continuously used customer still has to pay for one task per level of authorization that customer needs. There are different ML stages as shown below in diagram and customer wants to use 1 and 3 stage/tool pattern for their user cases. It will not be economical to use GPU based instances on a persistent basis with web based architecture so customer wants hybrid of web based and on demand environment. In this blog, I will demonstrate how to data scientists can request such on-demand environment using Service Catalog.

image

Architecture

image

0a – Create temporary ubuntu 18.04 EC2 instance and install Anaconda, R, python etc. Create AMI image from EC2 and terminate that temporary EC2 instance.

1a – Create IAM role with policy that will allow read access to PII data in S3 data lake. This role will be used by EC2 instances created when on-demand environment is requested by data scientists.

1b – Create cloud formation template using IAM role and AMI image. In this template only CPU based EC2 instances are allowed for data exploratory type of tasks.

1c – In Service Catalog, create Product using cloud formation template.

1d – In Service Catalog, create PII portfolio and add product to this portfolio.

1e – Create cloud formation template using IAM role and AMI image. In this template only GPU based EC2 instances are allowed for data exploratory, create model, train and evaluate model.

1f – In Service Catalog, create Product using cloud formation template.

1g – Add product to existing PII portfolio.

1i – Add IAM users that will work on PII data to PII portfolio.

2a to 2i – Follow similar steps as above with mapping to nonPII IAM role.

1h – Users can launch product using products assigned to them.

1j – Once the product is launched, users can access JupyterHub environment.

Implementation

0a. Create AMI

Launch any t2 EC2 instance using ubuntu 18.04, login to ec2 instance and run following commands. Once the packages are installed, reboot ec2 instance and create AMI image from it. After image is created terminate EC2 instance.

# Ubuntu updates
sudo apt-get update -y
sudo apt-get dist-upgrade -y
sudo apt-get autoremove -y
sudo apt-get autoclean -y 

# Install Anaconda
sudo curl -O https://repo.anaconda.com/archive/Anaconda3-5.2.0-Linux-x86_64.sh
sudo sh ./Anaconda3-5.2.0-Linux-x86_64.sh -b -p /home/ubuntu/anaconda3

# Install NFS client
sudo apt update
sudo apt-get install nfs-common -y

# Install R pre-requisites
sudo apt-get install -y --no-install-recommends \
fonts-dejavu \
unixodbc \
unixodbc-dev \
r-cran-rodbc \
gfortran \
gcc && \
rm -rf /var/lib/apt/lists/*


# Fix for devtools https://github.com/conda-forge/r-devtools-feedstock/issues/4
sudo ln -s /bin/tar /bin/gtar

# Install Conda, Jupyterhub and R packages
sudo su -
export PATH=$PATH:/home/ubuntu/anaconda3/bin
conda update -n base conda -y
conda create --name jupyter python=3.6 -y
source activate jupyter
conda install -c conda-forge jupyterhub -y
conda install -c conda-forge jupyter notebook -y
conda install -c r r-IRkernel -y
conda install -c r rstudio -y
conda install -c r/label/borked rstudio -y
conda install -c r r-devtools -y
conda install -c r r-ggplot2 r-dplyr -y
conda install -c plotly plotly -y
conda install -c plotly/label/test plotly -y
conda update curl -y
conda install -c bioconda bcftools -y
conda install -c bioconda/label/cf201901 bcftools -y
conda install -c anaconda boto3 -y
pip install boto3
R -e "devtools::install_github('IRkernel/IRkernel')"
R -e "IRkernel::installspec(user = FALSE)"

# Install Jupyterhub
#sudo python3 -m pip install jupyterhub

# Create Config file
mkdir /srv/jupyterhub
echo "c = get_config()" >> /srv/jupyterhub/jupyterhub_config.py
echo "c.Spawner.env_keep = ['AWS_DEFAULT_REGION','AWS_EXECUTION_ENV','AWS_REGION','AWS_CONTAINER_CREDENTIALS_RELATIVE_URI','ECS_CONTAINER_METADATA_URI']" >> /srv/jupyterhub/jupyterhub_config.py
echo "c.Spawner.cmd = ['/home/ubuntu/anaconda3/envs/jupyter/bin/jupyterhub-singleuser']" >> /srv/jupyterhub/jupyterhub_config.py
  

1a. Create IAM roles and policies.

aws iam create-role --role-name "jupyterpii" --description "Allows EC2 to call AWS services on your behalf." --assume-role-policy-document '{"Version":"2012-10-17","Statement":[{"Sid":"","Effect":"Allow","Principal":{"Service":["ec2.amazonaws.com"]},"Action":"sts:AssumeRole"}]}' --region us-east-1
aws iam put-role-policy --policy-name "pii" --policy-document '{"Version":"2012-10-17","Statement":[{"Sid":"VisualEditor0","Effect":"Allow","Action":["s3:PutAccountPublicAccessBlock","s3:GetAccountPublicAccessBlock","s3:ListAllMyBuckets","s3:HeadBucket"],"Resource":"*"},{"Sid":"VisualEditor1","Effect":"Allow","Action":"s3:*","Resource":"arn:aws:s3:::vishaldatalake/pii/*"}]}' --role-name "jupyterpii" --region us-east-1
aws iam create-role --role-name "jupyternonpii" --description "Allows EC2 to call AWS services on your behalf." --assume-role-policy-document '{"Version":"2012-10-17","Statement":[{"Sid":"","Effect":"Allow","Principal":{"Service":["ec2.amazonaws.com"]},"Action":"sts:AssumeRole"}]}' --region us-east-1
aws iam put-role-policy --policy-name "nonpii" --policy-document '{"Version":"2012-10-17","Statement":[{"Sid":"VisualEditor0","Effect":"Allow","Action":["s3:PutAccountPublicAccessBlock","s3:GetAccountPublicAccessBlock","s3:ListAllMyBuckets","s3:HeadBucket"],"Resource":"*"},{"Sid":"VisualEditor1","Effect":"Allow","Action":"s3:*","Resource":"arn:aws:s3:::vishaldatalake/nonpii/*"}]}' --role-name "jupyternonpii" --region us-east-1
    
    
  

1b, 1e Create cloud formation templates.

Create EFS mount point and replace template with EFS endpoint. All the notebooks will be stored on shared EFS mount point. Review default parameters and replace it according to your environment.

Templates

1c, 1f, 2c, 2f Create Service Catalog Products

Locate Service Catalog Service, click on upload new product.

image

Click on Next. Enter email contact of product owner and click on next.

image

Choose file, select cloud formation template and click on next. Review details and create products.

Below is the screenshot for all the products.

image

1d, 1g, 2d, 2g Create Service Catalog portfolio and add products.

Click on Create portfolio.

image

Click on Create.

image

Click on one of the portfolios and add PII specific products to PII portfolio and nonPII products to nonPII portfolio.

Below is screenshot of PII portfolio.

image

1i, 2i Create IAM users and give them access to Service Catalog portfolio.

Create IAM users for data scientists.

aws iam create-user --user-name "user1" --path "/" --region us-east-1
aws iam attach-user-policy --policy-arn "arn:aws:iam::aws:policy/AWSServiceCatalogEndUserFullAccess" --user-name "user1" --region us-east-1
aws iam attach-user-policy --policy-arn "arn:aws:iam::aws:policy/IAMUserChangePassword" --user-name "user1" --region us-east-1
aws iam create-login-profile --user-name "user1" --password-reset-required --regiot-1

aws iam create-user --user-name "user2" --path "/" --region us-east-1
aws iam attach-user-policy --policy-arn "arn:aws:iam::aws:policy/AWSServiceCatalogEndUserFullAccess" --user-name "user2" --region us-east-1
aws iam attach-user-policy --policy-arn "arn:aws:iam::aws:policy/IAMUserChangePassword" --user-name "user2" --region us-east-1
aws iam create-login-profile --user-name "user2" --password-reset-required --regiot-1
  

Click on Portfolio and under users, group and role click on Add users, group and role.

image

1h ,2h Login as Data scientist IAM user and launch product.

image

Click on product and launch product.

image

Provide details and click next.

image

Change instance type as per need and click next.

Leave default for Tags and Notifications. Review details and launch product.

image

Once the product is launched it will show JupyterHub url as key value pair.

Launch JupyterHub from browser.

image\

Login using username and password.

1k, 2k Create notebook and test access.

Create notebook and notebook will be stored on EFS mount point.

image

As expected, User can access data from PII folder.

image

User does not have access to nonPII data.

Once the user completes data science or machine learning tasks, product can be terminated by clicking on action and terminate. In future user can launch product and notebooks will be preserved as they are stored on persistent EFS mount.

Additional Considerations

Spot Instances

I have created products using cloud formation that uses on-demand instances. If there is no urgency to complete data exploration, machine learning training, consider creating products that use spot instances which can significantly save cost.

Security

Use certificates and consider launching EC2 products in private subnet for security reasons and access it through bastion.

Active Directory Integration

You can use ldap authenticator to authenticate users through AD.

SageMaker

Product offering can be extended using sage maker and data scientists can have flexibility to use JupyterHub or SageMaker depending upon their requirements.

Cost and Reporting

If users don’t terminate product EC2 will keep incurring additional cost. Lambda can be scheduled to terminate idle tasks or cloudwatch alarm can be created such that if EC2 instances are idle for more than certain period of time then terminate those instances.

As users have control over what type of EC2 instances they can launch, additional reporting can be created using service catalog, EC2 and cost metadata.

Advertisements

April 24, 2019

Multiuser Data Science Environment using JupyterHub on AWS Elastic Container Services

Filed under: AWS, Containers, Data Science, Uncategorized — vishaldesai @ 8:38 pm

Overview

Customer has built multi tenant data lake on S3 and have started ingesting different types of data. Now they want to build data science environment  for data exploration using JupyterHub. Below were the requirements.
  • Environment must be low cost.
  • Environment must scale with number of data scientists.
  • Environment should support authentication and authorization (S3 data lake).
  • Notebooks must be stored in a centralized location and should be sharable.
  • Environment must support installing custom packages and libraries such as R, pandas etc..

Customer’s preference is to use JupyterHub and does not want to use EMR due to additional cost.

Architecture

image

image

1a. Create IAM policies and roles with access to specific S3 folder. For simplicity lets assume S3 Bucket has two keys/folder called PII and Non-PII. Create policy and role with access to PII and Non-PII.

2a. Create two Dockerfile for authorization purpose. Each Dockerfile will have separate users for authentication and later while creating ECS task definition each image will be attached to different role for authorization. Store Dockerfile in CodeCommit.

2b. CodeBuild will trigger on commit

2c. Code Build will build images using Dockerfile.

2c. CodeBuild will push images in Elastic Container Repository.

Web Based Environment

Single Task can shared by multiple users, Task can scale based on scaling policy, Minimum of one task must be running so customer has to pay for at least one task per task group, CPU and memory limits per task are 4 vCPU and 30 GB of memory.

3a. Create ECS cluster

3b. Create one task definition using the role that has access to PII folder in S3 and image that consists of users who needs access to PII data and other task definition for Non-PII.

3c. Create Services for PII and Non-PII.

3d. Create Application load balancer with routing rules to different services.

3f. Create A-record in Route53 using one of the existing domains.

On Demand Environment

EC2 instance can be provisioned using service catalog, One EC2 instance per User, Users could also share EC2, Customer only pay what they use, wide variety of EC2 options available with much higher CPU, memory compared to ECS. Recommended for ad-hoc and very large data processing use cases.

In this blog, I will cover implementation of Web Based Environment and will cover On Demand Environment in part 2.

Walkthrough Dockerfile

Get the base image, update ubuntu and install jupyter, s3contents, awscli. s3contents is required to store Notebooks on S3.

#Base image
FROM jupyterhub/jupyterhub:latest
#USER root

# update Ubuntu
RUN apt-get update


# Install jupyter, awscli and s3contents (for storing notebooks on S3)
RUN pip install jupyter  && \
    pip install s3contents  && \
    pip install awscli --upgrade --user  && \
    mkdir /etc/jupyter
  

Install R and required packages.

# R pre-requisites
RUN apt-get update && \
    apt-get install -y --no-install-recommends \
    fonts-dejavu \
    unixodbc \
    unixodbc-dev \
    r-cran-rodbc \
    gfortran \
    gcc && \
    rm -rf /var/lib/apt/lists/*

# Fix for devtools https://github.com/conda-forge/r-devtools-feedstock/issues/4
RUN ln -s /bin/tar /bin/gtar


# R packages

RUN conda install -c r r-IRkernel && \
    conda install -c r rstudio && \
    conda install -c r/label/borked rstudio && \
    conda install -c r r-devtools  && \
    conda install -c r r-ggplot2 r-dplyr  && \
    conda install -c plotly plotly  && \
    conda install -c plotly/label/test plotly  && \ 
    conda update curl  && \
    conda install -c bioconda bcftools  && \
    conda install -c bioconda/label/cf201901 bcftools  

RUN R -e "devtools::install_github('IRkernel/IRkernel')"  && \
    R -e "IRkernel::installspec()"
  

Install S3ContentsManager to store notebooks in centralized S3 location. Although github says it should work with IAM role but I got some errors so as of now I’m using access_key_id and secret_access_key that has read/write access to S3 bucket.

#S3ContentManager Config
RUN echo 'from s3contents import S3ContentsManager' >> /etc/jupyter/jupyter_notebook_config.py  && \
    echo 'c = get_config()' >> /etc/jupyter/jupyter_notebook_config.py  && \
    echo 'c.NotebookApp.contents_manager_class = S3ContentsManager' >> /etc/jupyter/jupyter_notebook_config.py  && \
    echo 'c.S3ContentsManager.access_key_id = "xxxxxxxx"' >> /etc/jupyter/jupyter_notebook_config.py  && \
    echo 'c.S3ContentsManager.secret_access_key = "xxxxxxxx"' >> /etc/jupyter/jupyter_notebook_config.py  && \
    echo 'c.S3ContentsManager.bucket = "vishaljuypterhub"' >> /etc/jupyter/jupyter_notebook_config.py
  

JupyterHub Configuration File.

#JupyterHub Config
RUN echo "c = get_config()" >> /srv/jupyterhub/jupyterhub_config.py  && \
    echo "c.Spawner.env_keep = ['AWS_DEFAULT_REGION','AWS_EXECUTION_ENV','AWS_REGION','AWS_CONTAINER_CREDENTIALS_RELATIVE_URI','ECS_CONTAINER_METADATA_URI']" >> /srv/jupyterhub/jupyterhub_config.py  && \
    echo "c.Spawner.cmd = ['/opt/conda/bin/jupyterhub-singleuser']" >> /srv/jupyterhub/jupyterhub_config.py
  

Add PAM users

#Add PAM users
RUN useradd --create-home user3  && \
    echo "user3:user3"|chpasswd  && \
    echo "export PATH=/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin" >> /home/user3/.profile  && \
    mkdir -p /home/user3/.local/share/jupyter/kernels/ir  && \
    cp /root/.local/share/jupyter/kernels/ir/* /home/user3/.local/share/jupyter/kernels/ir/  && \
    chown -R user3:user3 /home/user3
  

Start JupyterHub using configuration file created earlier.

## Start jupyterhub using config file
CMD ["jupyterhub","-f","/srv/jupyterhub/jupyterhub_config.py"]
  

Implementation of Web Based Environment

1a. Create IAM Roles and Policies

Create IAM role and policy with access to PII key/folder and non-PII key/folder.

aws iam create-role --role-name "pii" --description "Allows ECS tasks to call AWS services on your behalf." --assume-role-policy-document '{"Version":"2012-10-17","Statement":[{"Sid":"","Effect":"Allow","Principal":{"Service":["ecs-tasks.amazonaws.com"]},"Action":"sts:AssumeRole"}]}' --region us-east-1
aws iam put-role-policy --policy-name "pii" --policy-document '{"Version":"2012-10-17","Statement":[{"Sid":"VisualEditor0","Effect":"Allow","Action":["s3:PutAccountPublicAccessBlock","s3:GetAccountPublicAccessBlock","s3:ListAllMyBuckets","s3:HeadBucket"],"Resource":"*"},{"Sid":"VisualEditor1","Effect":"Allow","Action":"s3:*","Resource":"arn:aws:s3:::vishaldatalake/pii/*"}]}' --role-name "pii" --region us-east-1
aws iam create-role --role-name "nonpii" --description "Allows ECS tasks to call AWS services on your behalf." --assume-role-policy-document '{"Version":"2012-10-17","Statement":[{"Sid":"","Effect":"Allow","Principal":{"Service":["ecs-tasks.amazonaws.com"]},"Action":"sts:AssumeRole"}]}' --region us-east-1
aws iam put-role-policy --policy-name "nonpii" --policy-document '{"Version":"2012-10-17","Statement":[{"Sid":"VisualEditor0","Effect":"Allow","Action":["s3:PutAccountPublicAccessBlock","s3:GetAccountPublicAccessBlock","s3:ListAllMyBuckets","s3:HeadBucket"],"Resource":"*"},{"Sid":"VisualEditor1","Effect":"Allow","Action":"s3:*","Resource":"arn:aws:s3:::vishaldatalake/nonpii/*"}]}' --role-name "nonpii" --region us-east-1

2c, 2d. Build Docker images and push it to ECR

For sake of brevity, I will skip code commit and code build and show what commands codebuild has to run. There will be two images, one will have users that needs access to PII and another one for non-PII. Instead of two repositories you can also create single repository and create two images with different tags.

cd jupyterhub1
aws ecr create-repository --repository-name jupyterhub/test1
aws ecr get-login --no-include-email --region us-east-1
docker tag jupyterhub/test1:latest xxxxxxxxx.dkr.ecr.us-east-1.amazonaws.com/jupyterhub/test1:latest
docker push xxxxxxxxx.dkr.ecr.us-east-1.amazonaws.com/jupyterhub/test1:latest

cd jupyterhub2
aws ecr create-repository --repository-name jupyterhub/test2
aws ecr get-login --no-include-email --region us-east-1
docker tag jupyterhub/test2:latest xxxxxxxxx.dkr.ecr.us-east-1.amazonaws.com/jupyterhub/test2:latest
docker push xxxxxxxxx.dkr.ecr.us-east-1.amazonaws.com/jupyterhub/test2:latest
  

ecsjh1

3a. Create ECS Cluster

Go to ECS service and click on create cluster and choose Networking only. Click on Next Step, provide cluster name and click on create. I have named cluster as jhpoc.

ecsjh2

3b. Create Task Definitions

Click on Task Definition, and Create New Task Definition. Select Launch Type compatibility as Fargate. Click on Next Step. Enter following details.

Task Definition Name: jhpocpii

Task Role: pii

Task Memory: 2GB

Task CPU: 1 vCPU

Click on Add container. Enter following details.

Container Name: jhpocpii

Image*: xxxxxxxx.dkr.ecr.us-east-1.amazonaws.com/jupyterhub/test1:latest

Port Mapping: Add 8000 and 80

Click on Add.

Click on Create.

Follow same steps now and create another task definition for nonPII. Use different role nonpii role and test2 image for container.

ecsjh3

3d. Create Application Load Balancer.

Create Target groups.

aws elbv2 create-target-group --health-check-interval-seconds 30 --health-check-path "/hub/login" --health-check-protocol "HTTP" --health-check-timeout-seconds 5 --healthy-threshold-count 5 --matcher '{"HttpCode":"200"}' --name "jhpocpii" --port 8000 --protocol "HTTP" --target-type "ip" --unhealthy-threshold-count 2 --vpc-id "vpc-0829259f1492b8986" --region us-east-1
aws elbv2 create-target-group --health-check-interval-seconds 30 --health-check-path "/hub/login" --health-check-protocol "HTTP" --health-check-timeout-seconds 5 --healthy-threshold-count 5 --matcher '{"HttpCode":"200"}' --name "jhpocnonpii" --port 8000 --protocol "HTTP" --target-type "ip" --unhealthy-threshold-count 2 --vpc-id "vpc-0829259f1492b8986" --region us-east-1

Create ALB.

aws elbv2 create-load-balancer --name "jhpocalb1" --scheme "internet-facing" --security-groups '["sg-065547ed77ac48d99"]' --subnets '["subnet-0c90f68bfcc784540","subnet-026d9b30457fcb121"]'  --ip-address-type "ipv4" --type "application" --region us-east-1
  

Create Routing Rules as follows:

ecsjh4

3c. Create ECS Services

Click on ECS cluster and on services tab click on create.

Choose Launch Type Fargate.

Choose Task Definition for PII.

Specify Service Name.

Specify Number of Tasks as 1.

Click on Next and uncheck Enable Service Discovery Integration.

Choose VPC, subnets and security group.

For Load Balancer choose ALB.

Choose Load Balancer Name from Drop down.

Choose Container to Load Balancer settings as follows:

ecsjh5

Click on Next Step.

Optionally set Auto Scaling as follows:

ecsjh17

Click on Create Service.

Create Service for PII and non-PII.

After both the services are created, wait for few minutes until there is one task running for each service.

ecsjh6

3f. Create A-records.

Create A-records in Route53.

ecsjh7

Test it

Launch jhpocpii.vishalcloud.club:8000. Login as user1 and notice user1 can only access pii data. Trying to login using user3 or user4 will result into authentication error.

ecsjh8

ecsjh9

ecsjh10

ecsjh11

Launch jhpocnonpii.vishalcloud.club:8000. Login as user4 and notice user4 can only access non-PII data. Trying to login using user1 or user2 will result into authentication error.

ecsjh12

ecsjh13

ecsjh14

ecsjh15

Test R Program.

ecsjh18

Important Additional Considerations

Cost

ECS Tasks can be launched using Fargate or EC2. Below matrix shows cost comparison of similar CPU/memory configuration between Fargate and EC2. To save cost, depending upon the type of usage pattern of environment use EC2 for persistent usage or use Fargate for adhoc usage.

ecsjh16

Security

Use certificates and consider launching ECS tasks in private subnet for security reasons.

Active Directory Integration

You can use ldap authenticator to authenticate users through AD. Create separate images with different AD group to control authorization.

January 30, 2019

ElasticSearch Authentication and Authorization at Indices level on AWS

Filed under: AWS, ElasticSearch, Uncategorized — vishaldesai @ 10:10 pm

 

Customer Requirement:

Customer XYZ is planning to host multi-tenant ElasticSearch domain to provide log analytics service to multiple clients. Customer XYZ will receive logs from different clients into their data lake and selectively push relevant logs to client specific Indices. Clients should be able to login and authenticate to Kibana portal and should only have authorization to client specific indices. There will be separate set of indices created for each client starting with standard prefix such as clientid or clientname.

Solution:

In current solution, I will demonstrate how to integrate ElasticSearch domain with Cognito. Using Cognito, customer can create users for different clients in user pool. Customer can also federate using SAML provided users are available or can be created in hosted/cloud AD. Each user either from user pool or federated will map to one of the groups in Cognito and each group will be associated with IAM role which in turn will provide authorization access to set of client specific indices.

Note: Solution will work only for dev tools in kibana and not for discover, dashboards etc

Prerequisite resources:

KMS key, VPC, subnet ids, security group should be setup prior to implementation steps.

Implementation Steps:

Step 1: Elasticsearch Domain

Create ElasticSearch Domain using CloudFormation template from Appendix A.

1001

Step 2: Sample Indices

Login to Kibana portal and create client id specific indices for testing. Kibana end point can be accessed using ec2 windows in public subnet or via proxy to bastion.


PUT orders/_doc/1
{

"user" : "order",

"post_date" : "2009-11-15T14:12:12",

"message" : "trying out order"

}


PUT customers/_doc/1

{

"user" : "customers",

"post_date" : "2009-11-15T14:12:12",

"message" : "trying out customer"

}

Step 3: Cognito

Create user pool and identity pool using CloudFormation template from Appendix A.

1003

Step 4: Cognito Domain

Create domain for Cognito user pool.


aws cognito-idp list-user-pools --max-results 50 --output text | grep espool | awk '{print $3}'

us-east-1_ldkzTlRck


--Domain name value in below cli must be unique


aws cognito-idp create-user-pool-domain --domain espooldemo --user-pool-id us-east-1_ldkzTlRck

1004.png

Step 5: Update ElasticSearch domain with Cognito configuration.


aws es update-elasticsearch-domain-config --domain-name esdomain  --cognito-options Enabled=true,UserPoolId=us-east-1_ldkzTlRck,IdentityPoolId=us-east-1:fb6e132c-3711-4974-866f-cc4a3db7d6fa,RoleArn=arn:aws:iam::xxxxxxxx:role/service-role/CognitoAccessForAmazonES

Domain status will change to processing. Once the processing is complete status will change to Active and Cognito configuration will be updated for Elasticsearch domain.

1005

Step 6: Users, Roles and policies

Create IAM policies, roles, Cognito user pool users, groups and map groups to IAM roles. Policy documents can be found in Appendix A. IAM policy documents are key on how to control authorization at indice level.


aws iam create-policy --policy-name client1_policy --policy-document file:///Users/xyz/Downloads/client1_policy.json

aws iam create-policy --policy-name client2_policy --policy-document file:///Users/xyz/Downloads/client2_policy.json


aws iam create-role --role-name client1_role --assume-role-policy-document file:///Users/desaivis/Downloads/client_policy_trust.json


aws iam create-role --role-name client2_role --assume-role-policy-document file:///Users/desaivis/Downloads/client_policy_trust.json


aws iam attach-role-policy --policy-arn arn:aws:iam::"xxxxxxxxxxx:policy/client1_policy --role-name client1_role


aws iam attach-role-policy --policy-arn arn:aws:iam::"xxxxxxxxxxx:policy/client2_policy --role-name client2_role


aws cognito-idp create-group --group-name client1_group --user-pool-id us-east-1_ldkzTlRck --role-arn arn:aws:iam::"xxxxxxxxxxx:role/client1_role


aws cognito-idp create-group --group-name client2_group --user-pool-id us-east-1_ldkzTlRck --role-arn arn:aws:iam::"xxxxxxxxxxx:role/client2_role


aws cognito-idp admin-create-user --user-pool-id us-east-1_ldkzTlRck --username client1_user --temporary-password Eskibana1#


aws cognito-idp admin-create-user --user-pool-id us-east-1_ldkzTlRck --username client2_user --temporary-password Eskibana2#


aws cognito-idp admin-add-user-to-group --user-pool-id us-east-1_ldkzTlRck --username client1_user --group-name client1_group


aws cognito-idp admin-add-user-to-group --user-pool-id us-east-1_ldkzTlRck --username client2_user --group-name client2_group

Step 7: Update ElasticSearch domain resource policy.

Find roleid using following commands:


aws iam get-role --role-name client1_role

aws iam get-role --role-name client2_role

Create policy document as per Appendix A.


aws cognito-identity get-identity-pool-roles --identity-pool-id us-east-1:fb6e132c-3711-4974-866f-cc4a3db7d6fa | jq '.Roles.authenticated'

"arn:aws:iam::xxxxxx:role/cognito-CognitoAuthorizedRole-JEUO3KGR2STO"


aws es update-elasticsearch-domain-config --domain-name esdomain  --access-policies file:///Users/xyz/Downloads/espolicy.json

Wait for cluster status to be active.

Step 8: Testing

Login as client1_user in kibana.

1007

1008

1009

Login as client2_user in kibana.

1010

1011

SAML integration:

Further, if customer wants to provide federation for AD users, identity provider can be configured using SAML using the xml file obtained from ADFS.

1012

Enable the identity provider in the App client Settings.

1013

Kibana Login page will now look like as follows:

1014

Federated identities may not map to any groups initially so federated identities can still get access denied message. Cognito pre/post authentication triggers can be configured using lambda to add external identities to Cognito groups depending upon SAML attribute values or can be added after some verification/approval process.

Appendix A:

Download

Create a free website or blog at WordPress.com.