Build an AppSync API Using ElasticSearch and Lambda as Data Sources

2020年05月13日


In this post, I will demonstrate how to build an AppSync API using ElasticSearch and Lambda function as its backend. Refer to below architecture.


Don't worry if you could not open above Lucid Chart diagram. Refer below image for an architecture view of the demo.


Create an IAM Role for AppSync Access Lambda

Create an IAM role for AppSync Data Source to invoke Lambda functions.
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "lambda:InvokeFunction"
            ],
            "Resource": [
                "[LambdaFunctionArn]"
            ]
        }
    ]
}

ATTN
The Lambda function ARN is in a format similar to: "arn:aws:lambda:us-west-2:123456789012:function:[FunctionName]". The example in Configure Data Source for AWS Lambda won't work (tried but failed)! The ARN should end with "function:[FunctionName]", but not "function/[FunctionName]"!

Trust relationship:
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "appsync.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

Meanwhile, I have an existing Lambda function that takes primary key as input and respond with a specific hashed value as output. Associate this role with the data source designated for this Lambda function.


Create an IAM Role for AppSync Access Amazon Elasticsearch

Create an IAM role for AppSync Data Source to interact with Elasticsearch.

Inline policy:
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "es:ESHttpDelete",
                "es:ESHttpHead",
                "es:ESHttpGet",
                "es:ESHttpPost",
                "es:ESHttpPut"
            ],
            "Resource": [
                "arn:aws:es:us-west-2:123456789012:domain/[ElasticsearchDomainName]/*"
            ]
        }
    ]
}

Trust relationship:
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "appsync.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

Associate this role with the data source designated for the Elasticsearch.

Creates a DataSource object for AppSync

Creates a DataSource object for AppSync.

$ aws appsync create-data-source --api-id [AppSync-API-ID] --name *** --type AMAZON_ELASTICSEARCH --service-role-arn arn:aws:iam::123456789012:role/service-role/### --elasticsearch-config endpoint=https://***.us-west-2.es.amazonaws.com,awsRegion=us-west-2 --region us-west-2
{
    "dataSource": {
        "serviceRoleArn": "arn:aws:iam::123456789012:role/service-role/###",
        "dataSourceArn": "arn:aws:appsync:us-west-2:123456789012:apis/[AppSync-API-ID]/datasources/***",
        "type": "AMAZON_ELASTICSEARCH",
        "name": "***",
        "elasticsearchConfig": {
            "endpoint": "https://***.us-west-2.es.amazonaws.com",
            "awsRegion": "us-west-2"
        }
    }
}
 
ATTN
  • The endpoint MUST be prefixed with "https://", i.e. the endpoint MUST be the same as the one retrieved from AWS Elasticsearch console.
  • Because from the AWS AppSync console it is not technical possible to specify the endpoint according to above rule, the datasource could not be added successfully via console. Below error message from console will happen. Using above CLI could successfully add an Elasticsearch domain as data source.


Then, import some data in to the Elasticsearch cluster.

PS: For a sample data, move to the end of this post, and you will see a screen shot of one piece of sample data.


Create Schema


Define the schema.

Adding a Root Query Type.
Create a SearchPost application. Add a root type named Query with a single searchPostsByEs field that returns a list containing SearchPost objects (i.e. [SearchPost]). Add the following to your schema.graphql file:
...
type Query {
	searchPostsByEs(fieldname: String, keyword: String): [SearchPost]
}
...
This is pretty like we are doing when defining a function in programming language, like Java and Python. Then, in the end of this demonstration, we will call this function by transferring parameter values to the function, as declared here.

PS: The test query is shown below. You do not need to execute the query at this stage. This is just for illustration.
query {
  searchPostsByEs(keyword: "Add ElasticSearch as a AppSync Data Source", fieldname: "title") {
    id
    text
    title
  }
}


Defining a SearchPost Type

Create a type that contains the data for a SearchPost object.
...
type SearchPost {
	text: String
	title: String
	id: ID!
}
...
Any field that ends in an exclamation point is a required field.

The whole schema definition:
type Query {
	searchPostsByEs(fieldname: String, keyword: String): [SearchPost]
}

type SearchPost {
	text: String
	title: String
	id: ID!
}

schema {
	query: Query
}


Configure a Pipeline Resolver for this schema

AWS AppSync uses Mapping Templates for resolvers which are written in VTL.

Pipeline resolvers offer the ability to serially execute operations against data sources. Create functions in your API and attach them to a pipeline resolver.

Before mapping template:
#set($result = { "fieldname": $ctx.args.fieldname, "keyword": $ctx.args.keyword  })
$util.toJson($result)
PS: The $context / $ctx variable is a map that holds all of the contextual information for your resolver invocation. It has the following structure:
{
   "arguments" : { ... },
   "source" : { ... },
   "result" : { ... },
   "identity" : { ... },
   "request" : { ... },
   "info": { ... }
}

Add the first function to query Elasticsearch.
Request Mapping Template:
{
    "version":"2018-05-29",
    "operation":"GET",
    "path":"/[IndexName]/_search",
    "params":{
        "headers":{},
        "queryString":{},
        "body":{
          "query": {
            "match": {
              "$!{ctx.args.fieldname}": "$!{ctx.args.keyword}"
            }
          }
        }
    }
}
PS: For the equivalent cURL-like syntax, like what we are doing in the Kibana console, refer to the appendix.

Use below Response Mapping Template to specify the _source filter.
[
    #foreach($entry in $ctx.result.hits.hits)
    #if( $velocityCount > 1 ) , #end
    $utils.toJson($entry.get("_source"))
    #end
]
This will extract the data in the "_source" field of each element, and construct a new list using these extracted data as elements. The new list will contain elements each has fields id, title, and text, as the first tier.

Add second function. This points to a Lambda function data source to replace the pk value in each result returned by the last function. For a sample event data of the Lambda function, refer to the appendix of this post.

Request Mapping Template:
{
  "operation": "Invoke",
  "payload": $util.toJson($ctx.prev.result)
}
The $ctx.prev.result represents the result of the previous operation that was executed in the pipeline.

This function will respond with a payload using a format that contains a "body" field that contains the results with ID field being hashed.

Response Mapping Template:
#if($ctx.error)
    $util.error($ctx.error.message, $ctx.error.type)
#end
$util.toJson($ctx.result.body)
This will take out the data in the "body" field, leaving the exact data that we are searching for. In other words, each element of the returned data , which is of list type, will have fields in the first tier, including id, title, and text.

The context object (aliased as $ctx) for lists of items has the form $ctx.result.body.
If your GraphQL operation returns a single item, it would be $context.result.

After mapping template:
$util.toJson($ctx.result)


Test

Now, we navigate to the Queries tab of the AWS AppSync console. Run a query against your Amazon Elasticsearch domain.
query {
  searchPostsByEs(keyword: "Add ElasticSearch as a AppSync Data Source", fieldname: "title") {
    id
    text
    title
  }
}
PS: For the equivalent cURL-like syntax, like what we are doing in the Kibana console, refer to the appendix.

Choose Execute query (the orange play button).

The post retrieved from Elasticsearch should appear in the results pane to the right of the query pane. It looks similar to the following:
{
  "data": {
    "searchPostsByEs": [
      {
        "id": "f7631hpl",
        "text": "-<br /><br />$ <strong>aws appsync create-data-source --api-id [AppSync-API-ID] --name *** --type AMAZON_ELASTICSEARCH --service-role-arn arn:aws:iam::123456789012:role/service-role/### --elasticsearch-config endpoint=https://***.us-west-2.es.amazonaws.com,awsRegion=us-west-2 --region us-west-2</strong><br />\r\n<pre class=\"brush:plain;auto-links:false;toolbar:false\" contenteditable=\"false\">{\r\n    \"dataSource\": {\r\n        \"serviceRoleArn\": \"arn:aws:iam::123456789012:role/service-role/###\",\r\n        \"dataSourceArn\": \"arn:aws:appsync:us-west-2:123456789012:apis/[AppSync-API-ID]/datasources/***\",\r\n        \"type\": \"AMAZON_ELASTICSEARCH\",\r\n        \"name\": \"***\",\r\n        \"elasticsearchConfig\": {\r\n            \"endpoint\": \"https://***.us-west-2.es.amazonaws.com\",\r\n            \"awsRegion\": \"us-west-2\"\r\n        }\r\n    }\r\n}</pre>\r\n<br /><strong>ATTN</strong><br />\r\n<ul>\r\n<li>The endpoint MUST be prefixed with \"https://\", i.e. the endpoint MUST be the same as the one retrieved from AWS Elasticsearch console.</li>\r\n<li>Because from the AWS AppSync console it is not technical possible to specify the endpoint according to above rule, the datasource could not be added successfully via console. Below error message from console will happen. Using above CLI could successfully add an Elasticsearch domain as data source.</li>\r\n</ul>\r\n<img src=\"../../../img/202005/add_es_as_appsync_datasource_00.png\" alt=\"\" width=\"495\" height=\"109\" /><br /><br /><br /><strong>Related Documentation</strong><br /><br /><a title=\"create-data-source\" href=\"https://docs.aws.amazon.com/cli/latest/reference/appsync/create-data-source.html\" target=\"_blank\">create-data-source</a><br /><br /><br /><strong>Related Products</strong><br /><br /><a title=\"Amazon Elasticsearch Service\" href=\"https://aws.amazon.com/elasticsearch-service/\" target=\"_blank\">Amazon Elasticsearch Service</a><br /><br /><a title=\"AWS AppSync\" href=\"https://aws.amazon.com/appsync/\" target=\"_blank\">AWS AppSync</a><br />-",
        "title": "Add ElasticSearch as a AppSync Data Source"
      },
...
    ]
  }
}

The data we really care about is nested in the data.searchPostsByEs structure.

Screenshot from the AppSync console.




Wrap Up

To wrap up, we used AppSync to build a GraphQL API, and configured ElasticSearch and Lambda function as its data source.


Related Documentation

create-data-source


Related Products

Amazon Elasticsearch Service

AWS AppSync

Apache Velocity If / ElseIf / Else

Apache Velocity Foreach Loop

Resolver Mapping Template Context Reference


Appendix

Kibana console cURL-like syntax
Below is an equivalent cURL-like syntax, like what we are doing within the Kibana console.
GET [IndexName]/_search
{
  "query": {
    "match": {
      "title": "Add ElasticSearch as a AppSync Data Source"
    }
  }
}

Sample input data for Lambda function.
Below is a sample event data retrieved from the Lambda function. This function hashes the primary key field, and return back payload with the ID field as a hashed value.
[
	{
		'id': ****, 
		'title': 'Add ElasticSearch as a AppSync Data Source', 
		'text': '-<br /><br />$ <strong>aws appsync create-data-source --api-id [AppSync-API-ID] --name *** --type AMAZON_ELASTICSEARCH --service-role-arn arn:aws:iam::123456789012:role/service-role/### --elasticsearch-config endpoint=https://***.us-west-2.es.amazonaws.com,awsRegion=us-west-2 --region us-west-2</strong><br />\r\n<pre class="brush:plain;auto-links:false;toolbar:false" contenteditable="false">{\r\n    "dataSource": {\r\n        "serviceRoleArn": "arn:aws:iam::123456789012:role/service-role/###",\r\n        "dataSourceArn": "arn:aws:appsync:us-west-2:123456789012:apis/[AppSync-API-ID]/datasources/***",\r\n        "type": "AMAZON_ELASTICSEARCH",\r\n        "name": "***",\r\n        "elasticsearchConfig": {\r\n            "endpoint": "https://***.us-west-2.es.amazonaws.com",\r\n            "awsRegion": "us-west-2"\r\n        }\r\n    }\r\n}</pre>\r\n<br /><strong>ATTN</strong><br />\r\n<ul>\r\n<li>The endpoint MUST be prefixed with "https://", i.e. the endpoint MUST be the same as the one retrieved from AWS Elasticsearch console.</li>\r\n<li>Because from the AWS AppSync console it is not technical possible to specify the endpoint according to above rule, the datasource could not be added successfully via console. Below error message from console will happen. Using above CLI could successfully add an Elasticsearch domain as data source.</li>\r\n</ul>\r\n<img src="../../../img/202005/add_es_as_appsync_datasource_00.png" alt="" width="495" height="109" /><br /><br /><br /><strong>Related Documentation</strong><br /><br /><a title="create-data-source" href="https://docs.aws.amazon.com/cli/latest/reference/appsync/create-data-source.html" target="_blank">create-data-source</a><br /><br /><br /><strong>Related Products</strong><br /><br /><a title="Amazon Elasticsearch Service" href="https://aws.amazon.com/elasticsearch-service/" target="_blank">Amazon Elasticsearch Service</a><br /><br /><a title="AWS AppSync" href="https://aws.amazon.com/appsync/" target="_blank">AWS AppSync</a><br />-'
	}, 
	...
]

Category: AWS Tags: public

Upvote


Downvote