Extending terraform with external data sources
Sometimes my terraform modules depend on data not managed by a terraform provider but by a build step or a script in my repository. External data sources are an interface for running a command locally, on the machine that runs terraform, and providing the console output of that program as a data source.
What is an External Data Source
It’s a mechanism to allow a local script to act as a data source. All this local script needs to do to qualify as a valid data source is print JSON to standard out like this:
console.log(JSON.stringify({ hello: 'world' }))
I love it when things are simple 🙌
It’s meant to be used as an escape hatch and has limited capabilities. If this data source is complex and needs to be shared among projects, it might be a better choice to implement a provider instead.
Personally I’ve found external data sources to be very useful when I want to roll a small build step into the provisioning step. Following are two examples, a simple hello world to get you started and then a more practical example (provisioning a lambda runtime for deno)
All of the examples in this blogpost are in this repository.
Hello World external data source with node.js
Right, we have two files. First - main.tf
, which simply uses an external data source to run a node script
data "external" "js" {
program = ["node", "${path.module}/data-source.js"]
}
output "external_source" {
value = data.external.js.result
}
Our second file is data-source.js
, which simply writes a json string stdout
:
console.log(JSON.stringify({ hello: 'world' }))
Provided you have terraform installed, run terraform init
and then terraform apply
to see the following output:
Apply complete! Resources: 0 added, 0 changed, 0 destroyed.
Outputs:
external_source = {
"hello" = "world"
}
I’ve also added hello world examples for Go and Python to the example repository.
A more practical example: Provisioning a lambda layer with an external data source
In this example I want to provision a deno runtime for lambda and always use the latest release of the said runtime from github. Here are the steps to automate for this:
- Download the lambda runtime layer.
- Use the downloaded layer package as input for the aws resource
aws_lambda_layer_version
. - Associate the lambda layer with a lambda function resource.
- Deploy with terraform.
First here’s a simple node script to request the latest release (on github) of the deno lambda layer, download and write it to the file lambda_layer.zip
. When the program is finished a json object with the filename is written to the console. If the script doesn’t write json to the console, terraform commands will fail. Bear in mind that this is written using the node.js standard library, hence it’s a little verbose.
const https = require('https');
const fs = require('fs');
const file = fs.createWriteStream("lambda_layer.zip");
const options = { headers: { 'User-Agent': 'node-script' }}
https.get(
// github's api for getting the latest releases of a repository
"https://api.github.com/repos/hayd/deno-lambda/releases/latest",
options,
(res) => {
res.setEncoding('utf8');
let rawData = '';
// since we're using the standard node interface
res.on('data', (chunk) => { rawData += chunk; });
res.on('end', () => {
const parsedData = JSON.parse(rawData);
const { browser_download_url } = parsedData.assets.find((asset) =>
asset.name === 'deno-lambda-layer.zip'
)
https.get(
browser_download_url,
options,
(res) => {
// follow redirect if there is one
if (res.statusCode === 302) {
https.get(
res.headers.location,
options,
(res) => {
res.pipe(file)
res.on('end', () => {
console.log(JSON.stringify({ file: 'lambda_layer.zip' }))
})
})
} else {
res.pipe(file)
res.on('end', () => {
console.log(JSON.stringify({ file: 'lambda_layer.zip' }))
})
}
});
});
}
)
Now in our terraform file we can declare the above node script as an external data source:
data "external" "deno_layer" {
program = ["node", "${path.module}/get-release.js"]
}
The object that the node script has printed to the console can now be accessed via data.external.deno_layer.result
. Here’s how we use it with our aws_lambda_layer_version
resource:
resource "aws_lambda_layer_version" "deno_lambda_layer" {
filename = data.external.deno_layer.result.file
layer_name = "deno_lambda_layer"
source_code_hash = filebase64sha256(data.external.deno_layer.result.file)
}
There’s three attributes here:
- The
filename
attribute will be used to point to the actual layer deployment package for upload layer_name
is the name of the lambda layer, you need to specify a unique name for your account.source_code_hash
is the value that determines when the resource should be updated. That means that if the contents of the file change, the resource will be updated too, if the contents are the same time as when we last ran terraform, the update will not be performed.
That’s it, now we can use the lambda layer with a lambda function like so:
resource "aws_lambda_function" "deno_lambda" {
layers = [aws_lambda_layer_version.lambda_layer.arn]
}
That’s it. For a full example including lambda function and IAM policies, check out the repository.