Build a Serverless App with Cloud Run that Creates PDF Files - GSP644
Table of Contents
Overview
Twelve years ago, Lily started the Pet Theory chain of veterinary clinics. Pet Theory currently sends invoices in DOCX format to clients, but many clients have complained that they are unable to open them. To improve customer satisfaction, Lily has asked Patrick in IT to investigate an alternative to improve the current situation.
Pet Theory's Ops team is a single person, so they are keen to invest in a cost efficient solution that doesn't require a lot of ongoing maintenance. After analyzing the various processing options, Patrick decides to use Cloud Run.
Cloud Run is serverless, so it abstracts away all infrastructure management and lets you focus on building your application instead of worrying about overhead. As a Google serverless product, it is able to scale to zero, meaning it won't incur cost when not used. It also lets you use custom binary packages based on containers, which means building consistent isolated artifacts is now feasible.
In this lab you will build a PDF converter web app on Cloud Run that automatically converts files stored in Cloud Storage into PDFs stored in separate folders.
Architecture
This diagram gives you an overview of the services you will be using and how they connect to one another:
Objectives
In this lab, you will learn how to:
Convert a Node JS application to a container.
Build containers with Google Cloud Build.
Create a Cloud Run service that converts files to PDF files in the cloud.
Use event processing with Cloud Storage
Prerequisites
This is an intermediate level lab. This assumes familiarity with the console and shell environments. Experience with Firebase will be helpful, but it is not required. Before taking this lab it is recommended that you have completed the following Google Cloud Skills Boost labs before taking this one:
Once you're ready, scroll down and follow the steps below to set up your lab environment.
Setup and requirements
Before you click the Start Lab button
Read these instructions. Labs are timed and you cannot pause them. The timer, which starts when you click Start Lab, shows how long Google Cloud resources will be made available to you.
This hands-on lab lets you do the lab activities yourself in a real cloud environment, not in a simulation or demo environment. It does so by giving you new, temporary credentials that you use to sign in and access Google Cloud for the duration of the lab.
To complete this lab, you need:
- Access to a standard internet browser (Chrome browser recommended).
Note: Use an Incognito or private browser window to run this lab. This prevents any conflicts between your personal account and the Student account, which may cause extra charges incurred to your personal account.
- Time to complete the lab---remember, once you start, you cannot pause a lab.
Note: If you already have your own personal Google Cloud account or project, do not use it for this lab to avoid extra charges to your account.
How to start your lab and sign in to the Google Cloud console
Click the Start Lab button. If you need to pay for the lab, a pop-up opens for you to select your payment method. On the left is the Lab Details panel with the following:
The Open Google Cloud console button
Time remaining
The temporary credentials that you must use for this lab
Other information, if needed, to step through this lab
Click Open Google Cloud console (or right-click and select Open Link in Incognito Window if you are running the Chrome browser).
The lab spins up resources, and then opens another tab that shows the Sign in page.
Tip: Arrange the tabs in separate windows, side-by-side.
Note: If you see the Choose an account dialog, click Use Another Account.
If necessary, copy the Username below and paste it into the Sign in dialog.
student-03-f84fa0b39cff@qwiklabs.net
You can also find the Username in the Lab Details panel.
Click Next.
Copy the Password below and paste it into the Welcome dialog.
Yod1LHlV0QtH
You can also find the Password in the Lab Details panel.
Click Next.
Important: You must use the credentials the lab provides you. Do not use your Google Cloud account credentials.
Note: Using your own Google Cloud account for this lab may incur extra charges.
Click through the subsequent pages:
Accept the terms and conditions.
Do not add recovery options or two-factor authentication (because this is a temporary account).
Do not sign up for free trials.
After a few moments, the Google Cloud console opens in this tab.
Note: To view a menu with a list of Google Cloud products and services, click the Navigation menu at the top-left.
Activate Cloud Shell
Cloud Shell is a virtual machine that is loaded with development tools. It offers a persistent 5GB home directory and runs on the Google Cloud. Cloud Shell provides command-line access to your Google Cloud resources.
Click Activate Cloud Shell
at the top of the Google Cloud console.
When you are connected, you are already authenticated, and the project is set to your Project_ID, qwiklabs-gcp-03-4b6565f7a2ce
. The output contains a line that declares the Project_ID for this session:
Your Cloud Platform project in this session is set to qwiklabs-gcp-03-4b6565f7a2ce
gcloud
is the command-line tool for Google Cloud. It comes pre-installed on Cloud Shell and supports tab-completion.
- (Optional) You can list the active account name with this command:
gcloud auth list
- Click Authorize.
Output:
ACTIVE: *
ACCOUNT: student-03-f84fa0b39cff@qwiklabs.net
To set the active account, run:
$ gcloud config set account `ACCOUNT`
- (Optional) You can list the project ID with this command:
gcloud config list project
Output:
[core]
project = qwiklabs-gcp-03-4b6565f7a2ce
Note: For full documentation of gcloud
, in Google Cloud, refer to the gcloud CLI overview guide.
Task 1. Understanding the task
Pet theory would like to convert their invoices into PDFs so that customers can open them reliably. The team wants to accomplish this conversion automatically to minimize the workload for Lisa, the office manager.
Ruby, Pet Theory's computer consultant, gets a message from Patrick in IT...
Patrick, IT Administrator | Hi Ruby, |
Ruby, Software Consultant | Hey Patrick, |
Help Patrick set up and deploy Cloud Run.
Task 2. Enable the Cloud Run API
Open the Navigation menu () and click APIs & Services > Library. In the search bar, enter "Cloud Run" and select the Cloud Run Admin API from the results list.
Click Enable and then hit the back button in your browser twice. Your Console should now resemble the following:
Task 3. Deploy a simple Cloud Run service
Ruby has developed a Cloud Run prototype and would like Patrick to deploy it onto Google Cloud. Now help Patrick establish the PDF Cloud Run service for Pet Theory.
- Open a new Cloud Shell session and run the following command to clone the Pet Theory repository:
git clone https://github.com/rosera/pet-theory.git
- Then change your current working directory to lab03:
cd pet-theory/lab03
- Edit
package.json
with Cloud Shell Code Editor or your preferred text editor. In the "scripts" section, add"start": "node index.js",
as shown below:
...
"scripts": {
"start": "node index.js",
"test": "echo \"Error: no test specified\" && exit 1"
},
...
- Now run the following commands in Cloud Shell to install the packages that your conversion script will be using:
npm install express
npm install body-parser
npm install child_process
npm install @google-cloud/storage
- Now open the
lab03/index.js
file and review the code.
The application will be deployed as a Cloud Run service that accepts HTTP POSTs. If the POST request is a Pub/Sub notification about an uploaded file, the service writes the file details to the log. If not, the service simply returns the string "OK".
- Review the file named
lab03/Dockerfile
.
The above file is called a manifest and provides a recipe for the Docker command to build an image. Each line begins with a command that tells Docker how to process the following information:
The first list indicates the base image should use node as the template for the image to be created.
The last line indicates the command to be performed, which in this instance refers to "npm start".
- To build and deploy the REST API, use Google Cloud Build. Run this command to start the build process:
gcloud builds submit \
--tag gcr.io/$GOOGLE_CLOUD_PROJECT/pdf-converter
The command builds a container with your code and puts it in the Artifact Registry of your project.
Return to the Cloud Console, on the Navigation menu (), click VIEW ALL PRODUCTS. In the CI/CD section, select Artifact Registry > Repositories. You should see your container hosted:
Open the gcr.io repository. You should see your container hosted:
Test completed task
Click Check my progress to verify that you've performed the above task.
Build simple a REST API
Check my progress
- Return to your code editor tab and in Cloud Shell run the following command to deploy your application:
gcloud run deploy pdf-converter \
--image gcr.io/$GOOGLE_CLOUD_PROJECT/pdf-converter \
--platform managed \
--region us-east1 \
--no-allow-unauthenticated \
--max-instances=1
When the deployment is complete, you will see a message like this:
Service [pdf-converter] revision [pdf-converter-00001] has been deployed and is serving 100 percent of traffic at https://pdf-converter-[hash].a.run.app
- Create the environment variable
$SERVICE_URL
for the app so you can easily access it:
SERVICE_URL=$(gcloud beta run services describe pdf-converter --platform managed --region us-east1 --format="value(status.url)")
echo $SERVICE_URL
Test completed task
Click Check my progress to verify that you've performed the above task.
Create a Revision for Cloud Run
Check my progress
- Make an anonymous POST request to your new service:
curl -X POST $SERVICE_URL
This will result in an error message saying "Your client does not have permission to get the URL".
This is good; you don't want the service to be callable by anonymous users.
- Now try invoking the service as an authorized user:
curl -X POST -H "Authorization: Bearer $(gcloud auth print-identity-token)" $SERVICE_URL
If you get the response "OK"
you have successfully deployed a Cloud Run service. Well done!
Task 4. Trigger your Cloud Run service when a new file is uploaded
Now that the Cloud Run service has been successfully deployed, Ruby would like Patrick to create a staging area for the data to be converted. The Cloud Storage bucket will use an event trigger to notify the application when a file has been uploaded and needs to be processed.
- Run the following command to create a bucket in Cloud Storage for the uploaded docs:
gsutil mb gs://$GOOGLE_CLOUD_PROJECT-upload
- And another bucket for the processed PDFs:
gsutil mb gs://$GOOGLE_CLOUD_PROJECT-processed
- Now return to your Cloud Console tab, open the Navigation menu and select Cloud Storage. Verify that the buckets have been created (there will be other buckets there as well that are used by the platform.)
Test completed task
Click Check my progress to verify that you've performed the above task.
Create two cloud storage buckets
Check my progress
- In Cloud Shell run the following command to tell Cloud Storage to send a Pub/Sub notification whenever a new file has finished uploading to the docs bucket:
gsutil notification create -t new-doc -f json -e OBJECT_FINALIZE gs://$GOOGLE_CLOUD_PROJECT-upload
The notifications will be labeled with the topic "new-doc".
Test completed task
Click Check my progress to verify that you've performed the above task.
Create a Pub/Sub topic for handling notifications from storage bucket
Check my progress
- Then create a new service account which Pub/Sub will use to trigger the Cloud Run services:
gcloud iam service-accounts create pubsub-cloud-run-invoker --display-name "PubSub Cloud Run Invoker"
- Give the new service account permission to invoke the PDF converter service:
gcloud beta run services add-iam-policy-binding pdf-converter --member=serviceAccount:pubsub-cloud-run-invoker@$GOOGLE_CLOUD_PROJECT.iam.gserviceaccount.com --role=roles/run.invoker --platform managed --region us-east1
- Find your project number by running this command:
gcloud projects list
Look for the project whose name starts with "qwiklabs-gcp-". You will be using the value of the Project Number in the next command.
- Create a
PROJECT_NUMBER
environment variable, replacing [project number] with the Project Number from the last command:
PROJECT_NUMBER=[project number]
- Then enable your project to create Cloud Pub/Sub authentication tokens:
gcloud projects add-iam-policy-binding $GOOGLE_CLOUD_PROJECT --member=serviceAccount:service-$PROJECT_NUMBER@gcp-sa-pubsub.iam.gserviceaccount.com --role=roles/iam.serviceAccountTokenCreator
Note:
If you are getting an error as service account does not exist
on executing the above command. Enable the Cloud Pub/Sub API and if it is already enabled, first Disable it and then Enable it again. Then, re-run the above command.
- Finally, create a Pub/Sub subscription so that the PDF converter can run whenever a message is published on the topic "new-doc".
gcloud beta pubsub subscriptions create pdf-conv-sub --topic new-doc --push-endpoint=$SERVICE_URL --push-auth-service-account=pubsub-cloud-run-invoker@$GOOGLE_CLOUD_PROJECT.iam.gserviceaccount.com
Test completed task
Click Check my progress to verify that you've performed the above task.
Create a Pub/Sub subscription
Check my progress
Task 5. See if the Cloud Run service is triggered when files are uploaded to Cloud Storage
To verify the application is working as expected, Ruby asks Patrick to upload some test data to the named storage bucket and then check Cloud Logging.
- Copy some test files into your upload bucket:
gsutil -m cp gs://spls/gsp644/* gs://$GOOGLE_CLOUD_PROJECT-upload
Once the upload is done, return to your Cloud Console tab, open the Navigation menu and click VIEW ALL PRODUCTS. Select Logging under the Observability section.
In the All resources dropdown, filter your results to Cloud Run Revision and click Apply. Then click Run Query.
In the Query results, look for a log entry that starts with
file:
and click it. It shows a dump of the file data that Pub/Sub sends to your Cloud Run service when a new file is uploaded.Can you find the name of the file you uploaded in this object?
Note:
If you do not see any log entries that begin with "file", try clicking on the "load newer logs" button near the bottom of the page.
- Now return to the code editor tab and run the following command in Cloud Shell to clean up your
upload
directory by deleting the files in it:
gsutil -m rm gs://$GOOGLE_CLOUD_PROJECT-upload/*
Task 6. Containers
Patrick needs to convert a backlog of invoices to PDFs so all customers can open them. He emails Ruby for some help...
Patrick, IT Administrator | Hi Ruby |
Patrick sends Ruby the code fragment he wrote to produce a PDF from a file:
const {promisify} = require('util');
const exec = promisify(require('child_process').exec);
const cmd = 'libreoffice --headless --convert-to pdf --outdir ' +
`/tmp "/tmp/${fileName}"`;
const { stdout, stderr } = await exec(cmd);
if (stderr) {
throw stderr;
}
Ruby responds back to Patrick...
Ruby, Software Consultant | Hi Patrick |
Patrick, IT Administrator | Hi Ruby |
Building the container will require the integration of a number of components:
Update the Manifest
With all the files identified, the manifest can now be created. Help Ruby set up and deploy the container.
The package for LibreOffice was not included in the container before, which means it now needs to be added. Patrick has previously provided the commands he uses to build his application, Ruby will add these as a RUN
command within the Dockerfile.
Open the
Dockerfile
manifest and add the commandRUN apt-get update -y && apt-get install -y libreoffice && apt-get clean
line as shown below:FROM node:20 RUN apt-get update -y \ && apt-get install -y libreoffice \ && apt-get clean WORKDIR /usr/src/app COPY package.json package*.json ./ RUN npm install --only=production COPY . . CMD [ "npm", "start" ]
Deploy the new version of the pdf-conversion service
Open the
index.js
file and add the following package requirements at the top of the file:const {promisify} = require('util'); const {Storage} = require('@google-cloud/storage'); const exec = promisify(require('child_process').exec); const storage = new Storage();
Replace the
app.post('/', async (req, res)
with the following code:app.post('/', async (req, res) => { try { const file = decodeBase64Json(req.body.message.data); await downloadFile(file.bucket, file.name); const pdfFileName = await convertFile(file.name); await uploadFile(process.env.PDF_BUCKET, pdfFileName); await deleteFile(file.bucket, file.name); } catch (ex) { console.log(`Error: ${ex}`); } res.set('Content-Type', 'text/plain'); res.send('\n\nOK\n\n'); })
Now add the following code that processes LibreOffice documents to the bottom of the file:
// Helper function to check file existence (using fs.promises for async) async function fileExists(filePath) { try { await fs.promises.access(filePath); // Throws an error if the file doesn't exist return true; } catch (err) { return false; } } async function downloadFile(bucketName, fileName) { // 1. Check if the file exists const fileExistsLocally = await fileExists(`/tmp/${fileName}`); // 2. Delete if present if (fileExistsLocally) { console.log(`File exists locally. Deleting: ${fileName}`); await fs.promises.unlink(`/tmp/${fileName}`); // Use fs.promises for async file operations console.log(`File deleted.`); } else { console.log(`File does not exist locally: ${fileName}`); } // 3. Download from the storage bucket const options = { destination: `/tmp/${fileName}` }; await storage.bucket(bucketName).file(fileName).download(options); console.log(`File downloaded: ${fileName}`); } async function convertFile(fileName) { const cmd = 'libreoffice --headless --convert-to pdf --outdir /tmp ' + `"/tmp/${fileName}"`; console.log(cmd); const { stdout, stderr } = await exec(cmd); if (stderr) { throw stderr; } console.log(stdout); pdfFileName = fileName.replace(/\.\w+$/, '.pdf'); return pdfFileName; } async function deleteFile(bucketName, fileName) { await storage.bucket(bucketName).file(fileName).delete(); } async function uploadFile(bucketName, fileName) { await storage.bucket(bucketName).upload(`/tmp/${fileName}`); }
Ensure your
index.js
file looks like the following:Note:
To avoid any formatting errors, it's recommended you replace all of the code in yourindex.js
file with this example code.const {promisify} = require('util'); const {Storage} = require('@google-cloud/storage'); const exec = promisify(require('child_process').exec); const storage = new Storage(); const express = require('express'); const bodyParser = require('body-parser'); const app = express(); app.use(bodyParser.json()); const port = process.env.PORT || 8080; app.listen(port, () => { console.log('Listening on port', port); }); app.post('/', async (req, res) => { try { const file = decodeBase64Json(req.body.message.data); await downloadFile(file.bucket, file.name); const pdfFileName = await convertFile(file.name); await uploadFile(process.env.PDF_BUCKET, pdfFileName); await deleteFile(file.bucket, file.name); } catch (ex) { console.log(`Error: ${ex}`); } res.set('Content-Type', 'text/plain'); res.send('\n\nOK\n\n'); }) function decodeBase64Json(data) { return JSON.parse(Buffer.from(data, 'base64').toString()); } // Helper function to check file existence (using fs.promises for async) async function fileExists(filePath) { try { await fs.promises.access(filePath); // Throws an error if the file doesn't exist return true; } catch (err) { return false; } } async function downloadFile(bucketName, fileName) { // 1. Check if the file exists const fileExistsLocally = await fileExists(`/tmp/${fileName}`); // 2. Delete if present if (fileExistsLocally) { console.log(`File exists locally. Deleting: ${fileName}`); await fs.promises.unlink(`/tmp/${fileName}`); // Use fs.promises for async file operations console.log(`File deleted.`); } else { console.log(`File does not exist locally: ${fileName}`); } // 3. Download from the storage bucket const options = { destination: `/tmp/${fileName}` }; await storage.bucket(bucketName).file(fileName).download(options); console.log(`File downloaded: ${fileName}`); } async function convertFile(fileName) { const cmd = 'libreoffice --headless --convert-to pdf --outdir /tmp ' + `"/tmp/${fileName}"`; console.log(cmd); const { stdout, stderr } = await exec(cmd); if (stderr) { console.log(`Conversion Failed: ${stderr}`); throw stderr; } console.log(`Conversion Success: ${stdout}`); pdfFileName = fileName.replace(/\.\w+$/, '.pdf'); return pdfFileName; } async function deleteFile(bucketName, fileName) { await storage.bucket(bucketName).file(fileName).delete(); } async function uploadFile(bucketName, fileName) { await storage.bucket(bucketName).upload(`/tmp/${fileName}`); }
The main logic is housed in these functions:
const file = decodeBase64Json(req.body.message.data); await downloadFile(file.bucket, file.name); const pdfFileName = await convertFile(file.name); await uploadFile(process.env.PDF_BUCKET, pdfFileName); await deleteFile(file.bucket, file.name);
Whenever a file has been uploaded, this service gets triggered. It performs these tasks, one per line above:
Extracts the file details from the Pub/Sub notification.
Downloads the file from Cloud Storage to the local hard drive. This is actually not a physical disk, but a section of virtual memory that behaves like a disk.
Converts the downloaded file to PDF.
Uploads the PDF file to Cloud Storage. The environment variable
process.env.PDF_BUCKET
contains the name of the Cloud Storage bucket to write PDFs to. You will assign a value to this variable when you deploy the service below.Deletes the original file from Cloud Storage.
The rest of index.js
implements the functions called by this top-level code.
It's time to deploy the service, and to set the PDF_BUCKET
environment variable. It's also a good idea to give LibreOffice 2 GB of RAM to work with (see the line with the --memory
option).
Run the following command to build the container:
gcloud builds submit \ --tag gcr.io/$GOOGLE_CLOUD_PROJECT/pdf-converter
Note:
EnterY
if you receive an pop to enable the Cloud Build API
Test completed task
Click Check my progress to verify that you've performed the above task.
Create another build for REST API
Check my progress
Now deploy the latest version of your application:
gcloud run deploy pdf-converter \ --image gcr.io/$GOOGLE_CLOUD_PROJECT/pdf-converter \ --platform managed \ --region us-east1 \ --memory=2Gi \ --no-allow-unauthenticated \ --max-instances=1 \ --set-env-vars PDF_BUCKET=$GOOGLE_CLOUD_PROJECT-processed
With LibreOffice part of the container, this build will take longer than the previous one. This is a good time to get up and stretch for a few minutes.
Click Check my progress to verify the objective.
Create a new Revision
Check my progress
Task 7. Testing the pdf-conversion service
Once the deployment commands finish, make sure that the service was deployed correctly by running:
curl -X POST -H "Authorization: Bearer $(gcloud auth print-identity-token)" $SERVICE_URL
If you get the response
"OK"
you have successfully deployed the updated Cloud Run service. LibreOffice can convert many file types to PDF: DOCX, XLSX, JPG, PNG, GIF, etc.Create a script to perform the upload
cat <<'EOF' > copy_files.sh #!/bin/bash SOURCE_BUCKET="gs://spls/gsp644" DESTINATION_BUCKET="gs://${GOOGLE_CLOUD_PROJECT}-upload" # Replace with your actual bucket name DELAY=5 # Get a list of files in the source bucket files=$(gsutil ls "$SOURCE_BUCKET") # Loop through the files for file in $files; do # Construct the full path of the source file source_file_path="$file" # Copy the file to the destination bucket gsutil cp "$source_file_path" "$DESTINATION_BUCKET" # Check if the copy was successful if [ $? -eq 0 ]; then # $? is the exit status of the previous command echo "Copied: $source_file_path to $DESTINATION_BUCKET" else echo "Failed to copy: $source_file_path" fi # Sleep for 5 seconds sleep $DELAY done echo "All files copied!" EOF
Run the following command to upload some example files:
bash copy_files.sh
Return to the Cloud Console, open the Navigation menu and select Cloud Storage. Open the
-upload
bucket and click on the Refresh button a couple of times to see how the files are deleted, one by one, as they are converted to PDFs.Then click Buckets from the left menu, and click on the bucket whose name ends in "-processed". It should contain PDF versions of all files. Feel free to open the PDF files to make sure they were properly converted:
Note:
Re-run the command if you don't see all the converted PDF files in -processed
bucket.
Solution of Lab
export REGION=
curl -LO raw.githubusercontent.com/quiccklabs/Labs_solutions/master/Build%20a%20Serverless%20App%20with%20Cloud%20Run%20that%20Creates%20PDF%20Files/quicklabgsp644.sh
sudo chmod +x quicklabgsp644.sh
./quicklabgsp644.sh