How to Build Robust Integrations for your Application? The production way!
Third-party integrations sound easy, but doing them in a way to not degrade the performance of your app is hard.
Building integrations with third-party services, or even your own internal services might create performance bottlenecks for your project, hence your clients. There are several ways to do them right and I'll provide you a higher overview of real-life examples with one of my projects Lopema. (if you're not a real estate agent from Bulgaria, the project is not interesting for you, but the integrations are applicable to all kinds of projects!)
The article will be structured in the following way:
- What type of integrations I'm talking about (ERPs, CRMs, etc)
- What are the possible types of data flow
- How to handle large objects and files (message brokers - Kafka, RabbitMQ, etc. Caching - Redis)
- Error handling and notifications
- Bonus performance advice
What to expect from an integration with EPR, and CRM systems?
Most of those systems are old and probably written in PHP (not that it's bad). This could mean one thing, you'll have to deal with XML or XLS files. If you're using JS (NestJS for me) you probably don't like dealing with anything short of JSON structured data.
When dealing with XML files, you'll most likely receive large files you can't handle fast enough if you iterate over the whole file. Syncing the data will be hard and resource-intensive.
If you're lucky the integration will have a way for you to listen for a webhook and update only relevant parts in your database. I've gotten lucky with only one of the integrations, but you could think of Stripe webhooks in a similar manner.
What are the possible ways for the data to flow in your integration?
In my experience, these are the two most common ways to integrate it:
- Scheduled integrations (cron jobs, schedulers, serverless functions)
- Webhooks (the most preferred way)
Let's start with scheduled integrations and what they present.
- An ERP, or third-party service provides you a file for example, XML and uploads it at a given URL or an FTP server
- You schedule a cron job to go once, twice, or however, times is needed to read and execute the syncing
This is standard practice, but it's not a "live" integration with their services, it might be slow in performance as doing it once in a file means they'll provide too much information in it.
Shortstop. If you like my articles, I would greatly appreciate it if you could subscribe to my newsletter. It's in the footer below, along with my social media accounts!
Continuing with webhook integrations.
- Set up a route that listens for requests from the service
- Validates the request by API key or any other way described in their service
- Receive only partial data and update your database so your users are in sync
Both approaches might work fine, but there are some drawbacks for each which should be thought of while implementing them. What happens with your service/project while you're dealing with the large file? Its performance might degrade and your users to suffer.
In the next chapter, we'll discuss how to improve the performance.
How to handle large objects and files with message brokers and cache?
We could use message brokers to avoid users feeling the heavy processing your service is doing. You could use RabbitMQ or any other, but I prefer Kafka.
I'll explain why we need it with an example.
If you have a large file, let's say with real estate offers. You could have thousands of them and doing it in one run would block the event loop in JS and it's not even smart to do so, because you won't have good error handling and retrying mechanism. Assume you fail the offer - 10, and instead of having a way to retry only it, you'll have to retry the whole file or stop the processing at all.
To avoid the above scenario we could create Kafka events that send each offer to the queue to be handled by our Kafka controllers. We should try to generalize and abstract the handling logic so as to be able to use it for both cron jobs and webhooks.
The flow would look like this (for cron jobs):
- Hits the hour, your job is triggered
- The file is being read offer by offer
- On each iteration of the offer, an event is sent to Kafka with the single offer information
- Cache in REDIS the information that is not often changed to avoid querying your database too much
On the handling side:
- Using our Kafka controller we receive single-offer events
- Sync your database with the offer received and map the data to your entities
This way the server doesn't get more than it can handle as it handles only 1 single request at a time and you won't be bursted with data. I'll provide you a bonus idea at the end.
Error handling and notifications
To avoid getting your handlers stuck, we need good error handling and error monitoring.
What I like to do is to avoid sending data to my "commands" without checking it. Meaning that I validate all the input before even trying to process the event itself. If there is not enough information for the request to pass, I ignore it and log it to Sentry. The reason I ignore it is to avoid getting my Kafka queue stuck and continue handling all the offers.
Your integration needs to put the error handling code in all places where possible. Handle the error gracefully and always send it to your error monitoring tool, as I mentioned Sentry in my case. In the beginning, this saves you a lot of time. Integrations and services tend to change their structure of data, ways of handling it, etc, so you need to be prepared for that.
Bonus advice for performance optimization
Even with message queues, it's sometimes not enough for high CPU-intensive operations. If you're using a k8s deployment in AWS, GPC, or anywhere it's much easier to do what I'll tell you.
With k8s you could create another pod with your service that handles ONLY Kafka messages, so your main pod won't be bothered by the work done at all.
The other way is with Docker image replicas. Assuming you're self-hosting you could add another replica with your service that is not exposed to the world and is handling only Kafka messages. (You might have to do some code changes)
If there is interest I'll provide an example in the future of how to do so with Coolify as I'm migrating slowly all my projects there.
If you have any questions, reach out to me on Twitter/X or LinkedIn. I've recently created an empty Instagram account, but I'll try to start posting! A follow is highly appreciated at all places!
You can subscribe to my newsletter below to get notified about new articles that are coming out, and I'll not spam you!
Related articles
Is Coolify Good for Production Deployments?
Self-hosting with Coolify can be a pleasent adventure, but you need to know the pros and cons.
How to deploy monitoring on your VPS with Coolify and Netdata
Deploy server-wide monitoring with Netdata and Docker compose for any VPS. Use in Coolify with a GitHub repository.
Create Production Dockerfile for NextJS - Deploy Everywhere
Dockerfiles aren't hard to do. You do it once per framerwork like NextJS and use everywhere!
TypeORM Migrations Explained - Example with NestJS and PostgreSQL
Learn to handle TypeORM migrations in all scenarios. Understand which command to run and when.
Create REDIS Service with NestJS - Use in Every Project
Implementing REDIS in a NestJS service is essential for projects that plan to scale and care about performance. Another tool in your path to production.
Create Production Dockerfile with Migrations - NestJS with TypeORM and Docker
Build a robust structure of NestJS Dockerfile with running migrations and be ready to deploy anywhere.
My Neswletter
Subscribe to my newsletter and get the latest articles and updates in your inbox!