I was recently involved in a project where my team and I should migrate our NoSQL databases from one database platform to Azure Cosmos DB. The subject for the migration project is under a legal requirement to delete certain documents after a fixed time. Consequently, one of the challenges was to solve and harmonize Time to Live expiration handling between our current vendor and Cosmos DB and find a reasonable strategy to solve our legal obligations. My team and I explored two different options which I will try to describe in this post.

When Time to Live is enabled in Cosmos DB the expiration is sliding. That means that each time a document is modified, the Time to Live is reset and the document lifetime is extended1. In detail, the _ts (timestamp) property is updated to reflect the last modification and the Time to Live is calculated to expire in n seconds. The deletion of documents based on TTL expiration runs automatically and does not consume additional Request Units2.

Use Logic Apps to run a scheduled deletion job

The details of this strategy are pretty straightforward; invoke a delete operation on expired documents using a scheduled job. In this case, an Azure Function that invokes a deletion is invoked by a scheduled trigger in Logic Apps.

Diagram explaining trigger flow using Logic Apps

This approach would replace the need for the built-in TTL expiration and would require the data model, the document, to keep a track on its expiration. This strategy would consume Request Units in Cosmos DB. Although the cost might be insignificant, be aware that your mileage will vary.


Subscribe to the Change Feed and decrement TTL on document updates

Consumers of the SQL API in Cosmos DB are able to subscribe to the Change Feed using Azure Functions. That gives the advantage of reacting on changes, and use this as a function to decrease the remaining TTL period on the document.

Diagram explaining flow using Change Feed

In this case, we would implement a function which performs two operations:

  1. Calculate the remaining TTL period.
  2. Dispatch a change event with updated TTL.

This would require the data model, the document, to keep track of its expiration or its creation. It would also require an abort function since the decrement function is triggering document changes. It is quite easy to end up in an infinite loop.

An ideal approach would be to check the differences in the update action and only run our decrement update when specified properties have changed. At the time of writing, however, this is not supported3, so another strategy would be to implement a TTL grace period that would check if the current or computed TTL value is within a specified time range and abort the update.

This strategy would utilize the built-in TTL deletion and not consume any RUs for the deletion, but it would add computational complexity with the diff calculation and the abort function. It would also consume RUs for the update operation.

Summary

In the end, we ended up trying the TTL decrement strategy and keeping the scheduled job as a fallback option if the selected strategy fails or proved too costly.

It is not hard to see that both strategies have both advantages and disadvantages, and them both would add technical complexity and cost. One might also argue that Cosmos DB would be a poor fit for our needs. In this case, however, another provider was not an option.

How would you solve it? Please reach out and tell me!