Michael Tsibelman (3) [Avatar] Offline
#1
Hi Yan, great course.

But I think I still don't sure I get that is the best approach to store DynamoDB configuration if I have a microservice.
The DynamoDB is very specific for this microservice and I expose it only via API functions. So it, not a shared resource.

On the one hand data it the most valuable part of the application, and you recommending not put it into the same stack as a serverless function in order to prevent mistaken deletion. On the other hand, when multiple people are working in the same AWS account during development they could interfere with one another if they using the same table.

Would you put it into the same stack as your functions or you would put into a separate stack that is managed independently?
Yan Cui (61) [Avatar] Offline
#2
Hi Michael,

Don't think of it as a hard and fast rule, your context matters the most at the end of the day.

As you said, when you have microservices, you shouldn't be exposing the underlying dynamodb tables outside of the service anyway so they're not "shared". But nonetheless, I find there are dangers in tying the lifecycle of the functions (the compute layer) with the data - accidental deletion is one, the functions (or the API itself) become obsolete. You *should* be deleting functions that you know are no longer used, otherwise they will continue to exist as an attack surface, even more so than active functions since these obsolete functions are unlikely to be patched with latest security updates. But you might want to keep the data around in case you have other use for it in the future, so now you can't just delete the CF stack to delete everything.

My preference is to manage the data layer separately, with a another CloudFormation stack, or with Terraform, or whatever tool that you're most familiar with. It does make setting up more painful and creates friction in the development process, and honestly I have to constantly ask myself if it's even worth it - but it's one of those things that you never think you'd need until you actually need it. Unfortunately with the way CF works, you can't just unlink a resource that was originally created with the stack. Because of this inflexibility I have leaned towards "better safe than sorry".

I'm not sure what you mean by "On the other hand, when multiple people are working in the same AWS account during development they could interfere with one another if they using the same table."

How are they interfering with each other?
Michael Tsibelman (3) [Avatar] Offline
#3
Thank you, Yan

The interference I mentioned is when people are stepping on each other toes while developing or testing some functionality. They may want to delete or modify data in the table. So when the table is shared people need to be mindful this, so it creates friction. But when a table is created in a context of a particular serverless stage it can be created just for this person, we can have a stage named Yan and a stage named Michael, and we would be working fully independently while being on the same account.

You also mentioned that separating the tables into different can make development more painful and I wholeheartedly agree with this, it especially true when you have a microservices based architecture. In my eyes Amazon most recent addition of continues backup and point in time restore is a better solution for the issues you mentioned.

https://techcrunch.com/2018/04/04/aws-adds-automated-point-in-time-recovery-to-dynamodb/
Yan Cui (61) [Avatar] Offline
#4
Hi Michael,

To mitigate the interference in tests, each test should setup and delete its own test data, and the test data should be randomised. You should do that even when you're not working with Lambda, as it happens when you run tests in parallel too.

Continuous backup and restore is great, it prevents the disaster scenario. But recovery for a large table still takes a long time (might have changed since the closed beta), which causes downtime should it ever happen by accident. Although it does mitigate the problem with not being able to delete a CF stack for unused functions as you can restore the deleted table after the stack is deleted if you just want to keep the data around.

Also keep in mind that data is not limited to DynamoDB, even though it's perhaps the most popular option, and not everything offers the same backup option.

Another thing you could do, is to include the tables as resources in non-production environments, since the Serverless framework lets you reference external files, so you can configure the resources section to ref dev.json file which contains the table definitions when you're deploying to dev, but in prod.json these table definitions are omitted and managed elsewhere.
Michael Tsibelman (3) [Avatar] Offline
#5
To have database resources only for dev stage is the interesting approach, thanks for suggestion need to think it over.

But in my eyes robust backup and a separate account for production with strictly managed permission is a better tool for accidents prevention, it possible to delete your data even if you not using serverless, you could do it via console or CLI commands as well.

As to cleaning unwanted lambdas, is it possible to remove them if I remove them from the serverless.yml and redeploy?
Yan Cui (61) [Avatar] Offline
#6
You should use those backup options in production regardless how you're managing those tables, and the same extends to other data you have as well. e.g. with Kinesis, you'd want to use something like Firehose and stream all the raw data into S3 for safe keeping.

As for removing the functions with a redeploy, yes, you can. But it doesn't delete the CF stack so it's not clean. Someone will always see the CF stack and think "mm.. why is that CF stack still there? thought those functions were deleted?"

Granted, I'm probably worrying about these little things more than I thought!