jarias (5) [Avatar] Offline
Hi, thank you for the course! I've been reading your blog for a while and this is like having a long wanted conversation with you smilie

I would like to know your personal point of view on how to implement the many repositories approach that you mention on unit 6. We've been using monorepos for our microservices for a while now, and we have tried to replicate this for our severless applications.

However I feel that the number of subprojects explodes when it comes to serverless, and that there is some sense of heterogeneous feel about the nature of the different repos tracked in the monorepo. We have three kind of repos in out projects:

- Serverless services: which are a collection of lambdas and a serverles.yml file.
- Shared libraries: which are node modules.
- Resources: which are serverless.yml files defining only shared resources (we find more convenient this format rather than cloudformation or terraform as it is easier to reference them from other templates).

We track the changes on each subproject in our CI platform and run tests and deployments (via sls) as needed. We are using just a collection of scripts for this and it is working fine for now, but I can't stop thinking that we are building something extremely fragile.

I haven't seen a single example of monorepos from the pros... Can you share with me some of your thoughts or experiencies with this kind of model?
Yan Cui (46) [Avatar] Offline
Hi ya,

Glad you're enjoying my posts on serverless, it's been a nice creative outlet for me to get lots of ideas out of my head and organise them in some coherent way.

My personal feeling towards monorepoes is that, in general (not limited to serverless), I think they are really hard to pull off as the no. of projects and people working on those projects grow:

  • the amount of knowledge a new joiner needs to acquire grows with the overall complexity of the system (as opposed to a single project that they need to touch, if that project was in its own repo), Michael Nygard's post on coherence penalty offers a really good explanation for this

  • there's an increased chance for concepts and abstractions to leak through project boundary because accidental sharing (and therefore accidental coupling between services) is easy when sharing code inside the same repo is easy and offers less friction than to share through shared lib which needs to be published to NPM first, etc.

  • release tracking and labelling becomes more difficult, becomes every release for every service is tagged and they all show up in the same github repo

  • similarly, understanding the trail of changes for a particular service also becomes more difficult if you're sharing code in the same repo as opposed to via NPM packages, e.g. a service might have changed because one of its dependencies is changed, but if that dependency change happened via a shared module in the same repo (as opposed to via an explicit package update) then it might not be reflected in the commit history for that service's project folder

  • Of course, it's possible to do monorepoes well, but it relies on strong discipline across the team to avoid the many pitfalls present in this approach, and in my experience when you're heavily reliant on discipline, and mix in staff turnover and new joiners, then it becomes a failure waiting to happen - any moment of ill-discipline or corner-cutting (or to acquire some tech debt in exchange for temporary velocity to meet a deadline) can have a lasting effect.

    I have heard some game companies use this approach but they tend to have small, skillful, and stable team so it was probably OK for them to enforce and rely on discipline, but I think it's a tight rope to walk on.

    I think you don't see any examples of monorepoes from the pros is because pretty much all of these guys have come from the microservices world where having individual repoes for each service and shared library is the norm, and a serverless architecture is almost always a microservice architecture too - microservice is an architectural style, and shouldn't be conflated with implementation technologies like EC2 or Docker - so the same principles apply.
    jarias (5) [Avatar] Offline
    Hi Yan,

    thanks for the valuable information, I´m actually applying a lot of contents from this course to our product, apart from many tests and some personal projects this is the first fully serverless architecture that I design. We are a small focused startup team with limited budget and the adoption of this tech stack has allowed us to do a lot more with less resources.

    I have successfully implemented the monorepo pattern in another project very recently, and I can totally see your point about service boundary relaxation and the difficulty to adapt to the project for newcomers. We faced many issues as well with versioning and change visibility, however we gained a lot of power in terms of integration as we could fully test and deploy our system using the same CI pipeline (it was an edge device running microservices as docker containers, each repo within the monorepo was an independent service/image so we could spin all of them as ephemeral containers during the same pipeline).

    I gave the thought to follow the same approach with this new serverless project, however after your response I tried to implement the system using individual repos, testing gets easier as test/dev stages are very easy to be persisted when performing integration and acceptance tests (I really found that part of the course useful). It is being a pleasant journey for now, however I am struggling a lot with shared resources and cross-stack references.

    I would like to know you personal view on this matter, how do you handle these references? I find difficult to maintain a good awareness of the different resources names, which ones are deployed or the post-deploy references to procedural ARNs and names when the services reside in different repos. I am currently using cloudformation output to retrieve this information during the CI pipeline but I found it difficult to maintain a single source of truth.

    jarias (5) [Avatar] Offline
    After a hard week I can answer myself, and save the valuable time of Yan to directly elaborate a response on my new findings smilie

    The particular use cases I found difficult to integrate where those in which I needed to reference other resources in my handler's code, i.e. reference a dynamodb table from an external service. In such cases CloudFormation cross-stack reference is not enough, as we cannot trespass the "Resources" section of the serverless.yml file.

    That is when I found this, which is timidly documented in the serverless docs:


    With that missing piece, I can now easily develop each service in isolation and just cross reference anything I might need. There is however something that I have not completely decided, how would I orchestrate integration tests?

    If I push a new commit to a repository into a stage that requires to perform an integration test, how can I have the awareness about the state of the referenced resources?, i.e.

    I have a REST API that writes to a DynamoDB table. If both my api and table are in different services... How can I create the specific environment or check the output within the context of the CI job? Perhaps the indivisible unit of service splitting is integration tests awareness?