jarias (6) [Avatar] Offline
#1
Hi, thank you for the course! I've been reading your blog for a while and this is like having a long wanted conversation with you smilie

I would like to know your personal point of view on how to implement the many repositories approach that you mention on unit 6. We've been using monorepos for our microservices for a while now, and we have tried to replicate this for our severless applications.

However I feel that the number of subprojects explodes when it comes to serverless, and that there is some sense of heterogeneous feel about the nature of the different repos tracked in the monorepo. We have three kind of repos in out projects:

- Serverless services: which are a collection of lambdas and a serverles.yml file.
- Shared libraries: which are node modules.
- Resources: which are serverless.yml files defining only shared resources (we find more convenient this format rather than cloudformation or terraform as it is easier to reference them from other templates).

We track the changes on each subproject in our CI platform and run tests and deployments (via sls) as needed. We are using just a collection of scripts for this and it is working fine for now, but I can't stop thinking that we are building something extremely fragile.

I haven't seen a single example of monorepos from the pros... Can you share with me some of your thoughts or experiencies with this kind of model?
Yan Cui (68) [Avatar] Offline
#2
Hi ya,

Glad you're enjoying my posts on serverless, it's been a nice creative outlet for me to get lots of ideas out of my head and organise them in some coherent way.

My personal feeling towards monorepoes is that, in general (not limited to serverless), I think they are really hard to pull off as the no. of projects and people working on those projects grow:

  • the amount of knowledge a new joiner needs to acquire grows with the overall complexity of the system (as opposed to a single project that they need to touch, if that project was in its own repo), Michael Nygard's post on coherence penalty offers a really good explanation for this

  • there's an increased chance for concepts and abstractions to leak through project boundary because accidental sharing (and therefore accidental coupling between services) is easy when sharing code inside the same repo is easy and offers less friction than to share through shared lib which needs to be published to NPM first, etc.

  • release tracking and labelling becomes more difficult, becomes every release for every service is tagged and they all show up in the same github repo

  • similarly, understanding the trail of changes for a particular service also becomes more difficult if you're sharing code in the same repo as opposed to via NPM packages, e.g. a service might have changed because one of its dependencies is changed, but if that dependency change happened via a shared module in the same repo (as opposed to via an explicit package update) then it might not be reflected in the commit history for that service's project folder


  • Of course, it's possible to do monorepoes well, but it relies on strong discipline across the team to avoid the many pitfalls present in this approach, and in my experience when you're heavily reliant on discipline, and mix in staff turnover and new joiners, then it becomes a failure waiting to happen - any moment of ill-discipline or corner-cutting (or to acquire some tech debt in exchange for temporary velocity to meet a deadline) can have a lasting effect.

    I have heard some game companies use this approach but they tend to have small, skillful, and stable team so it was probably OK for them to enforce and rely on discipline, but I think it's a tight rope to walk on.

    I think you don't see any examples of monorepoes from the pros is because pretty much all of these guys have come from the microservices world where having individual repoes for each service and shared library is the norm, and a serverless architecture is almost always a microservice architecture too - microservice is an architectural style, and shouldn't be conflated with implementation technologies like EC2 or Docker - so the same principles apply.
    jarias (6) [Avatar] Offline
    #3
    Hi Yan,

    thanks for the valuable information, I´m actually applying a lot of contents from this course to our product, apart from many tests and some personal projects this is the first fully serverless architecture that I design. We are a small focused startup team with limited budget and the adoption of this tech stack has allowed us to do a lot more with less resources.

    I have successfully implemented the monorepo pattern in another project very recently, and I can totally see your point about service boundary relaxation and the difficulty to adapt to the project for newcomers. We faced many issues as well with versioning and change visibility, however we gained a lot of power in terms of integration as we could fully test and deploy our system using the same CI pipeline (it was an edge device running microservices as docker containers, each repo within the monorepo was an independent service/image so we could spin all of them as ephemeral containers during the same pipeline).

    I gave the thought to follow the same approach with this new serverless project, however after your response I tried to implement the system using individual repos, testing gets easier as test/dev stages are very easy to be persisted when performing integration and acceptance tests (I really found that part of the course useful). It is being a pleasant journey for now, however I am struggling a lot with shared resources and cross-stack references.

    I would like to know you personal view on this matter, how do you handle these references? I find difficult to maintain a good awareness of the different resources names, which ones are deployed or the post-deploy references to procedural ARNs and names when the services reside in different repos. I am currently using cloudformation output to retrieve this information during the CI pipeline but I found it difficult to maintain a single source of truth.

    Cheers!
    jarias (6) [Avatar] Offline
    #4
    After a hard week I can answer myself, and save the valuable time of Yan to directly elaborate a response on my new findings smilie

    The particular use cases I found difficult to integrate where those in which I needed to reference other resources in my handler's code, i.e. reference a dynamodb table from an external service. In such cases CloudFormation cross-stack reference is not enough, as we cannot trespass the "Resources" section of the serverless.yml file.

    That is when I found this, which is timidly documented in the serverless docs:

    https://github.com/serverless/serverless/pull/3575
    https://serverless.com/framework/docs/providers/aws/guide/variables/#reference-cloudformation-outputs

    With that missing piece, I can now easily develop each service in isolation and just cross reference anything I might need. There is however something that I have not completely decided, how would I orchestrate integration tests?

    If I push a new commit to a repository into a stage that requires to perform an integration test, how can I have the awareness about the state of the referenced resources?, i.e.

    I have a REST API that writes to a DynamoDB table. If both my api and table are in different services... How can I create the specific environment or check the output within the context of the CI job? Perhaps the indivisible unit of service splitting is integration tests awareness?

    589343 (1) [Avatar] Offline
    #5
    jarias wrote:After a hard week I can answer myself, and save the valuable time of Yan to directly elaborate a response on my new findings smilie

    The particular use cases I found difficult to integrate where those in which I needed to reference other resources in my handler's code, i.e. reference a dynamodb table from an external service. In such cases CloudFormation cross-stack reference is not enough, as we cannot trespass the "Resources" section of the serverless.yml file.

    That is when I found this, which is timidly documented in the serverless docs:

    https://github.com/serverless/serverless/pull/3575
    https://serverless.com/framework/docs/providers/aws/guide/variables/#reference-cloudformation-outputs

    With that missing piece, I can now easily develop each service in isolation and just cross reference anything I might need. There is however something that I have not completely decided, how would I orchestrate integration tests?

    If I push a new commit to a repository into a stage that requires to perform an integration test, how can I have the awareness about the state of the referenced resources?, i.e.

    I have a REST API that writes to a DynamoDB table. If both my api and table are in different services... How can I create the specific environment or check the output within the context of the CI job? Perhaps the indivisible unit of service splitting is integration tests awareness?



    This is my exact problem too. How to orchestrate deployment? This deserves a critical part of this course to be honest. It can't be expected that large organizations put everything in a single serverless.yml file. Similarly, it can't be expected that these different services don't need to reference each other. E.g. a web service would want to know an s3 bucket reference. Therefore orchestration is necessary.
    jarias (6) [Avatar] Offline
    #6
    After a few months of serverless intensive activity (architecting, developing and monitoring) with a production system I have found that many of the problems that you may face have been already solved (at least in terms of design and engineering) by many other disciplines (cloud architecting, microservices...).

    Thanks to courses like this and bibliography kindly suggested by Yan, you can discover patterns that will solve your problems and change your mindset.

    Tooling, however, is another story... My most practical advice would be... Learn how to be fluent in AWS cloudformation and your life will be easier. Ther serverless framework is a nice entry point to developing serverless applications, especially when it comes to FaaS platforms such as AWS lambda... but if you are not proficient with the rest of the ecosystem then you will face these "already" solved problems, such as... "How can I create microservices with cross stack references?!". I learnt it the hard way and then I invested all my sleep time into properly developing not so serverless systems in my prefered cloud vendor.

    We already have lots of unsolved tooling problems as Yan mentions in its blog and courses.
    joebowbeer (14) [Avatar] Offline
    #7
    Yan, I'm surprised that you're not onboard with monorepossmilie I clicked on this topic looking for some guidance as well. My impression (based on some experience) is that there are plenty of pros (e.g., Google) who are using monorepos -- but that in the nodejs world in particular the pros are arriving late to the party.

    The biggest missing piece for monorepos is CI/CD support in the standard tools such as travis, circleci, codepipeline.

    I'm not interested in debating the issue here, though, in brief, my view is that there are many burdens faced by dev teams that must be balanced. Coherence penalty is one. (Thanks for the reference!) But this may be offset by the cost of maintaining lots of separate repos and coordinating changes among them. I favor a rule of thumb such as: "one repo per team", or "as few repos as possible". If the coherence penalty becomes too large then split the team (and repo)...

    I do suggest that you try not to exclude monorepo (and workspaces) from your coverage, given that they are increasingly popular in nodejs land.