The Author Online Book Forums are Moving

The Author Online Book Forums will soon redirect to Manning's liveBook and liveVideo. All book forum content will migrate to liveBook's discussion forum and all video forum content will migrate to liveVideo. Log in to liveBook or liveVideo with your Manning credentials to join the discussion!

Thank you for your engagement in the AoF over the years! We look forward to offering you a more enhanced forum experience.

317529 (6) [Avatar] Offline
#1
Hi Will,

I wrapped up Capstone 1, and got to section C1.3, "Why Stateless programming matters". I could use your help understanding this a little more.

Coming from an OOP background, I definitely understand the importance of managing state carefully – if a value is mutable across threads, then having to take care to avoid race conditions, for example. I also understand in some languages how immutable references can allow the compiler to make certain optimizations.

But I've had a hard time getting to concrete answers about what characteristics of statelessness make it generally favorable over stateful approaches. When I've talked with other developers about this I've always gotten such vague and abstract answers, such as "it prevents all manner of problems", and "it makes it easier to reason about" -- answers so vague that I'm not entirely sure that they actually know either.

So I'm hoping you can help me understand this more.

In the example with the fast and slow robots duking it out, you said, "Because we don't have state in functional programming, we have complete control over how computation happens." But we could do the same thing in an imperative language by just sequencing the operations like we want, right? We can control how the computation happens by controlling the sequence.

In Listing C.1.16, you demonstrate how the order of the operations doesn't matter, but I'm thinking that, in a way, it still does – yes, you can declare them in any order you want in the source code, but Haskell figures out the order to execute them based on the references. So you're still indicating the order of operations, just based on the references instead of the order that they appear in the source code. I think that also means that the robot fight couldn't actually ever execute in a concurrent/parallel way, because of the nature of the dependencies between the functions.

So the best that I've been able to conclude about stateless programming so far is that it allows a declarative style of programming where the code can be written in any order that you want... which seems like I'm coming up short because it's kind of recursive – i.e., stateless programming is good because it allows you do to stateless programming.

So I feel like I must be missing something here. Can you help me with this?

Thanks!
Will Kurt (21) [Avatar] Offline
#2
To understand statelessness I think it is important to realize its relationship to referential transparency. As a quick refresher, referential transparency means that given the same argument a function always returns the same value. This makes code more predictable because you can always reason about how code will behave because it always behaves the same.


My favorite example of this is the code sample in Listing 1.2 that shows this following code which is valid in three different languages (Ruby, Python and JavaScript).


myList = [1,2,3]
myList.reverse()
newList = myList.reverse()



This gives three different answers depending on whether this is executed as JavaScript, Python or Ruby. In this case only Ruby exhibits referential transparency. For JavaScript and Python you have to know how the language works in order to predict the result. This is a classic case of a “leaky abstraction”. A programming language is supposed to abstract out all the implementation details, if you have to know how the language operates under the hood to predict its results, then that language has flaws in its abstraction. Even in Stateful OOP proper encapsulation implies referential transparency.


If I had written a new language and told you the reverse example was valid in it as well, then told you that all the core libraries uphold referential transparency then you would know how this code works even if you have no idea how the internals are built. If I said “not all the functions uphold referential transparency” you would need to ask me implementation details in order to predict the output in my language. Here are some of the benefits of referential transparency:

  • The code is easier to reason about because you can always predict the results

  • Additionally referential transparency means that if we’re programming asynchronously or distributedly, where the order of execution is not guaranteed we can still predict the results.

  • Because anytime a function is called with the same argument we get the same result, the compiler only needs to calculate the value of a f(10) once, even if f(10) used 1000 times in your code.



  • Now, why does statelessness matter? Suppose I wrote my own Ruby methods and gave you the following code:


    myList = [1,2,3]
    myList.myReverse()
    newList = myList.myReverse()



    Do you still know the answer? No, because Ruby does allow for stateful programming there’s no way to predict how this code behaves. However if stateful programming is not allowed by the language then referential transparency is guaranteed. You can trust that every library you use follows these rules. Additionally the compiler cannot make optimitization that assume referential transparency if referential transparency is not gauranteed.


    Of course, you don’t need statelessness to preserve referential transparency. If you follow correct abstraction and encapsulation an object in OOP can manipulate state invisibly and you can still have reliable code. The most extreme example is that at the end of the day clearly even Haskell is compiled to machine code. Each time you run a program a list you use is assigned a different block of memory, etc. However you never, ever have to worry about how the machine code uses state because it is well enough abstracted out. Though imagine if each time you compiled your program there was a chance it behaved differently!


    You can write safe stateful code, just like you can safely use global variables. However in practice stateful code means we have no guarantees and leaky abstractions are abundant, just like global variables in the early days of JS were a nightmare.

    I hope that clarifies things a bit, let me know if you need more clarification!
    317529 (6) [Avatar] Offline
    #3
    Thanks for taking the time to write back. I've been tossing all this around in my head for a little bit, and I think I'm gradually working my way there. Here's what I'm thinking – let me know if I'm on track:

    Statelessness is really just a step to achieve the real goal, which is Referential Transparency. The valuable characteristics of Referential Transparency are:

    1. Reduces cognitive load on the developer because they can consider a function without the need to regard the dimension of time. That is, "when" a function is called during a program's lifetime is inconsequential because there's no state that can change over time.

    2. Allows for concurrent processing. Again, this is because results cannot change over time.

    3. Reduces cognitive load on the developer because sequencing is made clearer (as in the robot fight example).

    4. Enables easy caching of results.

    Does that sound about right?


    Also, I'd like you to elaborate on these quotes:
    "Even in Stateful OOP proper encapsulation implies referential transparency"

    and
    "You don't need statelessness to preserve referential transparency"

    I know you gave the example of compiled Haskell machine code, but maybe you can explain this quote with some Ruby or Java code? How can I achieve referential transparency without statelessness?

    Thanks!
    Will Kurt (21) [Avatar] Offline
    #4
    I know you gave the example of compiled Haskell machine code, but maybe you can explain this quote with some Ruby or Java code? How can I achieve referential transparency without statelessness?


    Absolutely! Here's some very straight forward Java that modifies state but preserves referential transparency:

    public static int sum(int[] nums){
        int total = 0;
        for(int i = 0; i < nums.length; i++){
            total += nums[i];
        }
        return total;
    }


    In this case both the variables total and i are updated repeatedly, meaning that we are changing state. However, if you are using sum and have absolutely no idea how it's implemented, everytime you call it on the same array you'll get the exact same result. The thing about referential transparency (which I should probably mention in the book) it that it is perfectly aligned with proper encapsulation in OOP.

    Counter examples can also be helpful. There are times when we actually need state that violates referential transparency. A great example is Java's Arrays.sort method. Arrays.sort does not return a value, it modifies the array that you pass into it. If you think of this in terms of encapsulation this is a bad idea. You have to have a sense of what Arrays.sort is doing internally to understand how to use it. This can lead to bugs. If there is a natural order to values in an array, for example employee ids based on the order employees arrive at a meeting, calling Arrays.sort will destroy this information (if you don't have a copy of the array lying around). But Arrays.sort is not bad code. From a performance standpoint performing an in place array sort can be absolutely essential in many applications.

    State in code is very similar to global variables. There are times where a global variable is helpful (though very rarely absolutely necessary), however the number of times you can accidentally shoot yourself in the foot with global variables are far more common. Many languages treat global variables like Haskell does state: if you absolutely must there is a way but you have to prove to me you mean it. For example ruby requries a $ to prefix global variables. JavaScript, as mentioned before, does not prevent global variables the same way most programming languages do not prevent stateful code. The result in both cases is that the burden is on the programmer to program defensively to avoid errors that may occur because someone else abused the language feature.

    A final example of the usefulness of entirely stateless programs is Amazon's Lambda. From their documentation:

    The code must be written in a “stateless” style i.e. it should assume there is no affinity to the underlying compute infrastructure. Local file system access, child processes, and similar artifacts may not extend beyond the lifetime of the request.


    And again from their documentation:

    Q: Why must AWS Lambda functions be stateless?
    
    Keeping functions stateless enables AWS Lambda to rapidly launch as many copies of the function as needed to scale to the rate of incoming events. While AWS Lambda’s programming model is stateless, your code can access stateful data by calling other web services, such as Amazon S3 or Amazon DynamoDB.


    I definitely think you have the right idea overall. The big win in Haskell is that statelessness guarantees referential transparency. And in places where you need state (such as IO and in place array opterations), Haskell's type system forces you to separate code that is stateful from the rest of your code (unit 4 covers IO, and makes plenty of stateful changes to the world).

    Hope that helps, and thanks so much for your questions!