I understand, what one book can’t fit all deep learning wisdom.

I saw NOTE about it on page 213, but I will be appreciate for some tiny sequence masking demo. I google a little, but still not found clean explanation and best practice for this technique.

Also, what about Stateful LSTM? No word about it in the book.