Hi!, first I would like to thank you very very much for your book, I loved your work so much!
I would like to make a comment regarding sampling and k folds.
On page 83, section of code in R called: Listing 3.29 Kfold validation, you first sample the data, getting a unordered list, I understand that this is what we want, then you generate K groups, but the cuts are sorted anyway, for example:
library (tidyverse)
k < 4
indices < sample (1:100)
folds < cut(indices, breaks = k, labels = FALSE)
# Looks like the k folds are unordered, and the sampling is random
data.frame(indices, folds)
# but the k folds are ordered, sampling correlatively
data.frame(indices, folds) %>% arrange(indices)
If random and noncorrelative sampling is desired, one option may be the following:
# this way
folds < cut (1:length (indices), breaks = k, labels = FALSE)
# let's see
df < data.frame (indices, folds) %>% arrange(indices)
df
table(df$folds)
Regards!
