534742 (1) [Avatar] Offline
#1
Hi!, first I would like to thank you very very much for your book, I loved your work so much!

I would like to make a comment regarding sampling and k folds.

On page 83, section of code in R called: Listing 3.29 K-fold validation, you first sample the data, getting a unordered list, I understand that this is what we want, then you generate K groups, but the cuts are sorted anyway, for example:

library (tidyverse)
k <- 4
indices <- sample (1:100)
folds <- cut(indices, breaks = k, labels = FALSE)

# Looks like the k folds are unordered, and the sampling is random
data.frame(indices, folds)

# but the k folds are ordered, sampling correlatively
data.frame(indices, folds) %>% arrange(indices)


If random and non-correlative sampling is desired, one option may be the following:

# this way
folds <- cut (1:length (indices), breaks = k, labels = FALSE)

# let's see
df <- data.frame (indices, folds) %>% arrange(indices)
df
table(df$folds)



Regards!
256385 (44) [Avatar] Offline
#2
Oh, that's a great catch! Unfortunately the book just went to print yesterday smilie

I will absolutely include this in errata though. Thanks again for reporting.