531586 (2) [Avatar] Offline
#1
I've run into an error in trying to run the code in "Listing 6.9 Tokenizing the text of the raw IMDB data (p.178 )"

I did not see this error mentioned elsewhere in the forum, so wanted to post it to see if anyone has insight into its cause. I'm currently running the latest versions of R and R-Studio on OSX High Sierra, and have a problem with:

"keras.preprocessing.sequence.pad_sequences(sequences, maxlen=None, dtype='int32', padding='pre', truncating='pre', value=0.)"

Here is my code up to the point of failure. Any help in resolving this would be greatly appreciated. Thanks!


> library(keras)
> imdb_dir <- "~/Downloads/aclImdb"
> train_dir <- file.path(imdb_dir, "train")
> labels <- c()
> texts <- c()
> for (label_type in c("neg", "pos")) {
+ label <- switch(label_type, neg = 0, pos = 1)
+ dir_name <- file.path(train_dir, label_type)
+ for (fname in list.files(dir_name, pattern = glob2rx("*.txt"),
+ full.names = TRUE)) {
+ texts <- c(texts, readChar(fname, file.info(fname)$size))
+ labels <- c(labels, label)
+ }
+ }

> maxlen <- 100
> training_samples <- 200
> validation_samples <- 10000
> max_words <- 10000
> tokenizer <- text_tokenizer(num_words = max_words) %>%
+ fit_text_tokenizer(texts)
> sequences <- texts_to_sequences(tokenizer, texts)
> word_index = tokenizer$word_index
> cat("Found", length(word_index), "unique tokens.\n")
Found 88584 unique tokens.
> data <- pad_sequences(sequences, maxlen = maxlen)
Error in as.vector(x, "list") :
cannot coerce type 'environment' to vector of type 'list'
256385 (46) [Avatar] Offline
#2
If you download the very latest version of Keras from GitHub this error shouldn't occur (it doesn't occur in my configuration). Try
devtools::install_github("rstudio/keras")
531586 (2) [Avatar] Offline
#3
Thanks! That does the trick...