The Author Online Book Forums are Moving

The Author Online Book Forums will soon redirect to Manning's liveBook and liveVideo. All book forum content will migrate to liveBook's discussion forum and all video forum content will migrate to liveVideo. Log in to liveBook or liveVideo with your Manning credentials to join the discussion!

Thank you for your engagement in the AoF over the years! We look forward to offering you a more enhanced forum experience.

nwchrisb (1) [Avatar] Offline
Under Listing 2.3, the text states:

"Note how we return both the normalized data and the factor with which the data was normalized. We do this because any new data, for example for prediction, will have to be normalized in the same way in order to yield meaningful results."

The factor returned in Listing 2.3 will be incorrect for new data normalization if the new data's value falls outside the range calculated in the function. Maybe this is something that will come up in the chapter on feature engineering (along with a clever way to deal with it?), but wouldn't a cautionary note of some kind be helpful here?
henrik.brink (22) [Avatar] Offline
Re: Note on feature normalization in Listing 2.3
Hi, thank you for the feedback! We'll make sure to check the listing and make this section more clear for the next MEAP update.
shaolang (3) [Avatar] Offline
Re: Note on feature normalization in Listing 2.3

I may be missing something, but the normalize_feature function does not normalize data correctly. If d_min turns out to be 1, not 0, then the normalizing is incorrect, because factor would be computed as 2/9.

Message was edited by:
202502 (1) [Avatar] Offline
Sorry if I'm missing something obvious, but as of version 7 of the MEAP, isn't normalize_feature incorrectly defined? That is, when I feed it these values:

data = np.array([-383.0, 4.0, 14.0, 18.0, 76.0, 1024.0])

...I get the following normalized values:

[-1.54442075 -0.99431414 -0.9800995 -0.97441365 -0.89196873 0.45557925]

...which falls outside the allowed range of [-1, 1]. Seems like this might be better?:

def normalize_feature(data, x_min_new = -1, x_max_new = 1):
    x_min_old, x_max_old = min(data), max(data)
    factor = (x_max_new - x_min_new) / (x_max_old - x_min_old)
    normalized = x_min_new + factor*(data - x_min_old)
    return normalized, factor