Gavin (41) [Avatar] Offline
#1
Perhaps mention that you should never, ever expose your TEST data any training exercises.


This evaluation is conducted in a formal way, splitting the data available into a 80% training set and 20% testing set. Another important objective is to determine if there are ay important business issues that have not been sufficiently considered.
Alessandro Negro (10) [Avatar] Offline
#2
I don't get what you mean. Give me a concrete example
Gavin (41) [Avatar] Offline
#3
Hi again,

For all my comments - please read them with an initial;
"I am happy to be wrong..."


In your statement here;

This evaluation is conducted in a formal way, splitting the data available into a 80% training set and 20% testing set. Another important objective is to determine if there are ay important business issues that have not been sufficiently considered.

I would add a sentence about the data you use for training and the data you use for testing - should never, ever be the same data.
Don't pollute one data-set with items from the other.
Not just for "this" iteration of your solution - but always.
Alessandro Negro (10) [Avatar] Offline
#4
Is this not clear in the word "split"?
Gavin (41) [Avatar] Offline
#5
Hi Alessandro,

It is clear. but I know the topic already, as do you.

But not every reader will have the appropriate base knowledge.
Only because this is the intro - "I" would be a little more specific about the requirement of them NEVER being used outside of the original tasks you assign the individual chunks of data in your 80/20 split.


Gavin.