This could also be a teaching moment about bad chart design.

The legend at the bottom says: "fraction of defaulted loans" but the fractions don't add up as you run your eye up the chart.

The top bar has a pointer: "The dark area represents the fraction of loans *in this category* that defaulted."

I assume "in this category" means the credit history at the far left.

Which contradicts the legend at the bottom, or at least makes it very confusing. There are three dimensions, fraction of defaulted loans, faction of loans in a category and the category. And then Figure 1.2 agrees with the top bar pointer.

One suggestion would be to represent the percentage that each category contributed to all loan failures.

Re: 1.2.2 Data Collection and Management - Figure 1.2
That graph abstracts out the relative proportions of loans in each category (eg, each category is normalized to one). So each bar represents all of the
loans in that category, and the dark blue region represents the fraction of loans in that category that defaulted.

So would it help if the x-axis label said "fraction of defaulted loans in each category"? Or some other, better, label? Suggestions welcome.

The point of the graph is supposed to be that the rate of default is higher among certain groups of borrowers that you would think would be safe (those that have paid off all previous loans), and that's surprising. So you have to think about why that might be, and if it could give misleading results, before barrelling ahead with the modeling. (3) [Avatar] Offline
Re: 1.2.2 Data Collection and Management - Figure 1.2
I'm not sure re-labeling addresses the problem.

The supplemental annotation to the top bar says:

"The dark area represents the fraction of loans in this category that defaulted."

So more than 50% of the loans with no credits/all paid back defaulted? That is what the note says.

It is mixing the total loans made (not represented by this chart) or percentage of those loans, with the faction of defaulted loans.

Yes? Mixing two different scales. One represented, % of defaulted loans, the other not, % of total loans.

Granting it is a graphic that raises questions, but not because of the data.

Re: 1.2.2 Data Collection and Management - Figure 1.2
I think once we have all the text and labels in sync the graph will be much clearer.