Mini rant:

Not to call out anyone in particular, but I've noticed an extremely misleading pattern in these "getting started with ML" type guides. 😬

Learning how to do data visualization isn't a "fun extra" if you're getting into ML: it's absolutely essential.

Follow

Like I'm not too bothered about what sort of tool you're using but at a *minimum* you should be able to make:

1. Frequency plots (finding outliers/anomalies, corr. matrices, etc.)
2. Time series visualizations (test/train error)
3. Plot errors (AUC is a *standard measure*)

And arguably even more important: you should be able to read all these plots. This isn't ~secret advanced stuff~ it's what I'd expect a junior ML engineer to know in a job interview.

Look through literally any system-based ML paper. They've all got figures! Because they're necessary! And many, many, many more were likely created during the research and development process.

(To be fair, some research papers will only have system diagrams and put all the system performance info in tables, but outside of pure proof-based papers I genuinely can't remember the last time I read a paper with no figures at all.)

WHY are folks telling people they can learn to do ML without leaning how to use one of the most essential tools for looking at data quality and model performance? 😩

Sure, there are workarounds (especially if needed for visual accessibility) but plotting is a standard tool.

Anyway, if you're looking for a good very intro guide, I like:

The R for Data Science chapter for R: r4ds.had.co.nz/data-visualisat
The @swcarpentry@twitter.com intro to Python: swcarpentry.github.io/python-n

(Notice that plotting is basically the first thing taught in these courses! Not a coincidence!)

Sign in to participate in the conversation
Mastodon

The social network of the future: No ads, no corporate surveillance, ethical design, and decentralization! Own your data with Mastodon!