@dashdsrdash @mhoye slightly aside from your main point here, I quite like that as a definition of "big data". If it's big data then it doesn't fit on a single computer and you need to do something distributed to get it to fit. Has some nice side benefits like accurately tracking that the definition of "big" changes every year as new components with larger storage capacity become available. Also "medium data" if it fits in SSDs on a single computer, "small data" if it fits in RAM (yes this can be multiple TBs ofc) and "tiny data" if it fits in RAM on a cheap computer.