The best process for effective group decisions

First post in a series on group decision-making.

more ...

The many and varied audiences for Big Data training

There’s a wide range of Big Data training on offer, from full in-person university degree programs (in the North American context, typically Master’s degrees), to certificates from online universities (such as Coursera or EdX), to short professional development courses from for-profit trainers. A common element of the sales pitches is the presumed universality of the need. Every discipline and nearly every profession, it is claimed, will benefit from the technologies of Big Data. Some pitches make an even stronger claim, that incorporating data analysis is now essential for success.

more ...

Random access to long string bodies incurs cache misses

As I described in the last post, shuffling the items in the source vector leaves the string handles in contiguous sequential order while introducing disorder into the sequence of string bodies. The NShuffled “data set” is really a family of 11 related data sets, differing in the fraction of their items that were shuffled. All sets start with the same vector src of 50,000 elements, then std::shuffle() is run on the first N items in the vector, where N ranges from 0 (no items shuffled) to 50,000 (all items shuffled) in increments of 5,000. All the data sets will have 37,500 (75% of 50,000) of their source elements copied to the result vector.

more ...

For long strings, access order determines performance

In the last post, I described several possible causes of the slowdown for long strings in the by-value version of the idioms. I concluded the most likely actual cause was the order in which the string bodies are accessed.

more ...

Sidling up to the long string slowdown

In the last post, I highlighted an anomaly in the results for filtering long strings using the by-value implementation of the loop. The usual suspect for such outcomes, heap fragmentation, didn’t seem to apply in this case. What might be the actual source of the problem?

more ...