R Programming

Efficiently De-Duplicate a Dataset

Jeff Jones

The duplicated function in R can handle an entire data.frame, or several columns in a single pass. Making it a very useful tool when eliminating redundancy within datasets. The most common form would be to de-duplicate the entire data.frame, however, there are other very useful variations.

Compare this to methods described elsewhere: stackoverflow Jun 2014 stackoverflow Jul 2015