I have a new tool on getButterfly: a data de-duplicator for emails, URLs, IDs, names and more. It’s written in JavaScript, with no external (or server-side) dependencies.
Table of Contents
Full-featured de-duplicator
Here’s how I did it. The HTML for it is simple, and it contains 2 elements: a <textarea>
for the initial data and a <textarea>
for the de-duplicated results.
Test the de-duplicator live here.
HTML
<p>
<textarea name="masterlist" id="masterlist" rows="16" style="width:100%"></textarea>
</p>
<p>
<label><input type="checkbox" name="caps" id="caps" value="" checked>Ignore capitals (results are generated in lower case)</label>
<br>
<label><input type="checkbox" name="kpblanks" id="kpblanks" value="">Keep blanks at line starts</label>
<br>
<label><input type="checkbox" name="sort" id="sort" value="">Sort results</label>
</p>
<input type="submit" class="button" alt="Submit" title="Submit" value="De-duplicate" onclick="deduplicate()">
<a name="startresults"></a>
<p name="removed" id="removed"></p>
<textarea name="output" id="output" rows="16" style="display:none;width:100%" onclick="this.focus();this.select()"></textarea>
The JavaScript consists of one function only, and, based on your settings – ignore capitals, keep blanks at line starts, sort results – it turns the data into an array, sorts it and cleans it up:
JavaScript
function deduplicate() {
var txt = document.getElementById("masterlist").value;
txt = txt.replace(new RegExp(">", "g"), ">");
txt = txt.replace(new RegExp("<", "g"), "<");
var masterarray = txt.split("\n");
var itemsInArray = masterarray.length;
var dedupe = new Array();
i = 0;
var editedArray = new Array();
while (i < itemsInArray) {
masterarray[i] = masterarray[i].replace(/\s+$/, "");
masterarray[i] = masterarray[i].replace(new RegExp("\t", "g"), " ");
if (!document.getElementById("kpblanks").checked) {
masterarray[i] = masterarray[i].replace(/^\s+/, "");
} else {
if (masterarray[i].match(/^ +/)) {
var spc = masterarray[i].match(/^ +/);
spc[0] = spc[0].replace(/ /g, " ");
masterarray[i] = masterarray[i].replace(/^\s+/, spc[0]);
}
}
if (document.getElementById("caps").checked) {
var ulc = masterarray[i].toLowerCase();
} else {
var ulc = masterarray[i];
}
editedArray[ulc] = ulc;
dedupe[ulc] = "0";
i++;
}
i = 0;
var uniques = new Array();
for (key in dedupe) {
if (editedArray[key] != "") {
uniques.push(editedArray[key]);
}
dedupe[key] = "dontprint";
i++;
}
if (document.getElementById("sort").checked) {
uniques.sort(function (x, y) {
var a = String(x).toUpperCase();
var b = String(y).toUpperCase();
if (a > b) return 1;
if (a < b) return -1;
return 0;
});
}
var ulen = uniques.length;
var thelist = uniques.join("\n");
var rmvd = itemsInArray - ulen;
document.getElementById("removed").innerHTML =
itemsInArray +
" original lines, " +
rmvd +
" removed, " +
ulen +
" remaining.";
document.getElementById("output").value = thelist;
document.getElementById("output").style.display = "block";
window.location = "#startresults";
}
Test the de-duplicator live here.
Alternate, light version
If you want to keep it small, and you need to keep both lowercase and uppercase, here’s a lighter version, one line. I did this mostly for fun and code-golfing.
HTML
<p><textarea id="j" rows="16" style="width:100%"></textarea></p>
<p><button onclick="d();">Process</button></p>
<p><textarea id="l" rows="16" style="width:100%"></textarea></p>
JavaScript
function d(){var a=document.getElementById("j").value.split("\n"),b=[];a.forEach(function(c){b[c]=1});a=[];for(k in b)a.push(k);document.getElementById("l").value=a.join("\n")};