How to code your own JavaScript de-duplicator

on in JavaScript Arrays & Objects
Last modified on

I have a new tool on getButterfly: a data de-duplicator for emails, URLs, IDs, names and more. It’s written in JavaScript, with no external (or server-side) dependencies.

Table of Contents

Full-featured de-duplicator

Here’s how I did it. The HTML for it is simple, and it contains 2 elements: a <textarea> for the initial data and a <textarea> for the de-duplicated results.

Test the de-duplicator live here.

HTML

<p>
    <textarea name="masterlist" id="masterlist" rows="16" style="width:100%"></textarea>
</p>

<p>
    <label><input type="checkbox" name="caps" id="caps" value="" checked>Ignore capitals (results are generated in lower case)</label>
    <br>
    <label><input type="checkbox" name="kpblanks" id="kpblanks" value="">Keep blanks at line starts</label>
    <br>
    <label><input type="checkbox" name="sort" id="sort" value="">Sort results</label>
</p>

<input type="submit" class="button" alt="Submit" title="Submit" value="De-duplicate" onclick="deduplicate()">

<a name="startresults"></a>

<p name="removed" id="removed"></p>
<textarea name="output" id="output" rows="16" style="display:none;width:100%" onclick="this.focus();this.select()"></textarea>

The JavaScript consists of one function only, and, based on your settings – ignore capitals, keep blanks at line starts, sort results – it turns the data into an array, sorts it and cleans it up:

JavaScript

function deduplicate() {
    var txt = document.getElementById("masterlist").value;
    txt = txt.replace(new RegExp(">", "g"), "&gt;");
    txt = txt.replace(new RegExp("<", "g"), "&lt;");
    var masterarray = txt.split("\n");
    var itemsInArray = masterarray.length;
    var dedupe = new Array();
    i = 0;
    var editedArray = new Array();
    while (i < itemsInArray) {
        masterarray[i] = masterarray[i].replace(/\s+$/, "");
        masterarray[i] = masterarray[i].replace(new RegExp("\t", "g"), "    ");
        if (!document.getElementById("kpblanks").checked) {
            masterarray[i] = masterarray[i].replace(/^\s+/, "");
        } else {
            if (masterarray[i].match(/^ +/)) {
                var spc = masterarray[i].match(/^ +/);
                spc[0] = spc[0].replace(/ /g, " ");
                masterarray[i] = masterarray[i].replace(/^\s+/, spc[0]);
            }
        }

        if (document.getElementById("caps").checked) {
            var ulc = masterarray[i].toLowerCase();
        } else {
            var ulc = masterarray[i];
        }
        editedArray[ulc] = ulc;
        dedupe[ulc] = "0";
        i++;
    }
    i = 0;
    var uniques = new Array();
    for (key in dedupe) {
        if (editedArray[key] != "") {
            uniques.push(editedArray[key]);
        }
        dedupe[key] = "dontprint";
        i++;
    }
    if (document.getElementById("sort").checked) {
        uniques.sort(function (x, y) {
            var a = String(x).toUpperCase();
            var b = String(y).toUpperCase();
            if (a > b) return 1;
            if (a < b) return -1;
            return 0;
        });
    }
    var ulen = uniques.length;
    var thelist = uniques.join("\n");
    var rmvd = itemsInArray - ulen;
    document.getElementById("removed").innerHTML =
        itemsInArray +
        " original lines, " +
        rmvd +
        " removed, " +
        ulen +
        " remaining.";
    document.getElementById("output").value = thelist;
    document.getElementById("output").style.display = "block";
    window.location = "#startresults";
}

Test the de-duplicator live here.

Alternate, light version

If you want to keep it small, and you need to keep both lowercase and uppercase, here’s a lighter version, one line. I did this mostly for fun and code-golfing.

HTML

<p><textarea id="j" rows="16" style="width:100%"></textarea></p>
<p><button onclick="d();">Process</button></p>
<p><textarea id="l" rows="16" style="width:100%"></textarea></p>

JavaScript

function d(){var a=document.getElementById("j").value.split("\n"),b=[];a.forEach(function(c){b[c]=1});a=[];for(k in b)a.push(k);document.getElementById("l").value=a.join("\n")};

Related Posts