Active Analytics and optimal data storage techniques

on in WordPress
Last modified on

Table of Contents

I have released a new version of the Active Analytics plugin for WordPress. This version focuses entirely on data storage, data retention and data optimization.

Data Retention

The plugin keeps the data indefinitely, unless set to purge old records. In line with European GDPR requirements, I have extended the maximum limit from 1 year to 2 years. The data purge occurs on a daily basis, using the native WordPress CRON API.

Data Storage

Until today, I was saving the data using specific MySQL column types. However, considering the type of data stored in each column, I have changed the database structure to better reflect this. See a breakdown below:

  1. User IP
    • Description: Saves the current user’s IP address, which can be either IPv4 or IPv6.
    • Data Type: VARCHAR
    • Minimum Length: 7 (for IPv4 addresses, e.g., “0.0.0.0”)
    • Maximum Length: 45 (for IPv6 addresses with full notation, e.g., “2001:0db8:85a3:0000:0000:8a2e:0370:7334”)
  2. URL
    • Description: Represents the current URL, including any query string parameters.
    • Data Type: VARCHAR
    • Minimum Length: 1 (empty URL)
    • Average Length: 40-200
  3. Device Type (simplified user agent)
    • Description: Stores the device type based on the user agent string, which indicates the client device, software, and OS used to access the website.
    • Data Type: VARCHAR
    • Minimum Length: 0 (empty user agent)
    • Maximum Length: 10
  4. Referrer
    • Description: Represents the referring URL, i.e., the URL of the page that linked to the current page.
    • Data Type: VARCHAR
    • Minimum Length: 0 (empty referrer)
    • Average Length: 40-200
  5. Timestamp
    • Description: Represents the timestamp of the page view, stored as a Unix timestamp.
    • Data Type: INT
    • Minimum Value: The minimum value of a Unix timestamp.
    • Maximum Value: The maximum value of a Unix timestamp.
  6. Session
    • Description: Stores a unique string saved as a cookie, which expires after 30 minutes. It is used to group multiple pageviews from the same user.
    • Data Type: VARCHAR
    • Minimum Length: 0 (empty session ID)
    • Maximum Length: 64 (maximum length of the session ID)

Pageviews, Unique Users, and Sessions

A pageview represents a single instance of a page being loaded or reloaded by a user. Each time a user visits a page, it generates a pageview.

Unique users refer to the distinct individuals who have visited a website within a given timeframe. Unique users are determined based on IP address.

A session represents a series of interactions a user has with a website within a specific timeframe. A session typically begins when a user visits the website and ends after 30 minutes. Sessions help track user engagement and behaviour during a visit to the website.

What’s Next

I have some potential improvements, but I am restricted by GDPR, so I am thinking of releasing individual modules, so that the main Active Analytics plugin stays GDPR-safe.

The improvements are saving email addresses site-wide, considering that all email submissions have explicit approval. This is tricky to generalize, though, and it should be done on a website-by-website basis.

Related posts