Skip to main content

How Statlas Builds Your Data

Overview

Statlas builds data from the ground-up. This allows for consistency among all the different reports and models. Order data is aggregated up from the order details; ad data is aggregated up from the ad or creative (In some cases we just pull campaign-level data).

Order Data

Data is rolled up into Revenue

The simplest application of order data is just aggregating all orders and displaying it as revenue. There are some caveats when comparing against source:

  1. Statlas excludes zero-dollar orders. Giveaways are considered marketing-driven orders in Statlas and often throw off other metrics. We always exclude these orders for data integrity purposes. For this reason, you should not expect Statlas to match gross and discounts if there are free orders. Free orders generally overstate gross sales and overstate discounts.
  2. Shopify Channels often exclude certain channels - It's generally best practice to exclude POS, wholesale, and draft orders. Many stores will exclude other irrelavant channels.
  3. API differences. For example, Shopify has some known issues with trying to replicate reporting using their Orders API: a. Duties, VAT rebates and taxes are often unevenly applied and will show differently in Shopify and Statlas b. Exchanges are often treated differently depending on how they interact with the system c. Discounts are not evenly applied. There are certain types of discounts not captured. Shopify will try to make some adjustments for these whcih are not always consistent with the API data we pull. d. Returns are often off. Some returns don't appear in the API.

Contribution Margin

Previous sections showed how order data rolls up to revenue, but we use order data to get more insights to calculate contribution margin. Order counts and products are used to calculate COGS and Cost of Delivery. This is combined with Statlas settings and ad data to get contribution margin. Other settings may also decide how refunds and shipping interact with contribution margin calculations.

Order History is kept for each customer to calculate new vs returning and LTV

Statlas uses order data to rebuild new versus returning databases. These sometimes cause discrepancies with various platforms for these reasons:

  1. Platforms will consider the free order / giveaway as a new customer whereas Statlas will not. This causes the next order to be considered a free order in Statlas.
  2. Platforms may use different logic when calculating first orders. One example is that Statlas will use the process date for determining the first order whereas Shopify will use the create date. We have found that using the create date causes issues when the first order never gets processed or got cancelled. Although this is generally immaterial, it does cause discrepancies.
  3. Statlas makes some adjustments to known issues. For example, we know TikTok Shops will return all emails as the same email address, but we know they are all different customers.

Order Line Items are catalogued for product analysis

Statlas will also keep track of products. This data is combined with the product and variant feed to bring product-level insights. We're currently working on building this system into an even more powerful database. The product database keeps track of customer cohorts, as well as products.

Other Data Sources

Statlas has other analytical databases that tap into order data and complementary marketing data sources. These include but are not limited to:

  1. Geographic data for geo-studies
  2. Hourly data for marketing operations during large sales
  3. Payment information
  4. Attribution information
  5. Google Analytics data for traffic, sessions, landing pages, and on-site engagement analysis
  6. Email/SMS platform data (e.g., Klaviyo) for flow performance, campaign sends, and lifecycle marketing analysis

Ad Data

Varies

Ad data really varies by platform. Although they have similar stats like clicks, CPM, and spend; Statlas treats each platform as its own domain.

Ingestion Architecture

The majority of ad data flows through a central database where it is normalized, deduplicated, and rolled up into a common schema across platforms. This central layer powers most reports, dashboards, and downstream models (such as contribution margin and attribution).

In parallel, Statlas also issues direct analytic queries against certain platforms for use cases where freshness, granularity, or query flexibility outweigh the benefits of going through the central database. Examples include:

  1. Near-real-time pacing during large sales windows
  2. Creative-level deep dives that require pulling fields not yet materialized in the central database
  3. Ad-hoc diagnostics where the central database is being rebuilt or backfilled

Both paths ultimately surface into the same reporting layer, but the routing decision is made per use case to balance reliability with speed.

AI Usage

Overview

Statlas treats AI access to client and account data as a tightly controlled surface. All AI-accessible data resides behind the Custom Statlas MCP (Model Context Protocol) server. Claude is the only AI assistant authorized to query this data, and access is gated by Single Sign-On (SSO) authentication. No AI system may access Statlas data through any other channel.

Access Controls

  1. Single point of access. AI interactions with Statlas data occur exclusively through the Custom Statlas MCP. Direct database access by AI systems, exported datasets fed into third-party AI tools, and unsanctioned AI integrations are not permitted.
  2. SSO authentication. Every MCP session requires a valid SSO-authenticated user. User permissions established within the Statlas application are inherited and enforced at the MCP layer.
  3. Approved AI assistant. Claude is the only AI model approved to connect to the Statlas MCP. Other AI assistants and models are not authorized to query the MCP endpoint.
  4. Scoped data access. Users can only query data for clients, accounts, and brands they are already authorized to view in Statlas. The MCP does not expand a user's permissions beyond what they have in the underlying platform.

Data Handling

  1. No model training on client data. Data retrieved through the Statlas MCP is not used to train, fine-tune, or otherwise improve any AI model. This is enforced by Anthropic's enterprise terms with Statlas.
  2. Query-time retrieval only. AI responses are generated from data fetched through MCP tool calls at query time. Data is not pre-loaded into the AI system and is not persisted beyond the active conversation context.
  3. PII treatment. Personally identifiable information (customer names, emails, addresses) is subject to the same access controls as the rest of the Statlas platform. Aggregate analysis is preferred wherever the use case allows.
  4. Data residency. All data continues to reside in Statlas's controlled infrastructure. The MCP returns query results to the authenticated user's AI session; it does not export data to external storage.

Audit and Compliance

  1. Logging. Every MCP tool call is logged with the authenticated user identity, timestamp, tool name, query parameters, and response metadata. Logs are retained for compliance review and incident investigation.
  2. Monitoring. Unusual access patterns (volume spikes, off-hours activity, queries outside a user's typical scope) are flagged for review.
  3. Revocation. Access can be revoked at the user, role, or organization level at any time through the SSO provider. Revocation takes effect immediately for new MCP sessions.
  4. Incident response. Suspected misuse or compromise of MCP credentials should be reported through Statlas's standard security incident channel. Affected sessions can be terminated and credentials rotated.

Permitted Use

AI access through the Statlas MCP is intended to support analysis, reporting, and operational workflows for authorized Statlas users. AI-generated outputs are advisory in nature; users remain responsible for verifying conclusions and for any business decisions made based on AI-assisted analysis. Users may not use AI access to circumvent role-based permissions, exfiltrate data outside approved workflows, or generate outputs that violate Statlas's contractual obligations to clients.