Donor Data Pipeline — Multi-Source Integration Simulator

Before any donor data can be loaded into a CRM — Salesforce, Microsoft Dynamics, Blackbaud, or otherwise — it has to be cleaned. Every nonprofit technology implementation runs this gauntlet: data arrives from three or four sources, each with different field names, different formatting conventions, and duplicate records for the same person across systems. Someone has to normalize it, resolve the duplicates, document every decision, and produce a validated output file the import tool will actually accept.

This simulator models that workflow end to end. The pipeline isn't decorative — the JavaScript is actually executing each step against the source data: detecting that "Massachusetts" should be "MA," recognizing that James Patel appears in all three source systems under the same email address and needs to be merged into one master record, flagging Betty Taylor's phone number as invalid rather than silently passing it through. The output CSV it produces is real and downloads correctly. The audit log timestamps every transformation.

Design decisions worth explaining

Why three specific source systems

The combination of CRM export, email platform, and event sign-in sheet is intentional — not arbitrary. These are the three most common sources a nonprofit data team encounters when a new client brings their data for a CRM implementation or migration. Each one has a different schema, a different level of data discipline, and a different set of quality problems. The CRM export has NPSP-specific field names (npo02__TotalOppAmount__c) that reflect real Salesforce NPSP vocabulary. The email platform export uses abbreviated field names (fname, state_abbr). The event sign-in sheet uses plain English headers with spaces — the kind of thing a volunteer typed into Google Sheets. Handling all three in one pipeline is the actual job.

Why the audit log matters

The animated pipeline is the demo. The audit log is the signal. In production data work, every transformation has to be documented — not because it looks good, but because when a client asks "why does this record show MA instead of Massachusetts?" or "why did we end up with 39 records instead of 47?" there has to be an answer. The audit log here captures what changed, why, and which source record was treated as the master in each merge. That accountability layer is what distinguishes a data operations professional from someone who just ran a script and hoped for the best.

The merge logic is a deliberate choice

When James Patel appears in all three sources, the pipeline doesn't just pick one record and discard the rest — it retains the CRM record as the master (because it has giving history, which is the most valuable field) and merges the engagement score from the email platform and the interest level from the event sign-in into that master record. That's best-value field selection: the output record is richer than any single source. The Patricia Harris case surfaces a name variant — "Pat" in the email platform vs. "Patricia" in the CRM — and the pipeline resolves it in favor of the formal name. These aren't accidents; they're documented decisions.

What this is and isn't

This is a working model of how I think about data operations problems, built in JavaScript so it runs in a browser without infrastructure. It isn't production ETL code, a Salesforce implementation, or a claim that this is how I'd engineer a real pipeline at scale — that would use Python, Pandas, a proper dedup library, and a validated import process with error row review. What it demonstrates is the underlying methodology: define the canonical schema first, map sources to it explicitly, treat every quality issue as a documented decision rather than a silent correction, and produce output your downstream system can trust. That approach doesn't change whether the tool is written in FoxPro, Python, or JavaScript.

Three sources.
One campaign-ready file.

Pipeline complete — output ready

What this demonstrates — and why it was built this way

Three sources.One campaign-ready file.

Pipeline complete — output ready

What this demonstrates — and why it was built this way

Three sources.
One campaign-ready file.