How this archive works
Methodology, sourcing and corrections
The Madlanga Commission Archive is an independent public-record project. It gathers, indexes and cross-references testimony given before the Madlanga Commission of Inquiry. It reports the record; it does not pronounce guilt. This page explains exactly how the archive is built, where every fact comes from, how the AI-assisted parts work, and how to ask us to fix something.
Where the record comes from
The spine of the archive is the official public record of the Madlanga Commission: the hearing schedule, the transcript records published in the Commission’s open uploads directory, and the official announcements feed (rulings, orders, records, interim reports, statements, letters and notices). These are ingested by script, not hand-typed, so the day-by-day record matches the source. The official Commission site, criminaljusticecommission.org.za, and its video record remain authoritative; where we differ from it, it wins.
How the archive is layered
The record is built in three tiers so it is clear what is source and what is editorial:
- Ingested (never hand-typed): the hearing days, the official documents feed, and the per-day transcript records. Re-run the ingestion and the record refreshes from source.
- Authored as data: our editorial layer - case files, witness and official profiles, and the network maps. This is where we add narrative, context and connections, always cited and attributed.
- Derived (never stored): counts, recency ordering, and the links between witnesses, hearings and documents are computed from the tiers above, so they cannot drift out of date.
How testimony is verified and attributed
Everything in a profile or case file restates evidence given on the public record. Soft or contested claims are attributed to their source and framed as alleged, testified or “according to” - never as established fact. Where a claim comes from reporting rather than the transcript, we look for corroboration across multiple independent outlets before it lands, and we omit what we cannot source. We do not publish thin filler, and we do not treat commentary or social-media clips as the record - at most they are a lead to verify.
The AI-assisted Investigation Room
The Investigation Room is a pre-computed “evidence board” built over the full set of hearing transcripts. Named entities are extracted with a language model, passages are turned into embeddings, and nearest cross-speaker passages are compared by a second model to surface possible corroborations and contradictions. Three principles govern it:
- It is a source, not a chatbot. Every node, connection and lead is pre-computed and pinned; nothing is generated live in response to a prompt.
- Everything is sourced. Each lead cites the specific hearing day, page and speaker on both sides, so you can check it against the transcript.
- Leads are suggestive, never conclusive. A “contradiction” is an apparent tension between two accounts to follow up - not a finding that anyone lied. People who run the inquiry (commissioners and evidence leaders) are not treated as independent witnesses, and the presumption of innocence is preserved throughout.
Editorial standards and the presumption of innocence
All persons named in this archive are presumed innocent unless and until convicted by a competent court. Allegations are reported as allegations and attributed to their source. People who have been charged are distinguished from witnesses, and a person who has not yet appeared is shown as awaiting appearance rather than implied to have testified. Officials who run the inquiry are kept separate from the witnesses who appear before it.
Corrections
This is an independent civic-record project, not affiliated with the Commission or any state body, and the official record is authoritative - we correct against it. If you believe something here is inaccurate, out of date, or unfairly framed, tell us and we will check it against the record and fix or remove it. Email [email protected] with the page and the specific point, and a source where possible.
Contact
For corrections, sourcing questions, or to flag a missing hearing day or document, email [email protected]. For the official record, terms of reference and live video, see the Commission’s official site.