Methodology
How Veraflux generates supplement reports, what role AI plays, and how we verify what the system produces.
What Veraflux is
Veraflux is a research tool that turns peer-reviewed PubMed clinical literature into structured, source-cited supplement reports personalized to the user's goal, age, and sex. Every claim in a report is engineered to trace back to a specific cited study, even when that means clunkier prose.
Veraflux is not a healthcare provider and does not provide medical advice. Reports are for educational and informational purposes only.
How a report is generated
Each report runs through a multi-stage pipeline. No single language-model call produces a finished report end to end; narrow models handle narrow jobs, and deterministic code carries data between them.
- Screener
Validates that the supplement name is real, resolves synonyms (for example, "vitamin D" vs "cholecalciferol"), and decides whether enough peer-reviewed literature exists to justify a full report. If it doesn't, the screener says so up front rather than synthesizing confidence from nothing.
- PubMed retrieval
Multiple parallel specialized queries run against the public PubMed API. Different queries target different facets of the question: the supplement and the user's goal in general, the supplement filtered to the user's demographics, the underlying mechanism of action, and so on. Results are deduplicated and pooled, which prevents any one query's bias from dominating the candidate set.
- Landmark inclusion
A dedicated pass identifies consensus-foundational studies for the supplement: large RCTs, key meta-analyses, papers a reasonable clinician would expect to see addressed. These are forced into the candidate set regardless of how generic ranking scored them, so the report cannot quietly omit a study that a domain expert would consider canonical.
- Selector
A curated subset of the candidate pool is chosen across multiple passes: it categorizes candidates by the kind of claim each could support, ensures every report section has direct evidence behind it, then prunes for redundancy. Typically 25-35 studies are selected from 100-200 candidates. Selection is goal- and demographic-aware, so a sleep-focused report for a 60-year-old surfaces different studies than a strength-focused report for a 25-year-old on the same supplement.
- Synthesizer
The narrative report is written under two constraints: every claim must trace to a specific study from the selected set, and the only PubMed IDs available to the model are the ones in that set. Any "remembered" study from training memory has no valid ID to attach to it, so a fabricated citation is structurally harder to produce. Each section opens with a plain-language paragraph stating the finding before supporting evidence with citations inline, and a separate post-synthesis pass adds a short 2-3 sentence summary at the top of the cover so the reader can orient before drilling in.
- Citation verifier
After generation, every citation in the output is matched against the actual list of PubMed IDs retrieved earlier in the run. Anything that does not map to a real, retrieved study is stripped before the report is returned. This is a second, independent line of defense beyond the synthesizer's grounding constraints; either layer alone would be weaker than both together.
- Safety pipeline
A parallel pipeline runs alongside the goal pipeline. It retrieves abstracts focused on side effects, adverse events, contraindications, drug interactions, and tolerable upper limits. Where appropriate it also draws on established pharmacology and regulatory references; those statements are explicitly tagged with their source category so the user always knows whether a claim came from a study, a pharmacology reference, or a regulatory body.
What AI does, what it does not
What AI does
- Decides which abstracts are relevant (the selector).
- Writes the narrative summary of selected abstracts (the synthesizer).
- Rates the strength of evidence behind each conclusion.
- Disambiguates supplement names (the screener and synonym generator).
What AI does not
- Invent citations. The synthesizer is constrained to a fixed set of retrieved studies.
- Recommend treatments.
- Freely generate content from training memory. The citation verifier independently strips any citation that does not map to a retrieved study.
Retrieval, citation tracking, verification, and pipeline orchestration are deterministic code, not language models. The two grounding mechanisms above are intentionally independent, so a failure in one does not silently propagate.
How we verify accuracy
- Every paragraph cites its sources
Each paragraph carries the specific studies that informed it, not a single bibliography at the end. If a sentence says "magnesium improved sleep onset," the studies supporting it are listed at the bottom of the same paragraph. Sentences without traceable support are flagged and rewritten or removed before the user sees the report.
- Evidence-quality grading
A separate pass tags each section as strong, moderate, limited, or preliminary based on the type and weight of studies supporting it. A claim built on a single small open-label trial is labeled differently than one built on multiple independent RCTs, and the user sees both the conclusion and the strength label that qualifies it.
- Source-category labeling
Safety-pipeline statements that do not come from a retrieved study (for example, a drug interaction documented in standard pharmacology references) are explicitly tagged with their source, so the user always knows whether a claim came from a study, a pharmacology reference, or a regulatory body.
Keeping reports current
Veraflux runs an automated surveillance worker that re-checks PubMed for new evidence on supplements that have already been reported on. When new high-quality evidence appears, the affected reports are refreshed to reflect it. The same worker runs periodic retraction sweeps: if a previously cited study is marked retracted in PubMed, the report is flagged so an outdated conclusion does not silently persist.
Surveillance distinguishes between content-material changes (new evidence the report should reflect on its next refresh) and user-material changes (changes important enough to notify subscribed users about). This separation keeps real signal from being buried in noise and avoids paging users for trivial updates.
Each report carries a "last refreshed" timestamp so users always know how recent the underlying evidence is.
Known limitations
- Bounded by PubMed and the retrieval cap
A statement that an effect or safety concern is "not addressed" means it was not found in the specific set of abstracts retrieved for that query, not that the effect does not exist.
- Abstracts, not full text
The headline results of most clinical studies are visible in the abstract, but secondary endpoints, subgroup analyses, and methodological caveats that appear only in the full paper are not currently part of the evidence base.
- Population skew
Most clinical research on supplements skews toward narrow populations (typically younger, male, or otherwise non-representative). Reports for users outside those populations carry an inherent gap. The evidence-quality pass accounts for this where it can, but no software can manufacture data the underlying literature never collected.
Read the Medical & AI Disclaimer for the full list of limitations and the user's verification responsibilities.
Corrections and feedback
If you find a citation that does not support a claim, an outdated conclusion, or a safety section that misses an important interaction, email us at support@veraflux.org. We review every report with reported errors and update the underlying pipeline rules where appropriate.
Editorial standards for site content
Site content outside of generated reports (the glossary, articles, this methodology page, About, and similar) is hand-authored by the Veraflux team. We use plain language, link to primary sources where possible, and avoid claims that go beyond what the cited evidence supports. When we are uncertain, we say so.