Methodology

How Veraflux generates supplement reports, what role AI plays, and how we verify what the system produces.

What Veraflux is

Veraflux is a research tool that turns peer-reviewed PubMed clinical literature into structured, source-cited supplement reports personalized to the user's goal, age, and sex. Every claim in a report is engineered to trace back to a specific cited study; the pipeline is built around verifiability rather than fluency. Veraflux is not a healthcare provider and does not provide medical advice. Reports are for educational and informational purposes only.

How a report is generated

Each report runs through a multi-stage pipeline. No single language-model call produces a finished report end to end; narrow models handle narrow jobs, and deterministic code carries data between them.

  1. Screener. Validates that the supplement name is real, resolves synonyms (for example, "vitamin D" vs "cholecalciferol"), and decides whether enough peer-reviewed literature exists to justify a full report. If it doesn't, the screener says so up front rather than synthesizing confidence from nothing.
  2. PubMed retrieval. Multiple parallel specialized queries run against the public PubMed API. Different queries target different facets of the question: the supplement and the user's goal in general, the supplement filtered to the user's demographics, the underlying mechanism of action, and so on. Results are deduplicated and pooled, which prevents any one query's bias from dominating the candidate set.
  3. Landmark inclusion. A dedicated pass identifies consensus-foundational studies for the supplement: large RCTs, key meta-analyses, papers a reasonable clinician would expect to see addressed. These are forced into the candidate set regardless of how generic ranking scored them, so the report cannot quietly omit a study that a domain expert would consider canonical.
  4. Selector. A curated subset of the candidate pool is chosen across multiple passes. The selector first categorizes candidates by what kind of claim each could support, then makes sure every section of the report has direct evidence behind it, then prunes for redundancy. Selection is goal-aware and demographic-aware: a sleep-focused report for a 60-year-old surfaces different studies than a strength-focused report for a 25-year-old, even when both ask about the same supplement.
  5. Synthesizer. The narrative report is written under two constraints: every claim must trace to a specific study from the selected set, and the only PubMed IDs available to the model are the ones in that set. Any "remembered" study from training memory has no valid ID to attach to it, so a fabricated citation is structurally harder to produce.
  6. Citation verifier. After generation, every citation in the output is matched against the actual list of PubMed IDs retrieved earlier in the run. Anything that does not map to a real, retrieved study is stripped before the report is returned. This is a second, independent line of defense beyond the synthesizer's grounding constraints; either layer alone would be weaker than both together.
  7. Safety pipeline. A parallel pipeline runs alongside the goal pipeline. It retrieves abstracts focused on side effects, adverse events, contraindications, drug interactions, and tolerable upper limits. Where appropriate it also draws on established pharmacology and regulatory references; those statements are explicitly tagged with their source category so the user always knows whether a claim came from a study, a pharmacology reference, or a regulatory body.

What AI does, what it does not

AI handles a narrow set of jobs: deciding which abstracts are relevant (the selector), writing the narrative summary of selected abstracts (the synthesizer), rating the strength of evidence behind each conclusion, and supplement-name disambiguation (the screener and synonym generator). Retrieval, citation tracking, verification, and pipeline orchestration are deterministic code, not language models.

AI does not invent citations, recommend treatments, or freely generate substantive content from training memory. The pipeline is designed so the first two are structurally hard to produce and the third is catchable: the synthesizer is constrained to a fixed set of retrieved studies, and the citation verifier independently strips any output citation that does not map to a study actually retrieved during the run. These two mechanisms are intentionally independent so a failure in one does not silently propagate.

How we verify accuracy

Every report is cite-grounded at the paragraph level: each paragraph carries the specific studies that informed that paragraph, not a single bibliography at the end. If a sentence in a paragraph says "magnesium improved sleep onset," the studies that support that sentence are listed at the bottom of the same paragraph. Sentences without traceable support are flagged and rewritten or removed before the user sees the report.

A separate evidence-quality pass tags each section as strong, moderate, limited, or preliminary based on the type and weight of studies supporting it. A claim built on a single small open-label trial is labeled differently than a claim built on multiple independent RCTs, and the user sees both the conclusion and the strength label that qualifies it.

Safety-pipeline statements that do not come from a retrieved study (for example, a known drug interaction documented in standard pharmacology references) are explicitly labeled with their source category, so the user always knows whether a claim came from a study, a pharmacology reference, or a regulatory body.

Keeping reports current

Veraflux runs an automated surveillance worker that re-checks PubMed for new evidence on supplements that have already been reported on. When new high-quality evidence appears, the affected reports are refreshed to reflect it. The same worker runs periodic retraction sweeps: if a previously cited study is marked retracted in PubMed, the report is flagged so an outdated conclusion does not silently persist.

Surveillance distinguishes between content-material changes (new evidence the report should reflect on its next refresh) and user-material changes (changes important enough to notify subscribed users about). This separation keeps real signal from being buried in noise and avoids paging users for trivial updates.

Each report carries a "last refreshed" timestamp so users always know how recent the underlying evidence is.

Known limitations

Veraflux's outputs are bounded by what PubMed indexes and by the retrieval cap per report. A statement that an effect or safety concern is "not addressed" means it was not found in the specific set of abstracts retrieved for that query, not that the effect does not exist.

Reports are based on study abstracts, not full text. The headline results of most clinical studies are visible in the abstract, but secondary endpoints, subgroup analyses, and methodological caveats that appear only in the full paper are not currently part of the evidence base.

Most clinical research on supplements skews toward narrow populations (typically younger, male, or otherwise non-representative). Reports for users outside those populations carry an inherent gap. The evidence-quality pass accounts for this where it can, but no software can manufacture data that the underlying literature never collected.

Read the Medical & AI Disclaimer for the full list of limitations and the user's verification responsibilities.

Corrections and feedback

If you find a citation that does not support a claim, an outdated conclusion, or a safety section that misses an important interaction, email us at support@veraflux.org. We review every report with reported errors and update the underlying pipeline rules where appropriate.

Editorial standards for site content

Site content outside of generated reports (the glossary, articles, this methodology page, About, and similar) is hand-authored by the Veraflux team. We use plain language, link to primary sources where possible, and avoid claims that go beyond what the cited evidence supports. When we are uncertain, we say so.