Concerns have been raised about access to a scientific trove containing the genetic data and medical records of more than 500,000 people, after an investigation revealed that "race scientists" appeared to claim to have obtained the data.
A senior scientist has warned that the leadership responsible for the data held by UK Biobank "have to be very careful with ensuring that correct processes are followed" around access to the information in order to maintain public confidence.
Biobank holds the genetic data and medical records of more than 500,000 participants, which it shares in anonymised form with academics and researchers to support new scientific discoveries and medical advances.
Last week the Guardian reported that a group called the Human Diversity Foundation (HDF), which carries out pseudoscientific research purporting to prove fundamental differences between races, had been covertly filmed discussing UK Biobank data.
Mainstream geneticists consider such research to be a racist pseudoscience without supporting evidence. The footage was obtained by an undercover activist from the anti-racism group Hope Not Hate and shared with journalists.
On the day of the Guardian's publication, Biobank issued a statement criticising the report and dismissing the findings. It said it had concluded what it called a "full" and "extensive" investigation that had found no evidence of misuse of UK Biobank data.
Biobank said it believed the group was discussing access to publicly available statistics that summarise the results of studies, rather than the anonymised data of the volunteers themselves.
However, in correspondence with a senior medic the following day, which has been seen by the Guardian, the Biobank chief executive, Prof Sir Rory Collins, said its inquiries were continuing.
"Out of an abundance of caution, we are pursuing further investigations to confirm whether or not there has been any misuse of UK Biobank data," he said. "If we discover that participant-level data have been obtained illegitimately or that unapproved analyses have been conducted, we will use all available sanctions available to us (including legal measures)."
The comments appeared at odds with Biobank's public announcement about the conclusion of its investigation. Asked about the discrepancy, a spokesperson said: "There is no contradiction between our statements. We launched an extensive investigation, including a third-party search of the internet and dark web, and found no evidence of these data being available to unapproved researchers. However, if we were to get new information it would enable us to investigate further."
Biobank's initial conclusions were partly based on analysis of a portion of the transcript of the undercover footage released by the Guardian. It said technical details in the transcript, such as file type, cast doubt on the suggestion that participant-level data, which is available only to approved researchers, had been obtained.
However, two senior geneticists and two health data experts who reviewed the same transcript said terms used by the HDF researchers in the undercover footage could refer to them having accessed such sensitive data.
David Curtis, a professor in genetics, evolution and environment at University College London, warned that any suggestion of the group accessing sensitive genetic data could affect public trust not only in Biobank but in science more generally. He questioned whether Biobank had been too quick to dismiss concerns.
"Maybe an appropriate response would be that these allegations are concerning and we're looking into it, or that we've requested that an external person investigate this," he said. "For them to say we've had our data scientist look at it and they think everything's fine isn't really good enough."
Separately, the Hope Not Hate investigation also recorded representatives of a US startup, Heliospect Genomics, describing Biobank data as a "godsend" that had allowed it to develop a system to predict traits such as IQ, sex and height, as well as risk of obesity or mental illness, in human embryos.
The company offers to help couples test their embryos as part of IVF treatment and has worked with more than a dozen families, according to the undercover footage. Experts say such practices would raise a host of moral and medical questions.
Biobank's position on Heliospect's use of its data changed over the course of the Guardian's inquiries and there remains a degree of confusion about Biobank's access policies.
Related: Undercover inside a 'scientific racism' network - podcast
Its spokespeople told the Guardian that Heliospect did not disclose screening of embryos for IQ as an intended commercial application. "All researchers, whether academic or commercial, applying to UK Biobank are required to make the purpose of their research explicit in their access application and subsequent annual reports," the spokesperson said.
However, the following day, apparently after receiving new information from Heliospect, Biobank amended its position and issued a new statement. "Heliospect confirmed that its analyses of our data have been used solely for their approved purpose to generate genetic risk scores for particular conditions and characteristics, and are exploring the use of their findings for pre-implantation screening in accordance with relevant regulation in the US where Heliospect is based," it said.
Heliospect told the Guardian that Biobank did not require companies to disclose the precise commercial applications of research.
Curtis questioned Biobank's response. "I think they've got to have approval processes which are more rigorous," he said.
Dr Francesca Forzano, the chair of the European Society of Human Genetics policy and ethics committee, called for stronger security processes around such datasets. She said: "We call on those who hold genomic datasets legitimately to ensure that access procedures are governed by robust and transparent processes, including about how decisions are made on whether or not the proposed research is in the public interest. Secondary use of data should be strictly prohibited and the dataset provided only used for the original, approved purpose."