wanna commit some scientific misconduct for a laugh? just slap AI on it and it's good enough for Nature

disclaimer: as i’m an organic chemist and this is inorganic chemistry, with all their techniques working in solid state, this is slightly out of my ballpark

some time ago i found this el reg story which claims that AI predicted new compounds, another AI predicted synthesis of these compounds, robotic chemist cooked them and according to another AI, some of that, most of the time, worked and provided new compounds, 43 of them, with 70%+ success rate, all in 17 days.

as you can expect

we might be running into rapidly compounding garbage in garbage out problemS.

paper was published in nature, so great success, at least according to google’s press release. google was very proud of it because model in question runs on deepmind. however, after some time, per el reg:

Secondary: Google has, to us, distanced itself a little from the Berkeley study, telling The Register that the materials produced by the A-Lab were proposed by the university’s researchers. The web giant’s reps said the Berkeley scientists “checked their predictions using a Google DeepMind tool,” ie: GNoMe.

For what it’s worth, at the time of the Nature paper going live, Google boasted in an announcement that the Berkeley-DeepMind study “shows how our AI predictions can be leveraged for autonomous material synthesis.” Two people at DeepMind, who are listed as co-authors of the Nature paper, are credited for using Google AI in the “filtering pipeline for novel-materials identification.”

this is because at some point somebody with actual domain knowledge looked into it all in detail and things started looking weird to them. things looked so weird to them that it all resulted in preprint which states that none of these compounds are actually new, only 3 of 58 syntheses were successful, of which only 1 has convincing receipts to back it up, most of the time whatever they made was dirty (up to 4 separate known compounds identified per sample), fit to experimental data was mediocre to nonexistent, which taken together means that the premise of the first paper doesn’t hold at all. this preprint is written clearly, without unnecessary jargon, and as authors state it right at the beginning, it was written in such a way to be accessible to “multi-disciplinary” (ie not exactly experts in field) audience

of these three compounds that were actually synthetised two were discovered between the time when google took snapshot of xrd database (2021) and now (2024). just one year of powering automated wisdom woodchipper per single inorganic compound. how efficient!

the problem 1:

el reg has a snippet that explains it well: ai model predicted a material with higher ordering, but what was formed irl and what is known in literature, sometimes from 70s, sometimes from 2003, has some degree of disorder in these sites. elreg explanation:

On the computational side, they couldn’t deal with something called ‘compositional disorder,’ which is a very important feature of inorganic materials. A crystal is an ordered arrangement of atoms. But even within that order there can be disorder. Imagine you have a set of children’s building blocks, all the same size and shape, and they are arranged in a perfectly ordered pattern on the floor. The blocks are like atoms in a crystal," Professor Palgrave told us.

“But now imagine that there are two colors of block, red and blue. We have an ordered pattern of colors, say alternating red, blue, red, blue etc. You might end up with a chess board type arrangement. But it is also possible for the colors to be mixed up randomly. In this case the blocks themselves are ordered, but the colors are disordered.”

why it happened? maybe because someone cut corners along the way, because simulating it would be much easier and somebody just had a genius idea that it’s the same thing anyway. i suspect that getting out of this problem involves throwing much larger, multi-elementary cell pieces into DFT because now we have to deal with some degree of disorder. probably there’s some nice trickery to deal with this problem, but it wasn’t used for whatever reason. this shouldn’t have happened if there was an actual crystallographer on team

the problem 2:

the data they have is powder xrd, which means it can’t be really interpreted directly and instead what is needed is a fit to known or predicted compounds. as it happens, additional ordering predicted by ai provides new testable prediction: sometimes there should have been additional peak in pxrd, but it doesn’t appear where it should. sometimes when disorder/order happens between metals that are sufficiently similar, difference in pxrd is also negligible and so authors of preprint state that some better proof, that is one using different technique or better quality data is needed to tell which is which. otherwise, if synthetic procedure is almost exactly the same as one from paper published 40 years ago, why should product be different?

(this is not my field, but in my field, generally, single technique is not enough to show that what you claim it is, is it. usually two different ones are required, like NMR and MS, or one to confirm identity and other to confirm purity. maybe it’s not the case here)

additional issue is that authors used another ai to interpret pxrd data, which for some mysterious reason always conformed to what they wanted to. for example, in case of phosphate series there was a possibility of overfitting model to what they wanted to get, but in all cases preprint shows clearly that all of these compounds are already known and provides better fits to experimental data.

i wanted to write an elaborate sneer about this shitshow, but i don’t have to. authors of preprint already did that, so i’ll just paste some snippets from it:

We discuss all 43 synthetic products and point out four common shortfalls in the analysis. These errors unfortunately lead to the conclusion that no new materials have been discovered in that work.

Many aspects of this work are impressive: the fact that robots can take over labor intensive steps, that AI can predict reasonable synthetic routes based on literature precedent, and that a full circle of materials synthesis and characterization without human intervention can be carried out. Unfortunately, we found that the central claim of the A-lab paper, namely that a large number of previously unknown materials were synthesized, does not hold. As we will explain below, we believe that at time of publication, none of the materials produced by A-lab were new: the large majority were misclassified, and a smaller number were correctly identified but already known.

Notably, all these materials are related to the famous “Naples Yellow” pigment, which derives from Pb2Sb2O7.27 Variants of Naples Yellow, including those with Sn(IV) substitution on the B site, were used by the ancient Egyptians, and have been lost and then rediscovered periodically throughout history, by different ancient civilisations, in the middle ages, at various points in the renaissance, and most recently by the A-lab.

Within the 36 samples classified as successes, we found that the analysis presented for 35 of them suffered from one or more of the error types described below.

Very poor and obviously incorrect fits. This means models that are such poor fits to the data, often missing intense diffraction peaks, that they cannot be relied upon either for proof of the structure of the compounds, nor their purity. The poor fitting leads to the inability to identify impurity phases. Since the authors aim to have >50 wt% of their product, it is important to identify what other materials are present in order to assess if the 50% threshold has been met. (emphasis mine) Additionally, the presence of unreacted starting materials is symptomatic of an incomplete reaction and incorrect reaction conditions. This error type is present in 18/36 compounds.

Using different structures for refinement than were claimed in the paper. In several cases the CIF supplied in the SI is not the same structure (or composition) as that claimed in the main paper. In several examples even the space group between the two differs. An example is Mg3NiO4 which we discuss below. This error is present in 8/36 compounds.

No evidence for cation ordering. The most common error is prediction of compounds which are ordered versions of known disordered compounds. For example, as we will show in detail below, the existence of MgTi2NiO6 is claimed, which is the same as the known ilmenite structure of the same composition, but the predicted structure has ordered Mg and Ni cations, whereas the known structure has those cations disordered. However, no consideration is given by the authors to the possibility that they may have in fact made the known disordered compound instead of their intended compound. We show below that this is in fact the most likely situation. This error type is present in 24/36 compounds.

Reporting existing compounds as new. In several cases the claimed new compounds are in fact already reported in the ICSD. This error type is present in 3/36compounds.

oh and also no actual experimental data was available, authors of preprint dug it out of charts in pdfs and still got better fits than whatever the third ai cooked

For the analysis, the original published experimental XRD patterns were obtained by digitalizing the data provided in the A-lab paper supplementary information using GetData Graph Digitalizer. […] This process is certainly not ideal and yields data of lower quality than the original. Nevertheless, we found it was possible to carry out Rietveld refinement on these datasets […] We do not claim our fits are definitive or cannot be improved upon, but we highlight in each case the features that make us believe the fits we propose are superior to those provided in the original paper.

The compound K2TiCr(PO4)3 was predicted to exist as a new cubic phase in the space group P 213. Fig. 10(d) shows our refinement of the provided PXRD pattern, using known cubic K2Ti2(PO4)3 (P 213; ICSD # 202888) and Cr2O3, a common impurity in high temperature synthesis of oxides containing chromium.34 The refinement provided in the A-lab paper had several unfitted peaks, which all correspond to the Cr2O3 impurity phase as marked by red arrows in Fig. 10©.

The example of K2TiCr(PO4)3 shows that there are serious issues with the supposed synthesis of the phosphates, in fact we could index and preliminarily match all 18 PXRD patterns to materials that are reported in the ICSD […] We consider it to be the responsibility of the authors of the A-lab paper4 to unambiguously prove the synthesis of the target materials in all cases and will refrain from providing alternative refinements of all 43 materials in this comment. We will however discuss each compound and possible alternatives briefly below.

In our view, three materials have been successfully synthesized as predicted. All of them, however, have been reported in the literature before. They are MnAgO2, Y3In2Ga3O12 and CaFe2(PO4)2O, which have been reported in the following references respectively.42–44 Of those CaFe2(PO4)2O seems to have been convincingly synthesized based on the provided PXRD data, whereas the other two’s PXRD patterns are fitted so poorly that it is difficult to state whether the materials indeed have been synthesized.

but gotta give it to them

In any case, the compounds in question were reported relatively recently, between 2021 and 2023. In fact, the authors of the Google DeepMind paper3 clarified that they took snapshots of the ICSD in 2021 and thus did not include materials discovered since in their training set. They rightfully view it as a success that materials they predicted based on a 2021 snapshot were since discovered.

Since we raised issues in the paper shortly after publication, the Ceder group has conceded that A-lab does not live up to human standards, but still claim that “the system offers a rapid way to prove that a substance can be made — before human chemists take over to improve the synthesis and study the material in more detail.”45 We hope that our comment made it clear that this statement is not justified - the A-lab paper does not provide proof that the new materials can be made.

now, tell me, how on god’s green earth this nature paper is still up and not retracted? this would be absolutely the case if authors weren’t partially automated, lots of papers were retracted for less. apparently you can commit any volume of scientific misconduct if you smear it in enough hype. in the meantime, developments were spun for nontechnical audience, stocks pumped and deals signed, and when it all turns out to be trash, people who funded it all and burned some square kilometers of amazon just to train and then run three-layered “ai chemist” that only sprouts garbage “distance themselves from findings”. whew i’ve never known it was that easy!

update: typos, wording

update 2: no matter what LIES computational/quantum/theoretical chemists tell you, chemistry is still an experimental science. i’ve also noticed there’s no actual experimental data, which is weird and this thing alone could be very well grounds for retraction. you’d expect a section in supplementary information of something like:

Compound 1a: In round bottom flask, 2a (XX mg, YY mmol, 0.5M), 3a (XX mg, YY mmol, W.W eq), catalyst 4j (XX mg, YY umol, W mol %) and toluene (XX ml) were placed. Homogenous reaction mixture was heated to 90C for 4h, washed with water, dried and subjected to column chromatography providing product 5a (XX mg, YY mmol, W% yield)

follows full set of analytical data necessary to confirm identity and purity of compound.

so, actual instructions needed to replicate their findings: synthesis, purification method if any, and all analytical data to check if they match. this supplementary info can easily run into 50-100 pages. i’ve only seen one xlsx file with one compound per line and “success” or “failure”, this hardly makes it work. also xrd data and simulations mean that there are pretty pictures to include, why wouldn’t you make it into a nice readable pdf (i have some suspictions as of why)