Tuesday, February 25, 2014

Spoofing Peer-Reviewed Science With Gibberish

Headline in Nature:
"Publishers withdraw more than 120 gibberish papers" "The publishers Springer and IEEE are removing more than 120 papers from their subscription services after a French researcher discovered that the works were computer-generated nonsense.

Over the past two years, computer scientist Cyril Labbé of Joseph Fourier University in Grenoble, France, has catalogued computer-generated papers that made it into more than 30 published conference proceedings between 2008 and 2013. Sixteen appeared in publications by Springer, which is headquartered in Heidelberg, Germany, and more than 100 were published by the Institute of Electrical and Electronic Engineers (IEEE), based in New York. Both publishers, which were privately informed by Labbé, say that they are now removing the papers.

Among the works were, for example, a paper published as a proceeding from the 2013 International Conference on Quality, Reliability, Risk, Maintenance, and Safety Engineering, held in Chengdu, China. (The conference website says that all manuscripts are “reviewed for merits and contents”.) The authors of the paper, entitled ‘TIC: a methodology for the construction of e-commerce’, write in the abstract that they “concentrate our efforts on disproving that spreadsheets can be made knowledge-based, empathic, and compact”. (Nature News has attempted to contact the conference organizers and named authors of the paper but received no reply; however at least some of the names belong to real people. The IEEE has now removed the paper).

How to create a nonsense paper
Labbé developed a way to automatically detect manuscripts composed by a piece of software called SCIgen, which randomly combines strings of words to produce fake computer-science papers. SCIgen was invented in 2005 by researchers at the Massachusetts Institute of Technology (MIT) in Cambridge to prove that conferences would accept meaningless papers — and, as they put it, “to maximize amusement” (see ‘Computer conference welcomes gobbledegook paper’). A related program generates random physics manuscript titles on the satirical website arXiv vs. snarXiv. SCIgen is free to download and use, and it is unclear how many people have done so, or for what purposes. SCIgen’s output has occasionally popped up at conferences, when researchers have submitted nonsense papers and then revealed the trick.


How many more are out there is unkown. Yes, science is somewhat self-correcting; it is also prone to foolishness, even in papers that are semi-legitimate.

4 comments:

Robert Coble said...

One must always take the claim of "peer reviewed" with a massive dose of salt.

While a professional software engineer, I belonged to the Association for Computing Machinery (ACM), the premier organization for software professionals. (Never mind the emphasis on hardware in the name.) The monthly ACM magazine is called "Communications of the ACM".

While working as an expert on the Year 2000 problem, I found a technical article on the Y2K problem written by a professor at Duke University. My eagerness to find some academic assistance quickly turned into severe disappointment. The esteemed professor seemed to think that the "problem" was merely one of finding an adequate representation of dates that would enable the century to be encoded, instead of the existing practice of encoding only the last two digits of the century. He was quite proud to announce that his binary scheme of 56-bits would enable dates for +/-30 BILLION YEARS to be encoded in the same space currently used by dates. Voila! End of Y2K problem!

I wrote to the ACM Editor-in-Chief about my disappointment with the article AND my disillusionment regarding the "peer review" that supposedly had been applied prior to publication of such a naive paper. I pointed out the following:

1. The professionals in the field had created the problem INTENTIONALLY because of the inordinate expense of mass storage in the early years of the computer business. If you couldn't fit the representation into the tiny memory available, you couldn't program a solution.

In the early days, if you could find a way to save 2 bytes per record, you were a "golden child" in the eyes of the "suits". The assumption (naive, I'll admit) was that as mass storage became less expensive, the software would be modified to accommodate the century. It didn't happen - because of a simple business "rule": there is no "value added" in rewriting existing software to fix a "potential" problem that is not going to add anything to this quarter's bottom line.

2. The professionals in the field were perfectly aware that a different representation (encoding the entire century) could be shoehorned into the same space. The "problem" is that all of the PROGRAMS would still have to be combed for the old representation and modified to accommodate the new representation. THAT (in a nutshell) WAS the Y2K problem.

The response was "Wow! I guess we were asleep when we reviewed that article! Do you want space to respond in next month's magazine?" I replied "No" but did allow her to send a copy of my email to the professor. I got a very snippy (frosty) reply from Herr Professor stating that he was "just trying to help." With "help" like that, we needed no help.

Major disillusionment with "peer reviewed" articles after that.

Stan said...

MmHm. And with professors.

Anonymous said...

What really suprises me is that IEEE has not deleted their publication of Remote Viewing Through Computer Conferencing.
For those who are not familiar with the term "remote viewing" or ESP,it is the ability to retrieve information at a distance without the use of the 5 physical senses but instead utilizes the mind.
More evidence for dualism!

Anonymous said...

Sorry,forgot to give the link

http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=1454641