Parapsychology/Sources/Mind-Matter Interaction/Contemporary Evidence

Contemporary evidence edit

Preliminary remarks: A case for the Rhine psychokinesis experiments was put forth by Gardner Murphy in The Challenge of Psychical Research, ch. VI, and by Pratt (Journal of Parapsychology, Volume 24 (1960), pp. 171-188). The case for criticism was put forth by Edward Girden in the 1962 article A review of psychokinesis (PK) - Girden was repudiated by Murphy, but he then reaffirmed his position. Subsequently a debate was launched - Girden, E., Murphy, G., Beloff, J., Flew, A., Rush, J. H., Schmeidler, G., & Thouless, R. H. (1964). A discussion of "A review of psychokinesis (PK)." International Journal of Parapsychology, 6, 26-137. Rush noted in Rush, J. H. (1977). Problems and methods in psychokinesis research. In S. Krippner (Ed.), Advances in Parapsychological Research 1. Psychokinesis (pp. 15-78). New York: Plenum, on p. 40, that "Murphy conceded that the early experiments were faulty, but noted that they were exploratory, not definitive. He charged that Girden had magnified the defects by ignoring the instances in which they had been avoided. He considered that Girden had exaggerated the value of the formal empirical control tests to compare the results of "wishing" with "not-wishing" for the target faces. The equivalent control, Murphy noted, was achieved by comparing scoring on the target face of the dice with that on nontarget faces in the same series of PK trials." He then noted that Girden brought up issues of dice bias on certain faces - this very issue was dealt with by Radin in The Noetic Universe in his overview of the subject (on pp. 146-147) - Radin's overview accounted for this and put forth positive evidence nonetheless, and Rush himself noted in the aforementioned article that "as Murphy noted, it was applicable to a few series only, and had been recognized and allowed for in the original analyses (Rhine and Humphrey, 1944)."

Stanford wrote a process-oriented overview accepting the phenomena as valid in the 1977 Handbook of Parapsychology, Girden wrote a negative overview in the 1985 A Skeptic's Handbook of Parapsychology [though see Gerd Hövelmann's extremely important article in that text, "Skeptical Literature in Parapsychology: An Annotated Bibliograpy" (pp. 449-491, on p. 470), for some relevant comments on Girden and his misrepresentations of PK work].

Evidence for the phenomena incorporating relevant quality criteria was put forth by Radin & Ferrari in the 1991 article Effects of consciousness on the fall of dice: A meta-analysis.

Regarding Random Number Generator micro-pk experiments, first, to state that Helmut Schmidt's experiments have not been independently replicated is utterly false. In his mid-1980s article "The strange properties of psychokinesis", he wrote, "So far, 27 other researchers at many different institutions have published PK studies with binary random generators." Later meta-analyses discovered an even greater number of replications that showed weak, but consistent effects. Radin on p. 150 of The Noetic Universe noted that "[T]oday, most RNG experiments are based on Schmidt's original ideas and are completely automated [...]"

James Alcock wrote a great deal of critical commentary in the National Research Council Report, though his views on Schmidt's work as expressed through have been disputed, and have been disputed in mainstream literature. Krippner noted a degree of sloppiness suggesting bigotry in Alcock's dismissal of Maimoinides dream telepathy work in New Frontiers of Human Science: A Festschrift for K. Ramakrishna Rao (McFarland, Jan 1, 2002), p. 134 - this should be kept in mind prior to evaluating his dismissal of other parapsychology work. Regarding the criticisms in the NRC report of Helmut Schmidt's work, John Palmer noted, in his review of Alcock's Science and Supernature (in JP Volume 55, 1991, pp. 84-89), which contains the text of Alcock's NRC report, "The critiques of the remote-viewing and RNG research in the NRC background paper draw heavily on the writings of other critics. [of this, see earlier in this review for some counter-criticism] The most thorough and original treatment is given to Schmidt's research. Only Schmidt's first set of experiments with prerecorded targets and the recent experiment with independent observers get grudgingly good reviews, which considering the source, is tantamount to high praise. Many of Alcock's criticisms are more aesthetic than substantive. For example, he complains that the designs are sometimes needlessly complex and the same type of RNG is not used consistently. The most serious criticisms by Alcock concern the randomness of the RNG output. Alcock echoes Hyman's concern that Schmidt's randomicity tests may not be powerful enough to detect short periods of nonrandomness. He then adds a new twist that explains how this tendency could bias the results. He notes that Schmidt often waits until a subject begins scoring well before starting a formal session. If there is a short-term bias, Schmidt could innocently begin a formal test part way through the bias period and end it soon after the bias vanishes and the scoring rate declines. These biased segments could then accumulate and lead to a significant total score. This potential "optional-starting" artifact does not apply to many of Schmidt's experiments, particularly the more recent ones. For example, it is hard to see how it could affect studies in which the target sequences are derived algorithmically from random seed numbers because these numbers bear no simple relation to the relevant qualities of the output string. As Alcock points out in his book, one helpful approach in evaluating Schmidt's ESP studies would be to examine the actual target sequences. Happily, Schmidt saved the raw data from his early precognition experiment, {6} which Alcock and Rao both featured in their BBS papers. Schmidt kindly supplied me with copies of these data on disk, and I am currently undertaking several analyses of the ran-domicity of the experimental target sequences. These analyses include (a) application of Good's Generalized Serial Test for the total trials completed by each subject in each of the two series up to the sextuplet level, and (b) summed chi-squares at both the singlet and doublet levels for successive blocks of sizes ranging from 24 to 200. I plan to report these analyses in due course, after consultation with Schmidt. So far, they provide little support for the optional-starting hypothesis. "

Helmut Schmidt was able to respond to some of Alcock's criticisms issued in a paper that followed his NRC report in the 1987 Brain & Mind Sciences debate, p. 609 as follows: "Alcock's discussion of my work contains factual errors and misleading statements that might have been avoided by a more careful reading of Schmidt (1969b), the paper that forms the center of his arguments. Schmidt (1969b) describes the provisions against subject fraud, such as the automatic double recording by external punch tape and internal nonreset counters. Subject fraud would have required, apart from specific electronic knowledge, much undisturbed time for opening the bottom plate of the test machine and feeding in electric pulses in order to fool the internal counters as well as the external recorder. My paper also reports that I was personally present in all tests, with the exception of a small part of the sessions with one subject, and that the scores in these few sessions were not higher than the other scores. In none of my subsequently reported experiments was there any less stringent subject supervision. Nevertheless, Alcock flatly states that "subjects were usually unsupervised." Due to editorial restrictions, there was no space in my paper to go into the details of the randomness tests. The paper indicates, however, in footnote 4, that a detailed report on the random generator and the randomness tests is available (Schmidt 1969c). This report answers in particular Alcock's question about the frequency of 4s in the different experiments: Only in the part where a subject was trying to enforce the generation of 4s (lighting of the red lamp) did the generator produce a significant excess of 4s. The report provides an additional argument against temporary preferences of the machine for one target. A sequence of more than 4 million random numbers (1, 2, 3, and 4) had been collected from the machine in automated runs made between the sessions with the subjects. The frequencies of the individual events as well as of the next-neighbor correlations are listed on page 17 of the report. These numbers give no indication of nonrandomness. If the machine had a tendency toward a temporary bias for 4s, this should have led to an increased total count of 4s. And even if the bias occasionally shifted between the four outcomes, this should be evident in the correlation matrix. If the random generator had been a "black box" of unknown structure, it would have been desirable to extend the randomness tests to higher correlations. From the known, simple structure of the generator, however, one can argue that a malfunction leading to nonrandomness must already lead to anomalies in the counts of single events or next-neighbor correlations. Alcock's statement that I frequently changed the components and the design of my random generator also needs some comment: My random generator uses the random timing of radioactive decays as the basic source of randomness. A rapidly advancing binary counter (1 million steps per second) is stopped at the random arrival time of a signal from a Geiger counter. Then the lower bits of the stopped counter provide practically ideal random numbers. The randomness features of this generator have been discussed in detail (Schmidt 1970d). Because of its reliability and conceptual simplicity, I have used this type of random generator for nearly all of my experiments. Following the progress of technology, the original circuit elements (not even manufactured anymore) were naturally replaced by more modern, integrated circuits that simplified the construction and maintenance. Alcock's feeling that my work lacks continuity - that I often "move on to a totally different situation" without refining the present measurements - may be more understandable. What needs to be done first, and what can be considered as a logical next step, depends much on personal taste and educational background. I certainly do not see this work as Alcock does: an "attempt to establish the reality of a nonmaterial aspect of human existence." For me, the underlying questions are very specific: Can we produce experimental evidence against a universality claim of current physics, in the sense that present physics gives the best possible description, even of systems that include human subjects? And if so, can we specifically say which of the conventionally accepted laws of physics fail in such systems, so that we can provide a solid foundation for a future theoretical framework? Even in a research effort that is very well focused, there will be some side roads. We will want to explore, for example, other random generators and other forms of feedback to be reasonably sure that we don't overlook other, possibly more efficient, approaches to psi testing. On the other hand, we have to be selective because each study takes much time and effort. It therefore often seems more reasonable to postpone the study of some details until we have pursued the main questions that should contribute most to our understanding of the overall picture. How my many experiments fit into the search for the overall picture may be seen best from review articles (Schmidt 1977; in press)."

Alcock on pp. 631-632 addresses some of this and apologizes for his statement that the Schmidt work was unsupervised. He stated, "although Schmidt has been pretty consistent in using radioactive decay (or sometimes electronic noise) as the ultimate source of randomness, the apparatus used to "capture" this randomness has varied considerably, leading me to reiterate that the REG, which is more than just radioactive decay, has varied from experiment to experiment. Not only is a given generator not explored in a consistent way, but the series of experiments themselves do not build upon one another. Furthermore, in many of these studies, there are clear methodological shortcomings apart from concerns about randomness. For example, Schmidt has often served as his own subject, and in one case was really the only subject. In many of the studies, there are varying numbers of trials or sessions per subject, and these are usually combined. Schmidt has generally worked in isolation from others, and his raw data, with little exception, have not been available for scrutiny." He wrote on p. 632: "In his commentary, Schmidt describes all sorts of correlations and so on regarding the control runs of his generator. This does not tell us that at the time the subject was responding there was no bias. Would it not be considerably easier just to listen to critics like Hyman and Hansel (see my target article), and now Gilmore and Dawes, and do the studies in a way that eliminates this bias?"

Parker & Brusewitz, in their 2003 "Compendium of Evidence for Psi", stated: "Hansel and later James Alcock in a more specific form proposed that Schmidt’s results might have been due to his participants capitalizing on local biases in the target sequences. A study by John Palmer analyzed these sequences and rejected this hypothesis:

Palmer, J. (1996) Evaluation of a conventional interpretation of Helmut Schmidt’s automated precognitive experiments. Journal of Parapsychology, 60, 149-170. [this is in the precognition section - Palmer later tested another one of Alcock's hypotheses in a way more critical of Schmidt, but that still ended up rejecting Alcock's views]

The other main criticism that Hansel (1980) made concerning that Schmidt worked alone, was answered by Schmidt (1993, below) in which his highly successful results were independently observed and replicated.

Schmidt, H. (1993) Observation of a psychokinetic effect under highly controlled conditions. Journal of Parapsychology, 57, 357-372. [this summarized 5 such replications]

As for the Hyman and Hansel criticisms of the Schmidt experiments, they are dealt with by Palmer & Rao in their lede essay to the 1987 Brain & Mind Sciences debate entitled "The Anomaly Called Psi: Recent Research and Criticism", on pp. 543-544.

Mishlove noted, when discussing the experiments of the Princeton Engineering Anomalies Research laboratory, in the chapter of his book Roots of Consciousness regarding PK, "The Princeton team has gone to great lengths to try to ensure that their equipment is unbiased. Internal circuits are continually monitored with regard to internal temperature, input voltage, etc. Successive switching of the relationship between the sign of the noise and the sign of the output pulse on a trial-to-trial basis was done to provide a further safeguard against machine bias. Results were automatically recorded and analyzed. Extensive tests of the machine's output and its individual components were also carried out at times separate from the test sessions. The provision of baseline trials interspersed with test trials provided a randomization check which overcame some of the weaknesses of Schmidt's procedure.

Psi researcher John Palmer has drawn attention to the fact that there is no documentation regarding measures to prevent data tampering by subjects. This is of concern since the subject was left alone in the room during the formal sessions along with the REG.

In evaluating these studies, skeptic James Alcock claimed that only one subject (Operator 10) accounted for virtually all the significance departures from chance in the Princeton studies. Noting that details regarding precautions against subject cheating were not specified, Alcock stated:

"I am not trying to suggest that this subject cheated; I am only pointing out that it would appear that such a possibility is not ruled out. Had the subject been monitored at all times, such a worry could have been avoided or at least reduced."

The Princeton team has chosen a policy of keeping the identity of all experimental subjects anonymous -- among other reasons, in order to eliminate motivation for subjects to cheat. However, the fact that Subject 10 contributed considerably more to the database than any other subject, suggests that this individual was either a member of the experimental team or someone who had become a close friend of the experimenters. As such, Subject 10 might well have had access to information which would make it possible to tamper with the data recording system.

In response to the criticisms of Palmer and Alcock, the Princeton researchers have prepared a detailed analysis of the equipment, calibration procedures and various precautions against data-tampering. According to the researchers, the automated and redundant on-line recording of data preclude data tampering -- as does the protocol requirement that the printer record be on one continuous, unbroken paper strip. It would appear that all necessary precautions have been taken, short of submitting subjects to constant visual observation. The subjects are submitted to intermittant visual observation which the researchers believe is sufficient to control against tampering with the equipment, given their particular setup.

In further response to Alcock's critique, the Princeton team conducted further analyses of the data which show that the anomalous RNG effects were contributed by most of the subjects, and were not dependent upon the scores of Subject 10. Several other subjects, who participated in fewer experimental trials, actually had scores with greater chance deviations. By analyzing the data from only the first series of 7,500 trials (1,500,000 binary digits) from each subject, it was possible to level the influence that Subject 10 exerted on the database. In this analysis, with each subject carrying an equal weight, the results were significantly beyond chance. Another analysis was conducted which eliminated all of the data from Subject 10. This, too, was statistically significant.

A comprehensive meta-analytic review of the RNG research literature encompassing all known RNG studies between 1959 and 1987 has been reported by Radin and Nelson, comprising over 800 experimental and control studies conducted by a total of 68 different investigators. The probability 597 experimental series was p < 10-35, whereas 235 control series yielded an overall score well within the range of chance fluctuation. In order to account for the observed experimental results on the basis of selective reporting (assuming no other methodological flaws), it would require "file drawers" full of more than 50,000 unreported studies averaging chance results." (emphasis added)

Other criticisms of the Princeton Anomalies Research Laboratory work were launched by CEM Hansel, and some minor criticisms were launched by others. Regarding these, the psychologist Roger Nelson, who knew the details of the project intimately, wrote to me in personal communication which he allowed me to post as regards the criticisms on Wikipedia as of this edit (on 12/13/2014 11:18 AM PST - Neslon's initial communication was on Sun Nov. 9, 2014 at 11:04 AM PST):

"You can find good and useful material at http://www.skepticalaboutskeptics.org and https://sites.google.com/site/skepticalconcepts/ and at links these sites contain.

I'll go back to your earlier letter, but respond here to the present item. I will note that we now have long experience with wikipedia's cadre of dedicated skeptical editors. It is not possible in general to introduce corrections because they will be reverted by individuals who have a clear agenda to trash any articles related to parapsychology, anomalies research, etc., as well as articles about individual researchers.

Hansel's criticisms of the PEAR work were rebutted at the time. He did not know the experimental designs, and his grasp of statistical design was evidently weak. In particular, the PEAR psychokinesis experiments all had a tripolar design, with a high intention, low intention, and a baseline (no intention). The latter is a "control" series with exactly the same conditions as the experimental series -- but with no intention. In addition, the RNGs were calibrated outside (before, after) the experiments to determine and confirm their correct functioning and the empirical distribution parameters.

As for replication, the PEAR RNG experiments actually were already replications of other researchers' work, notably Helmut Schmidt. There were many other replications, as documented in meta-analyses which addressed the complete available (mostly published) databases in RNG and other kinds of psi or psychokinesis experiments.

Hansel's comments about lack of detail reflect his lack of scholarship. The PEAR lab produced dozens of published articles and dozens of technical reports on the experiments, Many of these reports focus exactly on parameters such as individual participants, their gender, and their choices (e.g., volitional vs random assignment of intention conditions.) For serious scholars, it is even possible for original laboratory notebooks to be examined. A case can be made that there is no better documented set of experiments in the literature.

Best, Roger"

Nelson continued (personal communication, Sun Nov. 19, 2014 at 1:00 PM PST): "As I have said elsewhere, we now have long experience with Wikipedia's cadre of dedicated skeptical editors. It has proven to be not possible in general to introduce corrections because they are all or nearly all reverted by individuals whose agenda is to trash any articles related to parapsychology, anomalies research, etc., as well as articles about individual researchers.

You ask in a separate note: Also, there is this criticism which is probably ridiculous, though it would be good to have a response to it, "Pigluicci has written the statistical analysis used by the Jahn and the PEAR group relied on a quantity called a "p-value" but a problem with p-values is that if the sample size (number of trials) is very large like PEAR then one is guaranteed to find artificially low p-values indicating a statistical "significant" result even though nothing was occurring other than small biases in the experimental apparatus."

I agree this is ridiculous, and anyone with a decent statistical education could answer it. As used by PEAR, a "p-value" is simply a descriptive answer to the question: What is the probability the result is due to chance fluctuation? As for the "guarantee" Pigluicci makes, it is a false statement. The deviation statistics do not change as a function of N, only the precision of estimates. While the variance of the test distribution becomes smaller, that simply increases the accuracy of estimates of summary statistics including the p-value. Pigluicci is suggesting a magical production of something out of nothing. That does not happen, but if there is a small but real something buried in a large amount of nothing (noise), then the large N can help identify it. This is commonly discussed as a signal to noise ratio. The S/N ratio in psi experiments is small, and that is why large N experiments are useful.

I also note that all the PEAR experimental apparatus were well designed, competently calibrated, and employed in unimpeachable experimental designs. A salient example is the tripolar protocol: The same apparatus was subject to both a high and a low intention (and baselines). The figure of merit ultimately was a comparison of high minus low, which absolutely nullifies any effects of "small biases in the experimental apparatus" should any exist.

Now to your original questions, inline below: [...]

"CEM Hansel has a critique of Robert Jahn's PEAR work in "The
Search for Psychic Power" (1989), ch. 16 - He critiques "An experiment with large data base capability, III: Operator Related Anomalies" (Technical Note PEAR 84003) [which he refers to as Experiment B] and Robert Jahn's article "The Persistent Paradox of Psychic Phenomena" [which he refers to as Experiment A, or the first experiment].
His critique is as follows, and I wonder if any publication has addressed it:
"1. It would appear from the table of results that the experiment was not adequately preplanned. The number of trials varied with each series and with each section in each series. The numbers of trials vary in blocks of 50 as supplied by the automatic provision of 50 trials. The smallest number of trials for conditions PK+ or PK- was 1,750, the largest 8,900, or in terms of blocks of 50, varied between 35 and 178. Since the running average "from a preset origin" was available and seen by the operator, it would have been a simple matter to terminate trials at an appropriate point where either positive or negative
deviation was present."

The PEAR REG experiment was conceived as a long-term replication series. That is, it was not designed with a fixed number of trials. Instead, we specified the number of trials for each "series". This number was originally 5000 per intention, but that was changed to 2500 and ultimately (for the vast majority of the database) to 1000 trials per intention. I do not know where Hansel's numbers in this comment come from, but I can state that PEAR's experiment did not allow "optional stopping". Moreover, it can easily be demonstrated that, assuming the REG device is actually random and there is no effect of intention, optional stopping for individual experiments in the long-term replication cannot introduce a bias. See for example: https://sites.google.com/a/mindmattermapping.org/exchange/home/research-courses/pear-proposition and http://noosphere.princeton.edu/rdnelson/anova.html and Data Selection and Optional Stopping, York H Dobyns, Princeton University -- Presentation at the 24th Annual SSE meeting, May 19-21 2005

"2. A satisfactory control series was not employed.If the
experiment had been confined to the one condition (PK+) and the trials interspersed randomly in which the subject did not see the indicators or attempt to exert his PK efforts, the condition being employed could have been unknown to whoever was obtaining the data until it had been compiled and made public. Alternatively, information regarding the series being presented could have been signaled to the computer and print-out by the subject before the start of
each run."

As noted before, the tripolar design is provides unimpeachable "control" by requiring participants to produce deviations in both directions away from expectation. In addition, the internal baseline (the third pole) provides "control" data within exactly the same conditions as the experimental data. Finally, the PEAR devices were subjected to rigorous calibrations, to determine empirical distribution parameters and to confirm operation to meet design criteria.

"3. The procedure employed was unsatisfactory in the first
experiment, where a single operator was employed who was both experimenter and subject. Instructions (verbatim) as given to the subject are not stated. The subject was confronted with a complete assemblage of apparatus containing a number of dials and switches. He had to make at least one adjustment or make a record before starting each trial or block of trials to show the condition under which he was operating. No details are provided of how this was
done or the forms used to record large amounts of data."

"In the earliest experiments, before the scale of the databases that would eventually be produced was appreciated, operators were allowed to choose secondary parameters (run length, sampling rate, instructed or volitional intention assignment) ad libitum from session to session within a given series." (Dobyns & Nelson, 1998) However, all conditions and parameters were pre-specified, as was the length of the series, so these qualify as formal experiments. In any case these data are included in the complete database, as a matter of proper procedures. They have been analysed separately as well as an included part of the full database. Independent analyses have been made to ascertain whether there is any statistical evidence these data are compromised and the answer is no. See Dobyns, Y.H. (2000). Overview of several theoretical models on PEAR data. Journal of Scientific Exploration 14 (2), pp. 163–194. and Dobyns, Y.H., and Nelson, R.D. (1998). Empirical evidence against Decision Augmentation Theory. Journal of Scientific Exploration 12 (2), pp. 231–257.

He writes in his conclusion on PK experiments, "Jahn's
experiment at Princeton required a vast increase in the number of events that were to be influenced in order to achieve a result comparable to that of Schmidt. In addition, it was unsatisfactory in its method and its procedure. In the case of the second experiment at Princeton, it appears to have been realized that it is necessary to include more than one investigator; but in the course of improving the experimental conditions the number of observations then had to be increased into the millions rather than the thousands
in order to obtain a diminished effect."

There seems to be no question here beyond those treated above, but there is an incorrect inference: "required a vast increase in the number of events". The large numbers of trials was a design intention from the beginning, and a recognition of the small S/N ratio expected in such experiments based on prior art and experience.

Best wishes, Roger"

Critics are also fond of quoting Robert Park's dismissals of Radin, Jahn, and PEAR as put forth in Voodoo Science: The Road from Foolishness to Fraud - I corresponded with Dean Radin on this - as he has specific expertise with this field, his comments are noteworthy - he stated to me (personal communication, May 21, 2014): "I've [sic] love to have an ultramicrobalance to play with. No one has used one in this domain (to my knowledge) because such instruments are expensive. I can justify a $30K EEG system in my lab because it can be (and has been) used for many experiments. I can't justify spending that amount on a speculative experiment that may be used once.

Lack of testable theory is partially correct. The correct part is the lack of a general theory that directly links physics with psi, or for that matter with subjective awareness. The wrong part is that there are plenty of small scale theories that test physical and psychological factors thought to modulate psi performance.

The "trueness" of an RNG is assessed using standardized testing suites such as DIEHARD and the NIST tests. This is mentioned in many studies that used RNGs. In addition, beyond such tests many RNG experiments use the same RNGs under no-influence conditions as empirical controls."

As regards skeptical replication, Radin wrote in Entangled Minds (Simon and Schuster, 2009), "Consider+the+case+of+Stanley+Jeffers"&hl=en&sa=X&ved=0ahUKEwjIltOS36jYAhUE2oMKHWwQBhkQ6AEIKTAA#v=onepage&q=%22can%20skeptics%20produce%20successful%20experiments%3F%22&f=false pp. 284-285:

Consider the case of Stanley Jeffers, a skeptical physicist from York University. In 1992, Jeffers tried to repeat PK experiments similar to those reported by the Princeton Engineering Anomalies Research (PEAR) Laboratory. He wasn’t successful. His skepticism was fueled by another PK study he reported in 1998, which also failed. Then, in 2003 Jeffers coauthored a third study in which he finally reported a repeatable, significant PK effect. So, can skeptics produce successful experiments? Yes, they can. They just hardly ever try.

As regards the Global Consciousness Project, which developed out of the RNG-PK experiments, Richard Samson, provided insight: http://www.wfs.org/blogs/richard-samson/signs-connected-consciousness-detected-global-scale

"After 16 years of monitoring more than 480 world events, researchers report strong evidence of some kind of transpersonal mentality that seems to emerge when many people share a common concern or experience. At such times, a global network of devices employing quantum tunneling has found weak but definite signs of coherence arising out of background "noise" or randomness. [...] The detection system is a global network of random number generators (RNGs) based on quantum tunneling. Up to 70 are active at any one time. Each RNG outputs a continuous stream of completely unpredictable zeroes and ones. The stream ordinarily averages out at 50% ones and 50% zeros, just as flipping coins tends to produce roughly equal heads and tails over time. The RNG data are transmitted to a central archive for later analysis.

When events engage millions of minds and hearts at once, structure seems to emerge out of what would otherwise be randomness. The global network departs a bit from the normal generation of random ones and zeros. The RNGs' behaviors become slightly correlated and the system as a whole appears to shift toward coherence. [...] All efforts to invalidate the data or the conclusions have so far failed. For example, the team compared earthquakes that occurred under the ocean to those occurring on land. The prediction was that only the land-based quakes would produce a significant effect, since quakes at sea have hardly any impact on people. The RNG readings validated this prediction. [...] Nelson supplies a more nuanced picture of the research in a Q-and-A exchange with the author:

SAMSON: You have said your research reveals signs of a conscious influence, but it's not something that should be called mental "signals." Could you explain?

NELSON: Technically, we analyze the random data produced by the devices, and we discover weak or subtle signals emerging from the noise that the devices are designed to produce. The actual nature of the signals is correlations that should not exist between the devices. So the devices have not actually "found" signals; they are the source of the noise or signals that sometimes arise during moments of shared emotion and functional mass consciousness.

SAMSON: How, exactly, are the periods of mass consciousness manifested in the data?

NELSON: What distinguishes such moments is that the behaviors of the far-flung RNGs become slightly correlated. We don't look directly at the proportion of ones or zeros. We just calculate a composite measure of variance across all the RNGs in the network. That variance measure responds to mean shifts in either direction and more so if two or more RNGs shift in the same direction. It is as if a (non-energetic) field blanketed the whole earth and thus all the RNGs in the GCP network, and subtly modulated their behavior.

The basic effects we see in the data are more complicated than changing the percentage of ones and zeros, and indeed we don't look for such changes directly. What we measure is best described as correlated changes of the RNG behavior. That is, the RNGs, which are designed to be fundamentally independent random sources, and which are moreover separated by global distances (average separation 7,000 kilometers), ought to have completely independent behaviors. BUT, what we see is that they tend to become very slightly correlated during great world events.

These "random" devices become slightly synchronized, apparently as if in sympathy with the synchronized mass consciousness. In many ways, this is more remarkable and more instructive than simple deviations or extra bits here and there. But it is considerably more complicated to explain than an excess of heads when flipping coins.

SAMSON: What's the overall importance of the findings, in your view?

NELSON: In the end, the important thing is that the network of physical instruments becomes different; it changes, apparently because of interconnected or synchronized consciousness. The essential thing is that consciousness is not separate from the world, just looking on. It is enmeshed and instrumental, and we should know that so we can use it, to get on with becoming the sheath of intelligence (noosphere) for the earth. That is our terribly important work as human beings -- our evolutionary destiny, as Teilhard de Chardin put it."

The "skeptical" literature has as usual brought about confusion on this issue. Simon Fraser, in a dialogue with Dean Radin, whose expertise is in this area, stated: "Hi Dr Radin. I've been looking at the global consciousness project, and one argument I often see levelled at it is that it apparently wasn't independently replicable or the effects were due to cherry picking. I'm guessing this is nonsense."

To which Radin replied, "> I'm guessing this is nonsense.

Your guess is correct.

The GCP new website (http://global-mind.org/) has an enormous amount of information about the project. The following page in particular describes the scientific background and design: http://global-mind.org/science2.html

Cherry picking and/or data snooping does not take place in the formal hypothesis testing. There are a number of exploratory analyses, but those are clearly labeled as such.

The jury is out regarding independent replication of the GCP, but for a very simple reason: There aren't any other projects like this. However, conceptual replication for smaller scale events has been achieved and documented in many publications."

The article by Nelson & Bancel, "EFFECTS OF MASS CONSCIOUSNESS: CHANGES IN RANDOM DATA DURING GLOBAL EVENTS" (Explore (NY). 2011 Nov-Dec;7(6):373-83. doi: 10.1016/j.explore.2011.08.003.), clarified this, noting on p. 375: "PROCEDURE To set up a formal test, we first identify an engaging event. The criteria for event selection are that the event provides a focus of collective attention or emotion, and that it engages people across the world. Thus, we select events of global character but allow for variation in their type, duration, intensity, and emotional tone. In practice, events are chosen because they capture news headlines, involve or engage millions of people, or represent emotionally potent categories (eg, great tragedies and great celebrations). Once an event is identified, the simple hypothesis test is constructed by fixing the start and end times for the event and specifying a statistical analysis to be performed on the corresponding data. The statistic used for most events is a measure of network variance. It is calculated as the squared Stouffer’s Z across RNGs per second, summed across all seconds in the event. These details are entered into a formal registry before the data are extracted from the archive. We select and analyze an average of two or three events per month. The selection procedure allows exploration, whereas the replication design provides rigorous hypothesis specification for each event. Because the project is unique, with no precedents to provide information on relevant parameters, we began with guesses and intuitions about what might characterize suitable, informative events. Field research on group consciousness11-14 suggests that synchronization or coherence of thought and emotion may be important factors, so we typically select major tragedies and traditional celebratory events that bring large numbers of people together in a common focus. Although many observers assume we can and should follow a fixed prescription to identify “global events” this is not straightforward. To give specific examples, we could select a disaster if it results in, say, more than 500 fatalities. But this would likely exclude slow-moving but powerfully engaging events such as volcanic eruptions or major hurricanes, and it would fail to identify emotionally powerful, extremely important incidents like the politically disruptive attack that destroyed the Golden Dome Mosque in Iraq in February 2006, but killed relatively few people. What we try to do is to identify, with the help of correspondents around the world, events that can be expected to bring large numbers of people to a shared or coherent emotional state. The following is a partial, illustrative list of criteria that we use for event selection, with examples: 1. Suddenness or surprise. Terror attacks, especially when they galvanize attention globally. 2. Fear and compassion. Large natural disasters, typhoons, tsunamis, earthquakes. 3. Love and sharing. Celebrations and ceremonies like New Years, religious gatherings. 4. Powerful interest. Political and social events like elections, protests, demonstrations. 5. Deliberate focus. Organized meetings and meditations like Earth Day, World Peace Day. Experience has led to considerable standardization, and for some kinds of events, predefined parameters can be applied. For example, events that repeat, such as New Years, Kumbh Mela, or Earth Day, are registered with the same specifications in each instance. For unexpected events, such as earthquakes, crashes, bombings, the protocol typically identifies a period beginning at or near the moment of occurrence, followed by time (typically six hours) for the spreading of news reports. About half the events in the formal series are identifiable before the fact. Accidents, disasters, and other unpredictable events must, of course, be identified after they occur. To eliminate a frequent misconception, we do not look for “spikes” in the data and then try to find what caused them. Such a procedure, given the unconstrained degrees of freedom, is not statistically viable. There is no data mining, and there is no post hoc inclusion or exclusion of events. All events are entered into the formal experiment registry before the corresponding data are extracted from the archive. For details, see http://noosphere.princeton.edu/pred_formal.html. The analysis for an event then proceeds according to the registry specifications. All registered events are reported, whatever the outcome." (emphasis added)

A list of conceptual replications is provided here.

On to important papers:

Radin & Ferrari (1991). Effects of consciousness on the fall of dice: A meta-analysis.

May et al (1980). ELECTRONIC SYSTEM PERTURBATION TECHNIQUES.

Jahn (1982). The persistent paradox of psychic phenomena: An engineering perspective.

Schmidt (1987). The strange properties of psychokinesis.

Radin & Nelson (1989). Evidence for consciousness-related anomalies in random physical systems.

Schmidt (1990). Correlation between mental processes and external random events.

Schmidt (1993). Observation Of A Psychokinetic Effect Under Highly Controlled Conditions.

Bierman (1996). A statistical database of all major RPK experiments.

Jahn et al (1997). Correlations of Random Binary Sequences with Pre-Stated Operator Intention: A Review of a 12-Year Program.

Nelson et al (2002). Correlations of continuous random data with major world events.

Crawford et al (2003). Alterations in Random Event Measures Associated with a Healing Practice.

Freedman et al (2003). Effects of Frontal Lobe Lesions on Intentionality and Random Physical Phenomena.

Bierman (2004). Does consciousness collapse the wave function?

Dobyns (2004). The megaREG experiment: Replication and interpretation. (in "Reexamining Psychokinesis" (below), Radin et al note of this study, "the experiments in question unequivocally demonstrate the existence of a PK effect that is contrary to conscious intention, of high methodological quality, and established to very high levels of statistical confidence.")

Jahn & Dunne (2005). The PEAR Proposition.

Bosch et al (2006). Examining Psychokinesis: The Interaction of Human Intention With Random Number Generators—A Meta-Analysis (the authors refute their own proposal that publication bias could explain the results on p. 515 of this text)

Radin et al (2006). Reexamining psychokinesis: Commentary on the Bösch, Steinkamp and Boller meta-analysis.

Kugel (2011). A Faulty PK Meta-Analysis (argues that the Bosch et al meta-analysis is faulty, contains the following relevant comments - p. 59: "As mentioned in the beginning of this paper, ESP studies need target sequences that correspond to chance expectation, whereas PK studies measure the deviation of target sequences from chance expectation. Therefore, if ESP studies are merged with PK studies in a PK-MA, which measures the deviation from chance, the overall result should more closely approximate random expectation when more ESP studies are added. In the BSB-MA, 40 of the 302 studies were ESP studies. This signiﬁcantly reduces the overall result of the PK studies", and on p. 61: "BSB arbitrarily excluded genuine “intentional” PK studies and included “nonintentional” ESP studies (such as mine). This might explain a z-score of +3.59 (excluding the 2004 Dobyns et al. data), which is much lower than in the previous three PK meta-analyses")

Radin et al (2006). Assessing the Evidence for Mind-Matter Interaction Effects

Radin (2006). Experiments testing models of mind-matter interaction.

Radin et al (2008). Effects of Distant Intention on Water Crystal Formation: A Triple-Blind Replication. (as regards the reception of this study, see this).

Radin. (2008). Testing nonlocal observation as a source of intuitive knowledge.

Nelson & Bancel (2011). Effects of mass consciousness: Changes in random data during global events.

Radin et al (2012). Consciousness and the double-slit interference pattern: Six experiments

Shiah & Radin (2013). Metaphysics of the tea ceremony: A randomized trial investigating the roles of intention and belief on mood while drinking tea.

Radin et al (2013). Psychophysical interactions with a double-slit interference pattern

Baer (2015). Independent verification of psychophysical interactions with a double-slit interference pattern. (ABSTRACT: "Approximately 50 GB of data from interference patterns from double slit experiments suggesting psychophysical interactions recently reported in this journal [Radin et al., Phys. Essays 26, 4 (2013)] were independently analyzed at our Nascent Systems Inc. (NSI) facilities. The method of analysis sought to avoid complex statistical analysis as much as possible and look for simple correlations between mental efforts by participants and directly observable effects on the photon counts of minima seen in the interference patterns. The mean and standard deviation of the normalized percent difference counts between mental concentration and relaxation periods in 1435 trial files analyzed was 0.00185% and 0.0574, respectively. The same difference effect in control data generated by Robots with no Human involved was - 0.000475% and 0.06, respectively. These analyses of the interference pattern showed a small but consistent bias that indicates human thought interaction with material may be measurable and real. However, physical noise in the system is too high to justify such conclusions. In addition, a speculative outline of a psychophysical theory is provided that suggests specific instructions designed to guide the mental concentration, which may amplify the physical interference effect in future experiments. Before concluding that a psychophysical effect has been measured by these experiments, it is recommended that experiments using a substantially lower noise power supply, timing control, and instructions based on applicable quantum interpretations be conducted.")

Radin et al (2015). Reassessment of an independent verification of psychophysical interactions with a double-slit interference pattern.(Abstract: "In an independent analysis of data from a double-slit experiment designed to investigate von Neumann-like psychophysical interactions, Baer [Phys. Essays 28, 47 (2015)] concluded that shifts in interference pattern minima showed a small but consistent effect in alignment with what we had previously reported. But because the standard deviation of those measurements was large compared with the mean, Baer concluded that the optical system was not sufficiently sensitive to provide convincing evidence of a psychophysical effect. However, this type of assessment should rely on standard error, not standard deviation. When the proper statistic is employed, Baer's calculations show a modest but statistically significant deviation of the central minima in data contributed by human observers (p = 0.05, one-tail), but not in sessions contributed by robot “observers” (p = 0.26, one-tail). In addition, when considering the central minimum along with eight other minima, the human-observed grand mean was significantly larger than the robot-observed grand mean (p = 0.008). Thus, Baer's independent analysis confirmed that the optical apparatus used in this experiment was indeed sensitive enough to provide evidence for a psychophysical effect.")

Radin et al (2015). Psychophysical interactions with a single photon double-slit optical system.

Radin et al (2016). Psychophysical modulation of fringe visibility in a distant doubleslit optical system.