The following is the text
of Department of Defense Polygraph Institute report DODPI94-R-0008, A
Comparison of Psychophysiological Detection of Deception Accuracy Rates
Obtained Using the Counterintelligence Scope Polygraph and the Test for
Espionage and Sabotage Question Formats. This report is available from the
Defense Technical Information Center (DTIC) as report #ADA319333, and was also
reprinted in the American Polygraph Association quarterly, Polygraph,
Vol. 26, No. 2 (1997), pp. 79-106.
The text presented here is
taken from the Polygraph reprint. Page numbers are provided between
curly braces for citation purposes. Corrections in brackets were added to the
reprint text.
{79}
A COMPARISON OF PSYCHOPHYSIOLOGICAL DETECTION
OF DECEPTION ACCURACY RATES OBTAINED USING THE COUNTERINTELLIGENCE SCOPE
POLYGRAPH AND THE TEST FOR ESPIONAGE AND SABOTAGE QUESTION FORMATS
By
Department of Defense Polygraph Institute
Research Division Staff
June 1995
Abstract
This study was designed to compare the decision accuracy rates obtained using a
new psychophysiological detection of deception test, the Test for Espionage and
Sabotage (TES) to those obtained using two versions of the counterintelligence
scope polygraph (CSP) format; the CSP format using probable lie control (PLC)
questions (CSP-PLC), and the CSP format using directed lie control (DLC) questions
(CSP-DLC). The TES format differs from the CSP formats in that: (a) the number
of issues being tested in a question series is reduced; (b) a maximum of three
question repetitions are used to calculate question scores; (c) between-test
stimulation is eliminated; (d) the order of questions within the question
sequence cannot be altered; (e) each relevant question is compared to the same
control questions; (f) the pretest is brief, more standardized, and follows a
logical sequence of information presentation; and (g) problems associated with
PLC questions are reduced by using DLC questions. The 277 examinees included in
the analyses were recruited from the communities surrounding
{80}
Federal agencies use three
basic types of psycho physiological detection of deception (PDD) examinations:
Pre-employment, security screening, and specific-issue criminal examinations. Several
authors have summarized the research conducted to assess the validity of
specific-issue criminal examinations (Ansley, 1990; Kircher, Horowitz, &
Raskin, 1988; McCauley & Forman, 1988; Raskin, 1989). Little research has
been conducted with either pre-employment or security screening examinations. During
security screening examinations, the majority of the Department of Defense (DoD) agencies utilize the counterintelligence scope
polygraph (CSP) format. Although there is widespread use of CSP security
screening examinations--the DoD reported 17,970 examinations conducted during
fiscal year 1993 (Department of Defense, 1993) -- the analog studies, to date,
suggest that when the CSP format is utilized, 94.9% of the programmed innocent
(PI) examinees are correctly identified, but only 43.2% of the programmed
guilty (PG) examinees are correctly identified.
Using four different
security screening formats, Barland, Honts, and Barger (1989) assessed the
accuracies of decisions identifying examinees as guilty or PI of enacting a
mock crime. Examiners from four government agencies conducted PDD examinations
utilizing their agency's format. The formats included a CSP format in which
standard probable lie control (PLC) questions were asked (CSP-PLC), a CSP
format in which directed lie control (DLC) questions were asked (CSP-DLC), and
two variations of relevant-irrelevant (R/I) formats. The authors reported that
97.2% of all the PI examinees were correctly identified but only 33.7% of all
the PG examinees were correctly identified. Differences among the decision
accuracies obtained by examiners utilizing the four formats were significant. Decisions
based on the results of CSP-PLC tests were the least accurate (8%) in
identifying PG examinees, and decisions based on the results [of] CSP-DLC tests
were the most accurate (48%) in identifying PG examinees.
However, there were several
flaws in the design and analyses of the study. The most critical flaw concerns
how the authors reported correct decisions. If, based on the test results, the
examiner's decision was that deception was indicated, the examiner attempted to
obtain a confession from the examinee. If the examiner was unsuccessful in
obtaining a confession, another examination was conducted. If the examiner's
decision, based on the second examination, was that no deception was indicated,
then the reported decision for that examinee was no deception indicated. Therefore,
if a PG examinee was correctly identified but did not confess, and the result
of the second test indicated the examinee was truthful, the examinee was
reported as a miss. The rationale for this procedure was to simulate field
situations. In most screening examinations, when an examinee's physiological
responses to the relevant questions indicated deception, the examinee was
questioned and then the examination was rerun.
There is concern regarding
whether the psychological significance of the relevant questions for an
examinee in a mock laboratory situation is equivalent to the psychological
significance of the relevant questions for an examinee in an actual field
examination (Furedy, 1986; Iacono & Patrick, 1988). Furedy (1986), and
Iacono and Patrick (1988) suggest that the psychological significance of the
relevant questions is less for examinees in a mock laboratory situation. There
is little research concerning the affect [sic] of retesting, in a mock
situation, on the
{81}
accuracy of PDD test decisions. However, it
is well established that physiological responses to a less significant stimulus
will habituate faster (O'Gorman, 1977; Sokolov, 1963). Therefore, due to rapid
habituation of the less significant stimulus, fewer PG examinees might be
identified with repeated testing. In the Barland, et al. (1989) study,
once the PG examinees knew they had been caught (i.e., confronted with
the initial deceptive decision)--even if they did not confess--the
psychological significance of the relevant questions may have been reduced. Therefore,
if the results of only the first test were considered, more PG examinees were
correctly identified than the reported results indicate.
Another criticism of the
study concerns the wording of the relevant questions. Examinees were asked if
they had committed espionage or sabotage "against the
The only other laboratory
study concerned with screening, conducted by Honts (1989), was not designed to
test the validity of the CSP format but to compare the accuracies of decisions
identifying PI and PG examinees using two different sets of relevant questions.
Eighty-nine percent of the PI examinees were correctly identified but only 58%
of the PG examinees were correctly identified. No difference was found between
the accuracies of the decisions as a function of the two sets of questions. A
detailed report of the research was not written,
therefore it is difficult to evaluate the study. However, one possible problem
with the design was that the PG examinees were allowed only 10 minutes to
execute a complex scenario that included memorizing a lengthy article. Another
problem with the study may have been the way the examiners' decisions were reported.
The report states, "The CSP examinations were administered just as if they
were being given in the field." (Honts, 1989, p. 4).
This suggests, but does not state, that decisions were reported in the same
manner as in the Barland et al. (1989) study. If decisions were reported
in that manner, then the accuracy of the decisions identifying the PG examinees
may have been higher than indicated by the report.
Although the report results
of the first study are suspect, combined with the results of the second study
they suggest that decisions based on CSP test data are not highly accurate in
identifying PG examinees--at least in a laboratory situation. This study was
therefore completed to compare the accuracy of decisions obtained concerning PG
and PI examinee's [sic] veracity using a new screening test format, the Test
for Espionage and Sabotage (TES), to that obtained using the CSP-PLC and
CSP-DLC question formats.
{82}
TES Development
Theoretical Basis: Significance/attention model
The relationship between arousal and attention is well established for
respiratory (Obrist, 1981; Sokolov, 1963), electrodermal (Dawson & Schell,
1982; Dawson, Schell, Beers, & Kelly, 1982; Kilpatrick, 1972; Kimmel, van
Olst, & Orlebeke, 1979; Nikula, 1991; Ohman, 1979), and cardiovascular
(Coles & Duncan-Johnson, 1975; Coles & Strayer, 1985; Jennings, 1986a;
Jennings, 1986b) measures of arousal. When a significant change in sensory
stimulation occurs, attention shifts to focus on the input. If the significant
change is perceptual (increased volume, novel stimulus, etc.) the attention
shift is referred to as an orienting response (OR). The physiological arousal
associated with an OR is well documented (
Several studies have
indicated clearly that the act of lying is not a necessary condition for a PDD
examination to yield accurate results (Davis, 1961; Dawson, 1980; Gustafson
& Orne, 1965; Kugelmass, Lieblich, & Bergman, 1967; Orne, Thackray,
& Paskewitz, 1972; Thackray & Orne, 1968; Waid, Orne, & Wilson,
1979). This author would argue further, that PDD tests detect neither deception
nor guilt, but merely reflect relative degrees of physiological arousal. The
scores assigned based on the physiological arousal indicate only that the
examinee focused attention more when one type of question (relevant or control)
was asked compared to when the other type of question was asked. The greater
focusing of attention indicates a more significant stimulus. Therefore, the
physiological responses are used to infer a focusing of attention due to the
significance of the questions. However, inferences about why the relevant
questions are significant must be cautious. One (the most probable one) of the
reasons the relevant questions are significant to an examinee is because the
examinee is being deceptive.
The design of this study
and the development of the TES format were based on the previous
hypothesis--the test does not assess deception, but merely indicates a relative
degree of arousal from which significance of the stimulus is inferred. Therefore,
the decisions for the TES are either: (a) significant responding (SR) occurred
following the relevant questions, (b) no significant response (NSR) occurred
following the relevant questions, or (c) the responding following the relevant
questions was inconclusive (INC). It is the examiner's job to eliminate,
{83}
during the pretest, possible confounding reasons that would cause the relevant
questions to be significant to the examinee. Then if the examinee does respond
physiologically when the relevant questions are asked, the examiner must
ascertain why the relevant questions were significant to the examinee.
Control Questions
The standard control
question is the PLC, in which the examiner manipulates the examinee so the
examinee's answer to the question probably is a lie. The major difficulty with
using the PLC is the need to increase the psychological significance of the PLC
for the examinee. The examinee must believe that responses elicited by the PLC
and relevant question are equally important. The examiner must be skillful
enough to increase sufficiently, but not too much, the significance of the
control question. This requires the examiner to be able to "read" the
examinee's stress level and know what level is appropriate. Additional problems
associated with the PLC include: (a) the perception that they are intrusive,
offensive, and/or embarrassing to some examinees (due to the nature of the
associated psychological manipulations); (b) the occasional difficulty of
developing a PLC which excludes all aspects of the relevant issue; and (c)
possible difficulty associated with maintaining the psychological significance
of PLC questions during repeated testing (as sometimes occurs with security
screening examinations). DLCs eliminate most, if not all, of the problems
associated with PLCs because: (a)) they require little or no psychological
manipulation, (b) they are easy to explain and it is easy to justify their
purpose, (c) the examinee can readily answer them and the veracity of the
answer is not in question, (d) they are less sensitive to examiner competence
(no psychological manipulation required), (e) the questions and the procedures
for introducing them to the examinee are easily standardized, (f) they are not
personally intrusive, so they are not offensive or embarrassing, and (g) they
can be constructed so they do not overlap the relevant issues.
Based on the positive
results of research with DLC questions (Abrams, 1993; Barland, 1981; Honts
& Raskin, 1988; Horowitz, 1989; Raskin & Kircher, 1990; Reed, 1990;
Reed, 1995), and the numerous advantages of the DLCs, it was decided to
include, in the TES format, DLC questions rather than PLC questions. However,
each study which used DLCs implemented the DLCs differently regarding: (a)
when, during the examination, the acquaintance test was conducted; (b) the
rationale for conducting the acquaintance test; (c) the rationale for including
the DLC questions; (d) the construction of the DLC questions; and (e) how the
DLC questions were pretested. With respect to these issues, the following
decisions were made, and incorporated into the TES testing procedures. The
acquaintance test (ACQT) is a standard known solution numbers test (Department
of Defense Polygraph Institute, 1994a). During a TES examination, the ACQT is
conducted prior to the review of the relevant and control questions. The
rationale for this is when the ACQT is presented immediately after the
explanations of the instrumentation and the physiology and prior to question
review, it enhances the logical flow of the pretest. The examinee is told that
the purpose of the ACQT is to: (a) demonstrate the examination process to the
examinee, (b) allow the examinee to become accustomed to the components and
procedures, (c) allow the examiner an opportunity to adjust the instrument, and
(d) allow the examiner to make sure the examinee is physiologically capable of
responding when lying. In
{84}
addition, the rationale for including the
DLCs can be explained more easily and efficiently if the ACQT was conducted
previously. The examiner refers to the ACQT during the explanation of the
purpose of the DLC questions--to make sure the examinee continues to respond
physiologically when lying, just as occurred on the ACQT. The examinee is told
that if she or he does not continue to respond physiologically when lying, the
test results will be inconclusive.
The DLCs are pretested in
the following standardized manner. The examiner minimizes the behavior to be
discussed ("this is something we all have done"). The examiner asks
the DLC question being pretested ("Have you ever violated a minor traffic
law?") and obtains a verbal commitment from the examinee that the examinee
has, in fact, engaged in such behavior. If the examinee denies having engaged
in the behavior, the examiner is required to utilize a different DLC. Examiners
are not permitted to try to convince the examinee, with examples or suggestions, that the examinee had engaged in the behavior. Once
a verbal commitment is obtained, the examinee is asked to think of a specific
occasion during which the examinee engaged in the behavior. The examinee is
instructed not to tell the examiner about the incident but only to think about
it. Examiners are not allowed to suggest that the examinee think about the most
recent or most significant incident, but only an incident. After the examiner
has obtained a verbal commitment that the examinee has a specific incident in
mind, the examiner repeats the question and instructs the examinee to think
about the specific incident and then to lie by answering "no" to the
question. Finally, the examinee is instructed that when the question is asked
during the test, the examinee is to think about the incident and then lie by
answering "no."
Examiners are instructed to
be sure that the examinee actually thinks about the incident because cognitive
processing results in increased physiological activity (
Question Sequence
Reed (1995) reported
research on a new format in which the relevant and control questions were
repeated within the same question sequence. However, unlike standard control
question PDD tests, the question sequence was asked only once. The results
indicated that, when using a mock screening paradigm to program examinees
guilty, 81.5% of the PI examinees and 73.9% of the PG examinees were correctly identified.
With minor revisions, the TES format was developed directly from this previous
research. The sequence contains two different irrelevant (IR1 and IR2)
questions, two different control (C1 and C2) questions, two different relevant
(R1 and R2) questions and a sacrifice relevant (Sr) question. The question
sequence is IR1 IR2 Sr C1 R1 R2 C2 R1 R2 C1 R1 R2 C2.
{85}
Pretest Phase
A standardized pretest was
developed. The examinee is given a brief introduction to the procedures and
asked to sign a form indicating his consent to be given the PDD examination. Next,
the examinee's physical suitability to undergo the examination is assessed. Then
the operation of the polygraph instrument is explained and a brief explanation
of physiological responding is given. Next, the ACQT is introduced as an
opportunity to demonstrate the procedures to the examinee and to assess the
examinee's physiological suitability. After the ACQT is conducted and the
results are presented to the examinee, the test questions are reviewed with the
examinee. The examiner reviews, with the examinee, the three relevant questions
(including the sacrifice relevant question), the two control questions, and the
two irrelevant questions, in that order. The precise meaning and intent of each
relevant question is explained so the examinee fully understands what behaviors
the question includes. If the examinee has any problem understanding a relevant
question, alternative relevant questions are available.
Testing Phase
During the administration
of the TES, the examiner is not allowed to insert, into the question sequence,
more than two irrelevant questions in succession. The inter-question interval
(question onset to question onset) is 20 to 30 seconds, with an average of 25
seconds. If a physiological response occurs timely to a
question but could have been caused by other factors (movement, orienting
response to outside noise, etc.), it is referred to as an artifact and the
question cannot be scored. If an artifact occurs during the asking of a
TES relevant question, in order to provide three scorable physiological
responses for that question, the examiner is required to conduct a "short
test" with the following question sequence: IR1 IR2 Sr C1 R1 R2 C2.
Test Scoring
Tests are scored using the
7-point scoring criteria taught at the Department of Defense Polygraph
Institute (DoDPI), in which the relevant strength of the physiological
responses to a relevant question is compared to the relative strength of the
physiological responses to a control question (DoDPI, 1994b). A positive score
is assigned if the physiological responses to the control question are greater
than those to the relevant questions. A negative score is assigned if the
physiological responses to the relevant question are greater than those to the
control questions. In most PDD security screening tests, each relevant question
is compared to the stronger (relatively more responding) of the two control
questions that bracket it. However, based on previous research (Reed, 1995), the
first repetition of the first control question is not used when scoring a TES
examination. The physiological responses to the first repetition of R1 and R2
are compared only to the physiological responses to the first repetition of the
second control question. There are three scores for each relevant question,
because each relevant question is repeated three times. The three scores are
summed to provide one score for each relevant question. If a short test was
conducted, only the relevant question to which the artifact occurred is scored.
The other relevant question on the short test is not scored.
{86}
Decision Criteria
Multi-issue
examinations, in which different relevant questions address separate issues,
typically require a score of +3 or greater for each relevant question, for an
NSR decision to be rendered. This decision was based on the belief that each question is related to
a separate issue and therefore should be treated separately. However, research
suggests that when an SR decision is rendered, the strongest physiological
responses are not always to the question to which the examinee is being
deceptive (Barland, 1981; Barland, et al., 1989; Correa & Adams,
1981; Raskin, Kircher, Honts, & Horowitz, 1988). These studies reported
that the accuracies of the decisions for detecting deception decreased when
responding to specific questions was assessed. Thus, an examinee who committed
sabotage might respond physiologically to a question regarding the disclosure
of classified information, but not to a question regarding sabotage. Therefore,
decision criteria should be based on the test as a whole, not on responses to
individual questions. Reed (1995), thus, adopted the following decision
criteria for scoring TES examinations. An NSR decision is rendered if the
scores for both questions are positive and they sum to +4 or greater. An SR
decision is rendered if the score for either question is -3 or less or if the
scores for both questions are -2 (total score of -4). If the scores do not meet
either the NSR or the SR criteria, the decision is INC.
Standardization
Other aspects of the TES
format also were standardized. First, the number of artifact-free questions
required to calculate a score was standardized. With many PDD formats, the same
decision criteria (-3 or less for a deceptive decision) are utilized to reach a
decision, whether the score was calculated from two repetitions of the
questions or from five repetitions (Department of Defense Polygraph Institute,
1994c; Honts & Raskin, 1988; Horowitz, 1989; Raskin, 1982). Examiners using
the TES format are required to calculate scores from the physiological
responses to three artifact free repetitions of each question. Second, the
sequence in which the questions are asked was standardized. With many PDD formats,
the sequence of questions is repeated multiple times (usually 3). With each
repetition, the examiner might change the sequence in which the questions are
asked. Federal examiners are allowed to modify the sequence of the questions
based on their subjective opinions (Department of Defense Polygraph Institute,
1992), whereas Raskin and his colleagues (Honts & Raskin, 1988; Horowitz,
1989; Kircher & Raskin, 1988) systematically and objectively modify their
question sequence. The sequence of TES questions is not repeated. Therefore,
there is no option to modify the question sequence. Third, between successive
repetitions of the question sequence, some examiners interact with the examinee
by discussing the examinee's perception of the questions (Horowitz, 1989;
Podlesney & Raskin, 1977; Raskin, 1982). This form of interaction is not
standardized. The question sequence is not repeated with the TES format. Therefore
there is no opportunity for between-test interaction. Finally, the dialogue for
administering each of the individual components of the pretest was standardized
by providing explicit outlines and examples. This includes: (a) the
administration of the ACQT (as described above), (b) the rationale and
presentation of the DLC questions (as described above), (c) the explanations
regarding the polygraph instrument and the
{87}
physiological responses, and (d) the logical
sequencing of the presentation of these components of the pretest.
Methods
Examinees
Three hundred and six
examinees were recruited by a local employment agency under contract to the
Department of Defense Polygraph Institute and were paid $30.00 for their
participation. Individuals who met the following criteria were excluded from
participation: (a) less than 19 or more than 60 years of age, (b) not in good
health, (c) pregnant, or (d) did not have the equivalent of a high school
diploma. One hundred thirty-nine male (M = 26.7, SD = 7.8) and
167 female (M = 28.2, SD = 8.8) examinees were scheduled for
testing. There were 69 PI and 33 PG examinees assigned to the CSP-PLC group, 70
PI and 32 PG examinees assigned to the CSP-DLC group, and 67 PI and 35 PG
examinees assigned to the TES group.
Examiners
Twelve certified examiners
(11 males and 1 female) from the Office of the Secretary of the Air Force
(OSAF) and 6 (5 males and 1 female) from the United States Army Intelligence
and Security Command (USAINSCOM) conducted the examinations. The examiners had
an average of 6.5 years of experience, with a range of 1.5 to 19 years. Selection
of the examiners was determined by the agencies. Although examiner selection
was not random (selection criteria generally involve availability and
experience), the examiners were considered representative of the CSP examiner
population. Examiners were assigned randomly to administer one of the three PDD
formats, with the restriction that a format was utilized by two INSCOM and 4
OSAF examiners. Examiners received four hours of training to familiarize them
with the format, pretest, scoring rules and control questions to be used. They
conducted two practice examinations before conducting an examination for the
project. Each examiner completed two 4-hour examinations (morning and
afternoon) on seven days and one 4-hour examination on three days for a total
of 17 examinations each. The examiners were not given any information regarding
the base rates. They did not receive feedback regarding the accuracy of their
decisions until the end of the study, and they were blind as to whether the
examinee was PG.
Apparatus
The examiners used standard
field polygraph instruments manufactured by either Lafayette or Stoelting. Standard
respiratory, electrodermal, and cardiovascular responses were recorded. The
electrodermal component was operated in the manual mode. The examinations were
conducted individually in large (20 x 20) rooms in a building located on Ft. McClellan.
The scenarios used to program examinees guilty were enacted in another building
located approximately two miles from the examination building. There were no video recording devices nor one-way mirrors in the
examination rooms. The examinations were audio taped.
{88}
Scenarios
The PG examinees enacted
one of four mock scenarios. Each scenario was representative of one of the four
relevant questions. The "espionage" scenario required one examinee to
steal a classified document from an office and give the document to a second
examinee. The second examinee received the document and placed it inside a
vehicle located in the parking lot. Examinees who enacted the
"sabotage" scenario, stole either a classified document or a
classified computer disk. The examinee either put the document through a paper
shredder or with a pair of scissors, cut the disk into pieces. An examinee who enacted the "unauthorized contact" scenario
was asked to meet with a German agent who was sitting in a car in the parking
lot. The agent requested that the examinee obtain some classified information
to be given to the agent at a later time. During the enactment of the
"unauthorized disclosure" scenario, the scenario setter was called
out of his office midway through briefing the examinee regarding some
classified computer information. A third person, who appeared to be fixing a
window screen, entered the office and attempted to engage the examinee in
conversation regarding what the examinee had been told. All PG examinees
received $100.00 as payment for their participation in the "crime." In
addition, all PG examinees wrote a statement indicating that "for the
purposes of this project" they had engaged in espionage, sabotage,
unauthorized contact, or unauthorized disclosure, depending on which scenario
they enacted.
The author did not believe that fear or guilt could be instilled in the
examinees. Therefore, the scenarios were not intended to convince the examinees
that they had done anything wrong. However, because it is assumed that
physiologic; [sic] responses occur during PDD examinations due to the
significance of the questions, an attempt was made to make the relevant
questions significant to the examinees through cognitive means as well as
through the behavioral component (i.e., their actual participation in
the scenario). Therefore, the scenario setters colluded with the examinees to
"beat" the examiners.
Formats
Three separate PDD formats
were employed. Currently, four different relevant questions, each of which is
asked once within the sequence of questions, are included in the CSP-PLC
format. The sequence of questions is repeated three times, with a short break
between each repetition. Probable lie control questions are included in the
question sequence. The CSP-DLC format is identical to the CSP-PLC format except
the control questions are DLCs rather than PLCs. During this study, examiners
who utilized the CSP formats (PLC or DLC) followed the guidelines established
and taught at the DoDPI, (1992) with respect to the administration and scoring
of the test, and the decision criteria applied to the scores.
Four relevant questions are
also repeated three times during the TES examination format, but the sequence
of questions differs substantially from that used during the CSP examination. Two
of the four relevant questions are repeated three times each during the first
series of questions (TES sub-test A) and the other two relevant questions are
repeated three times each
{89}
during the second series of questions (TES
sub-test B). The TES examination format requires two question series, rather
than the three series used during a CSP examination. The examiner does not
mention the sub-test B relevant questions until after sub-test A testing has been completed (including any post test
discussions). The same or different DLCs are utilized on the two sub-tests. During
this study, if the decision based on the results of the first sub-test was SR,
then the second sub-test was not conducted, and if the decision for either
sub-test was SR, the examinee was considered SR.
Questions.
Barland et al. (1989) were criticized for using relevant questions
containing the phrase "against the United States." Based on the
significance/attention model proposed earlier, a test will be more accurate if
the examinee's attention is focused on the actual issues being tested. Field
examiners apply this principle to test question construction by including
"time bars" or "situation bars" which are caveats to a
question. The caveats are intended to direct the examinee's attention to a
specific time or situation (e.g., prior to 1993 have you ever ... ). Therefore, in order to focus the PG examinee's
attention on the mock scenario, the caveat "during this project" was
added to each question. The same four relevant questions were asked with each
format. The relevant questions were: (a) Sabotage,
"Have you committed an act of sabotage during this project?" (b)
Espionage, "Have you committed an act of espionage during this
project?" (c) Disclosure, "During this project, have you provided
classified material to an unauthorized person?["]
and (d) Contact, "During this project, have you had unauthorized contact
with a foreign national?" During the pretest phase of each examination,
the precise meaning of each relevant question was explained to the examinee. In
order to standardize the control questions, a list of ten PLC questions was
developed for use with the CSP-PLC format and a list of ten DLC questions was
developed for use with the CSP-DLC and TES formats. The specific DLCs were
chosen because they: (a) concerned trivial behaviors (e.g., minor
traffic violations), (b) were questions that were not likely to appear
personally intrusive, and (c) were questions that did not overlap with the
relevant issue. The same sacrifice relevant (Sr) question (Regarding the
project security questions, do you intend to answer truthfully?) and the same
list of four irrelevant questions were used by all examiners.
Procedures
During each session,
eighteen examinees were given information regarding the research project, their
participation, and the PDD examination. If they agreed to participate, they
signed a form indicating their consent to participate in the research project. The
examinees were taken in groups of two either to another building to be
programmed guilty, or to the testing site. The PG examinees received
information regarding the purpose of the scenario and signed an additional consent
form indicating their agreement to participate in the scenario. After they
enacted one of the scenarios, they were transported to the testing site. The
transportation of the examinees to the testing site was timed so the examiners
were not able to discern which examinees were PI and
which were programmed guilty.
The examinations were
conducted and each examiner provided a numeric score and a decision (SR, INC,
NSR) based on the numeric score, for each test. The decisions were rendered
{90}
according to the decision criteria for the
format utilized. An NSR decision concluded the examination (NSR to both
sub-tests for the TES format). If the decision was INC, the examiner briefly
discussed the questions with the examinee to determine if the examinee
understood the questions. Then, the test was administered again. If, based on
the data from the second test, the examiner's decision was INC, then the decision for that examinee was INC. When the
examiner rendered an SR decision, the examiner confronted the examinee with the
results.
Programmed guilty examinees
were instructed to confess their guilt if they were confronted by the examiner,
but not to reveal any details of their activities. Once a PG examinee
confessed, the examination was concluded. However, a PI examinee who responded
significantly to the relevant questions--a false positive (FP) decision--was
questioned by the examiner to determine if there was a legitimate,
"real-world" explanation for the examinee's physiological response to
the relevant questions. The examiner recorded any information provided by the
examinee and concluded the examination. Two examiners, otherwise not involved
with the study, independently evaluated the information obtained from the
examinees who received FP decisions. If the two examiners agreed that the
information was significant enough to justify the examinee's physiological
responding--a false positive decision with justification (FPWJ)--then that
examinee's data was not included in the original data analyses. All of the
examinees tested during a session were debriefed simultaneously. Examinees who
participated in mock scenarios returned the $100.00.
Data Reduction and Analyses
The data from 277 examinees
were included in the analyses. The remaining 29 examinees were excluded for the
following reasons: Eight PG examinees confessed their guilt to the examiner
prior to the examination; six examinees were not medically suitable to be
tested; four examinations were incomplete; three examinees were DoDPI employees;
and eight FPWJ examinees were excluded. The differences in the number of
excluded examinees in each of the three groups were not significant.
If the scoring based on the
physiological responding during an initial test resulted in an inconclusive
decision and a second test was conducted, unless otherwise indicated, only the
result of the second test was included in the analyses. The percentages of
correct and incorrect decisions were calculated for each group. Simple
proportionality tests were conducted to determine if differences between sets
of percentages were significant. Unless otherwise stated, the significance
criterion was set at .05 using a two-tailed probability distribution.
Three examiners who did not
conduct any of the examinations each scored a different third of the test
conducted with each of the three formats. The blind raters rendered decisions
based solely on their scoring of the recorded physiological reactions, whereas
the original examiner's [sic] scoring, and therefore their decisions, might
have been influenced by their interactions with the examinees.
{91}
Results
The major finding was that when a conclusive decision was made (i.e.,
inconclusive decisions were excluded) the decisions of the examiners who administered
the TES format were significantly more accurate (83.3%) identifying the PG
examinees than were the decisions of the examiners who administered either the
CSP-PLC (55.6%) or the CSP-DLC (58.6%) format. There were no significant
differences among the accuracies of the examiners' decisions identifying the PI
examinees.
Original Examiners' Decisions
Table 1
Number of Correct Decisions, Inconclusive (INC) Decisions, and Errors Made by
the Examiners in Identifying Programmed Guilty and Programmed PI Examinees
|
Decisions |
||
Format |
Correct |
INC |
Errors |
Programmed Guilty Examinees |
|||
CSP-PLC |
15a |
2 |
12 |
CSP-DLC |
17a |
2 |
12 |
TES |
25b |
0 |
5 |
Programmed PI examinees |
|||
CSP-PLC |
61 |
1 |
3 |
CSP-DLC |
59 |
4 |
3 |
TES |
48 |
2 |
6 |
|
|
|
|
Note:
Frequencies within columns with different subscripts are significantly
different from each other at p < .05.
The number of correct
decisions, inconclusive decisions, and errors made by the examiners are
presented in Table 1. The accuracies of the decisions obtained using the three
formats to identify PG examinees (inconclusive decisions excluded) were 83.3%,
55.6%, and 58.6% for the TES, CSP-PLC, and CSP-DLC formats respectively. The
accuracies of the
{92}
decisions identifying PG examinees were
significantly different, among the three formats. When inconclusive decisions
were excluded from the analyses, the decisions made by the examiners who
administered the TES format were significantly more accurate identifying the PG
examinees than were the decisions made both by examiners who administered the
CSP-PLC format (z = 2.28, p = .022) and by examiners who
administered the CSP-DLC format (z = 2.09, p = .036). Similarly,
when inconclusive decisions were included in the analyses, the decisions made
by examiners who administered the TES format were significantly more accurate
identifying the PG examinees than were the decisions made both by examiners who
administered the CSP-PLC format (z = 2.60, p = .009) and by
examiners who administered the CSP-DLC format (z = 2.40, p. =
.016). The accuracies of the decisions identifying PG examinees for the two CSP
formats, including and excluding inconclusive decisions, were not significantly
different. The accuracies of the examiners' decisions identifying the PI
examinees were not significantly different, among the three formats. The
accuracies of the decisions of the three formats identifying PI examinees
(inconclusive decisions excluded) were 88.9%, 95.3%, and 95.2% for the TES,
CSP-PLC, and CSP-DLC formats, respectively.
Eight FPWJ examinees were
excluded from the previous analyses because two independent judges agreed that
the examinees' explanations regarding their cognitive activities during the test
were sufficient to justify physiological responding to the relevant questions. When
those eight examinees are included in the analyses of the accuracies of
decisions identifying PI examinees, the percent accuracies are 81.4%, 95.3%,
and 90.8% for the TES, CSP-PLC, and CSP-DLC formats, respectively. The
accuracies of the examiners' decisions in identifying PI examinees were
significantly different among the three formats, both when inconclusive
decisions were excluded from the analyses, and when inconclusive decisions were
included in the analyses. Decisions made by examiners who administered the
CSP-PLC format were significantly more accurate identifying the PI examinees
than were the decisions made by examiners who administered the TES format
(excluding inconclusive, z = 2.43, p = .015; including
inconclusives, z = 2.48, p = .013). The accuracies of the
decisions identifying PI examinees were not significantly different for
examiners who administered the CSP-DLC format compared to the decisions of
examiners who administered either the TES or the CSP-PLC formats.
Blind Raters' Decisions
The number of correct
decisions, inconclusive decisions and errors made by the blind raters are
presented in Table 2. Sample sizes are smaller than in Table 1 because the blind
raters scored INC some examinations which the original examiner scored
conclusive (SR or NSR). Additional testing would have been required for the
blind raters to reach a decision. Therefore, these examinations were not
included in the blind raters' decisions. There was no statistically significant
different [sic] among the number of examinations omitted from each format
group.
The decisions of the blind
raters were significantly more accurate in correctly identifying PG examinees
when the data were collected with the TES format (81.0%), than were their
decisions when the data were collected with either the CSP-PLC format (57.2%)
or the CSP-DLC
{93}
format (42.9%). The differences among the
accuracies of the blind raters' decisions identifying the PG examinees were
significant both when inconclusive decisions were excluded from the analyses
(TES vs. PLC, z = 2.01, p = .04; TES vs. DLC, z
[=] 2.54, p = .011) and when inconclusive decisions were included in the
analyses (TES vs. PLC, z = 1.97, p = .05; TES vs.
DLC, z = 2.84, p = .004). The accuracies of the blind raters'
decisions in identifying the PI examinees were not significantly different,
among the three formats. The accuracies, based on the blind raters' decisions,
of the three formats in identifying PI examinees (inconclusive decisions
excluded) were 88.5%, 93.2%, and 94.4% for the TES, CSP-PLC, and CSP-DLC
formats, respectively.
Table 2
Number of Correct Decisions,
Inconclusive (INC) Decisions, and Errors Made by the Blind Raters in Identifying
Programmed Guilty and Programmed PI Examinees
|
Decisions |
||
Format |
Correct |
INC |
Errors |
Programmed Guilty Examinees |
|||
CSP-PLC |
12a |
1 |
11 |
CSP-DLC |
9a |
4 |
12 |
TES |
17b |
1 |
4 |
Programmed PI examinees |
|||
CSP-PLC |
55 |
3 |
4 |
CSP-DLC |
51 |
5 |
3 |
TES |
46 |
0 |
6 |
|
|
|
|
Note:
Frequencies within columns with different subscripts are significantly
different from each other at p < .05.
Interrater Reliability
Pearson correlation
coefficients were calculated between the numeric scores of the original
examiners and the numeric scores of the blind raters, for each format, to
determine interrater reliability. Within each format, a separate correlation
coefficient was calculated using the data
{94}
from each of the four relevant
questions. The correction coefficients are listed, by format and question, in
Table 3. In addition, the reliability of the categorical decisions (SR, NSR,
INC), based on the numerical scores of the original examiners and the blind
raters, was high for each format. The percent agreements were 89% (Kappa = .76,
t = 6.9), 89.5% (Kappa = .70, t = 7.7), and 89% (Kappa [=] .73, t
= .67) for the TES, CSP-PLC, and CSP-DLC formats, respectively. All of the
reliability measures were significant (p < .0001).
Inconclusive Decisions
The
percentage of PI examinees who were retested due to INC decisions when the
examiners administered the TES (either sub-test), CSP-PLC, and CSP-DLC formats
were 21.4%, 23.1%, and 19.7% respectively. The percentage of PG examinees who
were retested, due to INC decisions, when the examiners administered the TES
(either sub-test), CSP-PLC, and CSP-DLC formats were 13.3%, 10.3%, and 29.0%
respectively. The percentages of INC decisions were not significantly different
among the three formats.
Table 3
Pearson Product Moment Correlation Coefficients Calculated Between the Original
Examiners' Numerical Scores and the Blind Raters' Numerical Scores to Each
Question
|
Question |
|||
Format |
Espionage |
Sabotage |
Disclosure |
Contact |
TES |
.82* |
.84* |
.77* |
.78* |
CSP-PLC |
.82* |
.88* |
.80* |
.87* |
CSP-DLC |
.78* |
.89* |
.86* |
.87* |
*p < .0001
Confounding Variables
There were no significant
differences in the distributions of PG examinees or PI examinees among the
examination formats, as a function of either ethnic origin or gender. In
addition, inferential statistical analyses calculated to determine if the
number of PG examinees
{95}
participating in each scenario differed
significantly among testing formats indicated the differences were not
significant.
Physiological Response
Scores to Specific Questions
To ensure that no question
elicited stronger physiological responses from the examinees, than any other
question, the PI examinees' numerical scores for each question were analyzed
with a Quade non-parametric repeated measures analysis. The relative strengths
of the PI examinees' physiological responses to the four questions were not
significantly different from one another. To determine if the PG examinees'
physiological responses were greatest to the question specific to the scenario
they previously enacted, the PG examinees' numerical scores for each question
were analyzed with a Quade non-parametric repeated measures analysis. The data
from PG examinees who had enacted different scenarios
were analyzed separately. Therefore, four separate analyses were performed, one
for each scenario. The data from PG examinees who were administered the TES
format were not included in these analyses, because many of those examinees
were not administered the second sub-test. The relative strength of the
physiological responses to the question specific to the scenario previously
enacted was significantly stronger than the relative strengths of the
physiological responses to the other three questions, only when the sabotage
scenario had been enacted [Quade (3, 15) = 5.39, p < .01].
Table 4
The Number of Programmed Guilty Examinees,
Administered a CSP Examination, with the Most Negative Score for Each Question
|
Question |
|||
Scenario |
Espionage |
Sabotage |
Disclosure |
Contact |
Espionage* |
1 |
0 |
1 |
6 |
Sabotage** |
0 |
6 |
0 |
0 |
Contact* |
0 |
1 |
7 |
2 |
Disclosure |
0 |
1 |
2 |
5 |
Note: Analyses tested the significance of the distribution within each
scenario.
* p < .01. ** p
< .001.
{96}
The data in Table 4 are
frequency distributions in which the columns are the question to which the PG
examinee received the most negative score (strongest physiological response),
and the rows are the scenario in which the PG examinee participated. The data
include only CSP (PLC and DLC) examinations and only examinations in which the
strongest negative score was -3 or less (i.e., true positive results). To
determine whether PG examinees' physiological responses were stronger to the
question specific to the scenario they had enacted, rather than to any other
question, the data in Table 4 were analyzed using the Chi-square. Four separate
Chi-square statistics were calculated, one for each scenario. The distributions
were significantly different from chance for the espionage [X2(3)
= 11, p < .015], sabotage [X2(3) = 19, p < .001], and contact
[X2(3) = 11.6, p < .01] statistics.
When the PG examinees had enacted either the sabotage or contact scenario their
strongest physiological responses were usually to the question related to the
scenario they had enacted. The same trend was true for the disclosure scenario
but the effect was not significant. However, [sic- superfluous comma] the PG
examinees had enacted the espionage scenario their physiological responses were
usually stronger to the "disclosure" question than they were to the
"espionage" question. Overall, 59% (75%, if the data from the
espionage scenario are not included) of the examinees responded most strongly
to the question specific to the scenario previously enacted. However, 8 of the
32 examinees received a score of -3 or less to at least two questions and for 3
of those examinees neither response was to the question specific to the
scenario (espionage) previously enacted.
Development of new TES
Scoring and Decision Criteria
The data from the current
study were utilized to determine if different scoring and decision criteria
would yield more accurate results and/or fewer inconclusive decisions. The sets
of decision criteria are listed in Table 5. The data were reevaluated, using
each set of decision criteria, once when the data were scored using the
physiological responses following the first repetition of the first control
question (1C1) for scoring purposes and again when the physiological
response[s] to 1C1 were not used for scoring purposes. In general, decision
criteria which were less stringent for assigning an NSR decision resulted in
slightly higher accuracies in identifying PI examinees and slightly lower
accuracies in identifying PG examinees. The opposite was true for decision
criteria which were less stringent for assigning an SR decision. Similarly,
using the physiological responses to 1C1 for scoring purposes,
resulted in slightly higher accuracies in identifying PI examinees and slightly
lower accuracies in identifying PG examinees. The opposite was true when the physiological
response[s] to 1C1 were not used for scoring purposes. The accuracies of the
decisions using the different decision criteria were not significantly
different from the original decision accuracies.
Because each set of
decision criteria increased the detection rate of one category of examinee (PI
or PG) and decreased the detection rate of the other category of examinee, it
was decided to keep the original decision criteria but to try to retain the
benefits of scoring the data with or without the physiological responses to
1C1. Because including the physiological data from 1C1 increased the detection
rate of PI examinees and excluding the physiological data from
{97}
1C1 increased the detection rate of PG examinees, the combined detection rate might
be increased if both approaches were utilized.
The detection rate of the
PI examinees was increased first. The initial scoring of the test used the
physiological responses to 1C1. If the decision was
conclusive (SR or NSR), then the decision was final. However, if a
conclusive decision could not be made then the physiological responses to the
first two relevant questions were reevaluated using only the physiological
responses to the second control question (1C2) as a comparison. The rescoring
results in the same or less positive scores, because the physiological
responses to 1C1 typically are stronger than the physiological responses to
1C2. The new scoring method identified the PI examinees first, then, if a
conclusive decision could not be made, the rescore identified more of the PG
examinees.
The new scoring method did
not result in significant differences in the accuracies of detection. However,
it reduced the number of initial INC decisions. With the original scoring and
decision criteria, 13 PI and 4 PG examinees received INC decisions. With the
new scoring method, only 1 PG and 6 PI examinees received INC decisions. However,
statistically, the decreases in the number of inconclusive decisions were not
significant.
Table 5
Sets of Decision Criteria Used to Evaluate the Data
|
Decision Criteria |
|
SET |
NSR Decision |
SR Decision |
Original |
R1 + R2
>= +4 and R1 and R2 > 0 |
R1 or R2 <= -3 |
1 |
R1 + R2
>= +4 and R1 and R2 >= 0 |
R1 or R2 <= -3 |
2 |
R1 + R2
>= +3 and R1 and R2 > 0 |
R1 or R2 <= -3 |
3 |
R1 + R2
>= +3 and R1 and R2 >= 0 |
R1 or R2 <= -3 |
4 |
R1 + R2
>= +4 and R1 and R2 > 0 |
R1 or R2 <= -2 |
5 |
R1 + R2
>= +4 and R1 and R2 >= 0 |
R1 or R2 <= -2 |
6 |
R1 + R2
>= +3 and R1 and R2 > 0 |
R1 or R2 <= -2 |
7 |
R1 + R2
>= +3 and R1 and R2 >= 0 |
R1 or R2 <= -2 |
Note: Any test score which did not meet either the SR or the SR [sic] decision
criteria resulted in an "inconclusive" decision.
{98}
Discussion
The decisions of the examiners who administered the TES format were
significantly more accurate (83.3%) at identifying the PG examinees than were
the decisions of the examiners who administered either the CSP-PLC (55.6%) or
the CSP-DLC (58.6%) format. There were no significant differences among the
accuracies of the examiners' decisions at identifying the PI examinees. The
accuracies of the decisions obtained using the three formats to identify the PI
examinees were 88.9%, 95.3%, and 95.2% for the TES, CSP-PLC, and CSP-DLC
formats, respectively. The results were supported by the accuracies obtained
from blind scoring of the examinations. The accuracies of the blind raters'
decisions with the TES format were similar to the accuracies of the original
examiners' decisions. When the data were collected with the TES format, the
decisions of the blind examiners were significantly more accurate (81.0%) in
correctly identifying PG examinees, than the decisions obtained when the data
were collected with either the CSP-PLC format (57.2%) or the CSP-DLC format
(42.9%). The accuracies of the blind raters' decisions identifying the PI
examinees were not significantly different, among the three formats.
One possible explanation,
consistent with the significance/attention model, for the significant
differences among the decisions made using the formats to identify PG examinees
is the amount of information to which the examinee was required to attend
during the examination. Four relevant questions, each of which addresses a separate
issue, are asked during the administration of a CSP test (PLC or DLC). Therefore,
these examinees are given information and questioned regarding four separate
issues. Perhaps, having so much information to process and focus on diffuses
the examinee's attention, reducing the physiological responses, thereby
reducing the accuracy of PG identification. Only two relevant questions are
asked during each TES test, which reduces the amount of information presented
to the examinee during a test. A proponent of the significance/attention model
would predict higher detection rates when fewer issues are involved. This also
could explain why detection accuracies typically are higher for specific issue
criminal examinations (single issue examinations) than for security screening
examinations (multiple issue examinations). However, there is little research
assessing the affect [sic] of the number of issues addressed during an
examination on the detection accuracy of the test.
Barland et al. (1989)
assessed the differences in detection rates between single and multiple issue
examinations. The authors reported that accuracies of the decisions obtained
using single and multiple issue tests were not significantly different. However,
the study did not test the issue adequately. The principle investigator (G.
Barland, personal communication, September, 1993) stated that the examiners
conducting the single issue examination were instructed to conduct the three
examinations as separate examinations (i.e., pretest only the two
relevant questions for the first exam, conduct the exam, and so on). However, a
random sample of the single issue examinations administered during that study
indicated that the time between the examinations was only 1 minute and 8
seconds longer than the time between the tests (charts) within an examination. One
minute and 8 seconds is not sufficient time to pretest two relevant and 3
control questions. Therefore, it is possible that some of the examiners,
contrary to
{99}
instructions, were pretesting all of the
relevant questions prior to conducting any of the examinations. If all six
relevant questions were discussed with the examinee prior to any testing, the
examinee could have been thinking about all six relevant questions, even though
only two relevant questions were asked on any one test. In addition, the number
of PG examinees for whom INC decisions were rendered was significantly greater
when the multiple issue examination (28.3%) was administered than when the
single issue examinations (10.5%) were administered (test of proportionality, z
= 1.96, p < .05). Raskin, et al. (1988) reviewed multiple
issue field examinations conducted by a federal agency and found there was a
negative relationship between the number of issues and test accuracy. They
concluded that the agency should minimize the number of issues on a test to
maximize decision accuracy. Studies should be conducted to compare the accuracy
of decisions identifying PI and PG examinees when different numbers of relevant
issues are addressed.
An additional complicating
factor with multiple issues tests is that the examinee does not always respond,
physiologically, to the question to which she or he is being deceptive. Whether
a deceptive examinee's greatest physiological responses occur following the
question to which the examinee is being deceptive has implications both for the
number and type of relevant questions asked on a PDD test, and the criteria
used to render a decision based on those responses. Barland (1981) reported
that the accuracy of PG examinee identification decreased when responding to
specific questions was assessed. He concluded that the correctly identified PG
examinees were responding to questions other than the one to which they were
lying. Also, Correa and Adams (1981) using an R/I format, reported better
detection rates when the test was evaluated as a whole compared to the
detection rates based on individual questions. Barland, et al. (1989),
also concluded that the examinees were not always responding to the specific
question to which they were deceptive. Raskin, et al. (1988) reported
similar results with field examinations conducted by a federal law enforcement
agency. They concluded that the tests did not detect deception at the level of
the individual crime, which suggests that numerical scores associated with
individual relevant issues may be a poor guide in choosing the issue for
interrogation.
The data from the current
study support the previous findings. Although there was a relationship between
the scenario enacted and the specific question to which the examinee responded
physiologically, the relationship was modest. In fact, 41% of the PG examinees
did not have strong physiological responses to the question related to the
scenario in which they participated. Therefore, decision criteria should not be
based only on the physiological responses to individual questions but also on
the relevant questions as a group. It should be noted that strong physiological
responses to one relevant question do not indicate that it is the most significant question nor the only significant
question for the examinee.
The results of this study
indicate that a proportion of individuals have strong physiological responses
to one or more relevant questions because the question is significant to the
individual for reasons other than deceptive responses
to the questions. Forty-five percent of the PI examinees who received FP
decisions (9% of all PI examinees) following a TES examination were deemed to
have concerns sufficient to expect strong physiological responding to one or
{100}
more relevant questions. In addition, field experience indicates that examinees
often have concerns about a question, or the question brings something not
directly related to the question to mind, which they, initially, do not discuss
with the examiner. During the examination, the examinee may focus more
attention on that question thereby producing physiological responses to the
question (consistent with the significance/attention model). Because the TES
format is more sensitive to identifying the PG examinees, it also will be more
sensitive to identifying individuals with "outside" issues. This was
apparent from the larger number (although not significant) of FPWJ examinees
identified when the TES was administered compared to the number when either CSP
was administered. Therefore, it is important to determine why an examinee
responds physiologically to a relevant question. Future studies need to assess:
(a) what proportion of PI examinees have concerns related to the relevant
questions, (b) what proportion of those examinees actually respond to the
relevant questions, and (c) what effect pretest disclosure of information has
on the likelihood the examinee will respond to the questions (e.g., is
the examinee less likely to respond to the relevant questions, if the personal
concerns are discussed prior to the test).
It also is possible that
the decision accuracies obtained with the TES format might have been attenuated
by examiner unfamiliarity with the format. The examiners who administered the
TES format were not familiar with the format, whereas the examiners who
administered either of the CSP formats were familiar with the CSP format. There
are many differences (including the pretest and the actual conduct of the
examination) between the TES and the standard CSP format. Tape recordings of
the early TES examinations, are interpreted as
indicating that the examiners were not comfortable with what to say and often
did not pretest the relevant questions sufficiently. If an examinee was
thinking about something not specifically related to the relevant question and
the examiner did not adequately deal with the issue, the examinee might have
responded to the questions during the test. Once examiners become more familiar
with the format, accuracy rates might increase.
The numbers of male and
female examinees in the three conditions were not significantly different nor
were the numbers of African-American and Caucasian examinees in the three
conditions. Therefore, it is unlikely that the significant differences among
the accuracy rates obtained using the three formats to identify PG examinees
are not [sic] attributable to gender or racial differences. In addition, the
number of PG examinees who participated in each scenario did not differ
significantly among the testing formats. Therefore, the significant differences
among the accuracy rates obtained using the three formats to identify PG
examinees are not attributable to differences among the scenarios.
The significant differences
among the accuracy rates obtained using the three formats to identify PG
examinees do not appear to have been due to the different types of control
questions. If the DLC questions had contributed significantly to the higher
detection rate of the PG examinees who were administered the TES format, then
the detection rate of the PG examinees tested with the CSP-DLC format should
have been higher than the detection rate of the PG examinees tested with the
CSP-PLC format. It was not. Similarly, the DLCs do not appear to
{101}
have affected the detection rates for PI
examinees. Although the differences were not significant, PI examinees who were administered the TES format were identified less
frequently than PI examinees who were administered either CSP format. In
addition, when the FPWJ examinees were included in the analyses, significantly
more PI examinees were correctly identified using the CSP-PLC format versus the
TES format. However, in both sets of analyses, the number of PI examinees
correctly identified using the CSP-DLC was not significantly different than the
number identified when either the CSP-PLC or TES was administered. Therefore,
any differences among the accuracy rates in detecting PI or PG examinees is not
attributable to differences between the PLCs and the DLCs.
The syntax of the relevant
questions is an issue that affects the generalizability of the results. A
previous study (Barland, et al., 1989) was criticized for using relevant
questions that included the phrase "against the United States,"
because the examinee did not commit a crime "against the United
States." Proponents of a significance/attention model would argue that a
test would be more accurate if the examinee's attention is focused on the
actual issues being tested. In fact, field examiners have been applying this
principle for years in the development of control questions and sometimes
relevant questions (DoDPI, 1994d). Field examiners caveat their questions with
"time bars" or "situation bars" to narrow the examinee's
attention to a specific time or situation (e.g., prior to 1993 have you
ever ...). In the current study, it was decided to caveat the relevant
questions with the phrase "during this project" and to omit the
phrase "against the United States" to ensure that the subjects'
attention was focused on the test issues.
There is no reason to
expect that the caveat added to the relevant questions would differentially
affect the accuracies of decisions obtained using the three different formats. Therefore,
it is unlikely that the differences in decision accuracies among the three
formats are attributable to the caveat. However, because the relevant questions
were designed to focus the examinee's attention on the project, it is possible
that the accuracies obtained during the study may not be an accurate reflection
of the accuracies that would occur in the field. Studies should be conducted to
assess the impact of "time" or "situation" bars on PDD test
accuracy. This is an important question because the practice is so popular in
the field.
In conclusion, the new TES
format may be a viable alternative to the CSP format currently utilized for
security examinations. The TES format differs from the CSP formats in that: (a)
the number of issues being tested in a question series is reduced; (b) a
maximum of three question repetitions are used to calculate question scores;
(c) between-test stimulation is eliminated; (d) the order of questions within
the question sequence cannot be altered; (e) each relevant question is compared
to the same control questions; (f) the pretest is brief, more standardized, and
follows a logical sequence of information presentation; and (g) problems
associated with PLC questions are reduced by using DLC questions. Some of these
differences might account for the fact that in a laboratory mock situation, the
decisions of examiners who administered the TES format were significantly more
accurate at identifying PG examinees than were the decisions of examiners who
administered either CSP format. If future testing with the TES format continues
to demonstrate high accuracy rates for discriminating between PI and PG
{102}
examinees, the federal government should
consider changing their security screening programs to utilize the TES as their
primary PDD examination.
References
Abrams, S. (1993). The directed lie control question. Unpublished
manuscript.
Ansley, N. (1990). The validity and reliability of polygraph decisions in real cases.
Polygraph, 19, 169-181.
Barland, G.H. (1981) A validation and reliability study of counter
intelligence screening tests. Unpublished manuscript.
Security Support Battalion 902nd Military Intelligence Group.
Barland,
G.H., Honts, C.R., & Barger, S.D. (1989). Studies of the accuracy of screening
polygraph examinations. Unpublished report. Department of Defense Polygraph Institute,
Carver,
C.S., Blaney, P.H., & Scheier, M.F. (1979). Focus of attention, chronic expectancy, and
responses to a feared stimulus. Journal of Personality and Social
Psychology, 37, 1186-1195.
Coles,
M.G.H., & Duncan-Johnson, C.C. (1975). Cardiac activity and information processing:
The effects of stimulus significance and detection and response requirements. Journal
of Experimental Psychology, 1, 418-428.
Coles,
M.G.H., & Strayer, D.L. (1985). The psychophysiology of the
cardiac cycle time effect. In J.R. Orlebeke, G. Mulder, & L.J.P. van
Doornen (Eds.). Psychophysiology of cardiovascular control: Models, methods,
and data (pp. 517-534).
Correa,
E.I., & Adams, H.E. (1981). The validity of the pre-employment polygraph
examination and the effects of motivation. Polygraph, 20,
143-155.
{103}
Department
of Defense Polygraph Institute (1992). Counter Intelligence Scope Polygraph.
Department
of Defense Polygraph Institute (1994a). Stimulation test.
Department
of Defense Polygraph Institute (1994b). Test data analysis.
Department
of Defense Polygraph Institute (1994c). Modified General Question
Technique (MGQT).
Department
of Defense Polygraph Institute (1994d). Test question construction.
Easterbrook, J.A. (1959). The effect of emotion on cue
utilization and the organization of behavior. Psychological Review,
66, 183-201.
Furedy, J.J. (1986). Lie
detection as psychophysiological differentiation: Some fine lines. In M.G.H.
Coles, E[.] Donchin, &
S.W. Porges (Eds.) Psychophysiology: Systems, processes, and applications.
Germana,
J., & Chernault, G. (1968). Patterns of galvanic skin responses to signal and
non-signal stimuli. Psychophysiology, 5, 284-292.
Glass,
A.L., Holyoak, K.J., & Santa, J.L. (1979). Cognition.
Menlo Park, CA Addison-Wesley Publishing Co.
Gustafson,
Honts, C.R. (1989). The relative validity of two CSP question series. Unpublished manuscript, Department of Defense Polygraph Institute,
{104}
Honts,
C.R., &
Horowitz, S.W. (1989). The
role of control questions in physiological detection of deception. Unpublished doctoral dissertation.
Iacono,
W.G., & Patrick, C.J. (1988). Polygraph techniques. In R.
Rogers (Ed.) Clinical assessment of malingering and deception (pp.
205-233).
Kahneman, D. (1973). Attention
and effort.
Kilpatrick, D. (1972). Differential responsiveness of two electrodermal indices to
psychological stress and performance of a complex cognitive task. Psychophysiology,
9, 218-226.
Kimmel,
H., Olst, E.H., van, & Orlebeke, J.F. (1979). The orienting
reflex in humans.
Kircher,
J.C., Horowitz, S.W., &
Kircher,
J.C., &
Kugelmass, S., Lieblich,
Maltman,
McCauley,
C., & Forman, R.F. (1988). A review of the office of technology assessment
report on polygraph validity. Basic and Applied Social Psychology,
9, 73-84.
McLean, P.D. (1969). Induced arousal and time of recall as determinants of paired
associate recall. British Journal of Psychology, 60,
57-62.
{105}
Nikula, R. (1991).
Psychological correlates of nonspecific skin conductance responses. Psychophysiology,
28, 86-90.
Obrist, P.A. (1981). Cardiovascular
psychophysiology: A perspective.
O'Gorman, J.G. (1977).
Individual differences in habituation of human physiological responses: A
review of theory, method and findings in the study of personality correlates of
non-clinical populations. Biological Psychology, 5, 257-318.
Ohman, A. (1979). The
orienting response, attention and learning: An information processing
perspective. In H.D. Kimmel, E.H. Van Olst, & J.F. Orlebeke (Eds.) The orienting reflex in humans (pp. 443-471).
Olst,
E.H., van, Hemstra, M.L., & ten Kortenaar, T. (1979). Stimulus
significance and the orienting reaction. In H.D. Kimmel, E.H. van Olst, & J.F. Orlebeke (Eds.) The orienting reflex in
humans (pp. 521-547).
Orne,
M.T.,
Podlesny,
J.A., &
{106}
Reed, S.D. (1990). Counter
narcotics polygraph: NCP 2. Unpublished manuscript,
Department of Defense Polygraph Institute,
Reed, S.D. (1995). Psychophysiological detection of deception--Single test
interview. Paper presented at the meeting of the
Sampson, J.R. (1969). Further study of encoding and arousal factors in free recall of
verbal and visual material. Psychonomic Science, 16, 221-222.
Sokolov, E.N. (1963). Perception and the conditioned reflex.
Waid,
W.M., Orne, E.C., and Orne, M.T. (1981). Selective memory for social
information, alertness, and physiological arousal in the detection of
deception. Journal of Applied Psychology, 66, 224-232.
Waid,
W.M., Orne, M.T., &
Acknowledgements
Sheila D. Reed, Ph.D.,
served as principle investigator throughout planning, data collection, and
drafting of this manuscript. Final editing was completed by members of the
Department of Defense Polygraph Institute Research Division. We would like to
thank the Office of the Secretary of the Air Force (OSAF) and the United States
Army Intelligence and Security Command (USAINSCOM) polygraph programs for their
support and their input into the design of the study. Specifically, we would
like to thank Bruce Thompson and Jim Morrison for their valuable contributions,
including the time they spent monitoring the examinations and helping with the
scenarios. A special thanks is due to all of the people who made the study
possible: The OSAF and INSCOM examiners (Edith Andreasen, Douglas Blake,
Tawainia Barrera, Ray Brafford, Dave Cameron, David Case, Greg Eggleston,
Richard Giraud, Ronald Herring, Otto Jackson, Bryan Ladeaux, Russ Nichols,
Michael Rhodes, Donald Schupp, Ed Stoval, James Vaughan, Michael Walker, and
Harrison Wright), the scenario setters (Earl Taylor, Sam Braddock, and Gordon Barland),
the research assistants (Jeff St. Cyr, Linda Knickerbocker, and Joan Harrison),
and the DoDPI support staff (Frank Ragan and Randy Reynolds). Appreciation is
also extended to the Anniston Employment and Temporary Services, Inc. and Judy
Manners for the high quality examinees they provided. Gratitude is also
expressed to Andrew Dollins, John Schwartz, and Don Weinstein for diligently
reading and editing earlier drafts of this manuscript.
Funds for this research were provided by the DoDPI under project
DoDPI93-P-0044. The views expressed in this article do not reflect the official
policy or position of the Department of Defense or the U.S. Government.