Class, Oct. 22, 2001
Chapter 1 of book |
àLater:
specific effects of conditioning – applied Empiric test –open to retesting – to be proven wrong àobjective |
Scientific method: tries to avoid in distortions made by perception/judgment
-->reduces
but doesn’t avoid mistakes of perception/judgment
-need to be able to answer the following:
-is there a specific/unique methodology to a social science? (i.e. observation/questionnaire)
-->not
really! You can use many diff. kinds of methods
views:
-->but is it universal?
-->i.e. more people on the street-->more likely to help us in an emergency
-->but
it is not true -->the opposite is true
-can’t be based on a single case: could be the exception
Tools: need to
be reliable and applicable
terms
Reliability: it always gives the same results -->the watch could be fast, but it is always fast by the same speed.
-->the
ruler isn’t reliable since the temperature always changes its length
a bit.
Hypothesis: a temporary assumption of explanation of a thing/relation b/w thing.
-in daily life, we tent to be ‘large’: general --> we like to generalize
-->let
in science, we’re skeptical
Scientific | Daily | ||
General approach |
Empiric | Intuitional | |
Observation | Critical/methodological | Random/non-critical | |
Report | Objective | Subjective/colored | |
Terms | Well defined terms/univocal terms | Vague/ multi-meaning | |
Tools/measurements | Non-reliable/inapplicable | ||
Hypothesis | Not necessarily testable | ||
Attitude | Skeptical | accepting |
Charles Spears: (philosopher)
4 sources of knowledge:
-scientific truth: probability… doesn’t have to happen: it is just probable it would happen.
-->Tylenol
doesn’t always help
4 goal of science
Term |
Details |
Description |
Needs to be a full description! Not a vague, short description. no half-truths (it is not better [but not worse]) |
Predict |
I can predict w/o understanding the intrinsic details |
understanding | To give reasons
-so that we can perhaps have alternative, parallel cures |
Control |
We control how the participant behaves in the person |
Tools of the scientific method
-->i.e. if I think that all balds are active, I’ll write ‘he’s tired’ instead of he’s inactive’ -->it infl. how I observe!!! (bad!)
Measures:
-in psychology, the measures aren’t as clear as other sciences
->measures don’t exactly measure what we mean
-->sometimes have
to make own measures and test out if their measures are correct
-psychologists try to measure
how accurate their terms are
-it doesn’t really matter
how people use the term as long as they define it properly – so that
at least everyone knows what they mean when they use the terms
Experimentation
-we try to experiment
-in order to run an experiment and prove something, we need to have a situation
--
-science: tries to look for
general rules
Experimentation
Labs – help isolate individuality causes/effects (causality)
àmore possible factors - less
certain what the real answer is
things that give experimentation their strength
-hard to define
-has to have a logic
àwon’t be infl. by things like out biases/thoughts/etc…
-Shortest/most inclusive rule
ài.e.
you want to explain all the diff. outcomes of diff. populations w/I
the experiment w/ 1 rule, and not many diff rules – 1 for each group
Open mindedness
-you take into consideration
unexpected things.
-gov’t involvement – usually
hinders the science
Replication
-that is why you need to write
the process
-in psych – hard to replicate some things: like studying dreams – can’t necessarily repeat dreams
Basic rule the person wont be hurt physically/emotionally in any way
àespecially when the damage is irreversible
Example:
APA’s standards: you’ve got to weigh the benefits vs. the ethical expense
àethical boards
-ethics also apply to experiments
on animals
Class, Dec. 24, 2001
theories/hypothesis
-the more cases/issues/variance
that are explained by the theory – more inclusive – it is a better
theory
Theory: a a system of principles/assumptions that explains/predicts b/h.
-explains relationships b/w things
-->an organizing scheme.-->organizational schemes
-->a point of origin for future
research
-info are just building blocks.
The theory arranges them into a structure
-internal vs. external locus of control:
-Rotter: you can divide people,e ito 1 of 2 groups:
-when 1 is promoted to a managerial job from the group – there is a feeling of inequity, yet if a manager is brought from the outside -->higher motivation to listen to him
If a person does something which another person does as well, I do it better/faster:
-->coaction: almost instinctual tendency to b/c aroused and do it faster
-->this instinctual energy is called
dynamogenesis
Psychologists don’t like to speak of instincts:
-->can’t necessarily be instinct
(cousin of the one who defines
milgram’s experiment as the eichman experiment, -b/w they blamed everyone
else)
-->changed name ‘coaction’ to social facilitation
-change of name = usually shows
a change in direction
F. Olport: gets people to do simple vs. hard tasks (i.e. math/puzzles/etc.
-->social facilitation helps when simple task/distracts when task is hard
-->vs. triplett, who thought that
coaction only delps
Zajonc:
-the mere presence of others – not only of they’re doing the same act
-->also
happens to animals as simple as ants!
Zajanc (psychologist) notes:
-highly practical response/instinctive =facilitated in presence of coaction/audience
-newly learnt responds/actions
=impaired by presence of coaction
-everything has a hierarchy of responses
-->simple tasks =the
higher responses of the hierarchy are easily accessible (i.e. pressing
the pedal of the car to a red light)
-->yet when there is
3 buttons, each having to be pressed to red/yellow/green light, until
it is well learned, then social facilitation will make you make more
responses, since pressing the button is not the highest on the hierarchy
of response of seeing those color lights
2 stepped-study:
-people had to read sounds w/o meanings, each w/ various rated of repetition (of it returning)
-->the
more it returned, the more dominant response
-then they were told that they
were shown flashes of those nonsensical words (yet in reality, they
were just show flashes of scribbles)
-->they answered that they were shown flashes of the repeating words more
-->but
the curve was steeper w/ social facilitation
-Zajanc: only people who could appreciate our actions (as opposed to people who can’t, like sleeping people)
-->the uncertainty of
if we’re ok, that dev. when someone is looking at us, leads to the
social facilitation
Study:
Scenarios:
Results: order of social facilitation, from highest to lowest:
-Conclusion: the mere
knowledge of being watched, w/o being watched almost has the same effect
Study
Duval/wickland
Objective self-awareness: it is an issue of self-awareness – not others’ evaluations
-->here, the ind. checks his ‘standard of correctness’ – how in-line he is w/ being correct. -->yet people find gap b/w what they want to be and what they are.
-->so they try to bridge the gap
b/w reality and ideal, or escape the gap
[subjective self-awareness:
how I look at someone else.]
Study
-waiting for results for a test:
-->evaluation of oneself
-->development of theory of how we change our actions
-->after
all, triplet’s theory can explain cases like video or mirrors
Class, Jan 7th, 2001
-each paper has to define its terms, since each person has diff. usage of the terms
àlike a recipe – has to tell what
exactly the researcher meant
i.e. Anxiety is unclear unless you clearly state what is meant meaning of the word
àI might use anxiety diff, but it doesn’t matter, since I know what he means, since he told me
-i.e. ‘learning’ = too vague
àso I might define it as 3 times right in a row(?)
Interval vs. external validity
Internal validity- to what degree can we claim that the change is b/c of the indp. Variable and not b/c of other reasons
àin other words, the researcher had
control over the results
-as opposed to having rival/alternate
explanations
3 criteria for internal validity
àThe standard to which we compare the dep. variable to.
àmakes the baseline – w/o it – we don’t know how to relate the differences in the results to.
àthe
diff b/w this and extraneous variables is that here, the variables are
mixed in from the beginning. Otherwise, they are the same.
-I need at least 2 levels in the indp. Variable to make an experiment
àone is given to the control group/condition
àthe other is given to the experiment
group/condition
External validity
Can It be generalized to other
cases?
Jan 14, 2001
The ideal is to test ALL the populations.
-->since this is impossible, you have a representative sample, not a small population
-->i.e.
Bar-Ilan population is not representative of I. society
Table of random numbers: a chart made by a computer where all the # are random
-the external validity is hurt
if you didn’t choose random
Important:
-->subject
variable: w/o randomness of division of (randomly chosen) participants,
there is possibly going to be one trait in one side and another side
on another side
Example - Brady
Question
-Why do managers dev. more
ulcers than others
Answer
-studies on monkeys – 1 is a leader monkey -->has to press a lever in order so shock qll monkeys
-->manager
monkies had more ulcers than regular
-->learnt helplessness
-->tells us the opposite: that more control = better health
-but then people started realizing than that Brady didn’t have randomness: faster learners were managers/slower learners were ‘other’ monkeys
-->NO RANDOMNESS
-->therefore, we
have an alternative answers: the monkeys went through testing for sensitivity
to electricity/etc.
->random repetition of the above experiment w/ rats
=>
helpless rats = more ulcers
Size of sample
-the bigger the # - the more reliability the results are.
-the weaker the strength of
diff. (how diff. the 2 variables are), harder the tell diff -->need
more #
-never less than 10 participants
The experimental design
=the structure of the experiment
-there are many diff. kinds of structures
The differences lie in:
Example
Good vs. bad news
Between subjects design
-diff participants in each
of the categories
Within subject design
-the participants participate in some/all of the experimental situations
-->also
called repeated measures design
Mixed design
-one variable is between
subject design and another variable is within subject design
between subject design:
-simplest experimentation: 2-group-design
Matched groups
-only do hat’ama when the
literature said that there is a involving varialbe
-making pairs of things that
are similar àthen
putting them on opposite groups (control/experimentation), based on
say: a toss of coin
àcan do this b/f or after the setting
up of the experimentation
1) Multiple group design
-some groups
àyou need more than 2 groups, since you also want to take into account amount of the variable
ài.e.
if you want to measure effect of coffee on schooling, you also want
to take amount into account: diff. amounts of coffee
F-test – to see if there is a diff. b/w individual diff and diff. b/w it and the average
àanalysis of variance: is the diff. b/c ind. differences w/I the group, or the diff. is actually b/w groups?
àlook
up chart in stats class of last year
-if calculated F is:
-deal 2/ charts of h.m. people
are in the experiment
2)Factorial design
Source | SS | DF | MS | F |
A | ||||
B | ||||
AxB | ||||
Within | àthe df of all the cells | |||
Total |
example
Source | SS | DF | MS(variance) (ss/DF) | F (ms/wi) |
A | 32 | 1 | 32 | 6.56 |
B | 8 | 1 | 8 | 1.71 |
AxB | 72 | 1 | 72 | 15.45 |
Within | 14 | 3 | 4.67 | |
Total | 126 | 7 |
Report: F(1,3) = 15,45, p<0.05
--
factor = same
thing as independent variables in factor analysis
Example 2x4 experiment
B1 | B2 | B3 | B4 | |
A1 | 12 | 12 | 12 | 12 |
A2 | 12 | 12 | 12 | 12 |
àsince there are 2 levels of the independent variable: 1 df
à3 levels of B = 3 df
Source | SS | DF | MS(variance) (ss/DF) | F (ms/wi) | Minimum f |
A | 1 | 3.96 | |||
B | 3 | 2.72 | |||
AxB | 3 | 2.72 | |||
Within | 88 | ||||
Total | 95 |
Questions to do in an experiment
Reporting formula:
(AxB,WI) = F P<sig. level
example – a researcher reports that in a 2-fsactorial experiment, there is a sig. Diff.
f(2,54) = 11.8 P<05)
Source | SS | DF | MS(variance) (ss/DF) | F (ms/wi) | Minimum f |
A | 2 | ||||
B | 1 | ||||
AxB | 2 | ||||
Within | 54 | ||||
Total | à60 |
answers
3x2
6 cells
3 factors
à10
in each group
Source | SS | DF | MS(variance) (ss/DF) | F (ms/wi) | Minimum f |
A | 2, 1 | ||||
B | 2, 4 | ||||
AxB | 4 | ||||
Within | 90 | ||||
Total | 99 or 98 |
100 ot 99 participated
2X5
or
3 X 3
# of groups:
9 or 10
facrtors 2 4 or 5
àimportant:
there is no correct answer since there are alternative answers
--
from here on, unless noted:
2X2
example
B1 | B2 | ||
A1 | 8.0 | 8.5 | 8.25 |
A2 | 1.5 | 1.0 | 1.25 |
4.75 | 4.75 |
-we decide that any /x bigger
than 1.5 needs to be clarified
Main effect
Interaction
Subtract A1B1 from A1B2 and A2B1 from A2B2 and see if the diff. b/w the 2 #’s is more than 1.5 – if yes àinteraction. 8-8.5 =-0.5 and 1.5-1 = 0.5 = diff = 1
èno
interaction
-the question is is AXB sig? Is /a sig diff from /B
àno – in each case (A1-A2 / B1-B2),
the diff. b/w them is smaller than 1.5)
**-w/ graphs thynggy
To find on a line graph:
Important: if
1 is significant – than it is all significant
In other words:
A =the diff b/w A and B àsig. If their diff. are sig.
B = the diff b/w averages of all of (interaction)
àif the diff of averages stay constant = no interaction
à find if there is a diff. b/w A and
b line in diff parts of the graph
** all of the above
-to find the source of interaction: see if A rows is sig or B columns is sig
--
** Graphs/DF
March 11 2002
B1 | B2 | B3 | Average | ||
A1 | 2 | 2 | 4 | 2.66 | 2.66-6.66 >1.5
àsignificant |
A2 | 6 | 6 | 8 | 6.66 | |
Averageà | 4 | 4 | 6 |
àwe know how to compare 2, but in cases of more than 2, we pair 2 at a time of the many variables (B1B2, B1B3, B2B3) as long as 2 of the many variables are significant, then we say it is sig.
--
within subjects design
-until now, we spoke of b/w subject design – how diff people react in diff situation
ànow,
we’re speaking about the same person in 2 diff cases.
Benefits of w/I subject design:
Downsides of w/I subject design
**\\>
Small N design
-kind of w/I subject design.
ABBA design
Where you reverse the order of the manipulation to see the effects.
àcan
also be don on a group as a unit
Example: behavior
modification: teaching person new ways of b/h as alternative
to maladaptive b/h. in those cases, you don’t want to return to the
maladaptive situation àit is not ethical
questions
-table of random numbers -->what
is the story here?
March 18, 2002
-external control, i.e. dealing w/ things like social factors.
Physical factors:
Resolution:
àthis
doesn’t negate the infl. of the external factors, but it makes sure
that it doesn’t infl. one side and not the other (control group/experimental
group)
Within-subject experiment: has a unique problem: order effect
-Order effect: with 2 stimuli, perhaps the order in which you gave it makes a difference to the result. I.e. perhaps it effects participant’s moods/cognitive process/etc.
àyou need to counterbalance
-Progressive
error: same thing as order effect- just with more than
2 stimuli
-Counterbalance:
give all the stimuli more than once, and see what the infl is of giving
the same stimuli. 0) S1 1) S2 2) S2 3) S1àmight over-do the stimuli. [0- hunger
1- hunger…..) (0+3, 1+2 = balances out)
Between subject counter-balancing: compare
àformula:
(to choose orders to minimize # of orders needed to have both above
rules) ABLCL-1DL-2E àensures the 2 aforementioned rules.
Explanation of formula
A B L C L-1 D L-2 E
Take A, B, Last, C, Last –1, D, Last – 2, E etc….
-then fill out downwards
example of formula
-in case of A, B, C, D, E,
F
First:
A
B F (last) C E (Last-2) D
ABFCED
BCADFE
CDBEAF
DECFBA
EFDACB
FAEBDC
-in Latin square, you really need double as many as letters, when the number of stimuli is uneven
example (of 5)
ABECD
BCADE
CDBEA
DECAB
EADBC
-now reverse the orders:
DCEBA
EDACB
AEBDC
BACAD
CBDAE
àthose
are the 10 orders really necessary
control over miscellaneous variables:
i.e. personality/causal
-infl. internal and sometimes
external validity
Personality variables
-there is a response style: answer consistent answers (i.e. ‘yes’) regardless of the content of the question
-->to narrow down extreme
scores b/c of this, you can ask the question in reversed manner, therefore,
narrowing the infl. of those who have an extreme response style
MMRI= tests personality things
like OCD/depressions/mania/etc.
Response set:
tendency to answer in a way which one ought to answer and not truthfully,
in order to give a certain impression
Crown/Harlow: social desirability scale:
Social desirability scale:
if the person lies on the tests
Example of questions:
-do I get sometimes get angry
-I know my voted party’s statement of mission
-->high score =high set-response
solution
-you can phrase question in a way which the people don’t know how to answer.
-->i.e. ‘pets are fun
to raise but some people don’t enjoy it’
situational/social factors
Demand characteristics: just like in the response set: people wonder what the experiment is really about and therefore try to realize the experimenter’s hypothesis (they don’t want to screw it since they want to look normal and not crazy)
-->no
external validity, since they don’t act in a spontaneous way
-->studied by Orne
way to deal w/ demand characteristics:
Single blind experiment
-experiments, especially in medicine, where all the info is given, except he doesn’t know what situation he is in.
-->you’re told that
you’re in a situation i.e. headache medicine, where one group is getting
the medicine and one group is getting a placebo. The participant doesn’t
know which he is getting
cover story:
-we distract the person from
real issues of the experiment
accident
-make them think that what
is happening is an accident, but it is really the experiment’s issue
example
-darlie/natane:
-see how long it takes them
to leave (experiment helpers were supposed to stay)/report
Accomplice/confederate
-accomplice doing the important
thing, while the experimenter just doing the measuring. The participant
just doesn’t know that the experiment is really done mostly by the
person who is really an accomplice.
Example:
-while waiting for experiment
to start, the accomplice nods at a certain word. Then, when the participant
enters the experiment, he is interviewed, where the measurement is really
how many times he said that word after being conditioned earlier (by
the accomplice)
Detachment between the experiment and the measurement
-i.e. to check cognitive dissonance,
see if the attitude change is there the day after when participant doesn’t
know that he’s really asked about the thing he took place earlier
(i.e. day b/f)
Experimental bias:
the resulting bias made by the experimenter: even the instructions
have an effect on the participant, even though what he said never happens
chapter
4 of book
4 categories of variables
Operational definition:
how the researcher defines the variable, in order to use it/manipulate
it. Positive linear relationship: positive correlation Negative linear relationship: negative correlation Curvilinear relationship: the correlation changes: like a u or an upside down shaped u. -->also
called non-monotonic function Non-experimental methods: observations/measures of variables -->can’t assume causality -->there might be a cause not observable! àthird variable problem: there might be a link b/w the variables, but there might a third variable causing both variables àalternative explanations of the link b/w the dependant/independent variables àfewer
third variables: causal relationship b/w variables is stronger
àdisadvantage:
less control of all factors
àthis
is only in a non-experimental method
àin
a non-experimental case Experimental method: manipulation of variables is a controlled setting à: 1 variable is manipulated and then the other is measured
Independent variable: causing the dependant variable’s change(?) àalways on horizontal axis dependant variable:
always on vertical axis. -in some cases, where studies future predictions of b/h, no cause-and-effect is necessary to be studied correlation coefficient:
how strongly 2 variables are linked |
Chapter
5
Validity Validity: the
truth of the experiment
àCause-and-effect
correctly used
Reliability: consistency/stability of a measure. I.e. if the measurement is similar every time it is measured. 2 components of measure:
-lower reliability: higher
variability àcould
still be normal distribution with high variability. Pearson product-moment correlation coefficient: Used to calculate correlation coefficient: (how strongly 2 variables are linked). Measured from –1.00 to +1.00. 0 correlation = no relations. +1.00 = positive relationship and –1.00 = negative relationship.
test-retest reliability: 2 diff. tests on the same person at diff. times -should
be used for measures expected to be consistent over time (i.e. intelligence) -for most measures, in order
to be reliable, got to be over .80 in cases where things change,
i.e. mood, one must use other methods to find reliability, using a single
test internal consistency
reliability: many questions really checking 1 thing – the
reliability is acquired through the sum of all the answers split-half reliability:
comparing 1 half of the test to the other half of the items. The items
are split randomly. Cronbach’s Alpha:
each item is correlated to each other item. An average of all those
correlations are taken. Item-total: all
the aforementioned ways to compare the item to the total. Allows you
to eliminate an item which doesn’t quite correlate w/ other items Interrater reliability:
in cases of observations where you have a rater, 1 rater might be unreliable,
so you have several raters and see the reliability of the raters to
each other. Construct validity of measures: Construct validity: how ‘true’ the operational definition is. àusually built up over many studies. àwith time, measures are changed, to fix problems w/I them
àperson estimates whether the given question really measures the variable àProblems: 1)no an empirical way 2)
many things measured don’t have an obvious face validity Convergent / discriminant validity Convergent validity:
when variable finally proven to be related to other variables. I.e.
self-esteem and well-being Discriminant validity:
variables known not to be related to each other. Criterion validity Criterion validity: measure of construct validity of measures which predict future b/h (i.e. SAT) –see if test (predictor variable) is related to future b/h. if criterion validity is high, then the test predicts well future b/h Predictor variable: the test/measure which sets to predict future b/h Criterion variable:
the future b/h being predicted. Reactivity of measures Reactivity: if
results might change if participant knows he’s being measured. Unobtrusive/non-reactive:
measuring variables indirectly, i.e. in seeing near which paintings
in museum the tiles are most worn – you’d know that they’re most
popular painting. Variables
|
April 22, 2002
Experimenter bias: diff. terms for the same thing: way the experimenter b/h in order to effect the results
Also called:
àeven
on animals!
Examples
àThey even interpreted ambiguous stimuli
diff.
Double blind effect:
both the participant and the experimenter don’t know which of the
experimentation settings the participant is in
Quasi-experimental design
-researches similar to experiments
ài.e. we run them in labs, where we can control variables/their intensities/order of events/right control groups
àin field-research, we are limited in our ability to control.
ài.e.
in comparing effects on divorced people’s kids: you can’t make people
divorce for the sake of the experiment
-natural setting experiments are used to:
àkinds of ways to help released criminals
Differences b/w lab and natural settings:
hardships in field research
Threats:
things that might weaken the validity of the experiment
Possible threats:
à I.e. after announcing a new economic plan, people asked how their satisfied from gov’t economic plan. Their answer is problematic, b/c it could be that the $ is low and the mortgage rates went down and not only had to do with their recent announcements
ài.e. you measure something in the beginning of grade 7 and end of grade 7 is problematic since they mature a lot in that age, independently of your experimental manipulation
àalso: person might be more tired/more relaxed through the experiment
àhistory w/I the participant
ài.e. the interviewer got tired/the exercise machine got old
àquestion: is it me that got
better, or the tool which got worse
difference:
-testing is change in the person. Instrumentation is change in the tools (interviewer could also be a tool)
ànow,
I don’t know whether the change is b/c of statistical regression or
the experimental manipulation
scatter diagram
Pretest scores↓ | 7 | 8 | 9 | 10 | 11 | 12 | 13 | Avearge |
13 | 1 | 1 | 1 | 1 | 11.5 | |||
12 | 1 | 1 | 2 | 1 | 1 | 11 | ||
11 | 1 | 2 | 3 | 3 | 2 | 1 | 10.5 | |
10 | 1 | 1 | 3 | 4 | 3 | 1 | 1 | 10 |
9 | 1 | 2 | 3 | 3 | 3 | 1 | 9.5 | |
8 | 1 | 1 | 2 | 1 | 1 | 9 | ||
7 | 1 | 1 | 1 | 8.5 | ||||
Mean pretest | 8.5 | 9 | 9.5 | 10 | 10.5 | 11 | 11.5 |
-regression line (line that goes through the most predictive scores)
Pretest mean | Mean post test |
13 | 11.5 |
12 | 11 |
11 | 10.5 |
10 | 10 |
9 | 9.5 |
8 | 9 |
7 | 8.5 |
àshows the people moving to the center
(average)
àgrade 7 kids chosen, the manipulation worked, though if it was grade 5 kids, the manipulation didn’t work b/c they are diff. populations
àinteraction
b/w manipulation and maturity
3 kinds of experiments
Bad types of experimental setup
1 shot case study
XO (x = manipulate/O =measurement
àwe measure 1 group -after manipulation,
we measure its effect
problems:
1 group pretest-posttest design
O1XO2
Problems
**
Static group comparisons
-I have 2 groups:
-XO and O
-->1 group w/ manipulation and 1 w/o
-->not random (i.e. 1 underwent
the course and 1 didn’t)
Problems:
29.4.02
îç÷řéí ăîĺéé đéńĺé, çĺěůä ůě äđéńĺé - äîůę
5) Statistical regression – řâřńéä ěîîĺöň, đčééú ä÷öĺĺú ěäâéň ěîîĺöň.
îîĺöň âĺáä äéěăéí ěäĺřéí âáĺäéí îŕĺă éäéĺ đîĺëéí îîîĺöň ääĺřéí, ĺîîĺöň âĺáä äéěăéí ěäĺřéí îŕĺă đîĺëéí éäéĺ âáĺäéí îîîĺöň ääĺřéí. ěîä ćĺ çĺěůä ? äŕí äůéôĺř äĺŕ úĺöŕä ůě ŕéîĺď çĺćř ŕĺ ň÷á řâřńéä ěîîĺöň, ÷řé äđčééä ůě áňěé öéĺď đîĺę ěůôř áđéńéĺď đĺńó ŕú öéĺđéäí.
äđçä ůě
řňéĺď äřâřńéä, äéŕ ůéů ěëě îáçď ůâéŕä,
ëůá÷öĺĺú äůâéŕĺú äď éĺúř âăĺěĺú. áŕîöň
ääúôěâĺú äůâéŕä éëĺěä ěäéĺú âí ěëéĺĺď
äâáĺä ĺâí ěëéĺĺď äđîĺę ĺěëď đůŕřéí ôçĺú
ŕĺ éĺúř áŕîöň. ěđîĺëéí éů ůâéŕĺú ůđĺčĺú
ěîňěä ĺěâáĺäéí äôĺę.
čáěä ăĺ ëđéńúéú Scatter gram – ôéćĺř äöéĺď ěôé îăéăä řŕůĺđä ĺůđééä
ä÷ĺ äĺŕ ÷ĺ äřâřńéä,
äĺŕ ä÷ĺ ŕĺúĺ ŕđé îđáŕ. ŕĺúĺ ÷ĺ ůîńáéáĺ
ńëĺí äůâéŕĺú äĺŕ ä÷čď áéĺúř.
řâřńéä ěîîĺöň
– äúĺôňä ůě äđčééä ěäúëđń ěîîĺöň.
ä÷ĺ ůě ä÷ĺřěöéä
äĺŕ äîîĺöň ůě ůđé ÷ĺĺé äřâřńéä.
äîůę äŕéĺîéí / çĺěůĺú äîç÷ř :
6. áçéřä Selection
äúĺöŕĺú ůŕđĺ î÷áěéí ŕéđí áâěě äúôňĺě ŕěŕ ň÷á ůäŕđůéí ěôđé äúôňĺě ëář äéĺ ëŕěä.
7. đůéřä Mortality (attrition)
đůéřä ůě đáă÷éí. ńĺâ îńĺéí ůě đáă÷éí đůř ĺáâěě ćä ÷áěúé äáăě áúĺöŕĺú ĺěŕ ÷ůĺř áîä ůçůáúé.
8. ŕéđčřŕ÷öéä Interaction
ŕéđčřŕ÷öéä áéď
äŕéĺîéí äůĺđéí. éëĺě ěäéĺú ůéů ńě÷öéä
ůě đáă÷éí, äúôňĺě ôňě ŕę ř÷ ňě ÷áĺöä ńě÷čéáéú,
ňě ŕçřú ćä ěŕ äéä ôĺňě. úëđéú ěéîĺăéí
ěëúä ä ĺ-ć', äůôňä ůĺđä ňě ëě ëéúä- éëĺě
ěäéĺú ůćä áâěě ůéů áâřĺú âăĺěä éĺúř áëéúä
ć, ëę ůäčéôĺě îůôéň ŕçřú ňě ä÷áĺöĺú âéě
äůĺđĺú.
îňřëé îç÷ř ářîĺú ůĺđĺú
1. äřîä äřŕůĺđä - Preexperimental design
ŕěĺ äď řîĺú đîĺëĺú ůě îç÷ř ëîĺ :
ŕ. One shot case study (XO). XO – X ä÷ĺřń, O ńĺěí äňîăĺú ůňůéúé ěŕçř îëď. îç÷ř áĺ ÷áĺöä ŕçú ůŕçřé äúôňĺě áĺă÷éí äůôňä ŕçú ůŕđĺ çĺůáéí ůäúôňĺě âří ěä . ěăĺâîŕ- äŕí ä÷ĺřń áéäăĺú äáéŕ ěŕçĺć âáĺä ůě çéěĺđééí áňěé ăňĺú çéĺáéĺú ňě ăúééí ? ëëě ůěîç÷ř éĺúř çĺěůĺú ëę äĺŕ éĺúř âřĺň. ěŕ îúâář ňě îĺůâ ääéńčĺřéä (îůäĺ ŕçř îäúôňĺě âří ěúĺöŕĺú), ěŕ îúâář ňě äáůěĺú, ěŕ ňě äîăéăä, Instrumentation ćä áňéä ůě ůúé îăéăĺú ĺëŕď ŕéď ěé ůúé îăéăĺú, řâřńéä ěŕ řěĺĺđčéú, ńě÷öéä ŕĺîřú ëé îěëúçéěä ěńčĺăđčéí äéĺ ăňĺú çéĺáéĺú ĺěëď äĺŕ ěŕ îúâář ňě ńě÷öéä, đůéřä- ëě îé ůěŕ ŕäá ăúééí ňćá ŕú ä÷ĺřń ĺěëď ŕéđđĺ îúâář, ŕéđčřŕ÷öéä äéŕ äřňéĺď, ůéëĺě ěäéĺú ůä÷ĺřń äůôéň ę ř÷ ňě îé ůáçř áĺ ĺěŕ ňě ëě äŕđůéí áöĺřä ăĺîä.
á. One group pretest post test design (O1XO2) äĺńôúé îăéăä ěôđé ä÷ĺřń. ěŕ îúâář ňě äéńčĺřéä, áůěĺú, îăéăä éů ěé 2 îăéăĺú ĺěëď ééúëď ůäáéđĺ ŕú äůŕěĺď, äëěé – äîřŕééď îëéř ŕú äńčĺăđč ĺěëď îăář ŕěéäí ŕçřú ĺäí éůúôřĺ. řâřńéä- ŕé ŕôůř ěăňú ŕí äňîăĺú ÷éöĺđéĺú. ńě÷öéä- îúâář ňě ćä îëéĺĺď ůŕđé éĺăň ŕéę äđáă÷ äéä ěôđé ćä. đůéřä Mortality- îúâář ňě ćä, ëé ŕôůř ăřę ůúé îăéăĺú ěăňú ŕí đůřĺ ŕĺ ěŕ ĺěŕ ëîĺ áîăéăä ŕçú ůěŕ éĺăňéí ŕć ŕí đůřĺ ŕĺ ěŕ. ěŕ îúâář ňě ŕéđčřŕ÷öéä- ëé âí ôä äúôňĺě éëĺě ěäůôéň ňě ä÷áĺöä äćĺ ůîěëúçéěä äâéňä ě÷ĺřń áâěě ňîăĺú ŕĺ đčéĺú îńĺéîĺú.
â. Static group comparison (XO äçĺ÷ř ěŕ ńéĺĺâ ŕú äđáă÷éí
( O
ě÷áĺöĺú äůĺđĺú ŕěŕ ÷éáě ŕĺúí ëîĺ ůäí. ŕđé îůĺĺä ůúé ÷áĺöĺú, çéěĺđééí ůě÷çĺ ŕú ä÷ĺřń áéäăĺú ĺ÷áĺöú çéěĺđééí ůěŕ ěîăĺ ŕú ä÷ĺřń. ÷áĺöä ŕçú ňí úôňĺě ĺäůđééä ěěŕ úôňĺě. ŕđé îá÷ů ŕú ăňú äçéěĺđééí ěâáé ăúééí. îúâář ňě äéńčĺřéä, îúâář ňě áůěĺú, ěŕ řěĺĺđčé äîăéăä (ŕéď ůúé îăéăĺú) ëę ěâáé äëěé, řâřńéä, ěŕ îúâář ňě ńě÷öéä (ëé ŕéď ěé ňĺă îăéăä ěôđé), đůéřä, ŕĺ ŕéđčřŕ÷öéä.
ůîĺđä äçĺěůĺú
/ äîáçď |
History | áůěĺú | Testing | Instrumentation | řâřńéä | ńě÷öéä | đůéřä | ŕéđčřŕ÷öéä |
One shot case study (XO). | - | - | ě.ř | ě.ř | ě.ř | - | - | - |
One group pretest post test design | - | - | - | - | ? | + | + | + |
Static group comparison | - | + | ě.ř | ě.ř | ? | - | - | - |
2 experimental design
Solomon four-group design
R = random
4 patterns of manipulation/observation
R O1 X O3
O2 O4
X O5
O6
àyou
see if there is
History | Maturation | Testing | Instrumentation |
If you have
2 measure-
ments: -no prob, since you compared a O to an XO to see if it is the X that infl. or not |
No prob, since you can’t say that 1 matured more than other groups | no prob. since
1) comparing 1st
and 2nd rows: see that if there is a diff: must be b/c of
x and no other reasons 2)seeing diff. b/w 3/5 and 4/6 to see that there is diff->no infl of the testing |
Instrumentation: something
changes in measurement tools. If you compare O2 and O4 ->if diff: prob. w/ instrumentation (but could also be tools) |
Regression | Selection | Mortality | Interaction |
-regression:
I tested extreme groups: that had to even down to average: -no prob. 1)you split groups randomly 2)if it was an extreme group, you would have diff. b/w O1 and O2 |
-no prob: I choose who gets the manipulation and who doesn’t. I don’t choose a scenario as a manipulation | If I can show that no-one
left
-->no prob. -in case of 2 measurements –even w/ same group, I can see if people left . |
No prob: I can show that there
is no interaction b/w those who had manipulation (b/f or after) and
those who didn’t (b/w or after)
àif yes interaction: prob. -in order to say that X’s infl. on the later O is not only interaction w/ the first O, you got to have the same experiment w/o the former O and see if there is interaction b/w the 1st set of trials and the 2nd. If interaction àprob!!! |
Solomon four-group design
is really comparing 2 experiments
Pretest-post test control group design
R O1 X O3
O2 O4
History | Maturation | Testing | Instrumentation |
Once
you have 2 measure-
ments: -no prob, since you compared a O to an XO to see if it is the X that infl. or not |
No prob, since no reason to
say that 1 matured more than other groups
-once you have 2 groups, you have to say that the 2 groups are the same (i.e. randomness), and not b/c of specific maturation |
No prob. since
1)comparing 1st and 2nd rows: see that if there is a diff: must be b/c of x and no other reasons |
Instrumentation: something
changes in measurement tools. If you compare O2 and O4 ->if diff: prob. w/ instrumentation (but could also be tools) |
Regression | Selection | Mortality | Interaction |
-regression: since results are diff. O3 and O4 à therefore you must assume that there is no regression ** | -no prob if I compare O1 and O2 if they see that they are the same | If I can show that no-one
left
-->no prob. -in case of 2 measurements –even w/ same group, I can see if people left |
Problem: I do not have enough info to see if there is internal interaction àprob!!! |
Randomized-two-group design
R X O5
O6
History | Maturation | Testing | Instrumentation |
+
No prob, since once I have 2 groups, I can see the diff. to see if there external variables influencing the results |
+
No prob, since you there is no reason to say that 1 matured more than other groups -once you have 2 groups, you have to say that the 2 groups are the same (i.e. randomness), and not b/c of specific maturation |
?
Not relevant -only 1 measurement -->no change in the measure! (?) |
?
Not relevant –ibid. (?) |
Regression | Selection | Mortality | Interaction |
(?)
I can’t prove that it is
regression (despite randomization) b/c you don’t have initial group
to see the diff. b/w them |
(?) You don’t know the prior state, b/c you don’t have the 1st measurement (o) and therefore, you are left w/ a question-mark | +
If I can show that no-one left -->no prob. -I know how many people came in, so I know how many people came (unlike questionnaires where I don’t know how many people refused to take the questionnaire once they saw the content of it. |
-
Problem: I do not have enough info to see if there is internal interaction àprob!!! |
--
Quasi-experimental design
-(interrupted) times-series design
imp:->NO
RANDOMNESS IN Quasi-experimental design
-as seen above - if you want
to reduce weakness --> add observations or measurements
-quasi-experimental design: some measurements beforehand/manipulation/a few measurements afterwards:
i.e.-O1, O2,
O3, O4, XO5, O6, O7, O8
example:
-astronauts in space-walks =nervous. Therefore, more swearing. I.e. O1-4 X=space-walk [O5 measured during at x (space-walk)] and then O6-8
O1 O2 O3
O4 O5 O6 O7 O8
History | Maturation | Testing | Instrumentation |
-
b/c you can’t measure all factors. I.e. hw might have been upset at election results |
+
- |
+
if there is testing, it would have a certain 1-directional slope |
Ibid. |
Regression | Selection | Mortality | Interaction |
+
since there is no zigzags in the curve |
+
selection: from the beginning the since you compare diff. observations and see if |
+ same people in all experiment | -
Problem: I do not have enough info to see if there is internal interaction àprob!!! -only 3 groups -interaction b/w the situation |
Variations of the times
series response
àeffect
is lasting (i.e. vaccination)
àdelay
b/w manipulation and reaction
Multiple time series design
-no randomization/no groups
-i.e. after a certain thing happens in 1 city, compare it to another place
ànot
a random place, i.e. after a thing was screened in 1 city, but not in
another ity, seeing the change in one city, and compare it to what happens
in a city where the movie wasn’t screened
Group 1
O1 O2 O3 O4 O5 O6 O7 O8
O1 O2 O3 O4 O5 O6 O7 O8
Group 2
Multiple time series design:
History | Maturation | Testing | Instrumentation |
+
-if you compare O1 of each group and see that they are the same, then you see that there are no intervening variables (which ruins history) |
+
- |
+
if there is testing, it would have a certain 1-directional slope |
Ibid. |
Regression | Selection | Mortality | Interaction |
+
since there is no zigzags in the curve |
+
selection: from the beginning the since you compare diff. observations and see if |
+ same people in all experiment | -
I don’t have the info to see if there is interaction b/w place and manipulation. They could be more violent b/c of other things, such as terror, or other reasons. |
àunlike
time series –multiple time series can deal w/
regression discontinuity
-continuation of static group
comparison
Cross-sectional study: comparison diff. groups at same time
Longitudinal study:
take a group over a long time
X1 X2 X3 O4 O5 O6
History | Maturation | Testing | Instrumentation |
+ once you have more than 1 group, you can compare b/w them – if they are the same in the beginning àno prob. | +
no reason to assume that 1 group matures (learns) faster than another |
_
-only 1 measure, therefore, can’t compare diff. in testing of 2 observations |
Ibid. |
Regression | Selection | Mortality | Interaction |
+
Usually, the regression line seems to get people closer to average, yet here, the weaker (extreme) groups aren’t infl. closer to the average, but rather, moving upwards |
+
-we see if people left or not |
-
no! it might only work on weaker students àselection |
Pretest/posttest non-equivalent control group
O1 X O2
O1 O2
àdotted line = non-compared [b/c you didn’t choose them]àunlike the Salomon – where they a equivalent (controlled)
àfor
example 1st is frontal learning classroom. Other is jigsaw
classroom
History | Maturation | Testing | Instrumentation |
+ b/c both groups have all the same variables except the teaching styles. Therefore, there aren’t external variables | +
no reason to assume that 1 group matures (learns) faster than another |
+
-we don’t think that there is testing diff, b/c we see 2 diff groups reacting in diff. ways. Thus the test didn’t change w/ time, since you should have seen in equally in both groups. Since there is a diff. result slopes àmust be b/c of X |
Ibid. |
Regression | Selection | Mortality | Interaction |
Depends!
If 1 group crosses the consistent line of the other group àno regression, since you can’t regress beyond regression. If the 2nd groups slope comes close, but not over the horizontal slope àmight be regression |
+
-you’re sure that you select a group prone to those results, since you measured the group b/f X to see if they were prone to the results before the C |
+
-we see if people left or not |
-
-you might have interaction b/c the manipulation and the specific teacher àother teachers might teach better/worse in each manipulation types |
Measurements
-Giving #’s systematically
to phenomenon
-# have no meaning w/o definition
of what each # on the scale means
-you have a set of # -Set A (say participants) A1, a2, a3, a4, a5
-you have a set B – gender
(b1, b2) you classify A into set b = that is a kind of measurement
measurement = turning results
into # that rank
àyou have to have isomorphism: connection/logic in the arrangement of the numbers:
àneed to have connection b/w the numbers and what you’re measuring
àthough
there might mistakes, i.e. in exceptions: not all males are masculine
to the same extent even though all males are classified 1 on a gender
question
àif
no consistency = lower isomorphism
for example: if you
have test results, and you have same marks for several participants,
you add up all the ranking # and divide by # of participants w/ than
score
Real mark: the accurate ranking
Used mark: taking same results into account (as seen above)
--
90à1.5
90à1.5
70à6
80à3.5
80à3.5
70à6
70à6
70à6
70à6
60à10
Correlation:
to check isomorphism
p =1- (6åD2/N3- N)
D2 = diff. b/w original mark and measurement-given #
àthe sum of all if diff. b/w original
results and given
N = # of pairs
àthe answer of the formula: how good the correlation is
ranking/real ranking D2
1 1 0
2 2 0
4 3 1
5 4 1
6 5 1
8 6 4
1-7/216-6
=1-(1/30)
=1-0.0333333…
r=0.96666666…
Measurement Rules
àais diff than b
ànot
all things in life are transitive= but then it is not a tool
scales/levels of measurements
ài.e. the Richter scale
àrule #2
àrule #3
May 27, 2002
-last class, we saw that there is a diff. b/w real measurement and wanted measurement
àeach term is measured/expressed/defined
in diff. ways
examples:
-intelligence: speed of answering/amount of knowledge/how much math you know
-gender:
what people are biologically/what they define themselves socially
-now that we decided how to measure a variable, we need to measure:
-reliability is a necessary but not sufficient for measurement.
àif it is not reliable, then it is not
valid.
Reliability
Reliability:
accuracy/consistency of the term/variable across situations
3 approaches to measure reliability
àXµ = true score X1+X2……/N
--
***
àget help for when/how to use formula
To know for exam:
Reliability formula:
R=Vt-Ve
Vt
Or
Rtt = 1-ve/vt
Error = also called
residual – given on the test
Sourse |
SS | Df | MS |
|
Item àignore | |||
Individual | 30.17 àVt
-Also V(ind) | ||
Residual (also error) | 7.14 = Ve |
So plug into formula
=30.17-7.14/30.17
Example
Respondent | a | b | c | D | åind |
1 | 6 | 6 | 5 | 4 | 21 |
2 | 4 | 6 | 5 | 3 | 18 |
3 | 4 | 4 | 4 | 2 | 14 |
4 | 3 | 1 | 4 | 2 | 10 |
5 | 1 | 2 | 1 | 1 | 5 |
åt | 18 | 19 | 19 | 12 | 68 |
Note: in d, everyone is more
reserved about answering: systematic error
Variance: ind’s diff. from average squared
s
2=
å
(Xi-/X)2
n
--
C=total squared –n
**get help
Subject
Subject | Item A | B | c | d | åind | |||
1 | 6 | 6 | 5 | 4 | 21 | |||
2 | 4 | 6 | 5 | 3 | 18 | |||
3 | 4 | 4 | 4 | 2 | 14 | |||
4 | 3 | 1 | 4 | 2 | 10 | |||
5 | 1 | 2 | 1 | 1 | 5 | |||
18 | 19 | 19 | 12 | 68 |
-I Can compare diff. parts of testing items to get variance
à
i.e. 1st half w/ 2nd half
à
split-half reliability: compare the odds w/ evens (might get tires, so no comparing 1st half w/ 2nd half
Subject | Item A | B | c | d | åind | åo(dd) | åe(even |
1 | 6 | 6 | 5 | 4 | 21 | 11 | 10 |
2 | 4 | 6 | 5 | 3 | 18 | 9 | 9 |
3 | 4 | 4 | 4 | 2 | 14 | 8 | 6 |
4 | 3 | 1 | 4 | 2 | 10 | 7 | 3 |
5 | 1 | 2 | 1 | 1 | 5 | 2 | 3 |
18 | 19 | 19 | 12 | 68 |
à
comparing the
å
o and
å
e gets reliability
Subject | Item A | B | c | d | åind | åo(dd) | åe(even |
1 | 6 | 6 | 5 | 4 | 21 | 11 | 10 |
2 | 4 | 6 | 5 | 3 | 18 | 9 | 9 |
3 | 4 | 4 | 4 | 2 | 14 | 8 | 6 |
4 | 3 | 1 | 4 | 2 | 10 | 7 | 3 |
5 | 1 | 2 | 1 | 1 | 5 | 2 | 3 |
18 | 19 | 19 | 12 | 68 |
P=1-(6
å
d2/n3-n)
N=# of pairs
Ro | Re | D | D2 | ||||
1 | 1 | 0 | 0 | ||||
2 | 2 | 0 | 0 | ||||
3 | 3 | 0 | 0 | ||||
4 | 4.5 | 0.5 | 0.25 | ||||
5 | 4.5 | 0.5 | 0.25 | ||||
D2=0.5 |
--
Subject | Item A | B | c | d | åind | åo(dd) | åe(even |
1 | 6 | 4 | 5 | 1 | 16 | 11 | 5 |
2 | 4 | 1 | 5 | 4 | 14 | 9 | 5 |
3 | 4 | 6 | 4 | 2 | 16 | 8 | 8 |
4 | 3 | 6 | 4 | 3 | 16 | 7 | 9 |
5 | 1 | 2 | 1 | 2 | 6 | 2 | 4 |
13 | 19 | 19 | 12 | 68 |
P=1-(6
å
d2/n3-n)
N=# of pairs
Ro | Re | D | D2 | ||||
1 | 3.5 | 2.5 | 6.25 | ||||
2 | 3.5 | 1.5 | 2.25 | ||||
3 | 2 | 1 | 1 | ||||
4 | 1 | 3 | 9 | ||||
5 | 5 | 0 | 0 | ||||
D2=18.5 |
1-6(18.5)/120
0.75
coefficient varient??**
r2 = the common part of both tests
-there is a rule in reliability: the more items, the more chance of being similar
à
but in splithalf reliability, we reduced the # of items
à
so there is a way to fix it: spearman-brown formula
spearman-brown formula:
Rn=NR/1+(n-1)r
R=wanted correlations (how much there really is)
N = wanted # of items (the # that really exists)/existing items in out split half test
Example: 10 items of half-split
Rn=20/10 x 0.6
1+(20/10-1) x 0.6
1.2/1.6
=6/8
=0.75
--
Ro | Re | D | D2 | ||||
1 | 1 | 0 | 0 | ||||
2 | 2 | 0 | 0 | ||||
3 | 3 | 0 | 0 | ||||
4 | 4.5 | 0.5 | 0.25 | ||||
5 | 4.5 | 0.5 | 0.25 | ||||
D2=0.5 |
Question: why not compare all of them to each other?
N = n(n-1)/2
4 = 4(3)/2 = 12
chronbach alpha/ internal consistency: ranking each one againstg each other: a vs. b’s ranking a
à
b, c, d,, B vs. c, d.
->when using correlation as a score = correlation of a pair = 1 score. 2 pairs = 2 scores
-question on the test: where is there most reliability
chronbach alpha:
µ
=N
Ø
/ 1+(n-1)
Ø
n=# of TOTAL items
Ø
= the average of correlations of the N [= n(n-1)/2]
test-retest reliability: measured against all of the items
à
very similar in reliability to cronbach,
à
since it tests twice, and each item is considered, since Xt = X + e[rror] **
alternate forms reliability/parallel forms: when the same question is rephrased when retested