Do social workers really need to
understand research?
To be helpful in a better
way ->and
you get more funding by empirical support
To do no harm
How do we know if what we are doing
really helps?
Should social workers team
up with police in responding to elder abuse
a study showed that police or social worker intervention actually increased
is multi-systemic therapy
more effective than service as usual?
Are lay home visitors as
effective as nurses in peri-natal parental training?
are school based sexual
abuse prevention effective?
Are women as violent as
Does spanking have different
affects in African-American families? i.e. spanking has been shown to
be ineffective at best, yet might have a differential effect n various
Was welfare reform really
a success?
Are parents of children
with chronic health problems overprotective?
Does foster care increase
the risk of juvenile delinquency?
Is harm reduction ethical
wen working with pregnant drug users?
Why quantify?
Qualitative research is
used to explore and understand
Quantitative research is
used to:
Establish causality
Measurement is however fundamentally
reductionistic and must maintain an iterative
course outline
search and read – i.e.
know how to read descriptive statistics
numeracy and basic statistics
experimental designs
observational designs
systemic review and meta-analysis
searching and reading
Boolean search
– i.e. “usage of and or”s in your search
who –i.e. children
what – sexual abuse
why - prevention
reading tips
read title and abstract:
is it a study/review/theoretical/policy. It is meant to identify design,
sample, major concepts measured, and major findings
scan through heading->skim
through the intro and lit. review [first and last sentences of each
paragraph] – and last sentence of literature review [it usually describes
objectives]– compare abstract and title.
Sample/design usually in
the first table
Try to visualize the sample
from a clinical perspective – how do participants compare to people
you have worked with? – psychology researchers know the population
in a very limited spectrum
Imagine yourself recruiting
the study participants – who is missing? Who dropped out? Any significant
concerns about sampling bias? ->social services tend to refer those who they
think will be most successful. This is a bias!!!
Go to tables – they show
you all the variables measures
Go to the text describing
the measures – try to visualize them. Observation? Worker report?
Self report? How many questions or itemds? Do these make sense
What was left out?
Compare results to your
Discussion section
= first paragraph summarize finding
Identify strengths and
1 or 2 limitations [this skill improves with practicepractice]->i.e.
judge what is more relevant
Study summary table
-a table to summarize an article –
generally not more than a page
Columns include:
Sample and design
Major findings
L. (1991). Chapter 4. Does a Method cause change? (pp 61-81). In Leonard
Gibbs. Scientific reasoning for social workers: Bridging the Gap
between research and Practice. Indiana: Prentice Hall
Often, people use
cues-to-causality: shortcuts or rules of thumbs to infer causality.
They fail to rule out alternate causes and confounding effects of various
other variables
Kinds of cues-to-causalities
Casual chain strength:
i.e. questioning how reasonable the link made. For example, looking
at someone who looks disoriented: we don’t know exactly what this
person is feeling or why. We can reasonably assume that this is because
her Alzheimer’s – but it could be anything else too.
i.e. when two things come together, they are assumed to be associated.
Nevertheless, there could be other factors causing the change. i.e.
when a sucky depression was lifted with therapy, we don’t know if
it was the therapy or something else, such as the cause of the depression
ceased. The two problems with relying on contiguity logic are:
We do not know whether it
was the intervention or another variable which caused the change
Effects of intervention
are not always immideately seen
Temporal order:
the assumption that if one thing came before the other thing, then one
caused the other. This is problematic since it is necessary but not
sufficient grounds to infer causality. Yes, therapy must come before
the depression is alleviated, but just because it came first, then it
could have helped, but we still need to see that it was the therapy
and not something else. In Latin, this logic is called post hoc ergo
propter hoc. But after all, there are accounts of spontaneous
when things occur together, they are assumed to be causally linked.
This is not necessarily true because of there could be confounding variable.
An interesting study found that people misestimate the non-events or
what happens in cases when no treatment or no success have been done
Ruling out alternate
explanations: as mentioned, in other cases, not considering
alternative or confounding causes of the results of the study
Confounding/alternative causes
Client-by-treatment interaction:
the results were caused by the specific interaction between the specific
client and the specific intervention
Participation in other
treatments concurrently
Placebo effect: a
similar effect is called the Hawthorne effect: people change
behavior, or even recover simply because of the researcher/therapist’s
Maturation: a natural
process of change over time within the participants. i.e. (for the sake
of example): let’s say that people get less depressed with time, then
the study of effectiveness of therapeutic intervention will be confounded
by a natural healing process
Client selection
Self-selection: the
research group differed in some way from the general population
Purposeful selection
by researcher: you just skewed your results
Regression towards the
mean: people who got high readings on a measure were probably on
a peak, and tend to revert back to more normal readings on future readings.
So it was not the studied intervention, but a high first reading and
a more normal lower second reading on measurements of X, which happened
because the guy has a normal wavy reading, or the measurements are not
accurate or reliable, or whatever…
Mortality: people
quit studies/die/etc… this could skew the results
Resentful demoralization
of control or comparison group: the participants might get grumpy
because they are getting different treatment –or a “lesser” treatment
Compensatory rivalry:
people in the control group might try harder at the task to compensate
for their lack of treatment within the experiment
Spontaneous recovery:
sometimes seen as a special case of maturity
The socialworker/researcher
Reliability of treatment
implementation: how the intervention which is being studied is implemented
influences the outcome, beyond the pure intervention which is being
studied. For example, a student of social work will implement the intervention
differently than an experienced social worker
Outcome measure
Inadequate preoperational
explication ofconstructs: gotta define your criteria
and methods accurately
Reliability: gotta
make sure that your measurement tool is accurate and reliable – and
not constantly fluctuating
Testing: the mere
taking of a test could make a difference in the results. So it is not
the intervention but the testing which made the changes in the participants
not the intervention but the mere measurement instruments used in the
study which made the changes in the participants
History: if anything
external to the experiment took place between the experiment sessions
Chapel, T. (2004).
Chapter 70. Constructing and using logical models in program evaluations
(pp. 636-647). In Roberts, A. Yeager, K. (eds.) evidence-based practice
manual: research and outcome measures in health and human sciences.
New York, Oxford University Press
Logic models:
a way to chart graphically the logic of a study. You need to show:
Relationship: between
components and effects/outcomes of your study
Intended: that the
results are expected from your study manipulation, or your intervention
A logic model should show:
Inputs: the
resources which you need for your program
what takes place in the program being evaluated
Outputs: the
product of your activities
Effects: also
called outcomes or impacts – the result from your activities
and outputs
Developing a simple logic model:
Develop a list of activities
and intended effects. The following approaches could help
Review of information on
the program
Work backwards from effects
– i.e. asking “how to”
Work forwards from activities
– i.e. asking “so what?”/”then what happens?”
Subdivide the lists to display
any time sequence: i.e. a column for activities and another one for
[optional]: add any inputs
and outputs
Draw arrows to depict causal
Clean the logical model:
i.e. think through and revise when you have to figure out the logic
more clearly
Early activities
Later activities
Early outcomes
Later outcomes
Pool of # of participants
Better quality of life
Training staff for
Relationship with
Identification of participants
Legal authority
-you can also make the above into a
flowchart [at no extra cost]
logic model flowcharts help map out the implied causality and
where logic/evidence gaps exist, and which one of those gaps are most
urgent in the flow of the experiment logic.
make sure that all parties involved in your project agree with the logic
of the model that you are using for your study design
-determining the correct evaluation
focus depends on the specific case, but some general questions include:
who will use the evaluation of the project, and for which purpose? Which
effects will need to be measured then in order to satisfy the needs
of the readers?
Implementation fidelity include:
Transfers of accountability
– when program needs an external organization to take action in order
for study to continue
Dosage: is
the research/activity too lax or demanding on a critical variable?
– an increased for a certain product due to the research
Next 3 weeks – numeracy - numerical
concepts, average, procentages, standard deviatins
is not always clear – i.e. in cross-sectional studying
-i.e. if you study something if might
have impact or be influence by various factors
So, you might have:
[effects influencing the causes, i.e.
personal dispositions] ->Cause[s] ->moderating variables ->effect[s] +[and effects of effects
– i.e. changes in arrest policy]
-to measure how strong the cause-effect
connection is getting complicated then you look at various causes and
moderating variables. You probably need a much bigger sample
Control variables and
confounders influence the relationship between independent and
dependent variables
Conditions for causality
Temporal order
and dose response, contiguity, consistence, etc.
Pretest->post test: allows us to see how things
were before the intervention
course, you also need a control group
-statistical significant – a measure
that shows what the chances of the results are by chance or not
-proportion chart – asks how much
yu get the charted results by chance
3 ways to see association
difference in means
distribution in a chart
to measure if they are statistically significant
Temporal order
-asking which came first – the temporal
How do you measure temporal order?
some things, you know –i.e.
gender came before the depression
Pre-post design
/ non-spuriousness
to validity [i.e. if there are alternative explanations]
– and: spontaneous recovery – you need a similar
comparison group.
Reactive effects:
such as placebo effect or Hawthorne effect [people who do well just
because they are in a certain program
– spurious association [i.e. people do not break hips because of their
winter clothes, but because of their winter]; co-interventions
– volunteer bias, healthy worker bias [i.e. more people call in sick
on Monday], purposeful therapist selective [i.e. we refer the more successful
clients referred to the study]
Ecological fallacy
– mixing up levels of variables, i.e. studying crime in neighborhood,
and one neighborhood which has a lot of immigrants move in – you cannot
assume that immigrants are criminal – you do not know if they moved
into crime neighborhoods, etc…
Experimenter expectation
/ observer bias
Reliability – instrumentation
[is measure accurate and consistent?], calibration testing
Validity -does the instrument
measure the right thing?
Regression towards the mean
Theoretical plausibility
– does the causality make sense
Social work model
-people ->problem
you need to do a needs assessment and see:
What is the problem - need
What are the causes of the
What needs to be done -
How do we do this
Is it actually happening
Is it having an effect?
Theory – a way of explaining
-hypothesis – you are testing something.
Versus assumptions are generally nor being tested
-cause and effect is empirically studied,
without a third intervening factor
Causal theory
Person/situation ->intervening variable [which creates the problem] ->problem
will bypass the intervening variable to avoid the problem
Intervention hypothesis:
Action hypothesis: the action will
solve the problem
need the right program to solve the problem
Why a program might fail:
Is it because the theory
was wrong?
Was it because the program
was wrong?
Implied program theory:
Serious juvenile effects ->family
relations ->youth
social competence ->re-arrest and incarceration
Family therapy will try to work family
relations and social competence
3 readings: Norman and streiner –
first 2 chapters – read/ olds study of nurses
Make a logic model – look at one
of the interventions in your program - look for ST and LT outcomes
-explain what you typically do –
meet with patients/staff. This is what you expect st/lt outcomes in
your practice.
G., & Streiner, D. (2003). PDQ Statistics (3rd
edition). Hamilton, ON: B.C. Decker. Chapter
-a variable is something
that could be measured, or manipulated in a study
In a study design:
Independent variable:
the thing that the researcher changes. i.e. one group gets the medication
while the other does not. So the independent variable would be the treatment
Dependant variable
–is the variable that is not manipulated, but is measured to see the
effect of the independent variable. For example, in a medication research,
the dependent variable is the health improvement [based on the medication,
which is the independent variable]. A way to remember this is
that the dependent variable is dependent on what happens with the independent
variable, but the independent variable is independent of the dependent
Kinds of variables:
a kind of variable which merely categorizes: i.e. gender, diagnosis,
etc…. No meaning to order or # value
a kind of variable indicating order: i.e. ranks in the army, gender,
stage of disease,
a kind of variable where there is significance to the interval – the
diff. b/w the numbers. I.e. the difference between 5 and 6, and 8 and
9 is the same 1 unit of a difference, though there is no meaning to
an absolute 0. i.e. there is no absolute 0 in Celsius: 0 is not the
absence of temperature. So, because there is no absolute 0 in this kinf
of variable, then there are no ratio capabilities to such a variable.
So, for example, 20 degrees is not half of 40 degrees.
intervals with significance of the interval and there is an absolute
0! Examples include time and weight. 100 pounds is double of 50 pounds.
-so, you cannot do an average
on ordinal and nominal variables. There is no average gender,
stage of disease, rank, diagnosis, etc… but you can do averages in
interval and ratio variables.
-qualitative research
uses more nominal and ordinal variables while quantitative research
statistics – talk about categories –and thus when the results
are graphed, they are graphed on stick graphs, where each vertical stick
is a category and its height is the numbers which fall in each
category – and thus, those kinds of studies use nominal or ordinal
statistics – talk about axis and where results lie on it (and
not discrete categories). They use curves and lines to graph results,
and not categorical sticks. Interval and ratio variables are used in
such statistics.
G., & Streiner, D. (2003). PDQ Statistics (3rd
edition). Hamilton, ON: B.C. Decker. Chapter
2 kinds of statistics:
Descriptive data: describing
the results without trying to generalize it
Inferential statistics:
statistical procedures meant to establish generalizing to a broader
population, and that they did not occur by chance
Frequencies and distributions
Kinds of variables:
a kind of variable which merely categorizes: i.e. gender. No meaning
to order or # value
a kind of variable indicating order: i.e. ranks in the army
a kind of variable where there is significance to the interval – the
diff. b/w the numbers. I.e. the diff b/w 5 and 6, and 8 and 9 is the
same 1 of a diff., though there is no meaning to an absolute 0. i.e.
there is no absolute 0 in Celsius: 0 is not the absence of temperature.
intervals with significance of the interval and there is an absolute
0! Examples include time and weight.
Another distinction:
Continuous variables:
those on an axis – like height or weight. You can weigh 2 pounds,
or also 2.345434985345345 pounds, etc…
Discrete variables:
that there is no space – no range between the categories. I.e. you
can’t have 1 and a half kid – only 1, 2 , 3, etc…
More terms:
Distribution – the
data as displayed in a chart – i.e. how it is “distributed” in
a graph
Frequency distribution:
making a chart/graph of the frequencies of a certain variable values
Probability distribution:
charting the probability of each of the variable values. Thus you put
in the % of each value instead of the frequency
Kinds of centers of distribution:
Mode: the variable with
most frequency -good for nominal variables – you can have 2 modes
in a distribution – i.e. that 2 answers were equally common in a variable.
(called bimodal)
Median: the variable
in the middle of the distribution -good for ordinal variables
Mean / average: the number
resulting from total values/n -good for interval/ratio variables
is influenced by how skewed the values of the distribution is. So, positively
skewed (a few more extra-high values will unfairly influence and raise
the mean. Negatively skewed distribution will unfairly lower the average.
Average is most efficient measure in equally distributed data
Measures of variation: range,
percentiles, standard deviation
-this part will speak of how much people
differ. This is called the range. In nominal variables,
this is the number of variables with at least one respondent replying
to this category
In ordinal, interval and ratio:
there are several definitions of how to measure a range
The range between 5th
percentile and 95th percentile
See the quartiles [25%,
50% and 75%] and see the inter-quartile range – the difference
between the 25th percentile and 75th percentile
deviations are used for intervals/ratio variables which is:
SD =√sum of [individual
value-mean value]2 / number of values
The top is squared to undo
the negative values -those lower than the average, and the squaring
is undone by also doing the square root of the whole formula. By the
way, the result before the squaring part of the formula is called
variance. So, the lower the SD is from the mean, the closer
the participant is to the mean.
distribution: a symmetrical, bell-curve distribution. In a normal
distribution, 68% of the distribution fall between the 1 SD unit
to each side, 95.5 within 2 SDs and 2.3 fall on each of the tails
of the bell-curve
Standard scores:
When comparing 2 results from different
tests with different scales, you convert the score into a standard
score (x) which compares those two scores:
Standard score (z) =
(raw score – mean) / SD
-this test also allows us to see how
many people (%) got higher or lower than the given score (through
looking at the SD, which indicates how many people got higher or lower),
since z scores are in SD units
– January 20th, 2009
-nominal variable also sometimes
called categorical
Nominal – no order in
the variable – i.e. gender – you are one or another – but there
is no gender-order – need exhaustive and distinct categories – mode
is used
Ordinal – a variable with
ranks – i.e. army ranks – there is no order, but no meaning to the
distance between them - median or range or interquartile are used
Interval – there is a
scale and a set distance between the numbers– so you know the distance
between the variables, but no absolute 0 means that you cannot compare/double
the figures
Ratio – you know the difference
between them – you know the proportions
Yes =150
No = 50
Problem – hard to compare because
that if you have 70/400 unemployed, then 70 is more than 400, but it
is proportionally less – so it is better to give proportions
Histogram – tells the data with graphs
Constant – a number
that is not changing (according to user’s wishes)
Variable – a unit that
changes according to what people plug in (i.e. x)
Dependant variable variable
that changes dependant on another number in the equation
Independent variable
a variable that does not necessarily change depending on other variables
Example: Y=f(X)
Qualitative- something
relating to a quality of something – not #’s
Quantitative- something
relating to #
Continuous –
a variable that has subunits: i.e. 160cm has subunits (i.e. 160.5)
Discrete – a
variable w/o subunits: i.e. # of people in class – there can be 13
people in the class but there can’t be 13.5 people in the class
Finite -
Population the full group
being tested on
Sample the random group
from the population chosen to collect data
Graphs tables that portray
#’s in an arranged order
are shown in form of bars – too many bars = richer data, but at some
point, is more difficult to understand than the smaller categories in
the histogram
Bar graph – counting
and showing discrete categories
Polygram –a line graph
Central tendency
Mode – the most frequent
category in the variable
Median – goo for ordinal
and up – need to know what to rank it by, in order to do the median
Range – allows to take
into account also the spread [i.e. the extreme scores] and also
the average
Range does not allow to
tell you what happens between the extremes – so you have deviation
from the mean
Standard of deviation –
sum of category – mean
- average deviation about the mean
– by dividing it by N (has to be capital!), you yet to compare various
sized populations, and you get the standard of deviation
-dividing by n = average deviation
above the mean – the mean /N = has to be squared in order to undo
the squaring by doing he square root.
n-1 takes into account that this is
a sample – this is an adjustment that you make when it is not the
whole population
SD 1 = 34.3% SD2 – another 13.5%,
SD3 = another 2.1% - rest = 0.1%
You cannot for average on numeric or
ordinal scales – so you do not have the average religion in Canada.
January 27, 2009
bivariate inferential statistics
-comparing two things = bi-variate
analysis = needed for association between 2 variables. i.e. cause and
effect, and differences between 2 factors. –i.e. does A influence
B? is a more than b?
-we’re going to speak about tests
of significance
Significance: how likely is these results
likely to happen in reality
-if you have access to the whole population
[which the sample is taken from] then you can compare it to the
population but when you do not, and you want to test significance, then
you need other techniques.
Depending on the costs and ramifications,
some have a tighter or lighter p value [significant levels]
Statistical tests used for significance:
T-test – nominal
[only dichotomous] and ratio – you are comparing means
Purpose: To compare
2 independent random samples with normal distribution making no assumptions
about the direction of difference,
Concept: to see if
the difference is statistical significance to the difference between
those two variables
Anova – nominal
[3 or more levels of the variable] and ratio - you get a f score
Chi square – nominal
vs. nominal
Chi squared
– Nominal vs. ordinal
Pearson correlation
– ratio vs. ratio
Spearman – ordinal
vs. ratio
Stats Canada homepage: left
hand side: community profiles: enter your postal code – compare it
to city of montreal
Make table or chart with
the comparison and add your commentary on the data
February 3, 2009
readings – homework
– for the 10th. do readings +2 articles on webct
Exam – march 3rd
Biases that alter statistics
-Issues in the article read:
struggle to find statistics
shocking data!
Problem: sampling bias
You want to make sure that
you compare the same thing
Sampling bias – non-equivalent
Measurement biases
If biases are ruled out,
we are left to see if the differences found are significant
-larger sample – increases significance
significance = “does it really matter” – this measure is stronger
than the statistical significance measure
Is 35% different than 32%
35% different than 13%
Gotta look at
Effect size
Sample size ->i.e.
if there is more variation, then you want to increase the sample in
order to see the pattern in a clearer way [i.e. we also gotta look at
the standard of deviation]. If you see the pattern after a sample, then
enlarging the sample won’t change the fact that it is significant
Standard of deviation
What does P value mean?
Probability that the observed
differences or association is just a coincidence of sample
If p is lower that 0.05
(5%) – then there is a 5% chance that if you repeat this study, you
won’t find a difference – or 95% that you will find a difference
Systematic/design biases
–those are validity/reliability problems
Tests of significance assumes
that the other methodological issues are resolved, and the statistics
are ok – now we have to look at the results and see if they are ok
Hypothesis testing
Probability calculations
are based on staring in advance your explanation in a form of hypothesis
– i.e. there is a difference in the samples, which represent a real
differences in the population
The null hypothesis: there
is no difference between the 2 groups across the variable which is being
tested – i.e. that there is no difference in the sample, which is
also reflected in the general populations
Type I/Type II errors
Research hypothesis
is true
Research hypothesis
is false
Research hypothesis is supported
Correct decision
Type 1 error
5% chance that it will happen
Risk: using an ineffective thing.
hypothesis is not supported
Type 2 error
Risk: throwing out a good thing.
Usually happens with smaller sample: when the results are small but
In the past, this was not reported.
It is now increasing
Correct decision
Effect size
(SD size of the population)
=ES1 =means of treatment group
- the control group / standard deviation of control groups
-0.2 is a small effect size. 0.5 is
considered medium and 0.8 and up is considered large effect
ES2 = proportion improved in group
=proportion improved versus control group’s improvement
-odds ratio, -another
way to speak about effect size –it is the ratio of the odds f an event
occurring in 1 group to the one occurring in other boxed
Standard error
-SE is the standard of deviation divided
by the square root for the sample size ->more precise the mean is when sample is larger
-as the sample size gets larger, the
SW gets lower, but the mean/SD stays the same
Nominal 2+
Chi squared x2
Chi squared x2
T-test –t
Nominal 2+ levels
Anova – F
Spearman rho
Spearman rho
Logistic regression
February 10, 2009
Homework – choose test and discuss
it – 1 page
The grand scheme = what is the idea?
How does it relate to other concepts?
Defining what measures the concept
– what we’re actually looking at
Boundaries – i.e. what
is it not [despite being close to]?
-how you measure the operational definitions
Nominal – categorical
Factor, consistency definition
and what is not
Dimension – i.e. abuse
only sexual? Also physical? Etc…
Boundaries – i.e. “hitting
out of self defense is not abuse” – that is a boundary of abuse
– i.e. in strauss’ article, abuse is differentiated form emotional/sexual/self-defense/power
differentials, etc… this is the differentiating between what is in
Kurz vs.
Two articles discuss intimate partner
violence (IPV). They take very different approaches, based on a different
conceptualization of IPV. We will compare the two. This is in order
to show how different conceptualizations lead to very different results.
Therefore, IPV, or anything for that matter, could look very, very,
very different, based on your original mindset.
causal factors that explain IPV
Feminist perspective:
Gender inequality: financial, division
of labour, poverty->childcare/marriage/work$ ->IPV – persistent
Inequality + poverty ->family dynamics ->IPV
Sexual, emotional, economic abuse
Physical abuse
Contexts: inequality
Police records
Easy to administer, quick,
cheap, easy language to understand, frequency, can compare partners,
a lot of different acts are included
Recorded live
Lots of details
Gets a better feel for severity
than CTS
Availability –to compare
different samples
Does not take into account
context [motivation], self-reporting bias [social desirability] shame
and guilt from reporting physical harm
Weighting: giving same weight
to acts of various severities
Legal changes – i.e. change
in laws
Perceptual changes
Purpose of data collection
is different: if you collect it for legal reasons, it is different than
collecting it for research
Differences in definitions
Sampling bias – people
can report many times – so you might have the same guy appearing as
several participants
CTS is supposed to give higher scores
to more frequent assaults, rather than severity of assaults
Police reports has lots of violence
February 17, 2009
Study review
–part 1:
Identity 1 article in your group’s
topic and put it into a table
+ introduction about why you are looking
into this topic
+ Search strategy – and say if this
is one out of how many in the search
Statistical significance
P is the probability that
an observed difference is a coincidence due to an atypical sample
P<0.05 -5% means that
there is a 5% chance that if you repeat the study, there won’t be
any difference
Less than 5% probability
that findings [null hypothesis rejected] was due to chance alone
Rea difference in population
No difference in population
Research hypothesis
is supported – null is rejected
Correct decision power 1-b
Type 1 error: p=a
-thinking that there is a difference
when there is really not. Called alpha
5% threshold
Inconclusive –
fail to reject null hypothesis
Type 2 error: thinking
that there is no difference when there really is. Called beta
20% theshold
Correct decision
-population size effects the representativeness
of it. You need about a 1000 sample to get representativeness of Canadian
population – you need enough people to detect the significant effects
– to avoid a type 2 error
Effect size
ES = d /x mean of treatment group vs
control group / standard deviation of control
->effect size does not mean it is significant
can also speak about proportion’s effect size: proportion who improved
vs. those in the control who improved
thing with odds ratio
Clinical vs. statistical significance.
Clinical significance:
significance: “is this a large enough difference to be worthwhile”?
Statistical significance:
likelihood that the results could be by chance alone.
Tests of significance:
Chi test – proportions
Pearson – 2 continuous
Bivariate to multivariate
Bivariate – comparing two variables
– the dependent and independent
Multivariable – comepare more variables
You need to:
-cause and effect- temporal ->so
you can do a pre-test o see baseline
-rule out confounders
-theoretical plausibility
i.e. race and child abuse –
but it could also be income, which also correlates with abuse, and so
does race and children.
For example:
Reported child abuse
Poor whites
Poor African Americans
once you controlled for income, you see that it is the low income and
not race. It just happened that the African American were poorer
variables influencing dependent variables – you gotta account for
control and confounding variables
The more you add another variable,
and you see that the original correlation goes down, then you assume
that this is artifact
– you try to account for more variables – and really control for
the variables by seeing their relative contributions
->s, despite the control of the variables, the
interventions still had an effect – you basically want to see how
other [confounders] influence – to see if your independent variable
is the one which caused the effect]
Conditions for establishing causality:
Ruling confounding variables
Theoretical plausibility
Reading 2.1: Locating Measurement
Tools and Instruments for individuals and Couples
– Kevin Corcoran
-in sciences, progress followed improvements
in measurements. This was also true for behavioral sciences
Recent trend in assessment:
move from broad (broadband)
questionnaires (i.e. MMPI) to more specific questionnaires (narrowband)
– such a depression test
in the past, you needed
a specialist to decipher the questionnaire. Today, the clinician can
do it.
-the questionnaires give immediate
feedback --.and it quantifies observations
-such instruments help the clinician
get a direction.
-called Rapid Assessment Instrument
it uses the client’s self-observation
it can be made into a rating
can show the client’s
change, and help monitor it
a study run by a MSW student
shows that the study of questionnaires is ridiculous
to be useful/probative:
– how consistent is the test –score from 0.0 to 1.0. over 0.8 is
good and indicative of the test being consistent over time
-3 forms of consistency:
between items within the
instrument – called internal consistency
different forms of the same
instrument: alternative or parallel form reliability
over course of time –
test-retest reliability
validity –
accuracy of the test-
validity: does it actually measure the domain of the variable?
Concurrent validity:
compare it to a known, established criterion
Predictive validity:
see how much the test is able to predict future –i.e. how much does
SAT predict college success
Construct validity:
are the test’s results similar to things that are supposed to be similar,
yet different from things that are supposed to be different?
Utility: does
the test help improve the clinical work? Does it help plan, monitor,
evaluate? – too long of a test will reduce utility.
Self-referenced comparison:
compare the scores of the same person over time – to see the changes.
Norm-referenced comparison:
comparing one’s scores to general population – could be used to
push for therapy – to show that there is a real problem
is it suitable for the client’s emotionality and comprehension? i.e.
this measure is influenced by the guy’s ability to read. If client
is psychotic, he won’t get it either.
how the content of the questionnaire is accepted by the client – i.e.
some couples will have a hard time filling out a questionnaire about
sex – as this topic is not a topic which they accept talking about..
the test should be able to measure changes, but stable (reliable) when
no change occurred.
does the actual act of measuring cause the change that the questionnaire
shows? The book’s example: did increased sexual arousal occur because
of better relationship or because of the arousing questions in the questionnaire?
So when test is not nonreactive, you do not know why there is a change
in the scores.
Locating tools
people used to know tests
– or have to look them up for hours on end
other places include:
internet. Problems with
internet include
human error
websites change often
information overload
Exemplary instruments
Broadband: general
questionnaires that allow you to isolate the problem:
SQ: symptom questionnaire:
measures 4 dimensions:
And 4 parallel wellbeing
Somatic health
known groups : it differentiates
between healthy and non healthy populations. Also, it has high reliability
on test-retest and internal consistency
Narrowband: specific problem:
-the article lists questionnaires for
specific problems in the following issues:
A., Babbie, E.R. (2007). Research Methods for Social Work. Chapter 7.
6th Edition. Pacific Grove, CA: Brooks Cole
End stage of problem formulation-The
process of moving from vague ideas about what you want to study to being
able to recognize and measure what you want to study. This is the conceptualization
and operationalization process.
Conceptual Explication
variable isa concept we
are investigating
concept is a mental image
that symbolizes an idea
concepts that make up a
broader concept are called attributes
variables vary…ex. Male
is a concept that cannot vary and can take on only one value therefore
it is an attribute
concept is a variable if
1) comprises more than one attribute or value and thus is capable of
varying; and 2) is chosen for investigation in a research study
Developing a Proper Hypothesis
a variable that is postulated
to explain another variable is called the independent variable
the variable being explained
is the dependant variable
the statement that postulates
the relationship between the variables is the hypothesis
hypothesis should be value-free
and testable
Extraneous Variables
Extraneous variables
represent alternative explanations for relationships that are observed
between independent and dependent variables
Control variables
on the other hand check on the possibility of the relationship between
variables which may be misleading
Control variables can also
be called moderating variables and these can affect the
strength of direction of the relationship between the variables
Spurious relationship
is one that no longer exists when a third variable is controlled
Mediating Variables
Mediating variable
is the mechanism by which an independent variable affects a dependant
variable. If we think an intervention reduces recidivism among criminal
offenders by first increasing prisoner empathy for crime victims, then
our “level of empathy” would be the mediating variable
Can also be called
intervening variables
Moderating variables reside
outside the causal chain
Mediating variables reside
in the middle of the causal chain and have no impact on the independent
Types of Relationships between Variables
positive relationship:
the dependent variable increases as the independent variable increases
negative or inverse
relationship: the two variables move in opposite directions
curvilinear relationship:
the nature of the relationship changes at certain levels of the variables
(p. 156)
Operational Definitions
operational definition:
operations or indicators we will use to determine the quantity or attribute
we observe about a particular variable
nominal definitions
use a set of words to help us understand what a term means but do not
tell us what indicators to use in observing the term
Operationally Defining Anything that
most of the variables in
social work research don’t actually exist in the way that a rock exists
seldom have a single unambiguous
technical term of for mental
images (racism, homophobia, etc.) is conception
idiosyncratic conceptions
of the mental images is what we observe
direct observables:
those things we can observe rather simply and directly like color or
check marks made in a questionnaire
indirect observables:
if someone puts a check beside female in the questionnaire, we indirectly
observe the gender
all we can measure are the
direct and indirect observables that we think imply the conception (i.e.:
depression observed through certain behaviours that aren’t really
depression in themselves)
conceptualization is the
process through which we specify precisely what we will mean when we
use particular terms
Indicators and Dimensions
the end product of the conceptualization
process is the specification of a set of indicators of
what have in mind, markers that indicate presence or absence of the
concept we are studying
is a specifiable aspect or facet of a concept (ie.:economic dimension
and civil rights dimension of social justice)
Conceptions and Reality
the process of regarding
unreal things as real is called reification, and the reification of
concepts in day to day life is common
Creating Conceptual Order
clarification of concepts
is a continuing process in social research
refining meanings goes well
into the attempt to communicate findings to others in a final report
hermeneutic circle
is a cyclical process of ever-deeper understanding
The Influence of Operational Definitions
how we choose to operationally
define a variable can greatly influence our research findings
Gender and Cultural Bias in Operational
special care is needed to
avoid gender and cultural bias in choosing operational definitions
Operationalization Choices
although social workers
have wide variety of options available to the when it comes to measuring
a concept, operationalization does not proceed through a systematic
check list
Range of Variation
range of variation may not
need be fully measured at all times but will in light of the research
your decision on the range
of variation should also be governed by the expected distribution of
attributes among your subjects of study
range depends on whom you
are studying
Variations between the Extremes
* whenever you are not sure how much
detail to get in a measurement, get too much rather than too little
* it will always be possible to combine
precise attributes into more general categories, but it will never be
possible to separate out the variations that were lumped together during
observation and measurement
A Note on Dimensions
* It is essential to be clear about
which dimensions are important in your inquiry and direct the interviews
accordingly. Otherwise you may end up measuring one dimension when you
really wanted to know about another one
Examples of Operationalization in Social
Three broad categories: self-reports,
direct observation and examination of available records
Existing Scales
existing scales are popular
way to operationally define variables
most thorough procedure
is through a literature review
reference volumes also list
and describe many existing measures
existing self-report scales
may be practical, but they may not be the best way to operationalize
a particular variable in a particular study
issues when choosing an
existing scale: how lengthy is the scale? Will it be to difficult for
the participants to complete? Will it be sensitive to small changes
over relatively short periods?
Two critical issues: reliability
and validity (usually reference literature will report on reliability
and validity of a scale)
You may want to go beyond
the reference sourcebook that gives an overview of an existing scal
and examine firsthand the studies that reported the development and
testing of the scale
Operationalization Goes on and on
measure a given variable
in several different ways in a research
examine alternative operational
definitions during your analysis. You will have several single indicators
to choose from and many ways to create different composite measures
Qualitative Pespective on Operational
researchers conducting purely
qualitative studies do not restrict their observations to predetermined
operational indicators
problem to defining variables
in advance threefold 1) we may not know in advance what the salient
variables are 2) limitations in our understanding of the variables we
think are important may keep us from anticipating the best way to operationally
define those variables 3) even the best operational definitions are
necessarily superficial because they are specified only in terms of
observable indicators
A., Babbie, E.R. (2007). Research Methods for Social Work. Chapter 8.
6th Edition. Pacific Grove, CA: Brooks Cole
Sources of Measurement Error
Measurement error occurs
when we obtain data that do not accurately portray the concept we are
attempting to measure.
Systematic Error
Occurs when the information
we collect consistently reflects a false picture of the concept we seek
to measure, either because of the way we collect the data or the dynamics
of those who are providing the data
Most common way our measures
systematically measure something other than what we think they do, biases
are involved in the data collection
Acquiescent response
set: agreeing or disagreeing with most or all statements regardless
of their content
Social desirability
bias: tendancy of people to say or do things that will make
them or their reference group look good.
Cultural bias
Random Error
have no consistent pattern
of effects
do not bias our measures,
they make them inconsistent from one to the next
things we are measuring
do not change over time, but our measures keep coming up with different
Errors in Alternate Forms of Measurement
written self-reports:
problem in measurement is that peoples words don’t always match their
social desirability adds to problems already noted for written self-reports.
Different interviewers can also cause errors due to inconsistencies
among them, or their characteristics that may influence how respondents
behavioural observation: can be highly vulnerable to social
desirability; observers themselves might be biased to perceive behaviors
that support their study’s hypothesis. Also there may be errors in
recording and observations
Examining available
records: possibility that some practitioners exaggerate their
records in the belief that someone might use those records to evaluate
their performance (systematic error). Perhaps they aren’t careful
in documenting their tasks (random error)
Avoiding Measurement Error
it is virtually impossible
to avoid all possible sources of measurement error
must try to focus on minimizing
any major measurement errors that would destroy the credibility and
utility of your findings and that you assess how well your measures
appear to have kept those errors from exceeding a reasonable level
obtain collegial feedback
to help spot biases or ambiguities
make sure any team members
involved in the observation are consistent in how they perform their
unobstrusive observation
is used to minimize social desirability
triangulation deals with
systemic error by using several different research methods to collect
whether a particular technique,
applied repeatedly to the same object, would yield the same result each
does not ensure accuracy
to avoid errors, use measures
that have proven their reliability in previous research
clarity, specificity, training
and practice will avoid a great deal of unreliability and grief
Interobserver and Interrater Reliability
degree of agreement or consistency
between or among observers or raters
to assess interrater
reliability you would train two raters; then you would have
them view the same videotapes and independently rate their responses.
Some researchers argue that 70% and up of agreement would be acceptable
random error
you might want to calculate
the correlation between two sets of ratings instead of percentages
Test-Retest Reliability
term for assessing a measure’s
stability over time
administer the same measurement
instrument to the same individuals on two separate occasions. If the
correlation between the two sets of responses to the instrument is above
.70 or .80 then the instrument may be deemed to have acceptable stability.
In assessing test-retest
reliability, you must be certain that both tests occur under
identical conditions, and the time lapse between test and retest should
be long enough that the individuals will not recall their answers
Internal Consistency Reliability
to assess whether the various
items that make up the measure are internally consistent
Internal consistency
reliability assumes that the instrument contains multiple items,
each of which is scored and combined with the scores of the other items
to produce an overall score
Assess the correlation of
the scores on each item with the scores on the rest of the items
Split-halves method:
assess the correlations of sub-scores among different subsets of half
of the items
Most common and most practical
method for assessing reliability
Parallel-forms reliability:
constructing a second measuring instrument that is thought to be equivalent
to the first
Coefficient alpha:
equals the average of all the correlations (calculates the total subscore
of each possible split half for each subject, and then calculates the
correlations of all possible pairs of split half subscores)
refers to the extent to
which an empirical measure adequately reflects the real meaning of the
concept under consideration
Face Validity
* face validity: when
measure appears to measure what the researcher intended
Content Validity
refers to the degree to
which a measure covers the range of meanings included within the concept
to ascertain whether the
measure indeed measures what it’s intende to measure, we need empirical
Criterion-Related Validity
select an external criterion
that we believe is another indicator or measure of the same variable
that our instrument intends to measure
two subtypes: predictive
validity-its ability to predict a criterion that will occur
in the future. Concurrent validity-its correspondence
to a criterion that is known concurrently
measure is assessed according
to its ability to differentiate between known groups is known
group validity (another subtype)
ability to detect subtle
differences is termed the sensitivity
we need to be mindful of
the issue of sensitivity and of whether the instrument’s know groups
validity is based on groups whose differences are more extreme than
the differences we expect to detect in our own study
Construct Validity
the way to a measure relates
to other variables within a system of theoretical relationships
construct validation can
involve assessing whether the measure has both convergent validity and
discriminant validity
convergent validity:
when its results correspond to the results of other methods of measuring
the same construct
discriminant validity:
when results do not correspond as highly with measures of other constructs
as they do with other measures of the same construct
Factorial Validity
refers to how many different
constructs a scale measures and whether the number of constructs and
the items that make up those constructs are what the researcher intends
to assess your scale’s
factorial validity, you would use a statistical procedure called factor
if your results showed that
you had three factors, and that the items making up each factor were
for the most part the ones you intended to correlate most highly with
each other, then your scale would have factorial validity
An illustration of Reliable and Valid
Measurement in SW: The Clinical Measurement Package
9 short, standardized scales
that were designed for repeated use by clinical social workers to assess
client problems and monitor and evaluate progress in treatment (Hudson,
See pages 189 and 192 for
more text on this package
Relationship Between Reliability and
reliability does not ensure
you can’t have validity
without also having reliability
Neuman, W.L., (2002). Chapter
5. The Literature Review and Ethical Concerns (pp 95-136). In W. Laurence
Neuman. Social Research Methods: Qualitative and quantitative approaches.
5th Edition.
Allyn & Bacon.
of a literature review:
Knowledge accumulates and that people
learn from and build on what others have done. Scientific research is
a collective effort of many researchers who share their results with
one another and who pursue knowledge as a community.
4 Goals of literature reviews
To demonstrate familiarity
with a body of knowledge and establish credibility. All reviews have
this in common.
To show the path of prior
research and how a current project is linked to it.
To integrate and summarize
what is known in an area
To learn from others and
stimulate new ideas.
6 types of literature reviews
1. Self-study review: increases reader’s
confidence (combines first part of goal 1 and 4).
2. Context review: places a specific
project in a big picture. Introduces rest of the research-establishes
significance of research question.
3. Historical review: traces the development
of an issue over time.
4. Theoretical review: compares how
different theories address an issue.
5. Integrative review: summarizes what
is known at a point in time.
6. Methodological review: points out
how design, samples, measures account for differences.
Meta-analysis: technique used in an
integrative or methodological review.
gathering details from a
large number of research projects and statistically analyzing them
doesn’t have to use stats
to summarize findings
Where to find research literature
mass market publications:
source on current events but do not provide material in a form needed
for a lit. review
popularized social science
magazines and prof. publications-at best can supplement to other sources
opinion magazines-arena
for intellectual debate as opposed to presenting findings.
Scholarly journals
primary source of periodical
for lit review-usually found in college or university
can be specialized
no “seal of approval”
so reader must use judgement
see pages 205-209 for more
Citation formats
key to locating articles
ASR –american sociological
Chicago manual of style
difficult to distinguish
books on research reports from other books
some types of social research
more likely to be in book form
detailed theoretical or
philosophical discussions usually appear in books
Three kinds of books contain collections
of articles/research reports
Readers: designed for teaching
Collections: designed for
scholars-may gather journal articles, etc.
Annual research books: hybrids
between scholarly journals and collections of articles.
Original PhD research project
Specialized index lists
available through university library
Government documents
Sponsor studies and publish
Must use specialized lists
in libraries to find them
Policy Reports and Presented Papers
A thorough lit review examines
these two sources
Difficult to obtain
Found in libraries or by
writing to institute or center for a list of reports.
How to conduct a systematic lit review
Define a topic
well-focused research question
a context review will be
broader than research question
sometimes research question
finalized after the thorough review is finished.
Design a research
plan a search strategy (type,
extensiveness, material to include, etc.)
set parameters (time line,
number of reports to look at, etc.)
learn how to take notes
and how to record citations
develop a schedule
Locate research reports
general rule: use multiple
strategies (articles in scholarly journals, books, dissertations, gov.
documents, policy reports, etc.)
Taking notes
have one file for sources
that will be divided in two: those you have acquired and those that
are potential leads
content file-remember to
record specific quotes and the specific page number associated to it
Organizing notes
organize by theme
create a mental map of how
they fit together
context review-organising
around a specific research question
historical review-by theme
and date of publication
integrative review-core
common findings in a field
methodological-by topic
and with within topic by design or method
theoretical review-by theories
and major thinkers being examined
Writing the review
be prepared to do a lot
of rewriting
keep your purpose in mind
read source material critically
What does a good review look like?
not just summaries of findings;
it fails to communicate purpose
organize common findings
and or arguments together
approach most important
ideas first
note discrepancies and weakness
in findings
easy, fast, cheap-source
material from anywhere, easy to access, always open, easy to store found
“links” page helps to
connect to other potential sources
Democraticization of access
to information
wide range of information
no quality control
much of the most important
resource material for social research not available on the net
finding sources can be time
consuming and difficult
sources can be unstable
and difficult to document-specific addresses may change or cease to
Ethics in Social Research
sound ethical practie needs
to be integrated into study design
moral and professional obligation
to be ethical
ethical issues may involve
finding a balance between scientific knowledge and rights of those being
potential benefits of study
must be weighed against potential human costs
Individual researcher
moral code=best defence
against unethical behaviour
reflect on research actions-consult
your conscience
Why be ethical?
most unethical behaviour
results from lack of awareness, and pressures to take short cuts
vague descriptions of ethical
standards make the odds of getting caught being unethical pretty small
ethical concerns usually
internalized during professional socialization
you can be “legal” without
being ethical
Scientific misconduct
includes research fraud,
plagiarism, falsification
Origins of research participation protection:
Gross violation of human rights in
name of science during Nazi Germany
Secure prior voluntary consent
when possible
Never cause unnecessary
or irreversible harm to subjects
never unnecessarily humiliate,
degrade, or release specific individuals information that was collected
for research
Examples of research and ethical issues
Laud: Tearoom Study
deceipt and absence of consent
of participants engaging in casual homosexual encounters in a public
washroom. Followed participants to their vehicules, found their home
addresses, and then went to interview them in their homes on a supposedly
unrelated topic.
Milgram: Obedience study
social pressure to obey
emotional stress caused
by “teacher” who sent electric shocks to students that gave the
wrong answers to his questions. Participants were unaware they were
not actually sending an electric charge. The recipient actually faked
the shocking.
Zimbardo: Role play of prisoners/guards
subjects were too caught
up in the roles and the study had to be stopped. The prisoners became
passive and disorganized while the guards became aggressive, arbitrary,
and dehumanizing. The participants were students.
acceptable ONLY if there
is a specific methodological reason for it
most often it happens in
experimental research
misrepresentation of self
and actions or true intention: only if not doing so would skew results
by participants modifying their behaviors if they are fully aware
must still have informed
debrief participants afterwards
Covert observation: only if it is essential
combined to a gradual disclosure (exceptions possible, like cults, extremist
political sects, etc.)
Informed Consent
principle: never coerce
someone into participating, it must be voluntary
permission of subjects and
info on what they are asked to participate in
not a requirement of the
some may agree to participate
with out really being able to give true consent
legal guardians must sign
for “incompetent” subjects
Creating new inequalities
being denied a benefit or
service as a result of participation in a study (ie.: participants in
a control group that are denied services/treatments)
Privacy: probe into beliefs, background,
and behaviours that reveal intimate details
Anonymity: protection privacy by not
disclosing a participants identity after info is gathered. Two forms
of protection:
Anonymity: people remain
nameless usually code numbers given to identify them
Confidentiality: holds in
confidence the identities of those who give information
May protect subjects from
physical harm
Protection of research participants
Us dept of Health and Human
services office: protection from research risks
National Research Act
National Commission for
the protection of Human Subjects in Biomed and behavioural research
Universities and research
institutions are the safe guard of ethical standards
Ethics committees, peer
review within the profession
Longo, P. (2004). Chapter 88.
Application of Logic Models in Rural Program Development (pp796-804).
In Roberts, A, Yeager, K. (eds). Evidence-Based Practice Manual: Research
Outcomes, Measures in Health and Human Sciences. NY: Oxford University
On going
Performance Measurement and Management Model (OPM&M)
After 5 years as a grant-funded project
focused on implementing policy changes:
became a state-wide technical
assistance contract
primary aim to enhance administrative
infrastructure and capacity
Tools: Performance blueprint-non
linear logic model
instrumental in documenting
identifying needs
learning from service delivery
leads to performance measurement
OPM&M: 4 phases
visioning and revisioning
(ongoing phase) community assets assessed in relation to federal and
state outcome expenditure
Performing (ongoing phase)
performance of strategies and strategists are focal point
Measuring (ongoing phase)
focus on efficacy and effectiveness of strategies, services and activities
Learning (ongoing phase)
performance measurement and evaluation used to adjust program and services.
OPM&M: rooted in theory of change-based
evaluation traditions
Performance blueprint: added benefit
of looking at the sociocultural and political variables
requires identification
and inclusion of direct and indirect beneficiaries
also direct and indirect
service providers
incorporates Friedman’s
4 quadrant approach
offers a transparent strategy
for identifying and prioritizing 4 types of performance measures (effort
related, effect related, quantity, quality)
Individual components of the blueprint
Inputs: the resources needed
to achieve desired outcomes
Activities, Strategies,
Services: “effort” part of program (operational elements)
Providers, Vendors, Collaborators:
paid and unpaid personnel
Clients and Customers
Outputs (most important
distinction of the blueprint) specific impacts that the service providers
actually have on the clients or customers
Outcomes: using the 4 quadrants
4 Characteristics to operationalize
performance measurement in the OPM&M approach
1) systematically collecting and strategically
using performance information…
2) …on an ongoing basis…
3) in an intra and interorganizational
4)…for a variety of internal and
external purposes.
Longo, P. (2004). Chapter 89.
Amplifying Performance Measurement Literacy; Reflections from the Appalachian
Partnership for Welfare Reform (pp.804-812). In Roberts, A, Yeager,
K. (eds). Evidence-Based Practice Manual: Research Outcomes, Measures
in Health and Human Sciences. NY: Oxford University Press.
Performance Blueprint:
is an enhanced, non linear logice model
marshal important program-related
promotes stakeholder involvement
engender a more productive
and collegial type of collaboration
Two distinctions of the PB:
Two outputs
Those associated with the
“performance” of the program personnel
Those associated with “performance”
of the program’s clients and/or customers once they have come in contact
with the program.
Two outcomes
service delivery outcomes
community outcomes
the community in which the
clients and/or customers live
the service providing organization’s
capacity to manage its resources and to improve service delivery on
a continual and accountable basis
6 Step sequence for navigating through
the Performance Blueprint
1. Organize, collect, and chart outcomes
2. Identify targeted populations.
3. Define the results that clients
or customers can expect in terms of output “effects.”
4. Determine which activities, strategies,
and services are needed to achieve step 3 and identify who will initiate,
execute, provide, and/or monitor these efforts.
5. Define and set the performance measures
to assess the “effects” and “efforts” in relation to the chosen
activities, strategies, and services.
6. Use available resources and find
additional needed resources.
March 3, 2009
Levels of design
State of knowledge:
literature review
– what are the issues
who/how often/when/what
– causality – risk and protective factors, sequel and treatment
– no control over IV – when you can’t control the independent
variable – i.e. can’t control h.m. violence
Cross-sectional – one
Case-control retrospective
– compare those exposed to violence vs. those who did not
Longitudinal perspective
– follow over time
– control over IV
Randomized controlled trial
Quasi experimental design
– you have control over Independent variable but it is not randomized
Experimental design
One-group posttest
X – o1
– you donno what the baseline was so you do not know if and how much
improvement that there was
One-group pretest-posttest
O1 X O2
History: other factors,
other than the experiment infl.
-->whenever there is a problem w/ testing, there usually is a problem
w/ testing
Regression: you
don’t now whether it is extreme
Interaction: still
can infl. only this group w/b they are unique.
Comparison group post test
X o1
History: other factors,
other than the experiment infl.
Testing: no 2 measurements
-->can’t compare the two
Regression- can’t
know it they’re extreme
Selection: maybe
they were diff. beforehand
Maturation: perhaps
people left the course (manipulation) b/f
Interaction: perhaps
this group is unique
Comparison group pretest-posttest
O1 X o2
O1 o2
groups were not equivalent – i.e. one group cannot read. – so you
gotta match the 2 groups up
Randomized control trial
– classical experiment
R->o1 x o2
->o1 o2
Interrupted time series
O1 o2 o3 x o4 o5 06
Solomon 4 group
R O1 X O3
O2 O4
X O5
Components of
experimental design
Population + selection =sample
Sample ->assignment
X treatment and comparison
[control] group -> observation 1 [before] –o 2 [after]
-there is a difference between random
selection and random assignment
Read gibbs –quality of study rating
form - 3.2
your group article and fill out the form mentioned in gibbs
+study summary practical implications,
etc…. 500 word/ bring hengeller article for next class
Gibbs: How to evaluate
studies to guide practice systematically
Filter approach
to evaluating evidence
-problem: many studies some stronger
and some weaker
sponge approach:
use them all and integrate strengths from all studies
filter approach:
discarding less substantive arguments behind while accepting the better
-problem with sponge approach: you
accept gold and garbage alike – “reward everything fallacy”
Quality of Study Rating Form
-gives objective criteria of filtering
evidence and choosing the practice methods that are supported by the
strongest empirical data.
Wacko study: split students
into 2 groups – one consisted of those who are for death penalty and
the other being those against death penalty. Both groups were given
articles for and against death penalty, yet no one really changed their
mind and kept finding more reasons to support their own opinions. In
other words: bummer. In more other words: people use what they agree
with, without thinking about the issue at hand, and despite new data.
In other words #3: people are biased –and so are social workers. They
even prefer positive than negative results in studies. They are “skewed
Quality of Study Rating Form
quick systematic and reliable
guide for busy practitioners who want to know practical implication
of a report
summarize predominant features
of an evaluation study
help those with limited
exposure to research methods to rate an index of study quality reliability
help those with limited
exposure to concepts of meta-analysis compute 2 simple indices of a
treatment method’s effect size
compare and synthesize these
indices in order to select a method
waste our time
-the point of the QSRF is to help us
compare various studies that evaluate interventions, in order to help
us chose the best one
-based on principles of meta-analysis
– or “data synthesis” – analyzing data from large chunks of
data [whole studies and not individual participants] with the purpose
of integrating it.
Quality of study rating form
– actual form
client type[s]
treatment method[s]
outcome measure to compute
outcome measure to compute
source [APA format]
criteria for rating study
clear definition of treatment
who – 4 points
what – 4 points
where - 4 points
when – 4 points
why – 4 points
subjects randomly assigned
to treatment or control – 20 points
subjects randomly selected
– 4 points
non-treatment control group
– 4 points
number of subjects on smallest
treatment group is larger than 20 – 4 points
outcome measure has face
validity – 4 points
treatment outcome was checked
for reliability – 5 points
reliability measure has
greater than .70 or percept of rater agreement is greater than 70% -5
outcome of treatment was
measured after treatment was completed – 4 points
test of statistical significance
was made and p<0.05 – 20 points
follow-up greater than
75% -10 points
total quality points [TQP]
Effect Size (ES1) = mean
of treatment –mean of control or alternate treatment/SD of control
or alternative treatment
Effect Size (ES2) = proportion
improved in treatment –proportion improved in control or alternate
March 10, 2009
Journal watch in Study analysis: max
600: comment on implication to practice, and methodology
-study analysis –march 24th
-study review – march 31st.
-for next week –read henggler –
the second one
You can combine effects size
Camel collaboration
– way to combine [sometimes very] different results
March 17, 2009
Study analysis
-demographic: details!!
-exact # in groups
-allocation procedure
-methodological critique i.e.
some do not have much followup data
-good to have some detail in the finding
Formulation: what is your
Search strategies
New set of tables/
3-4 page summary – not
a restatement! But themes across studies. i.e. strengths of studies
6/7 studies had a comparison groups, 3 had randomized, etc….
Study analysis:
-quality of study rating form + your
-research archive ->look for summaries related to your topic- 500-600
words – summary of findings, design, methodology – see why it works
Observational design:
when you have no control over independent variable because of ethical
or other reasons
cross sectional design
single data collection point
no causality
Current events – very
common – like surveys
Cross sectional with
access to data in he past
Problem: Retrospective
bias: memory changes
People sad in the present
remember sad things
Longitudinal (prospective)
Multiple data collection
O ->e ->o->o
-e ->o->o
March 24, 2009
Randomized control trial -RCT– is
it feasible for the program that we described in logic model ->i.e.
methodologically/ethically +think of an outcome measure
Observational studies
- when you can’t manipulate the independent variable.
plausibility – in first few paragraphs of the article
– is it coincidence or statistically significant?
Temporal order
Rule out confounders
– i.e. systemic biases, ect..
Sampling frame
Sampling influences
internal validity – i.e. random. ->allows you to see that the experiment really
checks what it is supposed to
->cross-sectional has lower internal validity.
So you need to make sure that the sample is representative of the larger
population [external validity]
you need to clarify clearly your population ->sample frame = what the sample precisely represents
[and what it does not] – so you know what you can and cannot generalize
-frame: the list from
which you are sampling – i.e. phonebooks etc…
-population – the whole
group which you want to generalize to
-note: random is not disorganized.
You randomly sample in an organized way!!!
Probability sampling
Simple random sample
take several cases
Systemic sample:
take every 50th of the list – not random but close, since
there is no reason to believe that this is skewed
Stratified sample:
randomly sample within a stratification similar to the general population
Cluster: i.e.
randomly sample i.e. a neighborhood… you need enough neighborhoods
clusters to be more representative. Weaker design since randomness is
less so!
Non probabilistic sampling
Purpose -
good for thematic analysis
Quota –
you want x of this and x of that
Snowball –
helps when people aren’t going to come to the study to begin with.
Problem: volunteer bias, no representation – usually just get a certain
-less representative samples is ok,
as long as you are not generalizing
Sample size:
-accuracy of estimates increase with
sample size. If you do not have a lot of variability, then you can use
a less large sample
-error: variability / sample size SE=SD/SQRT(N)
Things you can
statistically do:
=confidence interval of proportions
– i.e. see how reliable your sampling was – i.e. am 95% sure that
the population is 55+-2 (between 53 and 57) . the bigger n you get,
the higher the confidence interval, and then the less the range of estimate
is! [But the more you increase, the less added benefit – between sample
of 10 and 30 is important. Between 200 and 210 is less significant].
Also, you can use same sample size for any population larger than 20-30,000
regardless of their size. Proportions is irrelevant beyond that size
of population
=sample size for estimating proportions
2 sample size questions
Do you have enough sampling
power to tell the difference between the groups? Minimum 20 [related
to gibbs!!!] = power
Do you have enough people
in the study to be confident that the sample is representative of the
population =representativeness
Response rate
Sample frame error – you
lose representativeness
Response / refusal rates
– you lose representativeness/power
Item completion rates –
you also lose representativeness/power
Methods of data collection
Mail – looks more intimidating
– looks longer than it takes!
Face-to-face – downside
– has to be clustered! – easier to do complex interviews – i.e.
answer something if you answered yes to something. Easier to do open-ended
Opening sentence describing the abstract
of the paper
-target population
-methods – who participated, sample
size, random assignment, design,
-measures used – and follow up times
-is it statistically significant? And
is there also clinical significance? Are they actually feeling better?
-conclusion: is it more effective?
“promising results”
-you have to say that X is more effective
than Y not just X is effective.
Service as usual is not the same is
not the same as no treatment
Long follow up
-comparison group – randomized control
-large sample size
-more than one measure
-random assignment= internal validity
–and random selection speaks about recruitment [ransom selection is
rare and you cannot assume that it happened)
-co-intervention – may have reduced
effect size
-lack of significance is a sign of
what/? Measures not sensitive enough, not enough power [i.e. sample
seems big but other intervention is effecting the control group, so
they have also improved somewhat]
-because of the complexity of this
population, you may need an even larger population
-only one site of the study = less
are you not doing harm?
[i.e. critics of harm reduction approaches]
how do you deal with those
in the control group – i.e. those not getting the treatment
i.e. if you fund 1 study
– you do not fund another