Quantitative Research Methods - SWRK 643


Research Methods –Quantitative – SWRK 643

Class – January 6, 2009

Do social workers really need to understand research?


How do we know if what we are doing really helps?


Why quantify?

  1. Qualitative research is used to explore and understand
  2. Quantitative research is used to:
    1. Establish causality
    2. Generalize
  3. Measurement is however fundamentally reductionistic and must maintain an iterative loop.
    1. Explore
    2. Understand
    3. Hypothesize
    4. Test
    5. generalize

course outline

  1. search and read – i.e. know how to read descriptive statistics
  2. causality
  3. measurement
  4. numeracy and basic statistics
  5. experimental designs
  6. observational designs
  7. ethics
  8. systemic review and meta-analysis

searching and reading

Boolean search – i.e. “usage of and or”s in your search

  1. who –i.e. children
  2. what – sexual abuse
  3. why - prevention

reading tips

  1. read title and abstract: is it a study/review/theoretical/policy. It is meant to identify design, sample, major concepts measured, and major findings
  2. scan through heading->skim through the intro and lit. review [first and last sentences of each paragraph] – and last sentence of literature review [it usually describes objectives]– compare abstract and title.
  3. Sample/design usually in the first table
  4. Try to visualize the sample from a clinical perspective – how do participants compare to people you have worked with? – psychology researchers know the population in a very limited spectrum
  5. Imagine yourself recruiting the study participants – who is missing? Who dropped out? Any significant concerns about sampling bias? ->social services tend to refer those who they think will be most successful. This is a bias!!!
  6. Go to tables – they show you all the variables measures
  7. Go to the text describing the measures – try to visualize them. Observation? Worker report? Self report? How many questions or itemds? Do these make sense
  1. Compare results to your knowledge
  2. Discussion section = first paragraph summarize finding
  3. Identify strengths and 1 or 2 limitations [this skill improves with practicepractice]->i.e. judge what is more relevant

Study summary table

-a table to summarize an article – generally not more than a page

Columns include:

  1. Author
  2. Sample and design
  3. Measures
  4. Major findings
  5. Comments

Gibbs, L. (1991). Chapter 4. Does a Method cause change? (pp 61-81). In Leonard Gibbs. Scientific reasoning for social workers: Bridging the Gap between research and Practice. Indiana: Prentice Hall
Often, people use cues-to-causality: shortcuts or rules of thumbs to infer causality. They fail to rule out alternate causes and confounding effects of various other variables

Kinds of cues-to-causalities

  1. Casual chain strength: i.e. questioning how reasonable the link made. For example, looking at someone who looks disoriented: we don’t know exactly what this person is feeling or why. We can reasonably assume that this is because her Alzheimer’s – but it could be anything else too.
  2. Contiguity: i.e. when two things come together, they are assumed to be associated. Nevertheless, there could be other factors causing the change. i.e. when a sucky depression was lifted with therapy, we don’t know if it was the therapy or something else, such as the cause of the depression ceased. The two problems with relying on contiguity logic are:
    1. We do not know whether it was the intervention or another variable which caused the change
    2. Effects of intervention are not always immideately seen
  3. Temporal order: the assumption that if one thing came before the other thing, then one caused the other. This is problematic since it is necessary but not sufficient grounds to infer causality. Yes, therapy must come before the depression is alleviated, but just because it came first, then it could have helped, but we still need to see that it was the therapy and not something else. In Latin, this logic is called post hoc ergo propter hoc. But after all, there are accounts of spontaneous recovery
  4. Covariation: when things occur together, they are assumed to be causally linked. This is not necessarily true because of there could be confounding variable. An interesting study found that people misestimate the non-events or what happens in cases when no treatment or no success have been done
  5. Ruling out alternate explanations: as mentioned, in other cases, not considering alternative or confounding causes of the results of the study

Confounding/alternative causes include:

  • Method:
    • Client-by-treatment interaction: the results were caused by the specific interaction between the specific client and the specific intervention
    • Participation in other treatments concurrently
  • Client
    • Placebo effect: a similar effect is called the Hawthorne effect: people change behavior, or even recover simply because of the researcher/therapist’s expectations
    • Maturation: a natural process of change over time within the participants. i.e. (for the sake of example): let’s say that people get less depressed with time, then the study of effectiveness of therapeutic intervention will be confounded by a natural healing process
    • Client selection
      • Self-selection: the research group differed in some way from the general population
      • Purposeful selection by researcher: you just skewed your results
      • Regression towards the mean: people who got high readings on a measure were probably on a peak, and tend to revert back to more normal readings on future readings. So it was not the studied intervention, but a high first reading and a more normal lower second reading on measurements of X, which happened because the guy has a normal wavy reading, or the measurements are not accurate or reliable, or whatever…
    • Mortality: people quit studies/die/etc… this could skew the results
  • Control/comparison group
    • Mortality
    • Resentful demoralization of control or comparison group: the participants might get grumpy because they are getting different treatment –or a “lesser” treatment
    • Compensatory rivalry: people in the control group might try harder at the task to compensate for their lack of treatment within the experiment
    • Spontaneous recovery: sometimes seen as a special case of maturity
  • The social worker/researcher
    • Reliability of treatment implementation: how the intervention which is being studied is implemented influences the outcome, beyond the pure intervention which is being studied. For example, a student of social work will implement the intervention differently than an experienced social worker
  • Outcome measure
    • Inadequate preoperational explication of constructs: gotta define your criteria and methods accurately
    • Reliability: gotta make sure that your measurement tool is accurate and reliable – and not constantly fluctuating
    • Testing: the mere taking of a test could make a difference in the results. So it is not the intervention but the testing which made the changes in the participants
    • Instrumentation: not the intervention but the mere measurement instruments used in the study which made the changes in the participants
  • Setting
    • History: if anything external to the experiment took place between the experiment sessions
Chapel, T. (2004). Chapter 70. Constructing and using logical models in program evaluations (pp. 636-647). In Roberts, A. Yeager, K. (eds.) evidence-based practice manual: research and outcome measures in health and human sciences. New York, Oxford University Press
Logic models: a way to chart graphically the logic of a study. You need to show:
  • Relationship: between components and effects/outcomes of your study
  • Intended: that the results are expected from your study manipulation, or your intervention

A logic model should show:

  • Inputs: the resources which you need for your program
  • Activities: what takes place in the program being evaluated
  • Outputs: the product of your activities
  • Effects: also called outcomes or impacts – the result from your activities and outputs

Developing a simple logic model:

  1. Develop a list of activities and intended effects. The following approaches could help
    1. Review of information on the program
    2. Work backwards from effects – i.e. asking “how to”
    3. Work forwards from activities – i.e. asking “so what?”/”then what happens?”
  2. Subdivide the lists to display any time sequence: i.e. a column for activities and another one for effects/outcomes
  3. [optional]: add any inputs and outputs
  4. Draw arrows to depict causal relationships
  5. Clean the logical model: i.e. think through and revise when you have to figure out the logic more clearly

Example

Inputs Early activities Later activities Outputs Early outcomes Later outcomes
Funds Outreach A Pool of # of participants intervention Better quality of life
Training staff for screening Screening B
Relationship with orgs. Identification of participants C
Legal authority


-you can also make the above into a flowchart [at no extra cost]

->those logic model flowcharts help map out the implied causality and where logic/evidence gaps exist, and which one of those gaps are most urgent in the flow of the experiment logic.

->gotta make sure that all parties involved in your project agree with the logic of the model that you are using for your study design

-determining the correct evaluation focus depends on the specific case, but some general questions include: who will use the evaluation of the project, and for which purpose? Which effects will need to be measured then in order to satisfy the needs of the readers?

Implementation fidelity include:

  1. Transfers of accountability – when program needs an external organization to take action in order for study to continue
  2. Dosage: is the research/activity too lax or demanding on a critical variable?
  3. Access – an increased for a certain product due to the research
  4. Self-competency



Next 3 weeks – numeracy - numerical concepts, average, procentages, standard deviatins

3 readings for next week:


Today, we’re continuing to speak about causality

Independent [classification/treatment/cause] ->dependent [response/outcome/effect]

->causality is not always clear – i.e. in cross-sectional studying

-i.e. if you study something if might have impact or be influence by various factors

So, you might have:

[effects influencing the causes, i.e. personal dispositions] ->Cause[s] ->moderating variables ->effect[s] +[and effects of effects – i.e. changes in arrest policy]

-to measure how strong the cause-effect connection is getting complicated then you look at various causes and moderating variables. You probably need a much bigger sample

Control variables and confounders influence the relationship between independent and dependent variables

Conditions for causality

  1. Association
  2. Temporal order
  3. Specificity/non-spuriousness and dose response, contiguity, consistence, etc.
  4. Theoretical plausibility

Association

Pretest->post test: allows us to see how things were before the intervention

->of course, you also need a control group

-statistical significant – a measure that shows what the chances of the results are by chance or not

-proportion chart – asks how much yu get the charted results by chance

3 ways to see association

->need to measure if they are statistically significant

Temporal order

-asking which came first – the temporal order

How do you measure temporal order?



Specificity / non-spuriousness

Threats to validity [i.e. if there are alternative explanations]
  1. Confounding
  2. Sampling
  3. Measurement

Confounding

  1. Maturation – and: spontaneous recovery – you need a similar comparison group.
  2. Reactive effects: such as placebo effect or Hawthorne effect [people who do well just because they are in a certain program
  3. History – spurious association [i.e. people do not break hips because of their winter clothes, but because of their winter]; co-interventions


Sampling

  • Selection – volunteer bias, healthy worker bias [i.e. more people call in sick on Monday], purposeful therapist selective [i.e. we refer the more successful clients referred to the study]
  • Attrition
  • Ecological fallacy – mixing up levels of variables, i.e. studying crime in neighborhood, and one neighborhood which has a lot of immigrants move in – you cannot assume that immigrants are criminal – you do not know if they moved into crime neighborhoods, etc…

Measurement

  • Experimenter expectation / observer bias
  • Reliability – instrumentation [is measure accurate and consistent?], calibration testing
  • Validity -does the instrument measure the right thing?
  • Regression towards the mean

Theoretical plausibility – does the causality make sense

Social work model

-people ->problem

->so you need to do a needs assessment and see:


Theory – a way of explaining

-hypothesis – you are testing something. Versus assumptions are generally nor being tested

-cause and effect is empirically studied, without a third intervening factor

Causal theory

Person/situation ->intervening variable [which creates the problem] ->problem

->program will bypass the intervening variable to avoid the problem

Intervention hypothesis:

Action hypothesis: the action will solve the problem

->you need the right program to solve the problem

Why a program might fail:



Implied program theory:

Serious juvenile effects ->family relations ->youth social competence ->re-arrest and incarceration

Family therapy will try to work family relations and social competence

Homework

3 readings: Norman and streiner – first 2 chapters – read/ olds study of nurses

Make a logic model – look at one of the interventions in your program - look for ST and LT outcomes

-explain what you typically do – meet with patients/staff. This is what you expect st/lt outcomes in your practice.

Norman, G., & Streiner, D. (2003). PDQ Statistics (3rd edition). Hamilton, ON: B.C. Decker. Chapter 1
Variables:

-a variable is something that could be measured, or manipulated in a study

In a study design:

  • Independent variable: the thing that the researcher changes. i.e. one group gets the medication while the other does not. So the independent variable would be the treatment given
  • Dependant variable –is the variable that is not manipulated, but is measured to see the effect of the independent variable. For example, in a medication research, the dependent variable is the health improvement [based on the medication, which is the independent variable]. A way to remember this is that the dependent variable is dependent on what happens with the independent variable, but the independent variable is independent of the dependent variable.

Kinds of variables:

  • Nominal: a kind of variable which merely categorizes: i.e. gender, diagnosis, etc…. No meaning to order or # value
  • Ordinal: a kind of variable indicating order: i.e. ranks in the army, gender, stage of disease,
  • Interval: a kind of variable where there is significance to the interval – the diff. b/w the numbers. I.e. the difference between 5 and 6, and 8 and 9 is the same 1 unit of a difference, though there is no meaning to an absolute 0. i.e. there is no absolute 0 in Celsius: 0 is not the absence of temperature. So, because there is no absolute 0 in this kinf of variable, then there are no ratio capabilities to such a variable. So, for example, 20 degrees is not half of 40 degrees.
  • Ratio: intervals with significance of the interval and there is an absolute 0! Examples include time and weight. 100 pounds is double of 50 pounds.

-so, you cannot do an average on ordinal and nominal variables. There is no average gender, stage of disease, rank, diagnosis, etc… but you can do averages in interval and ratio variables.

-qualitative research uses more nominal and ordinal variables while quantitative research uses more interval and ratio variables

  • Nonparametric statistics – talk about categories –and thus when the results are graphed, they are graphed on stick graphs, where each vertical stick is a category and its height is the numbers which fall in each category – and thus, those kinds of studies use nominal or ordinal scales
  • Parametric statistics – talk about axis and where results lie on it (and not discrete categories). They use curves and lines to graph results, and not categorical sticks. Interval and ratio variables are used in such statistics.
Norman, G., & Streiner, D. (2003). PDQ Statistics (3rd edition). Hamilton, ON: B.C. Decker. Chapter 2
Describing data

2 kinds of statistics:

Descriptive data: describing the results without trying to generalize it

Inferential statistics: statistical procedures meant to establish generalizing to a broader population, and that they did not occur by chance

Frequencies and distributions

Kinds of variables:

Variables

  • Nominal: a kind of variable which merely categorizes: i.e. gender. No meaning to order or # value
  • Ordinal: a kind of variable indicating order: i.e. ranks in the army
  • Interval: a kind of variable where there is significance to the interval – the diff. b/w the numbers. I.e. the diff b/w 5 and 6, and 8 and 9 is the same 1 of a diff., though there is no meaning to an absolute 0. i.e. there is no absolute 0 in Celsius: 0 is not the absence of temperature.
  • Ratio: intervals with significance of the interval and there is an absolute 0! Examples include time and weight.

Another distinction:

  • Continuous variables: those on an axis – like height or weight. You can weigh 2 pounds, or also 2.345434985345345 pounds, etc…
  • Discrete variables: that there is no space – no range between the categories. I.e. you can’t have 1 and a half kid – only 1, 2 , 3, etc…


More terms:

Distribution – the data as displayed in a chart – i.e. how it is “distributed” in a graph

Frequency distribution: making a chart/graph of the frequencies of a certain variable values happening

Probability distribution: charting the probability of each of the variable values. Thus you put in the % of each value instead of the frequency

Kinds of centers of distribution:

Mode: the variable with most frequency -good for nominal variables – you can have 2 modes in a distribution – i.e. that 2 answers were equally common in a variable. (called bimodal)

Median: the variable in the middle of the distribution -good for ordinal variables

Mean / average: the number resulting from total values/n -good for interval/ratio variables

->mean is influenced by how skewed the values of the distribution is. So, positively skewed (a few more extra-high values will unfairly influence and raise the mean. Negatively skewed distribution will unfairly lower the average. Average is most efficient measure in equally distributed data

Measures of variation: range, percentiles, standard deviation

-this part will speak of how much people differ. This is called the range. In nominal variables, this is the number of variables with at least one respondent replying to this category

In ordinal, interval and ratio: there are several definitions of how to measure a range

  • The range between 5th percentile and 95th percentile
  • See the quartiles [25%, 50% and 75%] and see the inter-quartile range – the difference between the 25th percentile and 75th percentile
  • Standard deviations are used for intervals/ratio variables which is:
  • SD =√sum of [individual value-mean value]2 / number of values
  • The top is squared to undo the negative values -those lower than the average, and the squaring is undone by also doing the square root of the whole formula. By the way, the result before the squaring part of the formula is called variance. So, the lower the SD is from the mean, the closer the participant is to the mean.

->normal distribution: a symmetrical, bell-curve distribution. In a normal distribution, 68% of the distribution fall between the 1 SD unit to each side, 95.5 within 2 SDs and 2.3 fall on each of the tails of the bell-curve

Standard scores:

When comparing 2 results from different tests with different scales, you convert the score into a standard score (x) which compares those two scores:

Standard score (z) = (raw score – mean) / SD

-this test also allows us to see how many people (%) got higher or lower than the given score (through looking at the SD, which indicates how many people got higher or lower), since z scores are in SD units



Class – January 20th, 2009

Numeracy

-nominal variable also sometimes called categorical


Example:

Unemployed:


Problem – hard to compare because that if you have 70/400 unemployed, then 70 is more than 400, but it is proportionally less – so it is better to give proportions

Histogram – tells the data with graphs

Terms

Constant – a number that is not changing (according to user’s wishes)

Variable – a unit that changes according to what people plug in (i.e. x)

Dependant variable variable that changes dependant on another number in the equation

Independent variable a variable that does not necessarily change depending on other variables

Example: Y=f(X)

Dependent: Y

Independent: X

Qualitative- something relating to a quality of something – not #’s

Quantitative- something relating to #

Continuous – a variable that has subunits: i.e. 160cm has subunits (i.e. 160.5)

Discrete – a variable w/o subunits: i.e. # of people in class – there can be 13 people in the class but there can’t be 13.5 people in the class

Infinite-

Finite -

Population the full group being tested on

Sample the random group from the population chosen to collect data

Graphs tables that portray #’s in an arranged order

Histograms

-frequencies are shown in form of bars – too many bars = richer data, but at some point, is more difficult to understand than the smaller categories in the histogram

Bar graph – counting and showing discrete categories

Polygram –a line graph



Central tendency

= - average deviation about the mean – by dividing it by N (has to be capital!), you yet to compare various sized populations, and you get the standard of deviation

-dividing by n = average deviation above the mean – the mean /N = has to be squared in order to undo the squaring by doing he square root.

n-1 takes into account that this is a sample – this is an adjustment that you make when it is not the whole population

SD 1 = 34.3% SD2 – another 13.5%, SD3 = another 2.1% - rest = 0.1%

--

You cannot for average on numeric or ordinal scales – so you do not have the average religion in Canada.

January 27, 2009

Basic bivariate inferential statistics

-comparing two things = bi-variate analysis = needed for association between 2 variables. i.e. cause and effect, and differences between 2 factors. –i.e. does A influence B? is a more than b?

-we’re going to speak about tests of significance

Significance: how likely is these results likely to happen in reality

-if you have access to the whole population [which the sample is taken from] then you can compare it to the population but when you do not, and you want to test significance, then you need other techniques.

Depending on the costs and ramifications, some have a tighter or lighter p value [significant levels]

Statistical tests used for significance:


Homework:
  • Reading:
  • Stats Canada homepage: left hand side: community profiles: enter your postal code – compare it to city of montreal
  • Make table or chart with the comparison and add your commentary on the data


February 3, 2009

readings – homework – for the 10th. do readings +2 articles on webct

Exam – march 3rd



Biases that alter statistics

-Issues in the article read:

Biases

-larger sample – increases significance

->clinical significance = “does it really matter” – this measure is stronger than the statistical significance measure

Significant:



Gotta look at


What does P value mean?

Mistakes


Hypothesis testing



Type I/Type II errors

Research hypothesis is true Research hypothesis is false
Research hypothesis is supported Correct decision Type 1 error

P=alpha

->i.e. 5% chance that it will happen

Risk: using an ineffective thing.

Research hypothesis is not supported Type 2 error

P=beta

Risk: throwing out a good thing. Usually happens with smaller sample: when the results are small but important

In the past, this was not reported. It is now increasing

Correct decision


Effect size

d=/x1-/x2

S (SD size of the population)

=ES1 =means of treatment group - the control group / standard deviation of control groups

-0.2 is a small effect size. 0.5 is considered medium and 0.8 and up is considered large effect

ES2 = proportion improved in group =proportion improved versus control group’s improvement

-odds ratio, -another way to speak about effect size –it is the ratio of the odds f an event occurring in 1 group to the one occurring in other boxed

Standard error

-SE is the standard of deviation divided by the square root for the sample size ->more precise the mean is when sample is larger

-as the sample size gets larger, the SW gets lower, but the mean/SD stays the same

dichotomous Nominal 2+ Ordinal Interval/ration
Dichotomous Chi squared x2 Chi squared x2 Mann-Whitney T-test –t
Nominal 2+ levels Kruskal-Wallis Anova – F
Ordinal Mann-Whitney Spearman rho

Rs

Spearman rho

Rs

Interval/ration Logistic regression pearson

February 10, 2009

Homework – choose test and discuss it – 1 page

Concept

The grand scheme = what is the idea? How does it relate to other concepts?

Operationalizing:

Defining what measures the concept – what we’re actually looking at


Measures

-how you measure the operational definitions



--

Operationalization


Kurz vs. strauss

Two articles discuss intimate partner violence (IPV). They take very different approaches, based on a different conceptualization of IPV. We will compare the two. This is in order to show how different conceptualizations lead to very different results. Therefore, IPV, or anything for that matter, could look very, very, very different, based on your original mindset.

IPV
Author Kurz Strauss
Concept: causal factors that explain IPV Feminist perspective:

Patriarchy:

Gender inequality: financial, division of labour, poverty->childcare/marriage/work$ ->IPV – persistent

Inequality + poverty ->family dynamics ->IPV
Operationalization: Sexual, emotional, economic abuse Physical abuse
Operationalization characteristics Contexts: inequality Chronicity

Frequency

Injury

Measures CTS Police records
Pros
  • Easy to administer, quick, cheap, easy language to understand, frequency, can compare partners, a lot of different acts are included
  • Recorded live
  • Lots of details
  • Gets a better feel for severity than CTS
  • Availability –to compare different samples
Cons
  • Does not take into account context [motivation], self-reporting bias [social desirability] shame and guilt from reporting physical harm
  • Weighting: giving same weight to acts of various severities
  • Self-editing
  • Legal changes – i.e. change in laws
  • Perceptual changes
  • Purpose of data collection is different: if you collect it for legal reasons, it is different than collecting it for research
  • Differences in definitions
  • Sampling bias – people can report many times – so you might have the same guy appearing as several participants
Comments: CTS is supposed to give higher scores to more frequent assaults, rather than severity of assaults Police reports has lots of violence


February 17, 2009

Study review –part 1:

Identity 1 article in your group’s topic and put it into a table

+ introduction about why you are looking into this topic

+ Search strategy – and say if this is one out of how many in the search

Statistical significance


Rea difference in population No difference in population
Research hypothesis is supported – null is rejected Correct decision power 1-b Type 1 error: p=a

-thinking that there is a difference when there is really not. Called alpha

5% threshold

Inconclusive – fail to reject null hypothesis Type 2 error: thinking that there is no difference when there really is. Called beta

20% theshold

Correct decision

-population size effects the representativeness of it. You need about a 1000 sample to get representativeness of Canadian population – you need enough people to detect the significant effects – to avoid a type 2 error

Effect size

ES = d /x mean of treatment group vs control group / standard deviation of control

->effect size does not mean it is significant

->you can also speak about proportion’s effect size: proportion who improved vs. those in the control who improved

->same thing with odds ratio

Clinical vs. statistical significance.

Clinical significance:

Clinical significance: “is this a large enough difference to be worthwhile”?

Statistical significance: likelihood that the results could be by chance alone.

Tests of significance:


Bivariate to multivariate

Bivariate – comparing two variables – the dependent and independent

Multivariable – comepare more variables

You need to:

-association

-cause and effect- temporal ->so you can do a pre-test o see baseline

-rule out confounders

-theoretical plausibility

Confounders:

i.e. race and child abuse – but it could also be income, which also correlates with abuse, and so does race and children.

For example:

Race Reported child abuse
White 2%
African 5%
Poor whites 8%
Poor African Americans 8%

->so once you controlled for income, you see that it is the low income and not race. It just happened that the African American were poorer

->independent variables influencing dependent variables – you gotta account for control and confounding variables


Multivariable

The more you add another variable, and you see that the original correlation goes down, then you assume that this is artifact

->so – you try to account for more variables – and really control for the variables by seeing their relative contributions

->s, despite the control of the variables, the interventions still had an effect – you basically want to see how other [confounders] influence – to see if your independent variable is the one which caused the effect]

--

Conditions for establishing causality:

  1. Association
  2. Temporal
  3. Ruling confounding variables
  4. Theoretical plausibility

Reading 2.1: Locating Measurement Tools and Instruments for individuals and Couples – Kevin Corcoran

-in sciences, progress followed improvements in measurements. This was also true for behavioral sciences

Recent trend in assessment:

  1. move from broad (broadband) questionnaires (i.e. MMPI) to more specific questionnaires (narrowband) – such a depression test
  2. in the past, you needed a specialist to decipher the questionnaire. Today, the clinician can do it.


-the questionnaires give immediate feedback --.and it quantifies observations

-such instruments help the clinician get a direction.

-called Rapid Assessment Instrument – RAI

  • it uses the client’s self-observation
  • it can be made into a rating scale
  • can show the client’s change, and help monitor it
  • a study run by a MSW student shows that the study of questionnaires is ridiculous

to be useful/probative:

  1. reliability – how consistent is the test –score from 0.0 to 1.0. over 0.8 is good and indicative of the test being consistent over time

    -3 forms of consistency:

    1. between items within the instrument – called internal consistency
    2. different forms of the same instrument: alternative or parallel form reliability
    3. over course of time – test-retest reliability
  1. validity – accuracy of the test-
    1. content validity: does it actually measure the domain of the variable?
    2. Concurrent validity: compare it to a known, established criterion
    3. Predictive validity: see how much the test is able to predict future –i.e. how much does SAT predict college success
    4. Construct validity: are the test’s results similar to things that are supposed to be similar, yet different from things that are supposed to be different?
  1. Utility: does the test help improve the clinical work? Does it help plan, monitor, evaluate? – too long of a test will reduce utility.

Self-referenced comparison: compare the scores of the same person over time – to see the changes.

Norm-referenced comparison: comparing one’s scores to general population – could be used to push for therapy – to show that there is a real problem

  1. Suitability: is it suitable for the client’s emotionality and comprehension? i.e. this measure is influenced by the guy’s ability to read. If client is psychotic, he won’t get it either.
  2. Acceptability: how the content of the questionnaire is accepted by the client – i.e. some couples will have a hard time filling out a questionnaire about sex – as this topic is not a topic which they accept talking about..
  3. sensitivity: the test should be able to measure changes, but stable (reliable) when no change occurred.
  4. Non-reactive: does the actual act of measuring cause the change that the questionnaire shows? The book’s example: did increased sexual arousal occur because of better relationship or because of the arousing questions in the questionnaire? So when test is not nonreactive, you do not know why there is a change in the scores.


Locating tools

  • people used to know tests – or have to look them up for hours on end
  • other places include:
    • books
    • journals
    • internet. Problems with internet include
      • human error
      • websites change often
      • information overload
    • databases


Exemplary instruments

Broadband: general questionnaires that allow you to isolate the problem:

  • SQ: symptom questionnaire:
    • measures 4 dimensions:
      • Depression
      • Anxiety
      • Somatization
      • Anger-hostility
    • And 4 parallel wellbeing scales
      • Relaxation
      • Contentment
      • Somatic health
      • Friendliness

    • known groups : it differentiates between healthy and non healthy populations. Also, it has high reliability on test-retest and internal consistency

Narrowband: specific problem:

-the article lists questionnaires for specific problems in the following issues:

  • depression
  • anxiety
  • alcohol
Rubin, A., Babbie, E.R. (2007). Research Methods for Social Work. Chapter 7. 6th Edition. Pacific Grove, CA: Brooks Cole
Introduction
  • End stage of problem formulation-The process of moving from vague ideas about what you want to study to being able to recognize and measure what you want to study. This is the conceptualization and operationalization process.

Conceptual Explication

  • variable isa concept we are investigating
  • concept is a mental image that symbolizes an idea
  • concepts that make up a broader concept are called attributes
  • variables vary…ex. Male is a concept that cannot vary and can take on only one value therefore it is an attribute
  • concept is a variable if 1) comprises more than one attribute or value and thus is capable of varying; and 2) is chosen for investigation in a research study

Developing a Proper Hypothesis

  • a variable that is postulated to explain another variable is called the independent variable
  • the variable being explained is the dependant variable
  • the statement that postulates the relationship between the variables is the hypothesis
  • hypothesis should be value-free and testable

Extraneous Variables

  • Extraneous variables represent alternative explanations for relationships that are observed between independent and dependent variables
  • Control variables on the other hand check on the possibility of the relationship between variables which may be misleading
  • Control variables can also be called moderating variables and these can affect the strength of direction of the relationship between the variables
  • Spurious relationship is one that no longer exists when a third variable is controlled

Mediating Variables

  • Mediating variable is the mechanism by which an independent variable affects a dependant variable. If we think an intervention reduces recidivism among criminal offenders by first increasing prisoner empathy for crime victims, then our “level of empathy” would be the mediating variable
  • Can also be called intervening variables
  • Moderating variables reside outside the causal chain
  • Mediating variables reside in the middle of the causal chain and have no impact on the independent variables



Types of Relationships between Variables

  • positive relationship: the dependent variable increases as the independent variable increases
  • negative or inverse relationship: the two variables move in opposite directions
  • curvilinear relationship: the nature of the relationship changes at certain levels of the variables (p. 156)

Operational Definitions

  • operational definition: operations or indicators we will use to determine the quantity or attribute we observe about a particular variable
  • nominal definitions use a set of words to help us understand what a term means but do not tell us what indicators to use in observing the term

Operationally Defining Anything that Exists

  • most of the variables in social work research don’t actually exist in the way that a rock exists
  • seldom have a single unambiguous meaning
  • technical term of for mental images (racism, homophobia, etc.) is conception
  • idiosyncratic conceptions of the mental images is what we observe
  • direct observables: those things we can observe rather simply and directly like color or check marks made in a questionnaire
  • indirect observables: if someone puts a check beside female in the questionnaire, we indirectly observe the gender
  • all we can measure are the direct and indirect observables that we think imply the conception (i.e.: depression observed through certain behaviours that aren’t really depression in themselves)

Conceptualization

  • conceptualization is the process through which we specify precisely what we will mean when we use particular terms

Indicators and Dimensions

  • the end product of the conceptualization process is the specification of a set of indicators of what have in mind, markers that indicate presence or absence of the concept we are studying
  • dimension is a specifiable aspect or facet of a concept (ie.:economic dimension and civil rights dimension of social justice)

Conceptions and Reality

  • the process of regarding unreal things as real is called reification, and the reification of concepts in day to day life is common

Creating Conceptual Order

  • clarification of concepts is a continuing process in social research
  • refining meanings goes well into the attempt to communicate findings to others in a final report
  • hermeneutic circle is a cyclical process of ever-deeper understanding

The Influence of Operational Definitions

  • how we choose to operationally define a variable can greatly influence our research findings

Gender and Cultural Bias in Operational Definitions

  • special care is needed to avoid gender and cultural bias in choosing operational definitions

Operationalization Choices

  • although social workers have wide variety of options available to the when it comes to measuring a concept, operationalization does not proceed through a systematic check list

Range of Variation

  • range of variation may not need be fully measured at all times but will in light of the research purpose
  • your decision on the range of variation should also be governed by the expected distribution of attributes among your subjects of study
  • range depends on whom you are studying

Variations between the Extremes

* whenever you are not sure how much detail to get in a measurement, get too much rather than too little

* it will always be possible to combine precise attributes into more general categories, but it will never be possible to separate out the variations that were lumped together during observation and measurement

A Note on Dimensions

* It is essential to be clear about which dimensions are important in your inquiry and direct the interviews accordingly. Otherwise you may end up measuring one dimension when you really wanted to know about another one

Examples of Operationalization in Social Work

Three broad categories: self-reports, direct observation and examination of available records

Existing Scales

  • existing scales are popular way to operationally define variables
  • most thorough procedure is through a literature review
  • reference volumes also list and describe many existing measures
  • existing self-report scales may be practical, but they may not be the best way to operationalize a particular variable in a particular study
  • issues when choosing an existing scale: how lengthy is the scale? Will it be to difficult for the participants to complete? Will it be sensitive to small changes over relatively short periods?
  • Two critical issues: reliability and validity (usually reference literature will report on reliability and validity of a scale)
  • You may want to go beyond the reference sourcebook that gives an overview of an existing scal and examine firsthand the studies that reported the development and testing of the scale

Operationalization Goes on and on

  • measure a given variable in several different ways in a research
  • examine alternative operational definitions during your analysis. You will have several single indicators to choose from and many ways to create different composite measures

Qualitative Pespective on Operational Definitions

  • researchers conducting purely qualitative studies do not restrict their observations to predetermined operational indicators
  • problem to defining variables in advance threefold 1) we may not know in advance what the salient variables are 2) limitations in our understanding of the variables we think are important may keep us from anticipating the best way to operationally define those variables 3) even the best operational definitions are necessarily superficial because they are specified only in terms of observable indicators
Rubin, A., Babbie, E.R. (2007). Research Methods for Social Work. Chapter 8. 6th Edition. Pacific Grove, CA: Brooks Cole

Commons Sources of Measurement Error
  • Measurement error occurs when we obtain data that do not accurately portray the concept we are attempting to measure.

Systematic Error

  • Occurs when the information we collect consistently reflects a false picture of the concept we seek to measure, either because of the way we collect the data or the dynamics of those who are providing the data
  • Most common way our measures systematically measure something other than what we think they do, biases are involved in the data collection
  • Acquiescent response set: agreeing or disagreeing with most or all statements regardless of their content
  • Social desirability bias: tendancy of people to say or do things that will make them or their reference group look good.
  • Cultural bias

Random Error

  • have no consistent pattern of effects
  • do not bias our measures, they make them inconsistent from one to the next
  • things we are measuring do not change over time, but our measures keep coming up with different results

Errors in Alternate Forms of Measurement

  • written self-reports: problem in measurement is that peoples words don’t always match their deeds
  • interviews: social desirability adds to problems already noted for written self-reports. Different interviewers can also cause errors due to inconsistencies among them, or their characteristics that may influence how respondents answer.
  • Direct behavioural observation: can be highly vulnerable to social desirability; observers themselves might be biased to perceive behaviors that support their study’s hypothesis. Also there may be errors in recording and observations
  • Examining available records: possibility that some practitioners exaggerate their records in the belief that someone might use those records to evaluate their performance (systematic error). Perhaps they aren’t careful in documenting their tasks (random error)

Avoiding Measurement Error

  • it is virtually impossible to avoid all possible sources of measurement error
  • must try to focus on minimizing any major measurement errors that would destroy the credibility and utility of your findings and that you assess how well your measures appear to have kept those errors from exceeding a reasonable level
  • obtain collegial feedback to help spot biases or ambiguities
  • make sure any team members involved in the observation are consistent in how they perform their tasks
  • unobstrusive observation is used to minimize social desirability
  • triangulation deals with systemic error by using several different research methods to collect data

Reliability

  • whether a particular technique, applied repeatedly to the same object, would yield the same result each time
  • does not ensure accuracy
  • to avoid errors, use measures that have proven their reliability in previous research
  • clarity, specificity, training and practice will avoid a great deal of unreliability and grief

Interobserver and Interrater Reliability

  • degree of agreement or consistency between or among observers or raters
  • to assess interrater reliability you would train two raters; then you would have them view the same videotapes and independently rate their responses. Some researchers argue that 70% and up of agreement would be acceptable random error
  • you might want to calculate the correlation between two sets of ratings instead of percentages

Test-Retest Reliability

  • term for assessing a measure’s stability over time
  • administer the same measurement instrument to the same individuals on two separate occasions. If the correlation between the two sets of responses to the instrument is above .70 or .80 then the instrument may be deemed to have acceptable stability.
  • In assessing test-retest reliability, you must be certain that both tests occur under identical conditions, and the time lapse between test and retest should be long enough that the individuals will not recall their answers

Internal Consistency Reliability

  • to assess whether the various items that make up the measure are internally consistent
  • Internal consistency reliability assumes that the instrument contains multiple items, each of which is scored and combined with the scores of the other items to produce an overall score
  • Assess the correlation of the scores on each item with the scores on the rest of the items
  • Split-halves method: assess the correlations of sub-scores among different subsets of half of the items
  • Most common and most practical method for assessing reliability
  • Parallel-forms reliability: constructing a second measuring instrument that is thought to be equivalent to the first
  • Coefficient alpha: equals the average of all the correlations (calculates the total subscore of each possible split half for each subject, and then calculates the correlations of all possible pairs of split half subscores)

Validity

  • refers to the extent to which an empirical measure adequately reflects the real meaning of the concept under consideration

Face Validity

* face validity: when measure appears to measure what the researcher intended

Content Validity

  • refers to the degree to which a measure covers the range of meanings included within the concept
  • to ascertain whether the measure indeed measures what it’s intende to measure, we need empirical evidence

Criterion-Related Validity

  • select an external criterion that we believe is another indicator or measure of the same variable that our instrument intends to measure
  • two subtypes: predictive validity-its ability to predict a criterion that will occur in the future. Concurrent validity-its correspondence to a criterion that is known concurrently
  • measure is assessed according to its ability to differentiate between known groups is known group validity (another subtype)
  • ability to detect subtle differences is termed the sensitivity
  • we need to be mindful of the issue of sensitivity and of whether the instrument’s know groups validity is based on groups whose differences are more extreme than the differences we expect to detect in our own study

Construct Validity

  • the way to a measure relates to other variables within a system of theoretical relationships
  • construct validation can involve assessing whether the measure has both convergent validity and discriminant validity
  • convergent validity: when its results correspond to the results of other methods of measuring the same construct
  • discriminant validity: when results do not correspond as highly with measures of other constructs as they do with other measures of the same construct




Factorial Validity

  • refers to how many different constructs a scale measures and whether the number of constructs and the items that make up those constructs are what the researcher intends
  • to assess your scale’s factorial validity, you would use a statistical procedure called factor analysis
  • if your results showed that you had three factors, and that the items making up each factor were for the most part the ones you intended to correlate most highly with each other, then your scale would have factorial validity

An illustration of Reliable and Valid Measurement in SW: The Clinical Measurement Package

  • 9 short, standardized scales that were designed for repeated use by clinical social workers to assess client problems and monitor and evaluate progress in treatment (Hudson, 1982)
  • See pages 189 and 192 for more text on this package

Relationship Between Reliability and Validity

  • reliability does not ensure validity
  • you can’t have validity without also having reliability
Reading 5.1

Neuman, W.L., (2002). Chapter 5. The Literature Review and Ethical Concerns (pp 95-136). In W. Laurence Neuman. Social Research Methods: Qualitative and quantitative approaches. 5th Edition. Allyn & Bacon.


Assumption of a literature review:

Knowledge accumulates and that people learn from and build on what others have done. Scientific research is a collective effort of many researchers who share their results with one another and who pursue knowledge as a community.

4 Goals of literature reviews

  1. To demonstrate familiarity with a body of knowledge and establish credibility. All reviews have this in common.
  2. To show the path of prior research and how a current project is linked to it.
  3. To integrate and summarize what is known in an area
  4. To learn from others and stimulate new ideas.

6 types of literature reviews

1. Self-study review: increases reader’s confidence (combines first part of goal 1 and 4).

2. Context review: places a specific project in a big picture. Introduces rest of the research-establishes significance of research question.

3. Historical review: traces the development of an issue over time.

4. Theoretical review: compares how different theories address an issue.

5. Integrative review: summarizes what is known at a point in time.

6. Methodological review: points out how design, samples, measures account for differences.

Meta-analysis: technique used in an integrative or methodological review.

  • gathering details from a large number of research projects and statistically analyzing them
  • doesn’t have to use stats to summarize findings

Where to find research literature

Periodicals

  • mass market publications: source on current events but do not provide material in a form needed for a lit. review
  • popularized social science magazines and prof. publications-at best can supplement to other sources
  • opinion magazines-arena for intellectual debate as opposed to presenting findings.

Scholarly journals

  • primary source of periodical for lit review-usually found in college or university
  • can be specialized
  • no “seal of approval” so reader must use judgement
  • see pages 205-209 for more details.

Citation formats

  • key to locating articles
  • APA
  • ASR –american sociological review
  • Chicago manual of style

Books

  • difficult to distinguish books on research reports from other books
  • some types of social research more likely to be in book form
  • detailed theoretical or philosophical discussions usually appear in books

    Three kinds of books contain collections of articles/research reports

    1. Readers: designed for teaching purposes
    2. Collections: designed for scholars-may gather journal articles, etc.
    3. Annual research books: hybrids between scholarly journals and collections of articles.

Dissertations

  • Original PhD research project
  • Specialized index lists available through university library

Government documents

  • Sponsor studies and publish reports
  • Must use specialized lists in libraries to find them

Policy Reports and Presented Papers

  • A thorough lit review examines these two sources
  • Difficult to obtain
  • Found in libraries or by writing to institute or center for a list of reports.

How to conduct a systematic lit review

Define a topic

  • well-focused research question
  • a context review will be broader than research question
  • sometimes research question finalized after the thorough review is finished.

    Design a research

  • plan a search strategy (type, extensiveness, material to include, etc.)
  • set parameters (time line, number of reports to look at, etc.)
  • learn how to take notes and how to record citations
  • develop a schedule

    Locate research reports

  • general rule: use multiple strategies (articles in scholarly journals, books, dissertations, gov. documents, policy reports, etc.)

Taking notes

  • have one file for sources that will be divided in two: those you have acquired and those that are potential leads
  • content file-remember to record specific quotes and the specific page number associated to it

Organizing notes

  • organize by theme
  • create a mental map of how they fit together
  • context review-organising around a specific research question
  • historical review-by theme and date of publication
  • integrative review-core common findings in a field
  • methodological-by topic and with within topic by design or method
  • theoretical review-by theories and major thinkers being examined

Writing the review

  • be prepared to do a lot of rewriting
  • keep your purpose in mind
  • read source material critically

What does a good review look like?

  • not just summaries of findings; it fails to communicate purpose
  • organize common findings and or arguments together
  • approach most important ideas first
  • note discrepancies and weakness in findings

Internet-Upside

  • easy, fast, cheap-source material from anywhere, easy to access, always open, easy to store found material
  • “links” page helps to connect to other potential sources
  • Democraticization of access to information
  • wide range of information sources

Internet-Downside

  • no quality control
  • much of the most important resource material for social research not available on the net
  • finding sources can be time consuming and difficult
  • sources can be unstable and difficult to document-specific addresses may change or cease to exist

Ethics in Social Research

  • sound ethical practie needs to be integrated into study design
  • moral and professional obligation to be ethical
  • ethical issues may involve finding a balance between scientific knowledge and rights of those being studied
  • potential benefits of study must be weighed against potential human costs

Individual researcher

  • moral code=best defence against unethical behaviour
  • reflect on research actions-consult your conscience

Why be ethical?

  • most unethical behaviour results from lack of awareness, and pressures to take short cuts
  • vague descriptions of ethical standards make the odds of getting caught being unethical pretty small
  • ethical concerns usually internalized during professional socialization
  • you can be “legal” without being ethical

Scientific misconduct

  • includes research fraud, plagiarism, falsification

Origins of research participation protection:

Gross violation of human rights in name of science during Nazi Germany

  1. Secure prior voluntary consent when possible
  2. Never cause unnecessary or irreversible harm to subjects
  3. never unnecessarily humiliate, degrade, or release specific individuals information that was collected for research

Examples of research and ethical issues

Laud: Tearoom Study

  • deceipt and absence of consent of participants engaging in casual homosexual encounters in a public washroom. Followed participants to their vehicules, found their home addresses, and then went to interview them in their homes on a supposedly unrelated topic.

Milgram: Obedience study

  • social pressure to obey authority
  • emotional stress caused by “teacher” who sent electric shocks to students that gave the wrong answers to his questions. Participants were unaware they were not actually sending an electric charge. The recipient actually faked the shocking.


Zimbardo: Role play of prisoners/guards

  • subjects were too caught up in the roles and the study had to be stopped. The prisoners became passive and disorganized while the guards became aggressive, arbitrary, and dehumanizing. The participants were students.

Deception

  • acceptable ONLY if there is a specific methodological reason for it
  • most often it happens in experimental research
  • misrepresentation of self and actions or true intention: only if not doing so would skew results by participants modifying their behaviors if they are fully aware
  • must still have informed consent
  • debrief participants afterwards

Covert observation: only if it is essential combined to a gradual disclosure (exceptions possible, like cults, extremist political sects, etc.)

Informed Consent

  • principle: never coerce someone into participating, it must be voluntary
  • permission of subjects and info on what they are asked to participate in
  • not a requirement of the law

Coercion

  • some may agree to participate with out really being able to give true consent
  • legal guardians must sign for “incompetent” subjects

Creating new inequalities

  • being denied a benefit or service as a result of participation in a study (ie.: participants in a control group that are denied services/treatments)

Privacy: probe into beliefs, background, and behaviours that reveal intimate details

Anonymity: protection privacy by not disclosing a participants identity after info is gathered. Two forms of protection:

  • Anonymity: people remain nameless usually code numbers given to identify them
  • Confidentiality: holds in confidence the identities of those who give information
  • May protect subjects from physical harm

Protection of research participants

  • Us dept of Health and Human services office: protection from research risks
  • National Research Act
  • National Commission for the protection of Human Subjects in Biomed and behavioural research
  • Universities and research institutions are the safe guard of ethical standards
  • Ethics committees, peer review within the profession
Reading 1.3

Longo, P. (2004). Chapter 88. Application of Logic Models in Rural Program Development (pp796-804). In Roberts, A, Yeager, K. (eds). Evidence-Based Practice Manual: Research Outcomes, Measures in Health and Human Sciences. NY: Oxford University Press.


On going Performance Measurement and Management Model (OPM&M)

After 5 years as a grant-funded project focused on implementing policy changes:

  • became a state-wide technical assistance contract
  • primary aim to enhance administrative infrastructure and capacity

    Tools: Performance blueprint-non linear logic model

  • evaluation
  • planning
  • instrumental in documenting success
  • identifying needs
  • learning from service delivery
  • leads to performance measurement literacy

OPM&M: 4 phases

  1. visioning and revisioning (ongoing phase) community assets assessed in relation to federal and state outcome expenditure
  2. Performing (ongoing phase) performance of strategies and strategists are focal point
  3. Measuring (ongoing phase) focus on efficacy and effectiveness of strategies, services and activities preformed
  4. Learning (ongoing phase) performance measurement and evaluation used to adjust program and services.

OPM&M: rooted in theory of change-based evaluation traditions

Performance blueprint: added benefit of looking at the sociocultural and political variables

  • requires identification and inclusion of direct and indirect beneficiaries
  • also direct and indirect service providers
  • incorporates Friedman’s 4 quadrant approach
  • offers a transparent strategy for identifying and prioritizing 4 types of performance measures (effort related, effect related, quantity, quality)

    Individual components of the blueprint

  • Inputs: the resources needed to achieve desired outcomes
  • Activities, Strategies, Services: “effort” part of program (operational elements)
  • Providers, Vendors, Collaborators: paid and unpaid personnel
  • Clients and Customers
  • Outputs (most important distinction of the blueprint) specific impacts that the service providers actually have on the clients or customers
  • Outcomes: using the 4 quadrants

4 Characteristics to operationalize performance measurement in the OPM&M approach

1) systematically collecting and strategically using performance information…

2) …on an ongoing basis…

3) in an intra and interorganizational fashion…

4)…for a variety of internal and external purposes.

Reading 1.4

Longo, P. (2004). Chapter 89. Amplifying Performance Measurement Literacy; Reflections from the Appalachian Partnership for Welfare Reform (pp.804-812). In Roberts, A, Yeager, K. (eds). Evidence-Based Practice Manual: Research Outcomes, Measures in Health and Human Sciences. NY: Oxford University Press.

Performance Blueprint: is an enhanced, non linear logice model

Permits:

  • marshal important program-related evidence
  • promotes stakeholder involvement
  • engender a more productive and collegial type of collaboration

Two distinctions of the PB:

  1. output-outcome
  2. effort-effect

Two outputs

  1. Those associated with the “performance” of the program personnel
  2. Those associated with “performance” of the program’s clients and/or customers once they have come in contact with the program.

Two outcomes

  1. service delivery outcomes
  2. community outcomes
    1. the community in which the clients and/or customers live
    2. the service providing organization’s capacity to manage its resources and to improve service delivery on a continual and accountable basis

Freidmans’ quandrant: effort = input, effect = output

Outputs
Effort

How hard did we try?



+
Quantity

How much did we do

Quality

How well did we do?



=
Quantity Quality
Effort
Effect

What change did we produce

Effect
Quantity Quality


4 measures

a. measures of the quality of the effect

b. measures of the quality of the effort

c. measures of the quantity of the effect

d. measures of the quantity of effort

6 Step sequence for navigating through the Performance Blueprint

1. Organize, collect, and chart outcomes

2. Identify targeted populations.

3. Define the results that clients or customers can expect in terms of output “effects.”

4. Determine which activities, strategies, and services are needed to achieve step 3 and identify who will initiate, execute, provide, and/or monitor these efforts.

5. Define and set the performance measures to assess the “effects” and “efforts” in relation to the chosen activities, strategies, and services.

6. Use available resources and find additional needed resources.


March 3, 2009

Levels of design

Experimental design

  1. One-group posttest
    1. X – o1
  1. One-group pretest-posttest
    1. O1 X O2


  1. Comparison group post test
    1. X o1
    2. O1
      1. History: other factors, other than the experiment infl.
      2. Testing: no 2 measurements -->can’t compare the two
      3. Instrumentation: same
      4. Regression- can’t know it they’re extreme
      5. Selection: maybe they were diff. beforehand
      6. Maturation: perhaps people left the course (manipulation) b/f
      7. Interaction: perhaps this group is unique

  1. Comparison group pretest-posttest
    1. O1 X o2
    2. O1 o2

->if groups were not equivalent – i.e. one group cannot read. – so you gotta match the 2 groups up

  1. Randomized control trial – classical experiment

R->o1 x o2

->o1 o2

  1. Interrupted time series –
    1. O1 o2 o3 x o4 o5 06
  2. Solomon 4 group

Components of experimental design

  1. Population + selection =sample
  2. Sample ->assignment
  3. X treatment and comparison [control] group -> observation 1 [before] –o 2 [after]

-there is a difference between random selection and random assignment

Read gibbs –quality of study rating form - 3.2

->take your group article and fill out the form mentioned in gibbs

+study summary practical implications, etc…. 500 word/ bring hengeller article for next class

Gibbs: How to evaluate studies to guide practice systematically
Filter approach to evaluating evidence

-problem: many studies some stronger and some weaker

  • sponge approach: use them all and integrate strengths from all studies
  • filter approach: discarding less substantive arguments behind while accepting the better ones.

-problem with sponge approach: you accept gold and garbage alike – “reward everything fallacy”

Quality of Study Rating Form

-gives objective criteria of filtering evidence and choosing the practice methods that are supported by the strongest empirical data.

Wacko study: split students into 2 groups – one consisted of those who are for death penalty and the other being those against death penalty. Both groups were given articles for and against death penalty, yet no one really changed their mind and kept finding more reasons to support their own opinions. In other words: bummer. In more other words: people use what they agree with, without thinking about the issue at hand, and despite new data. In other words #3: people are biased –and so are social workers. They even prefer positive than negative results in studies. They are “skewed up!”

Quality of Study Rating Form –QSRF

  1. quick systematic and reliable guide for busy practitioners who want to know practical implication of a report
  2. summarize predominant features of an evaluation study
  3. help those with limited exposure to research methods to rate an index of study quality reliability
  4. help those with limited exposure to concepts of meta-analysis compute 2 simple indices of a treatment method’s effect size
  5. compare and synthesize these indices in order to select a method
  6. waste our time

-the point of the QSRF is to help us compare various studies that evaluate interventions, in order to help us chose the best one

-based on principles of meta-analysis – or “data synthesis” – analyzing data from large chunks of data [whole studies and not individual participants] with the purpose of integrating it.

Quality of study rating form – actual form

  1. client type[s]
  2. treatment method[s]
  3. outcome measure to compute ES1
  4. outcome measure to compute ES2
  5. source [APA format]
  6. criteria for rating study
    1. clear definition of treatment
      1. who – 4 points
      2. what – 4 points
      3. where - 4 points
      4. when – 4 points
      5. why – 4 points
    2. subjects randomly assigned to treatment or control – 20 points
    3. subjects randomly selected – 4 points
    4. non-treatment control group – 4 points
    5. number of subjects on smallest treatment group is larger than 20 – 4 points
    6. outcome measure has face validity – 4 points
    7. treatment outcome was checked for reliability – 5 points
    8. reliability measure has greater than .70 or percept of rater agreement is greater than 70% -5 points
    9. outcome of treatment was measured after treatment was completed – 4 points
    10. test of statistical significance was made and p<0.05 – 20 points
    11. follow-up greater than 75% -10 points
    12. total quality points [TQP]
    13. Effect Size (ES1) = mean of treatment –mean of control or alternate treatment/SD of control or alternative treatment
    14. Effect Size (ES2) = proportion improved in treatment –proportion improved in control or alternate treatment

March 10, 2009

Journal watch in Study analysis: max 600: comment on implication to practice, and methodology

-study analysis –march 24th

-study review – march 31st.

s

-for next week –read henggler – the second one

You can combine effects size

Camel collaboration – way to combine [sometimes very] different results

--

March 17, 2009

Study analysis

-demographic: details!!

-exact # in groups

-allocation procedure

-methodological critique i.e. some do not have much followup data

-good to have some detail in the finding section

-

  • Formulation: what is your question
  • Search strategies
  • New set of tables/
  • 3-4 page summary – not a restatement! But themes across studies. i.e. strengths of studies 6/7 studies had a comparison groups, 3 had randomized, etc….

Study analysis:

-quality of study rating form + your study

-research archive ->look for summaries related to your topic- 500-600 words – summary of findings, design, methodology – see why it works

Observational design: when you have no control over independent variable because of ethical or other reasons

  • cross sectional design
    • single data collection point
    • outcome
    • exposure
    • outcome
    • no causality
      • E
      • O
      • -E
      • -O
    • Current events – very common – like surveys
  • Case-Control [retrospective]
    • Cross sectional with access to data in he past
      • E->o
      • –E
      • –E->o
      • E
    • Problem: Retrospective bias: memory changes
    • People sad in the present remember sad things
  • Longitudinal (prospective)
    • Multiple data collection points
      • O ->e ->o->o
        • -e ->o->o

March 24, 2009

Randomized control trial -RCT– is it feasible for the program that we described in logic model ->i.e. methodologically/ethically +think of an outcome measure

Observational studies - when you can’t manipulate the independent variable.

causality

  • Theoretical plausibility – in first few paragraphs of the article
  • Association – is it coincidence or statistically significant?
  • Temporal order
  • Rule out confounders – i.e. systemic biases, ect..

Sampling frame

  • Population->frame->sample

Sampling influences internal validity – i.e. random. ->allows you to see that the experiment really checks what it is supposed to

->cross-sectional has lower internal validity. So you need to make sure that the sample is representative of the larger population [external validity]

->therefore you need to clarify clearly your population ->sample frame = what the sample precisely represents [and what it does not] – so you know what you can and cannot generalize to

-frame: the list from which you are sampling – i.e. phonebooks etc…

-population – the whole group which you want to generalize to

-note: random is not disorganized. You randomly sample in an organized way!!!

Probability sampling

  • Simple random sample take several cases
  • Systemic sample: take every 50th of the list – not random but close, since there is no reason to believe that this is skewed
  • Stratified sample: randomly sample within a stratification similar to the general population
  • Cluster: i.e. randomly sample i.e. a neighborhood… you need enough neighborhoods clusters to be more representative. Weaker design since randomness is less so!

-

Non probabilistic sampling

  • Convenience
  • Purpose - good for thematic analysis
  • Quota – you want x of this and x of that
  • Snowball – helps when people aren’t going to come to the study to begin with. Problem: volunteer bias, no representation – usually just get a certain group!

-less representative samples is ok, as long as you are not generalizing

Sample size:

-accuracy of estimates increase with sample size. If you do not have a lot of variability, then you can use a less large sample

-error: variability / sample size SE=SD/SQRT(N)

Things you can statistically do:

=confidence interval of proportions – i.e. see how reliable your sampling was – i.e. am 95% sure that the population is 55+-2 (between 53 and 57) . the bigger n you get, the higher the confidence interval, and then the less the range of estimate is! [But the more you increase, the less added benefit – between sample of 10 and 30 is important. Between 200 and 210 is less significant]. Also, you can use same sample size for any population larger than 20-30,000 regardless of their size. Proportions is irrelevant beyond that size of population

=sample size for estimating proportions

2 sample size questions

  1. Do you have enough sampling power to tell the difference between the groups? Minimum 20 [related to gibbs!!!] = power
  2. Do you have enough people in the study to be confident that the sample is representative of the population =representativeness

Response rate

  • Sample frame error – you lose representativeness
  • Response / refusal rates – you lose representativeness/power
  • Item completion rates – you also lose representativeness/power

Methods of data collection

  • Mail – looks more intimidating – looks longer than it takes!
  • Telephone
  • Face-to-face – downside – has to be clustered! – easier to do complex interviews – i.e. answer something if you answered yes to something. Easier to do open-ended questions
    • Internet: harder to see if representative


Aspect of Survey

Mailed Questionnaires

Telephone Interviews
Face-to-Face

(In home) Interviews

Cost


Low

Low/Medium

High
Geographic distribution of Sample May be wide May be wide Must be clustered
Length of questionnaire Short/medium Medium/long Long
Complexity of questionnaire Must be simple May be complex May be complex
Control of question order Simple to moderate Must be short and simple May be complex
Use of open-ended questions Poor Fair Good
Sensitive topics Good Fair/good Fair
Response rates 45% - 75% 60% - 90% 65% - 95%
Quality of recorded response Fair/good Very good Very good
NOTE: Adapted from Designing Surveys: A Guide to Decisions and Procedures (p.32) by R. Czaja and J. Blair, 1995, Thousand Oaks, CA: Pine Forge Press. Copyright © 1996 by Pine Forge Press.


Question design

  • open vs. closed questions
  • length & sequence
  • response categories
    • avoid overlapping categories, middle positions and unknowns
  • wording
    • Avoid jargon, ambiguous, double-barreled, leading, & hypothetical questions

Procedures

  • Design phase
    • plan, pre-test, final design
  • Data collection phase
    • train, sample, & collect
  • Data analysis phase
    • code, clean, & analyze

--

March 31, 2009

Abstract

Opening sentence describing the abstract of the paper

-target population

-methods – who participated, sample size, random assignment, design,

-measures used – and follow up times

-is it statistically significant? And is there also clinical significance? Are they actually feeling better?

-conclusion: is it more effective? “promising results”

-you have to say that X is more effective than Y not just X is effective.

Service as usual is not the same is not the same as no treatment

Strengths:

Long follow up

-comparison group – randomized control group

-large sample size

-more than one measure

-random assignment= internal validity –and random selection speaks about recruitment [ransom selection is rare and you cannot assume that it happened)

Limitation:

-co-intervention – may have reduced effect size

-lack of significance is a sign of what/? Measures not sensitive enough, not enough power [i.e. sample seems big but other intervention is effecting the control group, so they have also improved somewhat]

-because of the complexity of this population, you may need an even larger population

-only one site of the study = less generalability

Ethics

  1. are you not doing harm? [i.e. critics of harm reduction approaches]
  2. how do you deal with those in the control group – i.e. those not getting the treatment
  3. i.e. if you fund 1 study – you do not fund another

Locations of visitors to this page