There are two Perl repositories available on CPAN that deal with Chi-squared analysis(`Statistics::ChiSquare`

and `Statistics::Distributions)`

. However neither one outputs the Chi-squared value for the analysis of two binary populations.

We can use the formula below to calculate the Chi-squared value with one degree of freedom.

χ2 = [n(ad – bc)2] / [(a + b) (c + d) (a + c) (b + d)]

n = a + b + c + d

Where:

variable | population 1 | population 2 |
---|---|---|

+ | a | b |

– | c | d |

Example:

Suppose we wish to determine the relationship between disease in two species. Both disease and the species are binary variables, so the Chi-squared test is applied:

Diseased | species 1 | species 2 |
---|---|---|

No | 57 | 36 |

Yes | 63 | 88 |

n = (57 + 36 + 63 + 88) = 244

χ^{2} = [244*(57*88 – 36*63)^{2}] / [(57 + 36) (63 + 88) (57 + 63) (36 + 88)]

χ^{2} = 8.81

The critical Chi-squared distribution P-values at 1 degree of freedom are:

D.F. | 0.1 | 0.05 | 0.025 | 0.01 | 0.005 |
---|---|---|---|---|---|

1 | 2.71 | 3.84 | 5.02 | 6.63 | 7.88 |

The χ^{2} value (8.82) is below the P-value 0.005.

Since the corresponding P-value is less than 0.05 (P<0.05), the data suggest that the prevalence of disease is significantly higher in species 2. Therefore we reject the null hypothesis.

Below is a Perl subroutine to automatically calculate Chi-squared.

```
sub chi_squared {
my ($a,$b,$c,$d) = @_;
return 0 if($b+$d == 0);
my $n= $a + $b + $c + $d;
return (($n*($a*$d - $b*$c)**2) / (($a + $b)*($c + $d)*($a + $c)*($b + $d)));
}
print &chi_squared(57,36,63,88);
```

Output:

`8.81780430153469`