Clojure for Data Science
上QQ阅读APP看书,第一时间看更新

The t-statistic

While using the t-distribution, we look up the t-statistic. Like the z-statistic, this value quantifies how unlikely a particular observed deviation is. For a dual sample t-test, the t-statistic is calculated in the following way:

The t-statistic

Here, The t-statistic is the pooled standard error. We could calculate the pooled standard error in the same way as we did earlier:

The t-statistic

However, the equation assumes knowledge of the population parameters σa and σb, which can only be approximated from large samples. The t-test is designed for small samples and does not require us to make assumptions about population variance.

As a result, for the t-test, we write the pooled standard error as the square root of the sum of the standard errors:

The t-statistic

In practice, the earlier two equations for the pooled standard error yield identical results, given the same input sequences. The difference in notation just serves to illustrate that with the t-test, we depend only on sample statistics as input. The pooled standard error The t-statistic can be calculated in the following way:

(defn pooled-standard-error [a b]
  (i/sqrt (+ (i/sq (standard-error a))
             (i/sq (standard-error b)))))

Although they are represented differently in mathematical notation, in practice, the calculation of t-statistic is identical to z-statistic:

(def t-stat z-stat)

(defn ex-2-15 []
    (let [data (->> (load-data "new-site.tsv")
                    (:rows)
                    (group-by :site)
                    (map-vals (partial map :dwell-time)))
          a (get data 0)
          b (get data 1)]
      (t-stat a b)))

;; -1.647

The difference between the two statistics is conceptual rather than algorithmic—the z-statistic is only applicable when the samples follow a normal distribution.