The t-statistic_Clojure for Data Science-QQ阅读男生玄幻网

上QQ阅读APP看书，第一时间看更新

The t-statistic

While using the t-distribution, we look up the t-statistic. Like the z-statistic, this value quantifies how unlikely a particular observed deviation is. For a dual sample t-test, the t-statistic is calculated in the following way:

Here, The t-statistic is the pooled standard error. We could calculate the pooled standard error in the same way as we did earlier:

However, the equation assumes knowledge of the population parameters σ_a and σ_b, which can only be approximated from large samples. The t-test is designed for small samples and does not require us to make assumptions about population variance.

As a result, for the t-test, we write the pooled standard error as the square root of the sum of the standard errors:

In practice, the earlier two equations for the pooled standard error yield identical results, given the same input sequences. The difference in notation just serves to illustrate that with the t-test, we depend only on sample statistics as input. The pooled standard error The t-statistic can be calculated in the following way:

(defn pooled-standard-error [a b]
  (i/sqrt (+ (i/sq (standard-error a))
             (i/sq (standard-error b)))))

Although they are represented differently in mathematical notation, in practice, the calculation of t-statistic is identical to z-statistic:

(def t-stat z-stat)

(defn ex-2-15 []
    (let [data (->> (load-data "new-site.tsv")
                    (:rows)
                    (group-by :site)
                    (map-vals (partial map :dwell-time)))
          a (get data 0)
          b (get data 1)]
      (t-stat a b)))

;; -1.647

The difference between the two statistics is conceptual rather than algorithmic—the z-statistic is only applicable when the samples follow a normal distribution.