描述性统计

均值(Mean)

$$ \bar{x}=\frac{1}{n} \sum_{i=1}^{n} x_{i} $$

中位数(Median)

按顺序排列的一组数据中居于中间位置的数

$$ x_{0.5}=\left\{\begin{array}{ll}x_{\left(\frac{n+1}{2}\right)}, & n \text { 为奇数 } \\\frac{1}{2}\left(x_{\left(\frac{n}{2}\right)}+x_{\left(\frac{n}{2}+1\right)}\right), & n \text { 为偶数 }\end{array}\right. $$

众数(Mode)

指一组数据中出现次数最多的数据值

频数(Frequency)和密度(Density)

set.seed(123)
df = tibble(heights = rnorm(10000, 170, 2.5))
ggplot(df, aes(x = heights)) +
  geom_histogram(fill = "steelblue", color = "black", binwidth = 0.5) +
  stat_function(fun = ~ dnorm(.x, mean = 170, sd = 2.5) * 0.5 * 10000, color = "red")

Untitled

值域/定义域