easystats 是 R 包集合,可提供统一且easy的代码进行 统计分析。


一、与 tidyverse 有何不同?

tidyverse包集合,是 R语言 中非常流行的软件包集合(ggplot、dplyr、tidyr……),它也使 R 的使用更加容易。那么,您应该选择 tidyverse 还是 easystats?

两个都选!

事实上,这两个生态系统的设计考虑了非常不同的目标。 tidyverse 包主要用于 数据操作和探索。而 easystats 更关注分析的最后阶段:理解和解释您的结果,并在手稿或报告中报告它们。您绝对可以在 tidyverse 工作流程中使用 easystats 功能!

easystats + tidyverse = ❤️


二、easystats生态圈

每个 easystats 包都有不同的范围和用途。这意味着您最好的开始方式是探索并选择您认为可能对您有用的那些。但是,由于它们的构建考虑了"更大的图景",您会意识到使用更多它们会创建一个流畅的工作流程,因为这些包旨在协同工作。理想情况下,这些软件包可以协同工作,涵盖统计分析和数据可视化的所有方面。

  • report:📜 🎉 生成自动统计报告
  • correlation:🔗 相关性的多合一包
  • modelbased:📈 基于统计模型估计效果、组平均值和组间对比
  • bayestestR:👻 非常适合贝叶斯统计的初学者或专家
  • effectize: 🐉 计算、转换、解释和使用效果大小和标准化参数的索引
  • see :🎨 创建漂亮结果可视化的绘图
  • parameters:📊 获取有关模型包含参数所有信息的表格
  • performance:💪模型的质量和性能指标(R2、ICC、LOO、AIC、BF、…)
  • insight:🔮 对于开发人员来说,一个包可以帮助您使用不同的模型和包
  • datawizard:🧙 清理和转换数据的魔法药水

截止2022-09-21, easystats生态中各个包的下载情况

Total insight bayestestR parameters performance datawizard effectsize correlation see modelbased report easystats
10,001,095 3,085,739 1,373,379 1,363,678 1,285,506 1,159,097 1,060,079 277,541 251,038 94,089 48,876 2,073



三、实验

3.1 report简介

easystats到底如何好用,今天只实验report包,该包号称 “从 R 到你的手稿”

report包以两步方式工作。首先,使用该函数创建一个report对象report()。然后,该报表对象可以使用文本方式(默认输出)或以表格形式显示as.data.frame()summary()此外,您还可以使用报表对象访问更摘要和紧凑的报表版本。


report包的主要目标是弥合 R 的输出与手稿中包含的格式化结果之间的差距。 它根据最佳实践指南(例如 APA 期刊风格)自动生成模型和数据框架的报告,确保结果报告的标准化和质量。安装report

install.packages("easystats")

3.2 数据

生成iris数据集描述性统计信息的英文报告

report(iris)

Run

# The data contains 150 observations of the following 5 variables:
# 
#   - Sepal.Length: n = 150, Mean = 5.84, SD = 0.83, Median = 5.80, MAD = 1.04,
# range: [4.30, 7.90], Skewness = 0.31, Kurtosis = -0.55, 0% missing
#   - Sepal.Width: n = 150, Mean = 3.06, SD = 0.44, Median = 3.00, MAD = 0.44,
# range: [2, 4.40], Skewness = 0.32, Kurtosis = 0.23, 0% missing
#   - Petal.Length: n = 150, Mean = 3.76, SD = 1.77, Median = 4.35, MAD = 1.85,
# range: [1, 6.90], Skewness = -0.27, Kurtosis = -1.40, 0% missing
#   - Petal.Width: n = 150, Mean = 1.20, SD = 0.76, Median = 1.30, MAD = 1.04,
# range: [0.10, 2.50], Skewness = -0.10, Kurtosis = -1.34, 0% missing
#   - Species: 3 levels, namely setosa (n = 50, 33.33%), versicolor (n = 50,
# 33.33%) and virginica (n = 50, 33.33%)

iris %>%
  select(-starts_with("Sepal")) %>%
  group_by(Species) %>%
  report() %>%
  summary()

Run

# The data contains 150 observations, grouped by Species, of the following 3
# variables:
# 
# - setosa (n = 50):
#   - Petal.Length: Mean = 1.46, SD = 0.17, range: [1, 1.90]
#   - Petal.Width: Mean = 0.25, SD = 0.11, range: [0.10, 0.60]
# 
# - versicolor (n = 50):
#   - Petal.Length: Mean = 4.26, SD = 0.47, range: [3, 5.10]
#   - Petal.Width: Mean = 1.33, SD = 0.20, range: [1, 1.80]
# 
# - virginica (n = 50):
#   - Petal.Length: Mean = 5.55, SD = 0.55, range: [4.50, 6.90]
#   - Petal.Width: Mean = 2.03, SD = 0.27, range: [1.40, 2.50]

3.3 t检验和相关性

report(t.test(mtcars$mpg ~ mtcars$am))

Run

# Effect sizes were labelled following Cohen's (1988) recommendations.
# 
# The Welch Two Sample t-test testing the difference of mtcars$mpg by mtcars$am
# (mean in group 0 = 17.15, mean in group 1 = 24.39) suggests that the effect is
# negative, statistically significant, and large (difference = -7.24, 95% CI
# [-11.28, -3.21], t(18.33) = -3.77, p = 0.001; Cohen's d = -1.41, 95% CI [-2.26,
# -0.53])

您还可以使用as.data.frame()函数创建表,例如使用此相关性测试:

cor.test(iris$Sepal.Length, iris$Sepal.Width) %>%
  report() %>%
  as.data.frame()

Run

# Pearson's product-moment correlation
# 
# Parameter1        |       Parameter2 |     r |        95% CI | t(148) |     p
# -----------------------------------------------------------------------------
# iris$Sepal.Length | iris$Sepal.Width | -0.12 | [-0.27, 0.04] |  -1.44 | 0.152
# 
# Alternative hypothesis: two.sided

3.4 方差分析

aov(Sepal.Length ~ Species, data = iris) %>%
  report()

Run

# The ANOVA (formula: Sepal.Length ~ Species) suggests that:
# 
#   - The main effect of Species is statistically significant and large (F(2, 147)
# = 119.26, p < .001; Eta2 = 0.62, 95% CI [0.54, 1.00])
# 
# Effect sizes were labelled following Field's (2013) recommendations.

3.5 广义线性模型

model <- glm(vs ~ mpg * drat, data = mtcars, family = "binomial")

report(model)

Run

# We fitted a logistic model (estimated using ML) to predict vs with mpg and drat
# (formula: vs ~ mpg * drat). The model's explanatory power is substantial
# (Tjur's R2 = 0.51). The model's intercept, corresponding to mpg = 0 and drat =
# 0, is at -33.43 (95% CI [-77.90, 3.25], p = 0.083). Within this model:
# 
#   - The effect of mpg is statistically non-significant and positive (beta = 1.79,
# 95% CI [-0.10, 4.05], p = 0.066; Std. beta = 3.63, 95% CI [1.36, 7.50])
#   - The effect of drat is statistically non-significant and positive (beta =
# 5.96, 95% CI [-3.75, 16.26], p = 0.205; Std. beta = -0.36, 95% CI [-1.96,
# 0.98])
#   - The effect of mpg × drat is statistically non-significant and negative (beta
# = -0.33, 95% CI [-0.83, 0.15], p = 0.141; Std. beta = -1.07, 95% CI [-2.66,
# 0.48])
# 
# Standardized parameters were obtained by fitting the model on a standardized
# version of the dataset. 95% Confidence Intervals (CIs) and p-values were
# computed using a Wald z-distribution approximation.

更多内容请阅读 report包 的文档。


广而告之