时间:2024-09-05 21:21:01 阅读:78
诺维医学科研官网:https://www.newboat.top 更新中!
bilibili:文章对应的讲解视频在此。熊大学习社 https://space.bilibili.com/475774512
公众号|B站|全网同名:熊大学习社
医学资源站,https://med.newboat.top/ 内有医学离线数据库、数据提取、科研神器等高质量资料库
诺维AI:https://gpt4.nwzz.xyz 可用GPT4|GPT3.5,改写论文、翻译润色、编写代码的好助手
课程说明:
(1)数据缺失的多重插补-代码实例已全部公开。关注公众号熊大学习社,回复med004
,获取资料信息。
(2)论文指导学员免费获取学习资料,了解咨询扫客服二维码。
(3)关注熊大学习社。您的一键三连是我最大的动力。
#####公众号:熊大学习社#####
# 手动设置工作目录为代码和数据所在文件夹
# 步骤方法:点菜单栏“session”->"Set Work Directory"->"Choose Directory"
# 选择代码和数据所在文件夹即可
# 查看工作目录
getwd()
# 检测是否安装了相关的库,没有则自动安装
if(!require('readxl')) install.packages('readxl')
if(!require('dplyr')) install.packages('dplyr')
if(!require('mice')) install.packages('mice')
# 加载库
library(readxl) # 数据文件读取与写入
library(dplyr) # 数据处理
library(mice) # 缺失值插补
# 代码目录,对应修改
code_path <- "D:Courses02" 直播课程-MIMIC数据库提取数据和清洗-基于R课程资料Day3code"
setwd(code_path)
load("data_1.Rdata")
## 3.4 多重插补multiple imputation------
baseline <- data_raw %>%
select(
# 列名重命名
Age = admission_age,
Weight = weight,
Height = height,
Gender = gender,
Race = race,
Hypertension = co_hypertension,
Diabetes = co_diabetes,
Neoplasm = co_neoplasm,
COPD = co_COPD,
`History of CAS` = co_CA_surgery,
INR = inr,
PT = pt
)
# 图形探究缺失值
baseline %>% VIM::aggr(prop=T, numbers=F, cex.label = 1.2, cex.aixs = 2,
gap = 3, labels = names(.), oma = c(8, 6, 2, 3))
# 查看缺失值数量和比例
baseline %>% naniar::miss_var_summary()
# 缺失值的可视化
baseline %>% visdat::vis_miss()
# 完全随机缺失检验
baseline %>% naniar::mcar_test()
# 均值插补:weight/inr/pt
data_simple_impute <- data_raw %>%
replace_na(
list(
# 当na.rm设置为TRUE时,函数会自动忽略缺失值并进行计算
weight = median(.$weight, na.rm = T),
inr = median(.$inr, na.rm = T),
pt = median(.$pt, na.rm = T)
)
)
# 随机插补:race/height
imp <- data_simple_impute %>%
select(admission_age, weight, height, gender, race, co_diabetes, co_hypertension, co_neoplasm,
co_COPD, co_CA_surgery, inr, pt) %>%
mutate(race = as.factor(race)) %>%
# 随机森林插补rf,1次插补
mice(method = "rf", m = 1, seed = 2024)
# 查看插补变量和插补方法
imp$method
# 生成插补数据
data_imputed <- complete(imp) %>%
select(height, race) %>%
rename_at(c(names(.)), function(x) paste0(x, "_imputed")) %>%
as_tibble()
# 零值插补:duration_pres_heparin
# 得到最终数据
final_data <- data_simple_impute %>%
bind_cols(data_imputed) %>%
mutate(
race = as.factor(race),
gender = as.factor(gender),
race = factor(race, levels = c("White", "Black", "Asian", "Hispanic", "Others")),
co_diabetes = as.factor(co_diabetes),
co_hypertension = as.factor(co_hypertension),
co_neoplasm = as.factor(co_neoplasm),
co_COPD = as.factor(co_COPD),
co_CA_surgery = as.factor(co_CA_surgery),
co_VTE = as.factor(co_VTE),
co_CI = as.factor(co_CI),
co_GI = as.factor(co_GI),
co_ICH = as.factor(co_ICH),
co_bleeding = as.factor(co_bleeding),
group = as.factor(group)
) %>%
replace_na(
list(
duration_pres_aspirin = 0,
duration_pres_clopidogrel = 0,
duration_pres_ticagrelor = 0,
duration_pres_heparin = 0
)
)
# 设置工作目录
setwd(code_path)
# 保存数据
save.image(file = "data_1.Rdata", compress = TRUE)
(1)数据缺失的多重插补-代码实例已全部公开。关注公众号熊大学习社,回复med004
,获取资料信息。
(2)医学公共数据数据库学习训练营已开班,欢迎咨询课程助理!
(3)数据提取和数据分析定制,具体扫码咨询课程助理。
(4)视频号课程推荐