Question

假设我有这个列表：

listexample = list(books = list(list(
                    title="Book 1",
                    entry = "entry 1",
                    publisher = "Books Unlimited",
                    authors = list(
                                list(name="bob", location="north dakota"),
                                list(name="susan", location="california"),
                                list(name="tim")),
                    isbn = "1358",
                    universities = list(
                                list(univ="univ1"),
                                list(univ="univ2"))
                    ),
                    list(
                        title="Book 2",
                        entry = "entry 2",
                        publisher = "Books Unified",
                        authors = list(
                            list(name="tom", location="north dakota"),
                            list(name="sally", location="california"),
                            list(name="erica", location="berlin")),
                        isbn = "1258",
                        universities = list(
                            list(univ="univ5"),
                            list(univ="univ2"),
                            list(univ="univ99"),
                            list(univ="univ2"),
                            list(univ="univ3"))
                    )   
                ),
     misc = list(name="Jim Smith", location="Alaska"))

如何创建一个数据框（或 tibble 也可以），其中每行都是一个作者？我想完全忽略主列表的第二个元素（misc）。我还想忽略universities、isbn和publisher。我仍然想保留、、title以及（主列表第一个元素的名称）。namelocationbooks

我知道它rrapply可以用来迭代地做事，但我不确定在这种情况下是否合适。

library(rrapply)
rrapply(listexample, how = "bind")

只要列表不是很大，这看起来就没问题了？您可能想listexample[[2L]] = NULL先运行一下。然后我猜只需要重命名和删除行即可。 —

Answer 1

1)使用tibblify创建 tibble 并从中选择title和authors列。后者是列表列，因此unnest它。

library(dplyr)
library(tidyr)
library(tibblify)

listexample %>%
  .$books %>%
  tibblify %>%
  select(title, authors) %>%
  unnest(authors)

给予

# A tibble: 6 × 3
  title  name  location    
  <chr>  <chr> <chr>       
1 Book 1 bob   north dakota
2 Book 1 susan california  
3 Book 1 tim   <NA>        
4 Book 2 tom   north dakota
5 Book 2 sally california  
6 Book 2 erica berlin

2) 上述方法的变体是使用如下所示的规范。可以通过运行然后编辑为所需内容来tibblify创建规范。guess_tspec_df(listexample$books)

spec <- tspec_df(
  tib_chr("title"),
  tib_df(
    "authors",
    tib_chr("name"),
    tib_chr("location", required = FALSE),
  )
)
tibblify(listexample$books, spec) %>% unnest(authors)

Accepted Answer

您可以使用unnest_longer和unnest_wider来自tidyr。

listexample |> 
  tibble::enframe() |> 
  dplyr::filter(name == "books") |> 
  tidyr::unnest_longer(value) |> 
  tidyr::unnest_wider(value) |> 
  dplyr::select(title, authors) |> 
  tidyr::unnest_longer(authors) |> 
  tidyr::unnest_wider(authors)

您可以逐行运行代码，查看所有内容的作用。简而言之，我们将列表变成两行 tibble（第一行是books，第二行是misc），然后展开嵌套信息。

阅读tidyr 小插图以了解更多信息。事实上，您可以使用来减少此处的代码tidyr::hoist()。

Answer 3

在基础 R 中，使用lapply你do.call可能会做

> lapply(listexample[[1L]], \(i) { 
+   tmp = i[names(i) %in% c("authors", "title")] 
+   tmp2 = do.call("rbind", lapply(l<-tmp[["authors"]], `length<-`, max(lengths(l))))
+   cbind.data.frame("title" = rep(tmp[["title"]], nrow(tmp2)), tmp2)
+   }) |> do.call(what="rbind")

   title  name     location
1 Book 1   bob north dakota
2 Book 1 susan   california
3 Book 1   tim         NULL
4 Book 2   tom north dakota
5 Book 2 sally   california
6 Book 2 erica       berlin

r – 如何从列表创建数据框，选择要关注的子列表 – 堆栈内存溢出

最佳答案
3

最佳答案 3

最佳答案
3