2014-09-08 3 views
7

Sto riscontrando un comportamento strano con la funzione select di dplyr. Non sta lasciando cadere la variabile dal frame di dati.comportamento anomalo con select in dplyr

Ecco i dati originari:

orig <- structure(list(park = structure(c(4L, 4L, 4L, 4L, 4L), .Label = c("miss", 
"piro", "sacn", "slbe"), class = "factor"), year = c(2006L, 2009L, 
2006L, 2008L, 2009L), agent = structure(c(5L, 5L, 5L, 7L, 5L), .Label = c("agriculture", 
"beaver", "development", "flooding", "forest_pathogen", "harvest_00_20", 
"harvest_30_60", "harvest_70_90", "none"), class = "factor"), 
    ha = c(4.32, 1.17, 3.51, 2.07, 9.18), loc_01 = structure(c(9L, 
    5L, 9L, 5L, 5L), .Label = c("miss", "non_miss", "non_piro", 
    "non_sacn", "non_slbe", "none", "piro", "sacn", "slbe"), class = "factor"), 
    loc_02 = structure(c(5L, 1L, 5L, 1L, 1L), .Label = c("none", 
    "piro_core", "piro_ibz", "slbe_mainland", "slbe_southmanitou" 
    ), class = "factor"), loc_03 = structure(c(1L, 1L, 1L, 1L, 
    1L), .Label = "none", class = "factor"), cross_valid = c(1L, 
    1L, 1L, 1L, 1L)), .Names = c("park", "year", "agent", "ha", 
"loc_01", "loc_02", "loc_03", "cross_valid"), row.names = c(NA, 
5L), class = "data.frame") 

Assomiglia:

> orig 
    park year   agent ha loc_01   loc_02 loc_03 cross_valid 
1 slbe 2006 forest_pathogen 4.32  slbe slbe_southmanitou none   1 
2 slbe 2009 forest_pathogen 1.17 non_slbe    none none   1 
3 slbe 2006 forest_pathogen 3.51  slbe slbe_southmanitou none   1 
4 slbe 2008 harvest_30_60 2.07 non_slbe    none none   1 
5 slbe 2009 forest_pathogen 9.18 non_slbe    none none   1 
> str(orig) 
'data.frame': 5 obs. of 8 variables: 
$ park  : Factor w/ 4 levels "miss","piro",..: 4 4 4 4 4 
$ year  : int 2006 2009 2006 2008 2009 
$ agent  : Factor w/ 9 levels "agriculture",..: 5 5 5 7 5 
$ ha   : num 4.32 1.17 3.51 2.07 9.18 
$ loc_01  : Factor w/ 9 levels "miss","non_miss",..: 9 5 9 5 5 
$ loc_02  : Factor w/ 5 levels "none","piro_core",..: 5 1 5 1 1 
$ loc_03  : Factor w/ 1 level "none": 1 1 1 1 1 
$ cross_valid: int 1 1 1 1 1 

Poi faccio un breve riassunto ...

library (dplyr) 
    summ <- orig %>% 
    + group_by(park,cross_valid,agent) %>% 
    + summarise(ha_dist=sum(ha)) 
    summ 
    Source: local data frame [2 x 4] 
    Groups: park, cross_valid 

     park cross_valid   agent ha_dist 
    1 slbe   1 forest_pathogen 18.18 
    2 slbe   1 harvest_30_60 2.07 
    str(summ) 
    Classes ‘grouped_df’, ‘tbl_df’, ‘tbl’ and 'data.frame': 2 obs. of 4 variables: 
    $ park  : Factor w/ 4 levels "miss","piro",..: 4 4 
    $ cross_valid: int 1 1 
    $ agent  : Factor w/ 9 levels "agriculture",..: 5 7 
    $ ha_dist : num 18.18 2.07 
    - attr(*, "vars")=List of 2 
     ..$ : symbol park 
     ..$ : symbol cross_valid 
    - attr(*, "drop")= logi TRUE 

Allora provo a cadere 'cross_valid '...

sel <- select (summ,-cross_valid) 
summ 
Source: local data frame [2 x 4] 
Groups: park, cross_valid 

    park cross_valid   agent ha_dist 
1 slbe   1 forest_pathogen 18.18 
2 slbe   1 harvest_30_60 2.07 
str(summ) 
Classes ‘grouped_df’, ‘tbl_df’, ‘tbl’ and 'data.frame': 2 obs. of 4 variables: 
$ park  : Factor w/ 4 levels "miss","piro",..: 4 4 
$ cross_valid: int 1 1 
$ agent  : Factor w/ 9 levels "agriculture",..: 5 7 
$ ha_dist : num 18.18 2.07 
- attr(*, "vars")=List of 2 
    ..$ : symbol park 
    ..$ : symbol cross_valid 
- attr(*, "drop")= logi TRUE 
- attr(*, "indices")=List of 1 
    ..$ : int 0 1 
- attr(*, "group_sizes")= int 2 
- attr(*, "biggest_group_size")= int 2 
- attr(*, "labels")='data.frame': 1 obs. of 2 variables: 
    ..$ park  : Factor w/ 4 levels "miss","piro",..: 4 
    ..$ cross_valid: int 1 
    ..- attr(*, "vars")=List of 2 
    .. ..$ : symbol park 
    .. ..$ : symbol cross_valid 

E non scenderà summ$cross_valid

Se uso di base R a cadere cross_valid, funziona ...

base.sel <- summ[-2] 
base.sel 
Source: local data frame [2 x 3] 
Groups: 

    park   agent ha_dist 
1 slbe forest_pathogen 18.18 
2 slbe harvest_30_60 2.07 

posso cadere orig$cross_valid utilizzando select ...

drop.orig <- select (orig,-cross_valid) 
drop.orig 
    park year   agent ha loc_01   loc_02 loc_03 
1 slbe 2006 forest_pathogen 4.32  slbe slbe_southmanitou none 
2 slbe 2009 forest_pathogen 1.17 non_slbe    none none 
3 slbe 2006 forest_pathogen 3.51  slbe slbe_southmanitou none 
4 slbe 2008 harvest_30_60 2.07 non_slbe    none none 
5 slbe 2009 forest_pathogen 9.18 non_slbe    none none 

Dato che posso rilasciare la variabile con la base R, non è un grosso problema, ma ho pensato che potrebbe esserci qualche problema con dplyr. Probabilmente è qualcosa con la struttura della variabile, ma non so cosa sarebbe.

Grazie ..

-cherrytree

risposta

8

provare ungroup()

summ%>% 
ungroup() %>% 
select(-cross_valid) 
# park   agent ha_dist 
#1 slbe forest_pathogen 18.18 
#2 slbe harvest_30_60 2.07 



groups(summ) 
#[[1]] 
#park 

#[[2]] 
#cross_valid 
+1

Sì, non è possibile '-select' una variabile di raggruppamento. – Ajar

+1

Grazie a @akrun. Non sapevo che non si potesse rimuovere una variabile di raggruppamento ... molto interessante e buona da conoscere. – cherrytree

+0

@cherrytree Nessun problema. Sono contento che ci abbia aiutato – akrun