20
Июл
2021

Как убрать повторяющиеся значения в DataFrame?

В таблице, значения столбца представлены строками:

df = pd.DataFrame({ 'a':['female, female, female, female, male, female', 'female, male, female, female', 'female, female, female', 'male, male, male']})

       a
0   female, female, female, female, male, female
1   female, male, female, female
2   female, female, female
3   male, male, male

Моё решение с использованием метода set():

f = lambda x: [set(y) for y in x.split('; ')]
df['b'] = df['a'].apply(f)

Даёт следующий результат:

      a                                              b
0   female, female, female, female, male, female    [{m, , e, ,, a, f, l}]
1   female, male, female, female                    [{m, , e, ,, a, f, l}]
2   female, female, female                          [{m, , e, ,, a, f, l}]
3   male, male, male                                [{m, , e, ,, a, l}]

А нужно:

     a                                               b
0   female, female, female, female, male, female    female, male
1   female, male, female, female                    female, male
2   female, female, female                          female
3   male, male, male                                male

Источник: https://ru.stackoverflow.com/questions/1306954/%D0%9A%D0%B0%D0%BA-%D1%83%D0%B1%D1%80%D0%B0%D1%82%D1%8C-%D0%BF%D0%BE%D0%B2%D1%82%D0%BE%D1%80%D1%8F%D1%8E%D1%89%D0%B8%D0%B5%D1%81%D1%8F-%D0%B7%D0%BD%D0%B0%D1%87%D0%B5%D0%BD%D0%B8%D1%8F-%D0%B2-dataframe

Тебе может это понравится...

Добавить комментарий