programing tip

Pandas DataFrame을 사전으로 변환

itbloger 2020. 7. 12. 10:14

Pandas DataFrame을 사전으로 변환

네 개의 열이있는 DataFrame이 있습니다. 이 DataFrame을 파이썬 사전으로 변환하고 싶습니다. 첫 번째 열의 keys요소가되고 동일한 행의 다른 열의 요소가되기를 원합니다 values.

데이터 프레임 :

    ID   A   B   C
0   p    1   3   2
1   q    4   3   2
2   r    4   0   9

출력은 다음과 같아야합니다.

사전:

{'p': [1,3,2], 'q': [4,3,2], 'r': [4,0,9]}

이 to_dict()메서드는 열 이름을 사전 키로 설정하므로 DataFrame의 모양을 약간 변경해야합니다. 'ID'열을 인덱스로 설정 한 다음 DataFrame을 조옮김으로써이를 달성 할 수 있습니다.

to_dict()또한 각 열의 값 목록 을 출력하는 데 필요한 '동향'인수를 허용합니다 . 그렇지 않으면 {index: value}각 열에 대해 양식 사전 이 반환됩니다.

이 단계는 다음 줄로 수행 할 수 있습니다.

>>> df.set_index('ID').T.to_dict('list')
{'p': [1, 3, 2], 'q': [4, 3, 2], 'r': [4, 0, 9]}

다른 사전 형식이 필요한 경우 가능한 방향 인수의 예는 다음과 같습니다. 다음과 같은 간단한 DataFrame을 고려하십시오.

>>> df = pd.DataFrame({'a': ['red', 'yellow', 'blue'], 'b': [0.5, 0.25, 0.125]})
>>> df
        a      b
0     red  0.500
1  yellow  0.250
2    blue  0.125

그런 다음 옵션은 다음과 같습니다.

dict- 기본값 : 열 이름은 키이고 값은 index : data 쌍의 사전입니다.

>>> df.to_dict('dict')
{'a': {0: 'red', 1: 'yellow', 2: 'blue'}, 
 'b': {0: 0.5, 1: 0.25, 2: 0.125}}

list- 키는 열 이름, 값은 열 데이터 목록

>>> df.to_dict('list')
{'a': ['red', 'yellow', 'blue'], 
 'b': [0.5, 0.25, 0.125]}

시리즈 - '목록'과 같지만 값은 시리즈

>>> df.to_dict('series')
{'a': 0       red
      1    yellow
      2      blue
      Name: a, dtype: object, 

 'b': 0    0.500
      1    0.250
      2    0.125
      Name: b, dtype: float64}

스플릿 - 값은 각각의 행 인덱스 라벨로 열 이름 데이터 값으로 되 키로 분할 열 / 데이터 / 인덱스

>>> df.to_dict('split')
{'columns': ['a', 'b'],
 'data': [['red', 0.5], ['yellow', 0.25], ['blue', 0.125]],
 'index': [0, 1, 2]}

레코드 -각 행은 키가 열 이름이고 값이 셀의 데이터 인 사전이됩니다.

>>> df.to_dict('records')
[{'a': 'red', 'b': 0.5}, 
 {'a': 'yellow', 'b': 0.25}, 
 {'a': 'blue', 'b': 0.125}]

index- 'records'와 비슷하지만 키가 색인 레이블 (목록이 아닌) 인 사전 사전

>>> df.to_dict('index')
{0: {'a': 'red', 'b': 0.5},
 1: {'a': 'yellow', 'b': 0.25},
 2: {'a': 'blue', 'b': 0.125}}

사용하려고 Zip

df = pd.read_csv("file")
d= dict([(i,[a,b,c ]) for i, a,b,c in zip(df.ID, df.A,df.B,df.C)])
print d

산출:

{'p': [1, 3, 2], 'q': [4, 3, 2], 'r': [4, 0, 9]}

Follow these steps:

Suppose your dataframe is as follows:

>>> df
   A  B  C ID
0  1  3  2  p
1  4  3  2  q
2  4  0  9  r

1. Use `set_index` to set `ID` columns as the dataframe index.

    df.set_index("ID", drop=True, inplace=True)

2. Use the `orient=index` parameter to have the index as dictionary keys.

    dictionary = df.to_dict(orient="index")

The results will be as follows:

    >>> dictionary
    {'q': {'A': 4, 'B': 3, 'D': 2}, 'p': {'A': 1, 'B': 3, 'D': 2}, 'r': {'A': 4, 'B': 0, 'D': 9}}

3. If you need to have each sample as a list run the following code. Determine the column order

column_order= ["A", "B", "C"] #  Determine your preferred order of columns
d = {} #  Initialize the new dictionary as an empty dictionary
for k in dictionary:
    d[k] = [dictionary[k][column_name] for column_name in column_order]

If you don't mind the dictionary values being tuples, you can use itertuples:

>>> {x[0]: x[1:] for x in df.itertuples(index=False)}
{'p': (1, 3, 2), 'q': (4, 3, 2), 'r': (4, 0, 9)}

For my use (node names with xy positions) I found @user4179775's answer to the most helpful / intuitive:

import pandas as pd

df = pd.read_csv('glycolysis_nodes_xy.tsv', sep='\t')

df.head()
    nodes    x    y
0  c00033  146  958
1  c00031  601  195
...

xy_dict_list=dict([(i,[a,b]) for i, a,b in zip(df.nodes, df.x,df.y)])

xy_dict_list
{'c00022': [483, 868],
 'c00024': [146, 868],
 ... }

xy_dict_tuples=dict([(i,(a,b)) for i, a,b in zip(df.nodes, df.x,df.y)])

xy_dict_tuples
{'c00022': (483, 868),
 'c00024': (146, 868),
 ... }

Addendum

I later returned to this issue, for other, but related, work. Here is an approach that more closely mirrors the [excellent] accepted answer.

node_df = pd.read_csv('node_prop-glycolysis_tca-from_pg.tsv', sep='\t')

node_df.head()
   node  kegg_id kegg_cid            name  wt  vis
0  22    22       c00022   pyruvate        1   1
1  24    24       c00024   acetyl-CoA      1   1
...

Convert Pandas dataframe to a [list], {dict}, {dict of {dict}}, ...

Per accepted answer:

node_df.set_index('kegg_cid').T.to_dict('list')

{'c00022': [22, 22, 'pyruvate', 1, 1],
 'c00024': [24, 24, 'acetyl-CoA', 1, 1],
 ... }

node_df.set_index('kegg_cid').T.to_dict('dict')

{'c00022': {'kegg_id': 22, 'name': 'pyruvate', 'node': 22, 'vis': 1, 'wt': 1},
 'c00024': {'kegg_id': 24, 'name': 'acetyl-CoA', 'node': 24, 'vis': 1, 'wt': 1},
 ... }

In my case, I wanted to do the same thing but with selected columns from the Pandas dataframe, so I needed to slice the columns. There are two approaches.

Directly:

(see: Convert pandas to dictionary defining the columns used fo the key values)

node_df.set_index('kegg_cid')[['name', 'wt', 'vis']].T.to_dict('dict')

{'c00022': {'name': 'pyruvate', 'vis': 1, 'wt': 1},
 'c00024': {'name': 'acetyl-CoA', 'vis': 1, 'wt': 1},
 ... }

"Indirectly:" first, slice the desired columns/data from the Pandas dataframe (again, two approaches),

node_df_sliced = node_df[['kegg_cid', 'name', 'wt', 'vis']]

node_df_sliced2 = node_df.loc[:, ['kegg_cid', 'name', 'wt', 'vis']]

that can then can be used to create a dictionary of dictionaries

node_df_sliced.set_index('kegg_cid').T.to_dict('dict')

{'c00022': {'name': 'pyruvate', 'vis': 1, 'wt': 1},
 'c00024': {'name': 'acetyl-CoA', 'vis': 1, 'wt': 1},
 ... }

DataFrame.to_dict() converts DataFrame to dictionary.

Example

>>> df = pd.DataFrame(
    {'col1': [1, 2], 'col2': [0.5, 0.75]}, index=['a', 'b'])
>>> df
   col1  col2
a     1   0.1
b     2   0.2
>>> df.to_dict()
{'col1': {'a': 1, 'b': 2}, 'col2': {'a': 0.5, 'b': 0.75}}

See this Documentation for details

참고URL : https://stackoverflow.com/questions/26716616/convert-a-pandas-dataframe-to-a-dictionary

'programing tip' 카테고리의 다른 글

diff를 git-diff처럼 작동시키는 방법? (0)	2020.07.12
angularjs 약속을 반환하기 전에 해결할 수 있습니까? (0)	2020.07.12
C ++ 11의 재귀 람다 함수 (0)	2020.07.12
Jenkins 빌드 번호 변경 (0)	2020.07.12
트리거 변경 이벤트 (0)	2020.07.12

현재글Pandas DataFrame을 사전으로 변환

itbloger

Pandas DataFrame을 사전으로 변환

Pandas DataFrame을 사전으로 변환

Follow these steps:

1. Use `set_index` to set `ID` columns as the dataframe index.

2. Use the `orient=index` parameter to have the index as dictionary keys.

3. If you need to have each sample as a list run the following code. Determine the column order

'programing tip' 카테고리의 다른 글

'programing tip'의 다른글

티스토리툴바

Pandas DataFrame을 사전으로 변환

Pandas DataFrame을 사전으로 변환

Follow these steps:

1. Use set_index to set ID columns as the dataframe index.

2. Use the orient=index parameter to have the index as dictionary keys.

3. If you need to have each sample as a list run the following code. Determine the column order

'programing tip' 카테고리의 다른 글

'programing tip'의 다른글

관련글

티스토리툴바

1. Use `set_index` to set `ID` columns as the dataframe index.

2. Use the `orient=index` parameter to have the index as dictionary keys.