programing tip

psycopg2 : 하나의 쿼리로 여러 행 삽입

itbloger 2020. 7. 16. 19:43
반응형

psycopg2 : 하나의 쿼리로 여러 행 삽입


하나의 쿼리로 여러 행을 삽입해야합니다 (행 수가 일정하지 않음). 다음과 같이 쿼리를 실행해야합니다.

INSERT INTO t (a, b) VALUES (1, 2), (3, 4), (5, 6);

내가 아는 유일한 방법은

args = [(1,2), (3,4), (5,6)]
args_str = ','.join(cursor.mogrify("%s", (x, )) for x in args)
cursor.execute("INSERT INTO t (a, b) VALUES "+args_str)

그러나 나는 더 간단한 방법을 원합니다.


다른 도시에있는 서버에 여러 줄을 삽입하는 프로그램을 만들었습니다.

이 방법을 사용하는 것보다 약 10 배 빠릅니다 executemany. 제 경우에는 tup약 2000 개의 행을 포함하는 튜플입니다. 이 방법을 사용할 때 약 10 초가 걸렸습니다.

args_str = ','.join(cur.mogrify("(%s,%s,%s,%s,%s,%s,%s,%s,%s)", x) for x in tup)
cur.execute("INSERT INTO table VALUES " + args_str) 

이 방법을 사용할 때 2 분 :

cur.executemany("INSERT INTO table VALUES(%s,%s,%s,%s,%s,%s,%s,%s,%s)", tup)

Psycopg 2.7의 새로운 execute_values방법 :

data = [(1,'x'), (2,'y')]
insert_query = 'insert into t (a, b) values %s'
psycopg2.extras.execute_values (
    cursor, insert_query, data, template=None, page_size=100
)

Psycopg 2.6에서 그것을하는 pythonic 방법 :

data = [(1,'x'), (2,'y')]
records_list_template = ','.join(['%s'] * len(data))
insert_query = 'insert into t (a, b) values {}'.format(records_list_template)
cursor.execute(insert_query, data)

설명 : 삽입 할 데이터가 다음과 같이 튜플 목록으로 제공되는 경우

data = [(1,'x'), (2,'y')]

이미 정확한 형식으로되어 있습니다.

  1. values의 구문 insert절에서와 같은 기록의 목록을 기대

    insert into t (a, b) values (1, 'x'),(2, 'y')

  2. PsycopgPython tuple을 Postgresql에 적용합니다 record.

필요한 작업은 psycopg가 채울 레코드 목록 템플릿을 제공하는 것입니다.

# We use the data list to be sure of the template length
records_list_template = ','.join(['%s'] * len(data))

insert쿼리에 배치

insert_query = 'insert into t (a, b) values {}'.format(records_list_template)

insert_query출력물 인쇄

insert into t (a, b) values %s,%s

이제 일반적인 Psycopg인수 대체로

cursor.execute(insert_query, data)

또는 서버로 전송 될 내용을 테스트

print (cursor.mogrify(insert_query, data).decode('utf8'))

산출:

insert into t (a, b) values (1, 'x'),(2, 'y')

psycopg2 2.7로 업데이트 :

executemany()이 스레드에 설명 된대로 클래식 은 @ ant32 구현 ( "폴딩"이라고 함)보다 약 60 배 느립니다. https://www.postgresql.org/message-id/20170130215151.GA7081%40deb76.aryehleib.com

이 구현은 버전 2.7의 psycopg2에 추가되었으며 다음과 execute_values()같습니다.

from psycopg2.extras import execute_values
execute_values(cur,
    "INSERT INTO test (id, v1, v2) VALUES %s",
    [(1, 2, 3), (4, 5, 6), (7, 8, 9)])

이전 답변 :

여러 행을 삽입하려면 여러 행 VALUES구문을 execute()사용하면 psycopg2를 사용하는 것보다 약 10 배 빠릅니다 executemany(). 실제로 executemany()많은 개별 INSERT진술을 실행 합니다.

@ ant32의 코드는 Python 2에서 완벽하게 작동하지만 Python 3에서는 cursor.mogrify()바이트를 반환 cursor.execute()하고 바이트 또는 문자열을 취하며 인스턴스를 ','.join()기대 str합니다.

따라서 Python 3에서는 다음을 추가하여 @ ant32 코드를 수정해야 할 수도 있습니다 .decode('utf-8').

args_str = ','.join(cur.mogrify("(%s,%s,%s,%s,%s,%s,%s,%s,%s)", x).decode('utf-8') for x in tup)
cur.execute("INSERT INTO table VALUES " + args_str)

또는 바이트 ( b''또는 포함 b"") 만 사용하여 :

args_bytes = b','.join(cur.mogrify("(%s,%s,%s,%s,%s,%s,%s,%s,%s)", x) for x in tup)
cur.execute(b"INSERT INTO table VALUES " + args_bytes) 

Postgresql.org 의 Psycopg2 튜토리얼 페이지의 스 니펫 (아래 참조) :

마지막으로 보여 드리고 싶은 것은 사전을 사용하여 여러 행을 삽입하는 방법입니다. 다음과 같은 경우 :

namedict = ({"first_name":"Joshua", "last_name":"Drake"},
            {"first_name":"Steven", "last_name":"Foo"},
            {"first_name":"David", "last_name":"Bar"})

다음을 사용하여 사전 내에 세 개의 행을 모두 쉽게 삽입 할 수 있습니다.

cur = conn.cursor()
cur.executemany("""INSERT INTO bar(first_name,last_name) VALUES (%(first_name)s, %(last_name)s)""", namedict)

It doesn't save much code, but it definitively looks better.


cursor.copy_from is the fastest solution I've found for bulk inserts by far. Here's a gist I made containing a class named IteratorFile which allows an iterator yielding strings to be read like a file. We can convert each input record to a string using a generator expression. So the solution would be

args = [(1,2), (3,4), (5,6)]
f = IteratorFile(("{}\t{}".format(x[0], x[1]) for x in args))
cursor.copy_from(f, 'table_name', columns=('a', 'b'))

For this trivial size of args it won't make much of a speed difference, but I see big speedups when dealing with thousands+ of rows. It will also be more memory efficient than building a giant query string. An iterator would only ever hold one input record in memory at a time, where at some point you'll run out of memory in your Python process or in Postgres by building the query string.


All of these techniques are called 'Extended Inserts" in Postgres terminology, and as of the 24th of November 2016, it's still a ton faster than psychopg2's executemany() and all the other methods listed in this thread (which i tried before coming to this answer).

Here's some code which doesnt use cur.mogrify and is nice and simply to get your head around:

valueSQL = [ '%s', '%s', '%s', ... ] # as many as you have columns.
sqlrows = []
rowsPerInsert = 3 # more means faster, but with diminishing returns..
for row in getSomeData:
        # row == [1, 'a', 'yolo', ... ]
        sqlrows += row
        if ( len(sqlrows)/len(valueSQL) ) % rowsPerInsert == 0:
                # sqlrows == [ 1, 'a', 'yolo', 2, 'b', 'swag', 3, 'c', 'selfie' ]
                insertSQL = 'INSERT INTO "twitter" VALUES ' + ','.join(['(' + ','.join(valueSQL) + ')']*rowsPerInsert)
                cur.execute(insertSQL, sqlrows)
                con.commit()
                sqlrows = []
insertSQL = 'INSERT INTO "twitter" VALUES ' + ','.join(['(' + ','.join(valueSQL) + ')']*len(sqlrows))
cur.execute(insertSQL, sqlrows)
con.commit()

But it should be noted that if you can use copy_from(), you should use copy_from ;)


To nicely insert into DB a list of rows, using user given batch size and with psycopg2 !

def get_batch(iterable, size=100):
    for i in range(0, len(iterable), size):
        yield iterable[i: i + size]


def insert_rows_batch(table, rows, batch_size=500, target_fields=None):
    """
    A utility method to insert batch of tuples(rows) into a table
    NOTE: Handle data type for fields in rows yourself as per your table 
    columns' type.

    :param table: Name of the target table
    :type table: str

    :param rows: The rows to insert into the table
    :type rows: iterable of tuples

    :param batch_size: The size of batch of rows to insert at a time
    :type batch_size: int

    :param target_fields: The names of the columns to fill in the table
    :type target_fields: iterable of strings
    """
    conn = cur = None
    if target_fields:
        target_fields = ", ".join(target_fields)
        target_fields = "({})".format(target_fields)
    else:
        target_fields = ''

    conn = get_conn() # get connection using psycopg2
    if conn:
        cur = conn.cursor()
    count = 0

    for mini_batch in get_batch(rows, batch_size):
        mini_batch_size = len(mini_batch)
        count += mini_batch_size
        record_template = ','.join(["%s"] * mini_batch_size)
        sql = "INSERT INTO {0} {1} VALUES {2};".format(
            table,
            target_fields,
            record_template)
        cur.execute(sql, mini_batch)
        conn.commit()
        print("Loaded {} rows into {} so far".format(count, table))
    print("Done loading. Loaded a total of {} rows".format(count))
    if cur:cur.close()
    if conn:conn.close()

If you want UPSERT (Insert+Update) as well in postgres with batches: postgres_utilities


Another nice and efficient approach - is to pass rows for insertion as 1 argument, which is array of json objects.

E.g. you passing argument:

[ {id: 18, score: 1}, { id: 19, score: 5} ]

It is array, which may contain any amount of objects inside. Then your SQL looks like:

INSERT INTO links (parent_id, child_id, score) 
SELECT 123, (r->>'id')::int, (r->>'score')::int 
FROM unnest($1::json[]) as r 

Notice: Your postgress must be new enough, to support json


I've been using ant32's answer above for several years. However I've found that is thorws an error in python 3 because mogrify returns a byte string.

Converting explicitly to bytse strings is a simple solution for making code python 3 compatible.

args_str = b','.join(cur.mogrify("(%s,%s,%s,%s,%s,%s,%s,%s,%s)", x) for x in tup) 
cur.execute(b"INSERT INTO table VALUES " + args_str)

If you're using SQLAlchemy, you don't need to mess with hand-crafting the string because SQLAlchemy supports generating a multi-row VALUES clause for a single INSERT statement:

rows = []
for i, name in enumerate(rawdata):
    row = {
        'id': i,
        'name': name,
        'valid': True,
    }
    rows.append(row)
if len(rows) > 0:  # INSERT fails if no rows
    insert_query = SQLAlchemyModelName.__table__.insert().values(rows)
    session.execute(insert_query)

If you want to insert multiple rows within one insert statemens (assuming you are not using ORM) the easiest way so far for me would be to use list of dictionaries. Here is an example:

 t = [{'id':1, 'start_date': '2015-07-19 00:00:00', 'end_date': '2015-07-20 00:00:00', 'campaignid': 6},
      {'id':2, 'start_date': '2015-07-19 00:00:00', 'end_date': '2015-07-20 00:00:00', 'campaignid': 7},
      {'id':3, 'start_date': '2015-07-19 00:00:00', 'end_date': '2015-07-20 00:00:00', 'campaignid': 8}]

conn.execute("insert into campaign_dates
             (id, start_date, end_date, campaignid) 
              values (%(id)s, %(start_date)s, %(end_date)s, %(campaignid)s);",
             t)

As you can see only one query will be executed:

INFO sqlalchemy.engine.base.Engine insert into campaign_dates (id, start_date, end_date, campaignid) values (%(id)s, %(start_date)s, %(end_date)s, %(campaignid)s);
INFO sqlalchemy.engine.base.Engine [{'campaignid': 6, 'id': 1, 'end_date': '2015-07-20 00:00:00', 'start_date': '2015-07-19 00:00:00'}, {'campaignid': 7, 'id': 2, 'end_date': '2015-07-20 00:00:00', 'start_date': '2015-07-19 00:00:00'}, {'campaignid': 8, 'id': 3, 'end_date': '2015-07-20 00:00:00', 'start_date': '2015-07-19 00:00:00'}]
INFO sqlalchemy.engine.base.Engine COMMIT

Using aiopg - The snippet below works perfectly fine

    # items = [10, 11, 12, 13]
    # group = 1
    tup = [(gid, pid) for pid in items]
    args_str = ",".join([str(s) for s in tup])
    # insert into group values (1, 10), (1, 11), (1, 12), (1, 13)
    yield from cur.execute("INSERT INTO group VALUES " + args_str)

Finally in SQLalchemy1.2 version, this new implementation is added to use psycopg2.extras.execute_batch() instead of executemany when you initialize your engine with use_batch_mode=True like:

engine = create_engine(
    "postgresql+psycopg2://scott:tiger@host/dbname",
    use_batch_mode=True)

http://docs.sqlalchemy.org/en/latest/changelog/migration_12.html#change-4109

Then someone would have to use SQLalchmey won't bother to try different combinations of sqla and psycopg2 and direct SQL together..

참고URL : https://stackoverflow.com/questions/8134602/psycopg2-insert-multiple-rows-with-one-query

반응형