Solución:
Hay varias formas de conectarse a Teradata y exportar tablas a Pandas. Aquí hay cuatro +:
Usando el módulo teradata
# You can install teradata via PIP: pip install teradata
# to get a list of your odbc drivers names, you could do: teradata.tdodbc.drivers
# You don’t need to install teradata odbc driver if using method='rest'.
# See sending data from df to teradata for connection example
import teradata
import pandas as pd
host,username,password = 'HOST','UID', 'PWD'
#Make a connection
udaExec = teradata.UdaExec (appName="test", version="1.0", logConsole=False)
with udaExec.connect(method="odbc",system=host, username=username,
password=password, driver="DRIVERNAME") as connect:
query = "SELECT * FROM DATABASEX.TABLENAMEX;"
#Reading query to df
df = pd.read_sql(query,connect)
# do something with df,e.g.
print(df.head()) #to see the first 5 rows
Usando TeradataSQL
de @ymzkala: este paquete no requiere que instale controladores de Teradata (aparte de este paquete).
# Installing python -m pip install teradatasql
import teradatasql
with teradatasql.connect(host="host", user="username", password='password') as connect:
df = pd.read_sql(query, connect)
Usando el módulo pyodbc
import pyodbc
#You can install teradata via PIP: pip install pyodbc
#to get a list of your odbc drivers names, you could do: pyodbc.drivers()
#Make a connection
link = 'DRIVER={DRIVERNAME};DBCNAME={hostname};UID={uid};PWD={pwd}'.format(
DRIVERNAME=DRIVERNAME,hostname=hostname,
uid=username, pwd=password)
with pyodbc.connect(link,autocommit=True) as connect:
#Reading query to df
df = pd.read_sql(query,connect)
Usando el módulo sqlalchemy
#You can install sqlalchemy via PIP: pip install sqlalchemy-teradata
#Note: It is not pip install sqlalchemy. If you already have sqlalchemy, you still need sqlalchemy-teradata to get teradata dialects
from sqlalchemy import create_engine
#Make a connection
link = 'teradata://{username}:{password}@{hostname}/?driver={DRIVERNAME}'.format(
username=username,hostname=hostname,DRIVERNAME=DRIVERNAME)
with create_engine(link) as connect:
#Reading query to df
df = pd.read_sql(query,connect)
Hay una quinta forma, usando el módulo giraffez. Disfruto usando este módulo, ya que viene con MLOAD, FASTLOAD, BULKEXPORT, etc. El único problema para los principiantes son sus requisitos (por ejemplo, compilador C / C ++, Teradata CLIv2 y encabezados / archivos lib TPT API).
Nota: Actualizado el 13-07-2018, usando el administrador de contexto para asegurar el cierre de las sesiones
Actualización: 31-10-2018: uso de teradata para enviar datos de df a teradata
Podemos enviar datos de df a Teradata. Para evitar el límite de 1 MB de ‘odbc’ y la dependencia del controlador odbc, podemos usar el método ‘rest’. Necesitamos host ip_address, en lugar del argumento del controlador. NÓTESE BIEN: El orden de las columnas en df debe coincidir con el orden de las columnas en la tabla de Teradata.
import teradata
import pandas as pd
# HOST_IP can be found by executing *>>nslookup viewpoint* or *ping viewpoint*
udaExec = teradata.UdaExec (appName="test", version="1.0", logConsole=False)
with udaExec.connect(method="rest",system="DBName", username="UserName",
password="Password", host="HOST_IP_ADDRESS") as connect:
data = [tuple(x) for x in df.to_records(index=False)]
connect.executemany("INSERT INTO DATABASE.TABLEWITH5COL values(?,?,?,?,?)",data,batch=True)
Al usar ‘odbc’, debe dividir sus datos en fragmentos de menos de 1 MB para evitar “[HY001][Teradata][ODBC Teradata Driver] Error de asignación de memoria “error: Ej.
import teradata
import pandas as pd
import numpy as np
udaExec = teradata.UdaExec (appName="test", version="1.0", logConsole=False)
with udaExec.connect(method="odbc",system="DBName", username="UserName",
password="Password", driver="DriverName") as connect:
#We can divide our huge_df to small chuncks. E.g. 100 churchs
chunks_df = np.array_split(huge_df, 100)
#Import chuncks to Teradata
for i,_ in enumerate(chunks_df):
data = [tuple(x) for x in chuncks_df[i].to_records(index=False)]
connect.executemany("INSERT INTO DATABASE.TABLEWITH5COL values(?,?,?,?,?)",data,batch=True)
Para agregar a la respuesta de Prayson, puede usar el paquete teradatasql (que se encuentra en pypi). Este paquete no requiere que instale controladores de Teradata (aparte de este paquete). Úselo así:
import teradatasql
import pandas as pd
with teradatasql.connect(host="host", user="username", password='password') as connect:
data = pd.read_sql('select top 5 * from table_name;', connect)
Descargue el módulo Teradata Python y python pyodbc.pyd de Internet. Instale usando cmd install setup.py.
Aquí está el script de muestra para conectarse a teradata y extraer datos:
import teradata
import pyodbc
import sys
udaExec = teradata.UdaExec (appName="HelloWorld", version="1.0",
logConsole=False)
session = udaExec.connect(method="odbc", dsn="prod32",
username="PRODRUN", password="PRODRUN");
i = 0
REJECTED = 'R';
f = file("output.txt","w");sys.stdout=f
cursor = session.cursor();
ff_remaining = 0;
cnt = cursor.execute("SELECT SEQ_NO,FRQFBKDC,PNR_RELOC FROM ttemp.ffremaining ORDER BY 1,2,3 ").rowcount;
rows = cursor.execute("SELECT SEQ_NO,FRQFBKDC,PNR_RELOC FROM ttemp.ffremaining ORDER BY 1,2,3 ").fetchall();
for i in range(cnt):
ff_remaining = cursor.execute("select count(*) as coun from ttemp.ffretroq_paxoff where seq_no=? and status <> ?",(rows[i].seq_no,REJECTED)).fetchall();
print ff_remaining[0].coun, rows[i].seq_no, REJECTED;