2009-06-19

Data transfer from PostgreSQL to MySQL

Based on a tip from codesmell.org blog I eventually transferred data from PostgreSQL to MySQL. My client - what is unusual - decided to stay at MySQL as homogeneous environment and drop PostgreSQL.

The main problem is to
  • preserve encoding
  • keep all quoted characters
  • change boolean from T/F to 1/0, and
  • auto_increment what was serialized.

Transfering schema is planned for future post. The main guidelines are:

  • change type inet to text
  • remove serial and nextval
  • remove evrything after "::" in field description using this command:


echo "default 'USD'::bpchar" | awk '{ i=index($0, ":"); print i; print substr($0, 0, i); }'


Exporting data


The first step is to compose string of the definition of every table for export to CSV:

table_name(first_column,second_column...)


Script parses all tables definitions and composes the desired string:
users(id,created,login,name,email,description)
ads(id,user_id,subject,text)

PostgreSQL versions


To parse PostgreSQL data I needed psql client in the same version as server. After download proper version I compiled and installed only client and necessary libraries. Create short script in source directory to automate it:

#!/bin/bashmake -C src/bin installmake -C src/bin/psql installmake -C src/include installmake -C src/interfaces installmake -C doc install


Then run ./configure --prefix=$HOME/local

Client in ~/local/bin/psql is ready to use. Execute it using full path or supplement $PATH:
export PATH=$HOME/local/bin:$PATH

Actual function:
function definitions {
while read table; do
if [[ $table != '' ]]; then
# weird empty lines
definition="$table("        # start the string with table name
while read column_def; do
column=`echo "$column_def" | awk '{ print $1; }'`
definition="${definition}${column}" # add column name
definition="$definition,"  # add comma
done < <($PGSQLCONN -t -c "\d $table" | awk '{ print $1; }')
echo "$definition)" | sed 's/,,)/)/' # add closing bracket and remove double commas
fi
done < <($PGSQLCONN -t -c "\dt" | grep -v "pg_" | awk '{ print $3; }') > $TABLE_DEFS
# for the while loop use all tables list from command psql> \dt


Some tables depend on data from other. To put FOREIGN KEY with reference to user's ids, I needed to put users table first. Identify all such tables and move them on top of imported tables.

tmpf="/tmp/tmp_table_definitions.txt"
grep -E "^user" $TABLE_DEFINITIONS > $PRIORITY_DEFINITIONS
grep -E "^ad" $TABLE_DEFINITIONS >> $PRIORITY_DEFINITIONS
grep -E -v "(^user|^ad)" $TABLE_DEFINITIONS > $tmpf
mv $PRIORITY_DEFINITIONS $TABLE_DEFINITIONScat $tmpf >> $TABLE_DEFINITIONS


Generation of a list table from table definition.

function tables {
DIR="$PROJHOME/csv"rm -rf $DIRmkdir $DIRtmpf="/tmp/tmp.csv"
# Ensure there's enough disk space on /tmp
rm $LISTA_TABEL 2>/dev/null
for definition in `cat $TABLE_DEFINITIONS`; do
echo "$definition" | \
awk 'BEGIN { FS = "(" } ; { print $1; }' >> $TABLE_LIST
done}


Dump of the data from PostgreSQL:

function dump {
### Data dump from PostgreSQL to CSV
for definition in `cat $TABLE_DEFINITIONS`; do
tabela=`echo $definition | \
awk 'BEGIN { FS = "(" } ; { print $1; }'`
file="${DIR}/${tabela}.csv"
nice ~/local/bin/psql --host=ip_address \
--user=account_name \
-c "COPY $definition TO STDOUT" db_name > $file
rows=`wc -l $file`
echo ">>File of the table $table has $rows rows"
done
}


Statements of table creation:

function create {
$MYSQLCONN < $RECREATE_TABLE
}


RECREATE_FILE contains complete database definition. Its based on PostgreSQL dump of schema with modifications to meet MySQL requirements.

Pouring the data into MySQL:

function tip {
tmpf="/tmp/tmp.csv"
for table in `cat $TABLE_LIST`; do
csv_file="$PROJHOME/csv/${table}.csv"
nice cat "$csv_file" | nice sed 's/\tt\t/\t1\t/g' | \
nice sed 's/\tf\t/\t0\t/g' > "$tmpf"
mv "$tmpf" "$csv_file"
# echo "TRUNCATE TABLE $tabela;" | $MYSQLCONN
echo "LOAD DATA LOCAL INFILE '$csv_file' INTO TABLE $table;" | \
$MYSQLCONN
done
}


Adding auto_increment behaviour to id columns:

function auto_increment {
$MYSQLCONN < $AUTO_INCREMENT


Where AUTO_INCREMENT file looks like:

ALTER TABLE user_activity MODIFY COLUMN `id` INTEGER NOT NULL AUTO_INCREMENT;
ALTER TABLE add_statistics MODIFY COLUMN `id` INTEGER NOT NULL AUTO_INCREMENT;
ALTER TABLE search_history MODIFY COLUMN `id` INTEGER NOT NULL AUTO_INCREMENT;


I added the above statements for all tables with id.

No comments: