Migrating from MySQL to Postgres

tips on configuring postgres and migrating data from mysql

postgres configuration

pg_hba.conf

Find the file

pg_hba.conf
normally located in
/var/lib/pgsql/data
Edit this file such that it contains:
# TYPE DATABASE USER CIDR-ADDRESS METHOD

# "local" is for Unix domain socket connections only
# heikki : replaced this line:
# local all all ident sameuser
# by this line:
local all all trust
host all all 127.0.0.1/32 trust

# IPv4 local connections:
host all all 127.0.0.1/32 ident sameuser
# IPv6 local connections:
# host all all ::1/128 ident sameuser
host all all 127.0.0.1 255.0.0.0 trust
#Allow connections from the server to connect providing a password
host all all 127.0.0.1/32 md5
Now, Postgres should accept connections both from JDBC (the GeoNetwork application) and from a user using psql on the command line.

After you've made changes to this file you may need to restart Postgres. One way to do this is :

/etc/init.d/postgresql stop
/etc/init.d/postgresql start

postgresql.conf

Find the file

postgresql.conf
normally located in
/var/lib/pgsql/data
Edit this file such that you uncomment (enable) the lines :
listen_addresses = 'localhost'
port = 5432

After you've made changes to this file you may need to restart Postgres. One way to do this is :

/etc/init.d/postgresql stop
/etc/init.d/postgresql start

creating the db

Create user:

createuser -SRIP [username]
Point your command line to the create scripts in your GN installation:
cd [installation-dir]/gast/setup/sql
Create the db in Postgres:
createdb geonetwork
psql [databasename] [username] < create-db-postgres.sql

grants

After setting up the DB you need to give grants to the user specified in GeoNetwork's JDBC connection. Unfortunately in Postgres it is impossible to give all grants on a particular DB schema, you need to do it separately for each object in the schema, and you have to do it again in future if a new object is introduced to the schema. Here is a semi-automatic way to do this.

Grant all on the database:

grant all privileges on database [databasename] to [username];
Find all separate grant statements for all tables :
select 'grant all on '||schemaname||'.'||tablename||' to [username];' from pg_tables where schemaname in ('public') order by schemaname, tablename;
The output should look like this:
                        ?column?
--------------------------------------------------------
grant all on public.categories to [username];
grant all on public.categoriesdes to [username];
grant all on public.groups to [username];
grant all on public.groupsdes to [username];
grant all on public.isolanguages to [username];
grant all on public.isolanguagesdes to [username];
grant all on public.languages to [username];
grant all on public.metadata to [username];
grant all on public.metadatacateg to [username];
grant all on public.metadatarating to [username];
grant all on public.monitoringlog to [username];
grant all on public.monitoringrequests to [username];
grant all on public.monitoringservices to [username];
grant all on public.monitoringstatsgroups to [username];
grant all on public.operationallowed to [username];
grant all on public.operations to [username];
grant all on public.operationsdes to [username];
grant all on public.regions to [username];
grant all on public.regionsdes to [username];
grant all on public.relations to [username];
grant all on public.settings to [username];
grant all on public.sources to [username];
grant all on public.usergroups to [username];
grant all on public.users to [username];
(24 rows)
Now copy all grant statements from the output above and enter them in psql. This you can probably do all at once.

The above step granted privileges to tables. Although I did not see any results when I did it, you must also repeat the step above for the other types of schema objects. The corresponding queries to generate them grant statements are :

for views :

select 'grant all on '||n.nspname||'.'||c.relname||' to [username];' from pg_catalog.pg_class c left join pg_catalog.pg_namespace n on n.oid = c.relnamespace where n.nspname in ('public') and c.relkind = 'v' order by n.nspname, c.relname;

for functions :

select 'grant all on function '||n.nspname||'.'||p.proname||'('||oidvectortypes(p.proargtypes)||') to [username];' from pg_proc p, pg_namespace n where n.oid = p.pronamespace and n.nspname in ('public') order by n.nspname, p.proname;

for sequences :

select 'grant all on '||n.nspname||'.'||c.relname||' to [username];' from pg_class c, pg_namespace n where n.oid = c.relnamespace and c.relkind in ('S') and n.nspname in ('public');

That's it, your Postgres DB should be ready for use with GeoNetwork OpenSource -- after you've migrated the data from your previous installation that uses MySQL.

migrating data from mysql to postgres

export data from mysql

This is a non-trivial affair, that may involve getting around lots of character encoding problems in various software, and fixing the contents of database dumps to produce strings that are legal in Postgres.

create dump

First, we create a MySQL dump using the Postgres so-called «compatibility» option.

mysqldump -u [username] -p[password] --default-character-set=utf8 --compatible=postgresql --no-create-info --skip-extended-insert --complete-insert --skip-opt -r dump.sql [databasename]
If we did not use the --no-create-info option, this would also output create table statements, which arguably might make it easier to import (you woudn't need to create the DB by hand first). However doing that, the dump would contain lots of Postgres-incompatible table definitions etc. so I recommend you give up trying to fix all that and instead, separate the Postgres database creation from data insertion. You can create the Postgres database as described in the sections above.

turn dump into UTF-8 encoding

Even though we specified utf8 in the above dump command, chances are MySQL did not actually use that encoding for the dump.

So now convert the resulting dump.sql into UTF8 encoding :
open dump.sql in Notepad++
select "Format -> Encode in UTF8 without BOM"
save the file

fix dump for postgres

Unfortunately the resulting dump file is not yet ready as such to be imported into Postgres and need to be processed further.

A PERL utility to convert MySQL dumps into valid Postgres sql is mysql2pgsql. Unfortunately when I tried it on my Linux server, after a promising looking start it crashed with a segmentation fault. It may work for you, that's why I mention it.

If you didn't use mysql2pgsql, continue with the dump. The next action is to change quote escaping and replace escaped carriage returns and newlines with actual carriage returns and newlines :

sed "s/\\\'/\'\'/g" dump.sql | sed "s/\\\r/\r/g" | sed "s/\\\n/\n/g" > escaped-dump.sql

Assuming you already created the GeoNetwork database in Postgres, continue :

remove everything from the resulting escaped-dump.sql file except for the INSERT statements. Likely there is not much except some MySQL Server SQL Mode declarations at the very top and bottom that you need to remove. The resulting file contains INSERT statements for the following tables :
  • Categories
  • CategoriesDes
  • Groups
  • GroupsDes
  • IsoLanguages
  • IsoLanguagesDes
  • Languages
  • Metadata
  • MetadataCateg
  • MetadataRating
  • Monitoringlog
  • Monitoringrequests
  • Monitoringservices
  • Monitoringstatsgroups
  • OperationAllowed
  • Operations
  • OperationsDes
  • Regions
  • RegionsDes
  • Settings
  • Sources
  • UserGroups
  • Users

to avoid problems with case-sensitivity in table and column names, remove all quotes around all table names and column names

change the order of the INSERT statements so that it becomes :
  • categories
  • languages
  • categoriesdes
  • groups
  • groupsdes
  • isolanguages
  • isolanguagesdes
  • users
  • usergroups
  • metadata
  • metadatacateg
  • metadatarating
  • monitoringlog
  • monitoringrequests
  • monitoringservices
  • monitoringstatsgroups
  • operations
  • operationallowed
  • operationsdes
  • regions
  • regionsdes
  • settings
  • sources

In the INSERT statement for table "groups", if there are any values "null" or "0" for the boolean field "isorganisation" change them to "false", and if there are values "1" change them to "true".

relax database constraints

Drop the parent foreign key constraint from table "settings". Open psql and view the table structure:
psql -d geonetwork -U [username]
<geonetwork>=# \d settings
The output should look like this :
Table "public.settings"
Column | Type | Modifiers
----------+------------------------+-----------
id | integer | not null
parentid | integer |
name | character varying(32) | not null
value | character varying(250) |
Indexes:
"settings_pkey" PRIMARY KEY, btree (id)
Foreign-key constraints:
"settings_parentid_fkey" FOREIGN KEY (parentid) REFERENCES settings(id)
Drop the constraint :
alter table settings drop constraint "settings_parentid_fkey" ;

ensure db is empty

truncate all tables (just in case there still was some data from earlier attempts) :
truncate categories cascade;
truncate languages cascade;
truncate categoriesdes cascade;
truncate groups cascade;
truncate groupsdes cascade;
truncate isolanguages cascade;
truncate isolanguagesdes cascade;
truncate users cascade;
truncate usergroups cascade;
truncate metadata cascade;
truncate metadatacateg cascade;
truncate metadatarating cascade;
truncate monitoringlog cascade;
truncate monitoringrequests cascade;
truncate monitoringservices cascade;
truncate monitoringstatsgroups cascade;
truncate operations cascade;
truncate operationallowed cascade;
truncate operationsdes cascade;
truncate regions cascade;
truncate regionsdes cascade;
truncate settings cascade;
truncate sources cascade;

inport dump into postgres

import the dump into postgres :
psql -d [databasename] -U [username] < escaped-dump.sql
It is likely you'll get some errors caused by corrupt data from MySQL (that doesn't enforce referential integrity), specifically I got some records in table operationallowed referencing non-existing metadata. You'll also get some warnings about "non-standard" string escapes in table metadata. You can ignore these errors and warnings.

restore database constraints

add the constraint on settings again :
alter table settings add constraint "settings_parentid_fkey" foreign key (parentid) references settings(id);
This may fail if the referential integrity on settings is corrupt, as it was when I migrated my database. I got the message
ERROR: insert or update on table "settings" violates foreign key constraint "settings_parentid_fkey"
DETAIL: Key (parentid)=(188) is not present in table "settings".
and solved it by removing the orphan settings :
delete from settings where parentid = 188 ;
Then try again to add the constraint, I got similar errors about orphans referring to 460, 461, 462, 465 and 473, remove all such until adding the constraint succeeds.

That's it ! Your Postgres DB should be up and ready for use with GeoNetwork.