Migrating from MySQL to Postgres
tips on configuring postgres and migrating data from mysql
postgres configuration
pg_hba.conf
Find the file
pg_hba.conf
normally located in
/var/lib/pgsql/data
Edit this file such that it contains:
# TYPE DATABASE USER CIDR-ADDRESS METHOD
# "local" is for Unix domain socket connections only
# heikki : replaced this line:
# local all all ident sameuser
# by this line:
local all all trust
host all all 127.0.0.1/32 trust
# IPv4 local connections:
host all all 127.0.0.1/32 ident sameuser
# IPv6 local connections:
# host all all ::1/128 ident sameuser
host all all 127.0.0.1 255.0.0.0 trust
#Allow connections from the server to connect providing a password
host all all 127.0.0.1/32 md5
Now, Postgres should accept connections both from JDBC (the GeoNetwork application) and from a user using psql on the command line.
After you've made changes to this file you may need to restart Postgres. One way to do this is :
/etc/init.d/postgresql stop
/etc/init.d/postgresql start
postgresql.conf
Find the file
postgresql.conf
normally located in
/var/lib/pgsql/data
Edit this file such that you uncomment (enable) the lines :
listen_addresses = 'localhost'
port = 5432
After you've made changes to this file you may need to restart Postgres. One way to do this is :
/etc/init.d/postgresql stop
/etc/init.d/postgresql start
creating the db
Create user:
createuser -SRIP [username]
Point your command line to the create scripts in your GN installation:
cd [installation-dir]/gast/setup/sql
Create the db in Postgres:
createdb geonetwork
psql [databasename] [username] < create-db-postgres.sql
grants
After setting up the DB you need to give grants to the user specified in GeoNetwork's JDBC connection. Unfortunately in
Postgres it is impossible to give all grants on a particular DB schema, you need to do it separately for each object in
the schema, and you have to do it again in future if a new object is introduced to the schema. Here is a semi-automatic
way to do this.
Grant all on the database:
grant all privileges on database [databasename] to [username];
Find all separate grant statements for all tables :
select 'grant all on '||schemaname||'.'||tablename||' to [username];' from pg_tables where schemaname in ('public') order by schemaname, tablename;
The output should look like this:
?column?
--------------------------------------------------------
grant all on public.categories to [username];
grant all on public.categoriesdes to [username];
grant all on public.groups to [username];
grant all on public.groupsdes to [username];
grant all on public.isolanguages to [username];
grant all on public.isolanguagesdes to [username];
grant all on public.languages to [username];
grant all on public.metadata to [username];
grant all on public.metadatacateg to [username];
grant all on public.metadatarating to [username];
grant all on public.monitoringlog to [username];
grant all on public.monitoringrequests to [username];
grant all on public.monitoringservices to [username];
grant all on public.monitoringstatsgroups to [username];
grant all on public.operationallowed to [username];
grant all on public.operations to [username];
grant all on public.operationsdes to [username];
grant all on public.regions to [username];
grant all on public.regionsdes to [username];
grant all on public.relations to [username];
grant all on public.settings to [username];
grant all on public.sources to [username];
grant all on public.usergroups to [username];
grant all on public.users to [username];
(24 rows)
Now copy all grant statements from the output above and enter them in psql. This you can probably do all at once.
The above step granted privileges to tables. Although I did not see any results when I did it, you must also repeat
the step above for the other types of schema objects. The corresponding queries to generate them grant statements are :
for views :
select 'grant all on '||n.nspname||'.'||c.relname||' to [username];' from pg_catalog.pg_class c left join pg_catalog.pg_namespace n on n.oid = c.relnamespace where n.nspname in ('public') and c.relkind = 'v' order by n.nspname, c.relname;
for functions :
select 'grant all on function '||n.nspname||'.'||p.proname||'('||oidvectortypes(p.proargtypes)||') to [username];' from pg_proc p, pg_namespace n where n.oid = p.pronamespace and n.nspname in ('public') order by n.nspname, p.proname;
for sequences :
select 'grant all on '||n.nspname||'.'||c.relname||' to [username];' from pg_class c, pg_namespace n where n.oid = c.relnamespace and c.relkind in ('S') and n.nspname in ('public');
That's it, your Postgres DB should be ready for use with GeoNetwork OpenSource -- after you've migrated the data from
your previous installation that uses MySQL.
migrating data from mysql to postgres
export data from mysql
This is a non-trivial affair, that may involve getting around lots of character encoding problems in various software, and
fixing the contents of database dumps to produce strings that are legal in Postgres.
create dump
First, we create a MySQL dump using the Postgres so-called «compatibility» option.
mysqldump -u [username] -p[password] --default-character-set=utf8 --compatible=postgresql --no-create-info --skip-extended-insert --complete-insert --skip-opt -r dump.sql [databasename]
If we did not use the --no-create-info option, this would also output create table statements, which arguably might make it
easier to import (you woudn't need to create the DB by hand first). However doing that, the dump would contain lots of
Postgres-incompatible table definitions etc. so I recommend you give up trying to fix all that and instead, separate the
Postgres database creation from data insertion. You can create the Postgres database as described in the sections above.
turn dump into UTF-8 encoding
Even though we specified utf8 in the above dump command, chances are MySQL did not actually use that encoding for the dump.
click here for some more information on that
It turns out the --default-character-set=utf8 setting does not mean that actually, the output is encoded in UTF8.
Some people say setting the whole MySQL database to UTF8 helps:
find my.cnf and edit it such that you add :
[client]
default-character-set=utf8
...
[mysqld]
default-character-set=utf8
default-collation = utf8_general_ci
...
restart MySQL :
/etc/init.d/mysql restart
You can verify the result in a mysql client by doing :
SHOW VARIABLES LIKE 'character_set%';
which should produce an output like
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
7 rows in set (0.00 sec)
But in my experience, this does not have the effect that mysqldump produce UTF8.
Others say that
using a ">" redirection in a linux shell does not properly handle UTF8 -- that's why I'm using the -r switch to specify
the output file for mysqldump. Although that too, did not make a difference to me -- still no UTF8-encoded output.
Yet others say that using --default-character-set=latin1
(sic!) could induce mysqldump to produce UTF8 output, but at least in my case, it did not.
After all this I gave up on getting mysqldump to produce UTF8, and instead tried to convert the output
using a text editor.
Tried TextPad, saving the file as UTF8; but TextPad did not convert the non-UTF8 characters in it to UTF8, alas.
Then tried UltraEdit, which actually seemed to convert the document to UTF8, but I could not get UltraEdit to desist
adding a BOM (byte order mark) to the resulting file; and «of course» you can't import a UTF8 file with BOM into Postgres.
Finally Notepad++ did the trick.
So now convert the resulting dump.sql into UTF8 encoding :
open dump.sql in Notepad++
select "Format -> Encode in UTF8 without BOM"
save the file
fix dump for postgres
Unfortunately the resulting dump file is not yet ready as such to be imported into Postgres and need to be processed further.
A PERL utility to convert MySQL dumps into valid Postgres sql is mysql2pgsql.
Unfortunately when I tried it on my Linux server, after a promising looking start it crashed with a segmentation fault. It may
work for you, that's why I mention it.
If you didn't use mysql2pgsql, continue with the dump. The next action is to change quote escaping and replace escaped carriage
returns and newlines with actual carriage returns and newlines :
sed "s/\\\'/\'\'/g" dump.sql | sed "s/\\\r/\r/g" | sed "s/\\\n/\n/g" > escaped-dump.sql
Assuming you already created the GeoNetwork database in Postgres, continue :
remove everything from the resulting escaped-dump.sql file except for the INSERT statements. Likely there is not much
except some MySQL Server SQL Mode declarations at the very top and bottom that you need to remove. The resulting file
contains INSERT statements for the following tables :
- Categories
- CategoriesDes
- Groups
- GroupsDes
- IsoLanguages
- IsoLanguagesDes
- Languages
- Metadata
- MetadataCateg
- MetadataRating
- Monitoringlog
- Monitoringrequests
- Monitoringservices
- Monitoringstatsgroups
- OperationAllowed
- Operations
- OperationsDes
- Regions
- RegionsDes
- Settings
- Sources
- UserGroups
- Users
to avoid problems with case-sensitivity in table and column names, remove all quotes around all table names and column names
change the order of the INSERT statements so that it becomes :
- categories
- languages
- categoriesdes
- groups
- groupsdes
- isolanguages
- isolanguagesdes
- users
- usergroups
- metadata
- metadatacateg
- metadatarating
- monitoringlog
- monitoringrequests
- monitoringservices
- monitoringstatsgroups
- operations
- operationallowed
- operationsdes
- regions
- regionsdes
- settings
- sources
In the INSERT statement for table "groups", if there are any values "null" or "0" for the boolean field "isorganisation"
change them to "false", and if there are values "1" change them to "true".
relax database constraints
Drop the parent foreign key constraint from table "settings". Open psql and view the table structure:
psql -d geonetwork -U [username]
<geonetwork>=# \d settings
The output should look like this :
Table "public.settings"
Column | Type | Modifiers
----------+------------------------+-----------
id | integer | not null
parentid | integer |
name | character varying(32) | not null
value | character varying(250) |
Indexes:
"settings_pkey" PRIMARY KEY, btree (id)
Foreign-key constraints:
"settings_parentid_fkey" FOREIGN KEY (parentid) REFERENCES settings(id)
Drop the constraint :
alter table settings drop constraint "settings_parentid_fkey" ;
ensure db is empty
truncate all tables (just in case there still was some data from earlier attempts) :
truncate categories cascade;
truncate languages cascade;
truncate categoriesdes cascade;
truncate groups cascade;
truncate groupsdes cascade;
truncate isolanguages cascade;
truncate isolanguagesdes cascade;
truncate users cascade;
truncate usergroups cascade;
truncate metadata cascade;
truncate metadatacateg cascade;
truncate metadatarating cascade;
truncate monitoringlog cascade;
truncate monitoringrequests cascade;
truncate monitoringservices cascade;
truncate monitoringstatsgroups cascade;
truncate operations cascade;
truncate operationallowed cascade;
truncate operationsdes cascade;
truncate regions cascade;
truncate regionsdes cascade;
truncate settings cascade;
truncate sources cascade;
inport dump into postgres
import the dump into postgres :
psql -d [databasename] -U [username] < escaped-dump.sql
It is likely you'll get some errors caused by corrupt data from MySQL (that doesn't enforce referential integrity),
specifically I got some records in table operationallowed referencing non-existing metadata. You'll also get some
warnings about "non-standard" string escapes in table metadata. You can ignore these errors and warnings.
restore database constraints
add the constraint on settings again :
alter table settings add constraint "settings_parentid_fkey" foreign key (parentid) references settings(id);
This may fail if the referential integrity on settings is corrupt, as it was when I migrated my database. I got the
message
ERROR: insert or update on table "settings" violates foreign key constraint "settings_parentid_fkey"
DETAIL: Key (parentid)=(188) is not present in table "settings".
and solved it by removing the orphan settings :
delete from settings where parentid = 188 ;
Then try again to add the constraint, I got similar errors about orphans referring to 460, 461, 462, 465 and 473, remove
all such until adding the constraint succeeds.
That's it ! Your Postgres DB should be up and ready for use with GeoNetwork.