Društvo LJUDMILA
Rozmanova ulica 12
1000 Ljubljana
Slovenia
Prostori: osmo/za

Xpg datalink

A project to implement file management from within a PostgreSQL database.

Some of this is based on SQL/MED (Management of external data) specification.

The purpose of datalinks is to provide a mechanism to synchronize the integrity control, recovery, and access control of the files and the SQL-data associated with them.

  • integrity control - makes sure files referenced from database cannot be renamed or deleted while referenced
  • access control - makes files accessible through database (instead of filesystem) permissions
  • recovery - enables point-in-time recovery of files (backups)

These are specified by SQL/MED. As such, datalinks ate good fundation for other operations. Specifically, we would like ability to read and write files referenced by datalinks.

Datalinks should enable you to:

  • efficiently store URLs (datalinks are stored as integers, not strings)
  • conveniently map between URLs and file system paths
  • safely read and write files from SQL (SQL/MED does not specify functions for this)
  • keep backups of changed files
  • manage files on different servers
  • manage datalink metadata (unified interface to extended filesystem attributes)

Datalink type

Datalink type is the most interesting part of the SQL/MED datalink support.

Synopsys

create table my_files (
  ...,
  link datalink,
  ....
)

insert into my_files (link) select dlvalue('file:///etc/passwd');
 
select into url dlurlcomplete(link) from my_files;

update my_files set link=dlnewcopy(url,0);

update my_files set link=dlpreviouscopy(url,0);

Datalink support functions

Constructors for values of type datalink:

  • DLVALUE(url) -> datalink (INSERT only)
  • DLNEWCOPY(url,tokenp) -> datalink (UPDATE only)
  • DLPREVIOUSCOPY(url,tokenp) -> datalink (UPDATE only)

Functions for extracting information from datalink type:

  • DLURLCOMPLETE(datalink) -> url
  • DLURLCOMPLETEONLY(datalink) -> url
  • DLURLCOMPLETEWRITE(datalink) -> url
  • DLURLPATH(datalink) -> fpath
  • DLURLPATHONLY(datalink) -> fpath
  • DLURLPATHWRITE(datalink) -> fpath
  • DLURLSCHEME(datalink) -> URL scheme (HTTP or FILE)
  • DLURLSERVER(datalink) -> server address


Datalink attributes per SQL spec

These are refered to as link control options by SQL spec

LINK CONTROL

  • NO (0/n)
    • just store the datalink
    • file is not "linked", no further control
  • FILE (f)
    • file is "linked"
    • file has to exist
    • level of control can be specified with further options

INTEGRITY

  • ALL (a) - linked files cannot be deleted or renamed
  • SELECTIVE (c/s) - linked files can only be deleted or modified using file manager operations, if no datalinker is installed
  • NONE (0/n) - referenced files can be deleted or modified using file manager operations, not compatible with FILE LINK CONTROL

READ ACCESS

  • DB (d)
    • read access is controlled by SQL server, based on access privileges to the datalink value
    • involves read access tokens
      • encoded into the URL by the SQL server
      • verified by external file manager/data linker
  • FS (f)
    • read access is determined by file manager

WRITE ACCESS

  • ADMIN REQUIRING TOKEN FOR UPDATE (e/t)
    • write access governed by SQL server (and datalinker)
    • involves write access token for modifying file content
  • ADMIN NOT REQUIRING TOKEN FOR UPDATE (a)
    • write access governed by SQL server (and datalinker)
  • BLOCKED (b)
    • linked files cannot be modified
  • FS (f)
    • write access controlled by file manager

RECOVERY

  • YES (1/y)
  • NO (0/n)

UNLINK

  • DELETE (d)
    • file is deleted when unlinked
  • RESTORE (1/r)
    • original properties (ownership, permissions) restored as well
  • NONE (0/n)
    • ownership and permissions are not restored

Internal datalink attribute representation

Attributes can be conveniently specified as 6-character ASCII strings, in the manner of unix file permissions. This makes them quite convenient to handle in Perl. This is defined as a domain "dl_attr" with a value constraint for values, permissible by standard.

matching regexes for legal combinations according to standard:

  1. 'nnffnn' - no control
  2. 'fsffnn' - file must exist but can dissapear later (dl:exists)
  3. 'faffnn' - link file to SQL environment (dl:link)
  4. 'fafbnr' - protect file from renaming and deletion (dl:lock)
  5. 'fafbyr' - point-in-time recovery for file (dl:??)
  6. 'fadb[yn]r' - read access thru db only (via datalinker) (dl:read...)
  7. 'fadb[yn]d' - delete when no more references to it exist (dl:delete)
  8. 'fad[ta][yn][rd]' - write access through db, replace-in-place (dl:write...)

Datalinker

Datalinker is a process external to the database which interacts with files on the filesystem. Database communicates with this process. This process must possess right privileges to be able to perform some actions.

<graphviz> digraph dl {

node [shape="box"];
postgres;
datalinker;
filesystem1;
filesystem2;
filesystem3;
edge [arrowhead="vee",arrowtail="vee"];
postgres->datalinker;
datalinker->filesystem1;
datalinker->filesystem2;
datalinker->filesystem3;

} </graphviz>

datalinker API

  • begin
  • commit
  • rollback
  • stat - get file info
  • lockw - protect against rename and delete
  • lockr - remove FS read permissions
  • unlink - delete file
  • read - read file