A project to implement file management from within a PostgreSQL database.
Some of this is based on SQL/MED (Management of external data) specification.
The purpose of datalinks is to provide a mechanism to synchronize the integrity control, recovery, and access control of the files and the SQL-data associated with them.
- integrity control - makes sure files referenced from database cannot be renamed or deleted while referenced
- access control - makes files accessible through database (instead of filesystem) permissions
- recovery - enables point-in-time recovery of files (backups)
These are specified by SQL/MED. As such, datalinks ate good fundation for other operations. Specifically, we would like ability to read and write files referenced by datalinks.
Datalinks should enable you to:
- efficiently store URLs (datalinks are stored as integers, not strings)
- conveniently map between URLs and file system paths
- safely read and write files from SQL (SQL/MED does not specify functions for this)
- keep backups of changed files
- manage files on different servers
- manage datalink metadata (unified interface to extended filesystem attributes)
Vsebina
Datalink type
Datalink type is the most interesting part of the SQL/MED datalink support.
Synopsys
create table my_files (
  ...,
  link datalink,
  ....
)
insert into my_files (link) select dlvalue('file:///etc/passwd');
 
select into url dlurlcomplete(link) from my_files;
update my_files set link=dlnewcopy(url,0);
update my_files set link=dlpreviouscopy(url,0);
Datalink support functions
Constructors for values of type datalink:
- DLVALUE(url) -> datalink (INSERT only)
- DLNEWCOPY(url,tokenp) -> datalink (UPDATE only)
- DLPREVIOUSCOPY(url,tokenp) -> datalink (UPDATE only)
Functions for extracting information from datalink type:
- DLURLCOMPLETE(datalink) -> url
- DLURLCOMPLETEONLY(datalink) -> url
- DLURLCOMPLETEWRITE(datalink) -> url
- DLURLPATH(datalink) -> fpath
- DLURLPATHONLY(datalink) -> fpath
- DLURLPATHWRITE(datalink) -> fpath
- DLURLSCHEME(datalink) -> URL scheme (HTTP or FILE)
- DLURLSERVER(datalink) -> server address
Datalink attributes per SQL spec
These are refered to as link control options by SQL spec
LINK CONTROL
-  NO (0/n)
- just store the datalink
- file is not "linked", no further control
 
-  FILE (f)
- file is "linked"
- file has to exist
- level of control can be specified with further options
 
INTEGRITY
- ALL (a) - linked files cannot be deleted or renamed
- SELECTIVE (c/s) - linked files can only be deleted or modified using file manager operations, if no datalinker is installed
- NONE (0/n) - referenced files can be deleted or modified using file manager operations, not compatible with FILE LINK CONTROL
READ ACCESS
-  DB (d)
- read access is controlled by SQL server, based on access privileges to the datalink value
-  involves read access tokens
- encoded into the URL by the SQL server
- verified by external file manager/data linker
 
 
-  FS (f)
- read access is determined by file manager
 
WRITE ACCESS
-  ADMIN REQUIRING TOKEN FOR UPDATE (e/t)
- write access governed by SQL server (and datalinker)
- involves write access token for modifying file content
 
-  ADMIN NOT REQUIRING TOKEN FOR UPDATE (a)
- write access governed by SQL server (and datalinker)
 
-  BLOCKED (b)
- linked files cannot be modified
 
-  FS (f)
- write access controlled by file manager
 
RECOVERY
- YES (1/y)
- NO (0/n)
UNLINK
-  DELETE (d)
- file is deleted when unlinked
 
-  RESTORE (1/r)
- original properties (ownership, permissions) restored as well
 
-  NONE (0/n)
- ownership and permissions are not restored
 
Internal datalink attribute representation
Attributes can be conveniently specified as 6-character ASCII strings, in the manner of unix file permissions. This makes them quite convenient to handle in Perl. This is defined as a domain "dl_attr" with a value constraint for values, permissible by standard.
matching regexes for legal combinations according to standard:
- 'nnffnn' - no control
- 'fsffnn' - file must exist but can dissapear later (dl:exists)
- 'faffnn' - link file to SQL environment (dl:link)
- 'fafbnr' - protect file from renaming and deletion (dl:lock)
- 'fafbyr' - point-in-time recovery for file (dl:??)
- 'fadb[yn]r' - read access thru db only (via datalinker) (dl:read...)
- 'fadb[yn]d' - delete when no more references to it exist (dl:delete)
- 'fad[ta][yn][rd]' - write access through db, replace-in-place (dl:write...)
Datalinker
Datalinker is a process external to the database which interacts with files on the filesystem. Database communicates with this process. This process must possess right privileges to be able to perform some actions.
<graphviz> digraph dl {
node [shape="box"]; postgres; datalinker; filesystem1; filesystem2; filesystem3;
edge [arrowhead="vee",arrowtail="vee"]; postgres->datalinker; datalinker->filesystem1; datalinker->filesystem2; datalinker->filesystem3;
} </graphviz>
datalinker API
- begin
- commit
- rollback
- stat - get file info
- lockw - protect against rename and delete
- lockr - remove FS read permissions
- unlink - delete file
- read - read file
