A project to implement file management from within a PostgreSQL database.
Some of this is based on SQL/MED (Management of external data) specification.
The purpose of datalinks is to provide a mechanism to synchronize the integrity control, recovery, and access control of the files and the SQL-data associated with them.
- integrity control - makes sure files referenced from database cannot be renamed or deleted while referenced
- access control - makes files accessible through database (instead of filesystem) permissions
- recovery - enables point-in-time recovery of files (backups)
These are specified by SQL/MED. As such, datalinks ate good fundation for other operations. Specifically, we would like ability to read and write files referenced by datalinks.
Datalinks should enable you to:
- efficiently store URLs (datalinks are stored as integers, not strings)
- conveniently map between URLs and file system paths
- safely read and write files from SQL (SQL/MED does not specify functions for this)
- keep backups of changed files
- manage files on different servers
- manage datalink metadata (unified interface to extended filesystem attributes)
Vsebina
Datalink type
Datalink type is the most interesting part of the SQL/MED datalink support.
Synopsys
create table my_files ( ..., link datalink, .... ) insert into my_files (link) select dlvalue('file:///etc/passwd'); select into url dlurlcomplete(link) from my_files; update my_files set link=dlnewcopy(url,0); update my_files set link=dlpreviouscopy(url,0);
Datalink support functions
Constructors for values of type datalink:
- DLVALUE(url) -> datalink (INSERT only)
- DLNEWCOPY(url,tokenp) -> datalink (UPDATE only)
- DLPREVIOUSCOPY(url,tokenp) -> datalink (UPDATE only)
Functions for extracting information from datalink type:
- DLURLCOMPLETE(datalink) -> url
- DLURLCOMPLETEONLY(datalink) -> url
- DLURLCOMPLETEWRITE(datalink) -> url
- DLURLPATH(datalink) -> fpath
- DLURLPATHONLY(datalink) -> fpath
- DLURLPATHWRITE(datalink) -> fpath
- DLURLSCHEME(datalink) -> URL scheme (HTTP or FILE)
- DLURLSERVER(datalink) -> server address
Datalink attributes per SQL spec
These are refered to as link control options by SQL spec
LINK CONTROL
- NO (0/n)
- just store the datalink
- file is not "linked", no further control
- FILE (f)
- file is "linked"
- file has to exist
- level of control can be specified with further options
INTEGRITY
- ALL (a) - linked files cannot be deleted or renamed
- SELECTIVE (c/s) - linked files can only be deleted or modified using file manager operations, if no datalinker is installed
- NONE (0/n) - referenced files can be deleted or modified using file manager operations, not compatible with FILE LINK CONTROL
READ ACCESS
- DB (d)
- read access is controlled by SQL server, based on access privileges to the datalink value
- involves read access tokens
- encoded into the URL by the SQL server
- verified by external file manager/data linker
- FS (f)
- read access is determined by file manager
WRITE ACCESS
- ADMIN REQUIRING TOKEN FOR UPDATE (e/t)
- write access governed by SQL server (and datalinker)
- involves write access token for modifying file content
- ADMIN NOT REQUIRING TOKEN FOR UPDATE (a)
- write access governed by SQL server (and datalinker)
- BLOCKED (b)
- linked files cannot be modified
- FS (f)
- write access controlled by file manager
RECOVERY
- YES (1/y)
- NO (0/n)
UNLINK
- DELETE (d)
- file is deleted when unlinked
- RESTORE (1/r)
- original properties (ownership, permissions) restored as well
- NONE (0/n)
- ownership and permissions are not restored
Internal datalink attribute representation
Attributes can be conveniently specified as 6-character ASCII strings, in the manner of unix file permissions. This makes them quite convenient to handle in Perl. This is defined as a domain "dl_attr" with a value constraint for values, permissible by standard.
matching regexes for legal combinations according to standard:
- 'nnffnn' - no control
- 'fsffnn' - file must exist but can dissapear later (dl:exists)
- 'faffnn' - link file to SQL environment (dl:link)
- 'fafbnr' - protect file from renaming and deletion (dl:lock)
- 'fafbyr' - point-in-time recovery for file (dl:??)
- 'fadb[yn]r' - read access thru db only (via datalinker) (dl:read...)
- 'fadb[yn]d' - delete when no more references to it exist (dl:delete)
- 'fad[ta][yn][rd]' - write access through db, replace-in-place (dl:write...)
Datalinker
Datalinker is a process external to the database which interacts with files on the filesystem. Database communicates with this process. This process must possess right privileges to be able to perform some actions.
<graphviz> digraph dl {
node [shape="box"]; postgres; datalinker; filesystem1; filesystem2; filesystem3;
edge [arrowhead="vee",arrowtail="vee"]; postgres->datalinker; datalinker->filesystem1; datalinker->filesystem2; datalinker->filesystem3;
} </graphviz>
datalinker API
- begin
- commit
- rollback
- stat - get file info
- lockw - protect against rename and delete
- lockr - remove FS read permissions
- unlink - delete file
- read - read file