icon ${title}

Our Data collaboration API is used to setup datasets, post changes to data and meta-data, retrieve data changes as well as commit details. The usage pattern is for each client to keep a version of the data and make incremental changes which are then submitted to the server. #### Authentication OAuth2 authentication is used and more information about how to create authentication tokens can be found at [Authentication API](/api_authentication). With each request, the following headers need to be passed to the Data Collaboration API. 1. `email` - This is the email of the user as such 'john.denkins@fittrack.com'. 2. `clientid` - This client id establishes the use of a unique client. It is used in case a user logs in from multiple laptops or browsers. This can be any alpha-numeric block as such '9e0b6716-485e-4e21-b3b6-4db8591fd69c'. 3. `Authorization` - This is the OAuth2 header in the format "Bearer {auth_token}" where auth_token is what has been generated by the API server. It typically will look like this: 'Bearer bENYbksvaFh6Tk95LzJhci9UTWE1cyszR2Mrb1NxSDdrK01rOUc0TFFHeklhRThRQktuWTFRPT0='. #### Standard Errors HTTP error codes can be received if certain conditions aren't met, such as account or payment status. 1. 417 - Expectation failed - in case the account is not in an active status, or it is maxed out beyond the limits of the current plan 2. 402 - Payment required - in case payment on an account is past due ### List Datasets This will give the list of datasets that a user is enabled for. >GET /api/v1/datasets Request Headers: If-None-Match: the last change tag that the client is current with datasets till, if left blank, then all datasets are returned Response Body: { datasets: [{ id: "", name: "", branches: [{ tag: "", currentChangeTag: "", permission: "", collaborationGroupId: "" }, ..] }, ..] } ### Create dataset / Create branch This will create a dataset or another named branch for an existing dataset. If the collaboration group id is specified, it will associate that with the dataset/branch. If instead the group members are specified, then a new group with those members will be created. It will also update the group members, if the branch exists. For an existing branch, only changes by an owner will be "accepted". If creating a branch and the collaboration group is provided, then the list of owners, editors, and viewers is ignored. If the branch exists, then the collaboration group is updated with the provided group members. >PUT /api/v1/datasets/{dataset id}/{b-{branch tag}} Form Body: { group: { id: "", owners: [{}], editors: [{email: "user1@acme.com", name: "User 1"}, ...], viewers: [] }, defn: {id: "", name: "", fields: [{name: 'Field1', fieldtype: 'string', fnum: 1}, ...] } } Response: Status-Code: 201 if create successful, 200 if update successful, 400 if the request is not properly setup Response Body: { code: 200/201/400, msg: ""} ### Get dataset details This will return all the other information about a dataset, including the various view settings, branches for this user, and the data profile (if requested). The branch filter is optional. >GET /api/v1/datasets/{dataset id}/{b-{branch tag}}/details Form Parameters: profile: y/n (default - no) steps: y/n (default - no) Response: { id: "", name: "", branches: [{branch: "master", collaborationGroupId: "", owners: [{}], editors: [{email: "user1@acme.com", name: "User 1"}, ...], viewers: [], preferredView: "", tableLayout: "", cardLayout: "", recordLayout: "", showRecordView: "" }, ... ], datasetProfile: {}, steps: [] } ### Update dataset details This will update the specified information for a dataset, including the various view settings, branches for this user, and the data profile (if requested). The branch filter is optional. > POST /api/v1/datasets/{dataset id}/{b-{branch tag}}/details Form Body: { name: "", description: "", defn: {id: "", name: "", description: "", fields: [{name: 'Field1', fieldtype: 'string', fnum: 1}, ...] } tableLayout: "Field1,Field4,Field2", cardLayout: "Field1{font-size: 1.8rem; label-position: none; width: 300px;},Field4\nField2", recordLayout: "Field1{font-size: 1.8rem; label-position: none; width: 300px;},Field4\nField2", preferredView: "card|table", steps: [], addField: {name: "NewField", fieldtype: "string", format: "", lov: ""} } Response: Status-Code: 200 if update successful, 500 if the server error, and 401 if not authorized to make the changes Response Body: If the definition was updated, then the definition is returned. { code: 200/201/400, msg: "", defn: {}} If an addField node was send, then the field definition, including the generated field number, is returned. { code: 200/201/400, msg: "", addField: {name: "", fnum: "", fieldtype: "", edittype: "", format: ""}} Implementation Details: 1. If the addField child is found, then the field is added to the defn with the latest field number, and then server sends back the field defn ### Delete branch This will delete the branch, but will first check if there are changes on this branch that have not been committed. If so, it will provide a reconfirmCode, which needs to be sent in a subsequent request for the branch delete to be processed, while discarding the changes on the branch. >DELETE /api/v1/datasets/{dataset id}/{b-{branch tag}} Form Parameters: OverrideCode: alpha-numeric, provided by a previous call, in case not all changes have been merged into the parent branch Response: Status-Code: 200 if successful, 412 if pending changes on branch (pre-condition failed) Response Body: only if response code is 412 { msg: "Human readable text describing the unmerged changes", overrideCode: "" -- this is optional if there are pending changes on the branch } ### Get latest data This returns the data in either the qvikly binary record streaming format or as XML or JSON. It includes the data and the type definition for the specified branch. >GET /api/v1/datasets/{dataset id}/{b-{branch tag}} Request Headers: Accept: application/json (default) If-None-Match: the last dataset change tag that the client is current till, if left blank, then all data is returned Form Parameters: collaboration (y/n): if yes, then msgs are returned, again based on the last change tag presented in the If-None-Match header tillChangeTag: only show updates till this specified tag Response Headers: ETag - contains the dataset change tag Response (JSON example): { fields: [{fnum: 1, name: "Field1", title: "Field 1", fieldtype: "string"}, ...], recs: [ {f1: "", f2: 123, ...}, {} ], msgs: [ {branchTag: "", datasetid: "", history: [ {by: "user1@acme.com", comments: "", date: "2001-01-13T01:10:15.001-07:00", tag: "", dataChange: { addedRecs: ["rec guid"], updatedRecs: [], deletedRecs: [] } } ]} ] } ### Post data updates Any changes will be sent to the server. If the dataset doesn't have the latest ETag, then the server will check if there's been changes to any of the records being posted. If so, the post will fail. If none of the records have had other changes, then it will proceed with updating the dataset commit log. >POST /api/v1/datasets/{dataset id}/{b-{branch tag}} Request Headers: Content-Type: application/json ETag: the dataset change tag that the dataset has been updated to Form Body (JSON example): { comments: "", recs: [ {f1: "", f2: 123, ...}, {} ] } Form Parameters: PartialOk: y/n Response: Status-Code: 200 or 409 (if conflict that requires client-side update) or 204 (if no content posted) Response Headers: E-Tag: the new dataset change tag that's been generated after the data updates were committed Form Data: (JSON example) { latestChangeTag: "", recs: [ {f1: "", f2: 123, ...}, {} ] } It sends the records that should be updated on the client side, because while not in conflict, they might have also been updated/added. Implementation Details: 1. Check if the dataset branch has had other commits, based on comparing the latest change tag for the branch to what's presented. 2. If no other changes have been received, then proceed to the SAVE step to update dataset on the server with the updates. 3. Create list of changed/deleted records being posted. Go through all subsequent commit records for that branch, and see if any of the records impacted are being posted. If so, then fail the post request. Otherwise, proceed to the SAVE step. SAVE: a. Save the change file and establish a new dataset change tag. b. Append to the dataset commit log a new entry c. Initiate asynch process to update the record archive SEND DIFFERENCES: a. A 409 error code (CONLICT) will be sent SEND SUCCESS: a. As part of success, the response will contain the new "ETag" with the newly generated dataset change tag. b. Any records that need to be updated on the client to be able caught up to the new dataset change tag will be sent back, in the format the data is received. ### Post comments Comments can be posted at the dataset level or for a specific set of records (one or more). >POST /api/v1/datasets/{dataset id}/{b-{branch tag}} Request Headers: Content-Type: application/json ETag: the dataset change tag that the dataset has been updated to Form Body (JSON example): { comments: "", refCommentId: "", --- optional in case of commenting on another's comment refRecordIds: [ ] --- optional if commenting on a specific record to allow for record level chatter or highlighting multiple records } Response: Status-Code: 200 or 204 (if no content posted) Form Data: (JSON example) { latestChangeTag: "" } It sends the records that should be updated on the client side, because while not in conflict, they might have also been updated/added. Implementation Details: 1. Check if the dataset branch has had other commits, based on comparing the latest change tag for the branch to what's presented. 2. If no other changes have been received, then proceed to the SAVE step to update dataset on the server with the updates. 3. Create list of changed/deleted records being posted. Go through all subsequent commit records for that branch, and see if any of the records impacted are being posted. If so, then fail the post request. Otherwise, proceed to the SAVE step. SAVE: a. Save the change file and establish a new dataset change tag. b. Append to the dataset commit log a new entry c. Initiate asynch process to update the record archive SEND DIFFERENCES: a. A 409 error code (CONLICT) will be sent SEND SUCCESS: a. As part of success, the response will contain the new "ETag" with the newly generated dataset change tag. b. Any records that need to be updated on the client to be able caught up to the new dataset change tag will be sent back, in the format the data is received. ### Get msgs Any changes on the server will be retrieved, including the commit history. This can be called for multiple datasets, since it is intended for a msg view. The dataset ids must be specified. Either GET or POST is suppported, since this is technically a GET request and doesn't change anything on the server, but requires data to be in the request body, which is typically done with POST requests. For datasets where the collaboration is setup at the record level, an internal messages table containing tuples { user id, dataset id, record id, change tag } is used as a cache to quickly look up if the user has any pending updates. For datasets where the collaboration is at the dataset/branch level, the dataset commit log is used to find additional changes. >GET/POST /api/v1/msgs Form Body: { datasets: [ {datasetid: "", branch: "", sinceChangeTag: "", tillChangeTag: ""}, ...] } - If the sinceChangeTag for a dataset id isn't specified or is blank, then the entire history is sent back - If the tillChangeTag for a dataset is specified, then history is limited till that point Response: Status-Code: 200 if successful, 400 if the request is not properly setup Response Body: { msgs: [ { datasetid: "", branch: "", history: [{tag: "", date: "", comments: "", by: "user1@acme.com", dataChange: { addedRecs: [], updatedRecs: [], deletedRecs: []} }, ..] }, ..] } ### Get data history Any changes on the server will be retrieved, including the commit history. This can be called for multiple datasets, since it is intended for a msg view. The dataset ids must be specified. Either GET or POST is suppported, since this is technically a GET request and doesn't change anything on the server, but requires data to be in the request body, which is typically done with POST requests. For datasets where the collaboration is setup at the record level, an internal messages table containing tuples { user id, dataset id, record id, change tag } is used as a cache to quickly look up if the user has any pending updates. For datasets where the collaboration is at the dataset/branch level, the dataset commit log is used to find additional changes. >GET/POST /api/v1/datasets/{dataset id}/{b-{branch tag}}/history Request Headers: Accept: application/json Form Body: { recs: ["recid1", "", ..], endTag: "", startTag: "", maxEntries: 2} startTag and endTag are inclusive if startTag is not specified, then maxEntries is used to get the latest set of entries if endTag is not specified, it is assumed to be the latest tag on the branch if maxEntries is not specified, it is assumed to be all entries Response (JSON example): [ {recid: "", msgs: [ {tag: "", date: "", comments: "", by: "user1@acme.com", rec: {f1: "", f2: "", ...}}, ... ]}, ... ] ## Implementation Details - **Dataset Change Tag** - This is a alphanumeric tag where higher values indicate further changes. This is stored at the dataset level and allows one to query quickly if dataset changes have occurred that can be retrieved. A tag is stored with each local copy and the service needs to be able to provide the list of changes since the last tag. - **Record Change Tag** - This is the dataset change tag that last changed the record. - **Dataset Commit Log** - This is a append-only log, containing sequential records, with each log entry having a change tag as the key (that's generated when being added to), along w/ the user and comments, and a reference to the change file. It can optionally have the list of record ids changed, in the case where records change be individually changed. Dataset 1 Commit Log Dataset Change Tag 1, Date, User Id, Comments Text, Last Dataset Change Tag List of Added Record Ids[{record id, change file offset(int)}] List of Changed Record Ids[{record id, change file offset(int)}] List of Deleted Record Ids[] Dataset Change Tag 2, Date, User Id, Comments Text, Last Dataset Change Tag List of Added Record Ids[{record id, change file offset(int)}] List of Changed Record Ids[{record id, change file offset(int)}] List of Deleted Record Ids[] ... - **Change File** - The change file is essentially the message that was sent from the user to the server, with the list of records being changed, as well as the new values. The id for the change file/record is the dataset change tag. Dataset 1 Change File ({Dataset Change Tag}.msg) List of Records[in streamed record layout, delete record will have the delete field id flag set] - **Record Archive** - In order to support being able to see the full history of the changes for a record, a record archive contains all facts about a single record, as an ever growing trunk. New field values are appended to the record. It is stored in the database so as the record grows, segmentation related to constantly adding new data is mananged by the database. Record Archive Layout Base Fields (don't change): Record Id, Dataset Id, Created On, .. List of fields[Field Num, Data type, Data] Record Change Tag 1, List of changed fields[] Record Change Tag 2, List of changed fields[] ... - **Branch Tag** - This is a tag that's setup in case a smaller group is working on data before it can be shared broadly. Branches can also be used to refer to older snapshots of data. The record archive for a branch is kept separately. The expectation is that branches are short-lived and should be purged once the data can be shared broadly. - **Branch Ref File** - The branch reference contains the list of active branches for a dataset, along with the change tag for when the fork occurred, the tag which is the baseline for the branch as well as the current latest change tag for the branch. This can be stored in the database for quick access, given that the volumes for these records won't be ever-growing. Dataset Id / Branch Tag / PreFork Change Tag / Fork Change Tag / Last Commit Change Tag