國家寶藏松 - 後端需求

志工發配 server

Given a naId (every document has a unique naId) query its detail here https://catalog.archives.gov/api/v1/?naIds={naId}

  1. check if `results.result[0].objects` exists; if yes, it has already been digitized and the image files are under `results.result[0].objects.object` array
  2. if it hasn’t been digitized, check if `results.result[0].description.series.fileUnit` exists and has a number > 1. If yes, that means there are multiple files under this naId and you need to query `https://catalog.archives.gov/api/v1/?description.fileUnit.parentSeries.naId={naId}` to get the sub files and their naIds. 
  3. Next, need to check `accessRestriction.status.termName` under `description` or `description.fileUnit` . If it’s `restricted` then we don’t want to dispatch this naId to volunteer. 
  4. If everything above checks out, check our flag with `unstarted` `started but incomplete` `complete` 
  5. dispatch unstarted and incomplete ones to volunteer client

NoSQL Table schema for TNT-Dispatch

Endpoints of the Dispatch Server

Document Index Server

NARA API Github  (query example)

Nation Archive Api recorder (first 100 row only out of 12600)

https://github.com/hsin421/tw-national-treasure

Suggested by Simon Liu 

Digital archive management system (open source): Fedora commons

OCR SERVER 

OCR with Python and Google Cloud Vision API reference:

https://gist.github.com/dannguyen/a0b69c84ebc00c54c94d

repo:

https://github.com/tl578/g0v-nyc

申請API Key:

https://developers.google.com/api-client-library/python/guide/aaa_apikeys

初步test:

網站要工程師嗎?

12.10.2016 更新

OCR server is up and running at 

https://nationa-treasure-vision.herokuapp.com/vision

Source Code

https://github.com/national-treasures-tw/vision

test by posting with { types: ’text’, imageUrl: ’YOU_IMAGE_URL’ }

please do not abuse this as we have only 1000 free request quota