Workflow: Titan first published

Current questions and issues:

  1. Why is SOLR and ES off by a few records each day?
  2. CP is searching with a 4 hour offset. This creates an issue comparing apples to apples.
  3. Why are we really off on yesterday's numbers (10-10-2018)?
  4. TODO: create a AWS metric based on a filter of the lambda log for data point created_date.
  5. NOTE: Research spreadsheet ("Ad hoc") numbers are not truth

Some data points:

Date Ad hoc SOLR ES SQL CP
9-19 166 168 168 160
9-26 140 135 135 135
10-3 245 244 244 247
10-9 105 115 108 108 108
10-10 163 127 101 101 104
2-11 181 181 182
2-12 142 142 142
2-13 134 134 134
3-11 130 130 130
3-12 141 141 141
3-13 158 158 158
3-14 102 102 102
3-15 1 1 1
3-18 164 149 149
3-19 118 66 66

1. Research Team

  1. Add new data
  2. Hit the publish button in RT
  3. Fill out Google Form to update stats spreadsheet

Stats spreadsheet keeps a running count by data of "published since" Titans.

[deprecated]

Metrics tab of Publisher Data 2.0 spreadsheet, manually add up publishing count for each person

2. Research Tool

Data point: FirstPublishedDate

Solr query for first published on any given day:

http://34.193.177.135:8983/solr/master-graph-2/select?indent=on&q=DocumentType:Candidate%20AND%20FirstPublishedDate:[2018-09-11T00:00:00Z%20TO%202018-09-12T00:00:00Z]%20AND%20MasterGraphChecker:true&sort=FirstPublishedDate%20ASC&wt=json

Or if you do through the SOLR web interface (http://34.193.177.135:8983/solr/#/master-graph-2/query):

DocumentType:Candidate AND FirstPublishedDate:[2018-09-11T00:00:00Z TO 2018-09-12T00:00:00Z] AND MasterGraphChecker:true

in the q input area.

3. Pipeline

Data point: created_data

The Data Pipeline is charge of transitioning and transforming data from RT to CP.

This process is atomic with respect to daily updates. The time lag between the systems is about one minute currently.

This process can also be run over the whole or partial sets of RT/SOLR data. This is called a "rebatch".

4. Elasticsearch

Using Chrome extension Elasticsearch-Head, go to tab "Any Request":

{
  "query": {
    "bool": {
      "must": {
        "range": {
          "created_date": {
            "gte": "2019-03-19T00:00:00.000Z",
            "lte": "2019-03-20T00:00:00.000Z"
          }
        }
      }
    }
  },
  "_source": [],
  "sort": [
    {
      "created_date": "asc"
    },
    {
      "_score": "desc"
    }
  ],
  "from": 0,
  "size": 20,
  "explain": true
}

5. PostgreSQL

SELECT * FROM "Titans" WHERE CAST("createdDate" AS DATE) = '2019-03-20'

OR

SELECT * FROM "Titans" WHERE "createdDate" BETWEEN '2019-03-10 00:00:00' AND '2019-03-16 23:59:59'

results matching ""

    No results matching ""