Workflow: Titan first published
Current questions and issues:
- Why is SOLR and ES off by a few records each day?
- CP is searching with a 4 hour offset. This creates an issue comparing apples to apples.
- Why are we really off on yesterday's numbers (10-10-2018)?
- TODO: create a AWS metric based on a filter of the lambda log for data point created_date.
- NOTE: Research spreadsheet ("Ad hoc") numbers are not truth
Some data points:
Date | Ad hoc | SOLR | ES | SQL | CP |
---|---|---|---|---|---|
9-19 | 166 | 168 | 168 | 160 | |
9-26 | 140 | 135 | 135 | 135 | |
10-3 | 245 | 244 | 244 | 247 | |
10-9 | 105 | 115 | 108 | 108 | 108 |
10-10 | 163 | 127 | 101 | 101 | 104 |
2-11 | 181 | 181 | 182 | ||
2-12 | 142 | 142 | 142 | ||
2-13 | 134 | 134 | 134 | ||
3-11 | 130 | 130 | 130 | ||
3-12 | 141 | 141 | 141 | ||
3-13 | 158 | 158 | 158 | ||
3-14 | 102 | 102 | 102 | ||
3-15 | 1 | 1 | 1 | ||
3-18 | 164 | 149 | 149 | ||
3-19 | 118 | 66 | 66 |
1. Research Team
- Add new data
- Hit the publish button in RT
- Fill out Google Form to update stats spreadsheet
Stats spreadsheet keeps a running count by data of "published since" Titans.
[deprecated]
Metrics tab of Publisher Data 2.0 spreadsheet, manually add up publishing count for each person
2. Research Tool
Data point: FirstPublishedDate
Solr query for first published on any given day:
http://34.193.177.135:8983/solr/master-graph-2/select?indent=on&q=DocumentType:Candidate%20AND%20FirstPublishedDate:[2018-09-11T00:00:00Z%20TO%202018-09-12T00:00:00Z]%20AND%20MasterGraphChecker:true&sort=FirstPublishedDate%20ASC&wt=json
Or if you do through the SOLR web interface (http://34.193.177.135:8983/solr/#/master-graph-2/query
):
DocumentType:Candidate AND FirstPublishedDate:[2018-09-11T00:00:00Z TO 2018-09-12T00:00:00Z] AND MasterGraphChecker:true
in the q
input area.
3. Pipeline
Data point: created_data
The Data Pipeline is charge of transitioning and transforming data from RT to CP.
This process is atomic with respect to daily updates. The time lag between the systems is about one minute currently.
This process can also be run over the whole or partial sets of RT/SOLR data. This is called a "rebatch".
4. Elasticsearch
Using Chrome extension Elasticsearch-Head
, go to tab "Any Request":
{
"query": {
"bool": {
"must": {
"range": {
"created_date": {
"gte": "2019-03-19T00:00:00.000Z",
"lte": "2019-03-20T00:00:00.000Z"
}
}
}
}
},
"_source": [],
"sort": [
{
"created_date": "asc"
},
{
"_score": "desc"
}
],
"from": 0,
"size": 20,
"explain": true
}
5. PostgreSQL
SELECT * FROM "Titans" WHERE CAST("createdDate" AS DATE) = '2019-03-20'
OR
SELECT * FROM "Titans" WHERE "createdDate" BETWEEN '2019-03-10 00:00:00' AND '2019-03-16 23:59:59'