Software Architecture Design
the scraper will extract the data from the company's website and transform it into a structured data so they will be pushed to the index through API
the scraper is written in JMeter for most of the websites but we also have few scrapers written in C#
the scraper have a common structure but depending on the website and the data it could be sligtly changed
the scraper needs to be updated once a website is updated
the scraper have no dependency of the website's code
the scraper is build to be triggered whenever it's needed with a Continous Integration tool like: Jenkins, GitHUB Actions, AZURE DevOPS or any other
the scraper is cross-platform working on any Operating System
the scraper is agnostic - it doesn't matter what technology has been used to build the website, the scraper will work
this is a web scraper
the API is written in PHP
the API is a wrapper around Apache SOLR
the API is built to ease the way front-end is getting the data to be presented to the user
the API is hosted on a specific Application Web Server
the technical documentation of the API is presented in Swagger UI
the API is also used by Scraper to clean and update the index with jobs
pushing the data to production via API requires an API KEY for security reasons
Apache SOLR
Apache SOLR is used to handle the index for all the jobs
Apache SOLR is the actual Search Engine which provides the results for the user
Apache SOLR is accesible via API for scraper and UI
for developers of scrapers we built dev.peviitor.ro to get the API KEY
you login with gitHUB account or gitLAB account
you generate an API KEY whenever you need
API KEY gives you access to production data
peviitor.ro - UI
search Engine UI
currently at version v01
written in JavaScript + HTML 5 + CSS 3
delivered at HTTP/1.1
secured by HTTPS
delivered through CloudFlare CDN
archived also at v01.peviitor.ro
hosted on the WebServer at ClausWeb
IDE used for development: Visual Studio Code
developed as Open Source code
GitHUB Actions
used to automate the scraper
trigger the scraper
using free linux machine by GitHUB
using custom YML file to run the scraper in JMeter
using custom YML file to run the scraper in C#