Software Architecture Design

  • the scraper will extract the data from the company's website and transform it into a structured data so they will be pushed to the index through API

  • the scraper is written in JMeter for most of the websites but we also have few scrapers written in C#

  • the scraper have a common structure but depending on the website and the data it could be sligtly changed

  • the scraper needs to be updated once a website is updated

  • the scraper have no dependency of the website's code

  • the scraper is build to be triggered whenever it's needed with a Continous Integration tool like: Jenkins, GitHUB Actions, AZURE DevOPS or any other

  • the scraper is cross-platform working on any Operating System

  • the scraper is agnostic - it doesn't matter what technology has been used to build the website, the scraper will work

  • this is a web scraper

  • https://scraper.peviitor.ro/

  • the API is written in PHP

  • the API is a wrapper around Apache SOLR

  • the API is built to ease the way front-end is getting the data to be presented to the user

  • the API is hosted on a specific Application Web Server

  • the technical documentation of the API is presented in Swagger UI

  • the API is also used by Scraper to clean and update the index with jobs

  • pushing the data to production via API requires an API KEY for security reasons

  • https://api.peviitor.ro/

Apache SOLR

  • Apache SOLR is used to handle the index for all the jobs

  • Apache SOLR is the actual Search Engine which provides the results for the user

  • Apache SOLR is accesible via API for scraper and UI

  • for developers of scrapers we built dev.peviitor.ro to get the API KEY

  • you login with gitHUB account or gitLAB account

  • you generate an API KEY whenever you need

  • API KEY gives you access to production data

  • https://dev.peviitor.ro/

  • search Engine UI

  • currently at version v01

  • written in JavaScript + HTML 5 + CSS 3

  • delivered at HTTP/1.1

  • secured by HTTPS

  • delivered through CloudFlare CDN

  • archived also at v01.peviitor.ro

  • hosted on the WebServer at ClausWeb

  • IDE used for development: Visual Studio Code

  • developed as Open Source code

  • https://peviitor.ro/

GitHUB Actions

  • used to automate the scraper

  • trigger the scraper

  • using free linux machine by GitHUB

  • using custom YML file to run the scraper in JMeter

  • using custom YML file to run the scraper in C#