8. Technology development (Tools Git, ngrok)

Technology Development - includes research and development, process automation, and other technology development used to support the value-chain activities.

8.1. The digitization

The Digitization is “Digitization, less commonly digitalization, is the process of converting information into a digital (i.e. computer-readable) format, in which the information is organized into bits.”

In short, developing software mimicking manual processes while adding new capabilities such enabling the communication between IT systems from different companies that couldn’t have been integrated using another way.

8.1.1. Digitization for the Insurance industry

# insipried by https://www.celent.com/insights/910618694 # graphviz samples : https://renenyffenegger.ch/notes/tools/Graphviz/examples/index graph digitization_insurance { ratio="fill"; // size="8.3,11.7!"; node [shape=record, fontsize=16, rankdir=LR]; Presentation_layer_1 -- Presentation_layer_2 Presentation_layer_2 -- Business_Logic_Tier_1 Business_Logic_Tier_1 -- Business_Logic_Tier_2 -- Business_Logic_Tier_3 -- Business_Logic_Tier_4 Business_Logic_Tier_4 -- Data_Tier -- Physical_Tier # PRESENTATION LAYER 0 Presentation_layer_1 [shape=house, sides=5, label="Presentation\nlayer", color="blue", style=filled] # PRESENTATION LAYER 1 Chatbot [label="{Chatbot}"] SMS [label="{SMS}"] Robo_advisor [label="{Robo advisor}"] Web_Browser [label="{Web Browser}"] Telephone [label="{Telephone}"] Mobile [label="{Mobile}"] subgraph cluster_Presentation_layer { shape=record Presentation_layer_1 subgraph cluster_Presentation_layer_1 { label=""; rankdir=LR; rank=same; style=dotted; Telephone; Robo_advisor; Mobile; SMS; Web_Browser; Chatbot; }; }; # PRESENTATION LAYER 2 Presentation_layer_2 [shape=house, sides=5, label="Presentation\nlayer", color="blue", style=filled] Broker_portal [label="Broker portal"] Customer_portal [label="Customer portal"] Call_centre [label="Call center"] Middle_office_work_station [label="Middle office\nwork"] Back_office_workstation [label="Back office\nworkstation"] subgraph cluster_Presentation_layer_2 { shape=record Presentation_layer_2 [label="Presentation\nlayer"] subgraph cluster_Presentation_layer_2 { label=""; rankdir=LR; rank=same; style=dotted; Broker_portal; Customer_portal; Call_centre; Middle_office_work_station; Back_office_workstation; }; }; # BUSINES LOGIC 0 Business_Logic_Tier_1 [label="Business\nLogic Tier", shape=house, sides=5, color="red", style=filled] # BUSINES LOGIC 1 API_Gateway [label="API Gateway"] subgraph cluster_Business_Logic_Tier_1 { Business_Logic_Tier_1 subgraph cluster_Business_Logic_Tier_1 { label=""; rank=same; style=dotted; API_Gateway; }; }; # BUSINES LOGIC 2 Business_Logic_Tier_2 [label="Business\nLogic Tier", shape=house, sides=5, color="red", style=filled] BPM [label="Business\nProcess\nManagement"] BRM [label="Business\nRisks\nManagement"] subgraph cluster_Business_Logic_Tier_2 { Business_Logic_Tier_2 subgraph cluster_Business_Logic_Tier_2 { label=""; rank=same; style=dotted; BPM; BRM; }; }; # BUSINES LOGIC 3 Business_Logic_Tier_3 [label="Business\nLogic Tier", shape=house, sides=5, color="red", style=filled] Policy_management [label="Policy management"] Product_management [label="Product management"] Claims_management [label="Claims management"] Customer_management [label="Customer management"] Finance_management [label="Finance management"] subgraph cluster_Business_Logic_Tier_3 { Business_Logic_Tier_3 subgraph cluster_Business_Logic_Tier_3 { label=""; rank=same; style=dotted; Policy_management, Product_management; Claims_management, Customer_management; Finance_management; }; }; # BUSINES LOGIC 4 Business_Logic_Tier_4 [label="Business\nLogic Tier", shape=house, sides=5, color="red", style=filled] Quote [label="Quote"] Entreprise_services [label="Enterprises\nservices"]; Policy [label="Policy"]; Products [label="Products"]; Claims [label="Claims"]; Customers [label="Customers"]; subgraph cluster_Business_Logic_Tier_4 { Business_Logic_Tier_4 subgraph cluster_Business_Logic_Tier_4 { label=""; rank=same; style=dotted; Quote; Policy; Products; Claims; Customers; Entreprise_services; }; }; # Data Tier 1 Data_Tier [label="Data\nTier", shape=house, sides=5, color="green", style=filled] Claims [label="Claims"] Policies [label="Policies"] Underwritings [label="Underwritings"] Bills [label="Bills"] Money_collection [label="Money collection"] Products_catalog [label="Products catalog"] CRM [Label="Customer\nRelationship\nManagement"] CDI [label="Customer\nData\nIntegration"] subgraph cluster_Data_Tier { Data_Tier subgraph cluster_Data_Tier_1 { label=""; rank=same; style=dotted; Claims; Policies; Products_catalog; Underwritings; Bills; Money_collection; CRM; CDI; }; }; # Physical Tier 1 Physical_Tier [label="Physical\nTier", shape=house, sides=5, color="brown", style=filled] Servers [label="Servers"] Data_storage [label="Data storage"] Data_centers [label="Data centers"] subgraph cluster_Physical_Tier { Physical_Tier subgraph cluster_Physical_Tier1 { label=""; rank=same; style=dotted; Servers; Data_storage; Data_centers; }; }; label="Digitization of the Insurance industry v2019-09-12"; }

8.1.1.1. Presentation layer I

graph digitization_insurance { ratio="fill"; // size="8.3,11.7!"; node [shape=record, fontsize=16, rankdir=LR]; # PRESENTATION LAYER 0 Presentation_layer [shape=house, sides=5, label="Presentation\nlayer", color="blue", style=filled] # PRESENTATION LAYER 1 Chatbot [label="{Chatbot}"] SMS [label="{SMS}"] Robo_advisor [label="{Robo advisor}"] Web_Browser [label="{Web Browser}"] Telephone [label="{Telephone}"] Mobile [label="{Mobile}"] subgraph cluster_Presentation_layer { shape=record Presentation_layer subgraph cluster_Presentation_layer_1 { label=""; rankdir=LR; rank=same; style=dotted; Telephone; Robo_advisor; Mobile; SMS; Web_Browser; Chatbot; }; }; }

The presentation layer is the layer that is in connection with your customers through several means or devices :

8.1.1.1.1. Telephone

A service desk supporting the requests on the phone

8.1.1.1.2. Robo advisor

A Robo advisor is a class of financial adviser that provide financial advice or Investment management online with moderate to minimal human intervention

  • The Robo advisor scans and dematerializes your documents

  • The Robo advisor compares the available offers and optimize the portfolio of your customers

  • The customer interacts with the trusted Robo advisors via the web, mobile chat or email

8.1.1.1.3. Mobile or Smartphone

Mobile phones accessing the insurer’ services through SMS, email, voice of mobile app means

8.1.1.1.4. SMS

The customer interacts with the Insurer using SMS. The answers to the SMS are made by a Robo advisor, a Chatbot or a human. The help desks uses an IT system and (s)he is not necessarily answering using a mobile phone.

8.1.1.1.5. Web Browser

a software used to access the Web site of the insurer or the Underwriter (i.e. Firefox, Opera, Ecosia, Microsoft Internet Explorer, Google Chrome, Microsoft Edge, Safari)

8.1.1.1.6. Chatbot

The Chatbot is a piece of software that conducts a conversation via auditory or textual methods.

The Chatbot tries to answer customers’ questions that human would have had as a conversation. Some chatbots use sophisticated natural language processing systems but are most of time supported by a service desk run by humans when the Chatbot can’t understand the demands made by the customer.

Chatbot - book a train

8.1.1.2. Presentation layer II

graph digitization_insurance { ratio="fill"; // size="8.3,11.7!"; node [shape=record, fontsize=16, rankdir=LR]; # PRESENTATION LAYER 0 Presentation_layer_2 [shape=house, sides=5, label="Presentation\nlayer", color="blue", style=filled] # PRESENTATION LAYER 2 Broker_portal [label="Broker portal"] Customer_portal [label="Customer portal"] Call_centre [label="Call center"] Middle_office_work_station [label="Middle office\nwork"] Back_office_workstation [label="Back office\nworkstation"] subgraph cluster_Presentation_layer_2 { shape=record Presentation_layer_2 [label="Presentation\nlayer"] subgraph cluster_Presentation_layer_2 { label=""; rankdir=LR; rank=same; style=dotted; Broker_portal; Customer_portal; Call_centre; Middle_office_work_station; Back_office_workstation; }; }; }

The presentation layer counts the front office as well as the back office. The Back office is all the resources of the company that are devoted to actually producing a product or service and all the other labor that isn’t seen by customers, such as administration or logistics.

8.1.1.2.1. Back office workstation

Those are the collaborators managing the operations ensuring the correct exection of the processes :

  • quote an offer

  • validate personal data

  • validate the filling of a form

  • perform the dunning services duties

The back office requires documentation, software and reports to perform their duties.

8.1.1.2.2. Middle office workstation

The Middle office is made up of the risk managers and the information technology managers who manage risk and maintain the information resources.

  • Track claim settlement times

  • Customer satisfation ratings

  • Long-term trends in customer activity

Data collected during the operations are stored into IT systems operated by a multitude of managers (risks, operations, HR, Marketing)

Those data are analysed and supports the value chain (logistics, operations, marketing, sales, support) by giving a broad and exact view of the financial situation of the company.

After a careful analysis, the data are shared with the back office who can act and interact with prospects, customers and suppliers depending on the situation (dunning service, quotation, billing, closing off the contract)

Tip

Try the Insurance claims analysis dashboard

Middle office: Insurance Claims Analysis (credits: syncfusion.com)

8.1.1.2.3. Call centre

Three types of call centres might be operated by a financial service company:

  • An Inbound call center is operated by a company to administer incoming product or service support or information enquiries from consumers.

  • An Outbound call center is operated for telemarketing, for solicitation of charitable or political donations, debt collection, market research, emergency notifications, and urgent/critical needs blood banks.

  • A Contact center, further extension to call centers administers centralized handling of individual communications, including letters, faxes, live support software, social media, instant message, and e-mail.

8.1.1.2.4. Customer portal

A website accessible through a Web browser or a mobile phone enabling the customer to access all the aspects of his duties and rights towards the insurer.

The portal gives access to diverse functionalities:

  • Information platform: share details about the products and services, how to contact the insurer

  • Transaction platform: create, update or delete information, stop a current insurance, pay electronically the remaining Bills

  • Sales platform: generate up-sell and cross-sell opportunities, promote the Robo advisor capabilities

  • Rewards platform: Insurers retain their customers through the Perceived Value of the customer, the Affinity that the customer has with his insurer, and the Barriers to Exit

    • Perceived value: does the customer feels that (s)he has coverage at a competitive and fair price?

    • Affinity: do the customer has a emotional connection with the customer? Insurance products may tend to have a limited value due to the commoditized nature of the product

    • Barriers to Exit: does the customer has strong and effective reasons to do not leave an insurer? the lack of competition, the increase of costs, the loss of a unique protection, a decrease of the quality of service

8.1.1.2.5. Broker portal

A broker portal is a website enabling the Broker to perform her/his duties

  • Information platform: share details about the products and services, how to contact the insurer, the customers

  • Sales platform: support the sales process (from a quote to a signed contract), generate up-sell and cross-sell opportunities

  • Marketing platform: identify new sales opportunites by advertising the products and identify the most profitable or potential prospects

  • CRM platform: maintain data related to the prospects and customers (contact details, online and offline interactions)

  • Dunning service platform: inform and give the tools to enable the broker to run after unpaid invoices till the termination of the contract

8.1.1.3. Business logic tier I

graph digitization_insurance { ratio="fill"; // size="8.3,11.7!"; node [shape=record, fontsize=16, rankdir=LR]; # BUSINES LOGIC 0 Business_Logic_Tier_1 [label="Business\nLogic Tier", shape=house, sides=5, color="red", style=filled] API_Gateway [label="API Gateway"] subgraph cluster_Business_Logic_Tier_1 { Business_Logic_Tier_1 subgraph cluster_Business_Logic_Tier_1 { label=""; rank=same; style=dotted; API_Gateway; }; }; }

8.1.1.3.1. API Gateway

The API describes the functions or the interfaces available between a client and a server.

APIs are enablers of the platform economy, and allow users to enhance and add services over existing products.

For example: An API enables an application ‘A’ to query a system ‘B’ and collects the schedule of the public transportation (See https://opendata.stib-mivb.be/store/data)

Tip

Look at the description of the API from a dunning Service https://dunningcashflow-api.azurewebsites.net/swagger/index.html

API Documentation: example of a Dunning service

8.1.1.4. Business logic tier II

graph digitization_insurance { ratio="fill"; // size="8.3,11.7!"; node [shape=record, fontsize=16, rankdir=LR]; # BUSINES LOGIC 2 Business_Logic_Tier_2 [label="Business\nLogic Tier", shape=house, sides=5, color="red", style=filled] BPM [label="Business\nProcess\nManagement"] BRM [label="Business\nRisks\nManagement"] subgraph cluster_Business_Logic_Tier_2 { Business_Logic_Tier_2 subgraph cluster_Business_Logic_Tier_2 { label=""; rank=same; style=dotted; BPM; BRM; }; }; }

8.1.1.4.1. Business Risk Management

The financial services must comply with a multitude of risks.

Here are a list of pure risks (loss or no loss only) that an insurer or a Underwriter may be confronted with:

  • Regulatory Compliance: Invoice compliance, MiFID ii, MiFID 2, Solvency II, Solvency 2

  • Tax Compliance: Tax determination, Fiscal reporting, VAT reporting

  • Liability risk exposure: product liability risks, or contractual liability risks

  • Operational risk: mistakes in process and procedure

  • Intellectual property violation risk

  • Mortality and morbidity risk at the societal and global level

Warning

  • Speculative risks are not described in this documents. i.e. market risk, reputational risk, brand risk, product success risk…

8.1.1.4.2. Business Process Management

The Business Process Management is a discipline aimed at managing all aspect of the business processes; from process design to modeling and analysis to execution and improvement.

Note

Here is the description of a process: Data entry of a claim

digraph insurance_claim_data_entry { label="Insurance: Claim data entry" ratio="fill"; // size="8.3,11.7!"; node [rankdir=LR]; start [label="Start", shape=box, style=rounded] a_claim_is_received [label="A claim is received"] is_it_new_claim [label="Is it a new claim?", shape=diamond] data_entry [label="Data Entry", shape=parallelogram] is_it_correctly_filled [label="Is it correctly filled?", shape=diamond] exception [label="Exception"] end [label="End", shape=box, style=rounded] administrative_worker [label="Back-end worker"] exception [label="Back-end worker\ndeals with the\nexception", shape=parallelogram] amend_data [label="Amend data", shape=parallelogram] start -> a_claim_is_received a_claim_is_received -> is_it_new_claim is_it_new_claim -> data_entry [label="Yes"] data_entry -> administrative_worker [label="The claim is\ndistributed"] administrative_worker -> is_it_correctly_filled [label="Submits the\n new claim"] is_it_new_claim -> amend_data [label="No"] amend_data -> is_it_correctly_filled [label="Submits the claim"] is_it_correctly_filled -> exception [label="No"] exception -> end is_it_correctly_filled -> end [label="Yes"] {rank=same; data_entry amend_data} }

8.1.1.5. Business logic tier III

graph digitization_insurance { ratio="fill"; // size="8.3,11.7!"; node [shape=record, fontsize=16, rankdir=LR]; # BUSINES LOGIC 3 Business_Logic_Tier_3 [label="Business\nLogic Tier", shape=house, sides=5, color="red", style=filled] Policy_management [label="Policy management"] Product_management [label="Product management"] Claims_management [label="Claims management"] Customer_management [label="Customer management"] Finance_management [label="Finance management"] subgraph cluster_Business_Logic_Tier_3 { Business_Logic_Tier_3 subgraph cluster_Business_Logic_Tier_3 { label=""; rank=same; style=dotted; Policy_management, Product_management; Claims_management, Customer_management; Finance_management; }; }; }

8.1.1.5.1. Finance management
8.1.1.5.2. Customer management
8.1.1.5.3. Claims management
8.1.1.5.4. Product management
8.1.1.5.5. Policy management

8.1.1.6. Business logic tier IV

graph digitization_insurance { ratio="fill"; // size="8.3,11.7!"; node [shape=record, fontsize=16, rankdir=LR]; # BUSINES LOGIC 4 Business_Logic_Tier_4 [label="Business\nLogic Tier", shape=house, sides=5, color="red", style=filled] Quote [label="Quote"] Entreprise_services [label="Enterprises\nservices"]; Policy [label="Policy"]; Products [label="Products"]; Claims [label="Claims"]; Customers [label="Customers"]; subgraph cluster_Business_Logic_Tier_4 { Business_Logic_Tier_4 subgraph cluster_Business_Logic_Tier_4 { label=""; rank=same; style=dotted; Quote; Policy; Products; Claims; Customers; Entreprise_services; }; }; }

8.1.1.6.1. Customers
8.1.1.6.2. Claims
8.1.1.6.3. Products
8.1.1.6.4. Policies
8.1.1.6.5. Entreprise services
8.1.1.6.6. Quotes

8.1.1.7. Data tier I

graph digitization_insurance { ratio="fill"; // size="8.3,11.7!"; node [shape=record, fontsize=16, rankdir=LR]; # Data Tier 1 Data_Tier [label="Data\nTier", shape=house, sides=5, color="green", style=filled] Claims [label="Claims"] Policies [label="Policies"] Underwritings [label="Underwritings"] Bills [label="Bills"] Money_collection [label="Money collection"] Products_catalog [label="Products catalog"] CRM [Label="Customer\nRelationship\nManagement"] CDI [label="Customer\nData\nIntegration"] subgraph cluster_Data_Tier { Data_Tier subgraph cluster_Data_Tier_1 { label=""; rank=same; style=dotted; Claims; Policies; Products_catalog; Underwritings; Bills; Money_collection; CRM; CDI; }; }; }

8.1.1.7.1. CDI (Customer Data Integration)

CDI

8.1.1.7.2. CRM (Customer Relationship Management)

CRM

8.1.1.7.3. Products catalog
8.1.1.7.4. Money collection
8.1.1.7.5. Bills
8.1.1.7.6. Underwritings
8.1.1.7.7. Policies
8.1.1.7.8. Claims

8.1.1.8. Physical tier

graph digitization_insurance { ratio="fill"; // size="8.3,11.7!"; node [shape=record, fontsize=16, rankdir=LR]; # Physical Tier 1 Physical_Tier [label="Physical\nTier", shape=house, sides=5, color="brown", style=filled] subgraph cluster_Physical_Tier { Physical_Tier subgraph cluster_Physical_Tier1 { label=""; rank=same; style=dotted; Servers; Data_storage; Data_centers; }; }; }

8.1.1.8.1. Data Centers
8.1.1.8.2. Data storage
8.1.1.8.3. Servers

8.2. GIT lifecycle

Description of how to manage the versions, branches in a git repository as well the operations of the software

How to write relevant commits?

8.2.1. A successful git branching model

A successful Git branching model : https://nvie.com/posts/a-successful-git-branching-model/

By Vincent Driessen on Tuesday, January 05, 2010

8.2.2. GIT : commit conventions

source : conventional commits : https://github.com/conventional-commits

Semantic messages: http://seesparkbox.com/foundry/semantic_commit_messages

build
chore (maintain i.e. updating grunt tasks etc; no production code change)
ci (continuous integration)
docs (documentation)
feat (feature)
fix (bug fix)
perf (performance improvements)
refactor (refactoring production code)
revert
style (formatting, missing semi colons, …)
test (adding missing tests, refactoring tests; no production code change)

8.2.3. GIT : how to manage the versions, branches … ?

  • GIT : Create a branch : [BRANCH-DEV] – [BRANCH-PARENT]

  • DEV : Local development on Software engineer machine

  • MERGE GIT : Merge the [BRANCH-DEV] with the [BRANCH-PARENT]

    • The code is merged into the [BRANCH-DEV]

  • STAGING : The Software is deployed on the staging environment

  • MERGE GIT : Merge the [BRANCH-DEV] with the [BRANCH-PARENT]

    • The code is merged into the [BRANCH-PARENT]

  • PROD : Test the PROD version of the software

  • LIVE : deploy the PROD version of the software on the PROD server

8.2.4. GIT LFS Large File System

  • git commands

  • Install git lfs https://git-lfs.github.com

  • Locks
    • git lfs lock images/foo.jpg

    • git lfs locks

    • git lfs unlock images/foo.jpg

  • git lfs push origin master –all

8.2.4.1. Create a .gitattributes file

.. include:: .gitattributes

8.2.4.2. Commands to add files into the repository, and push the code

git lfs install
git lfs track "*.jpg" --lockable
git lfs track "*.JPG" --lockable
git lfs track "*.png" --lockable
git lfs track "*.zip" --lockable
git lfs track "*.mp4" --lockable
git lfs track "*.MP4" --lockable
git lfs track "*.docx" --lockable
git lfs track "*.svg" --lockable
git lfs track "*.gif" --lockable
git lfs track "*.psd" --lockable
git lfs track "*.sketch" --lockable
git lfs track "*.ai" --lockable

git add "*.jpg" "*.JPG" "*.png" "*.zip" "*.mp4" "*.MP4" "*.docx" "*.svg" "*.gif"

git lfs ls-files
git lfs env

git config lfs.https://inlsprl.visualstudio.com/[ProjectName]/_git/[ProjectName].git/info/lfs.locksverify true
git push origin master
git lfs push origin master --all

8.3. ngrok - Public URLs for exposing your local web server

Use ngrok to grant access to your localhost to anyone

  1. Install https://ngrok.com/download

  2. open the port where the web server is located. run the following command :

    1. [path to ngrok]\ngrok.exe http [port to open on your localhost] -host-header=rewrite

  3. share the URL to the person who needs to access your local machine. i.e. https://a1cc816e.ngrok.io

ngrok by @inconshreveable

Session Status online Account [the account name] (Plan: Free) Update update available (version 2.2.8, Ctrl-U to update) Version 2.2.3
Region United States (us) Web Interface http://127.0.0.1:4040
Forwarding http://a1cc816e.ngrok.io -> localhost:4624 Forwarding

https://a1cc816e.ngrok.io -> localhost:4624

8.4. How to write a bug report?

  1. Copy paste the content hereunder

  2. Create a new issue: https://bitbucket.org/altf1be/software-architecture/issues/new

## WHAT STEPS WILL REPRODUCE THE PROBLEM?

1. Open the page
2.
3.

## WHAT IS THE EXPECTED OUTPUT?

*  StratEx is loaded


## WHAT DO YOU SEE INSTEAD?

* The screenshot attached to this email
* StratEx cannot be opened because of a problem

## WHAT VERSION OF THE PRODUCT ARE YOU USING?

* Version: 3.5.6245.20028
* on [https://www.stratexapp.com](https://www.stratexapp.com)
* on [https://staging.stratexapp.com](https://staging.stratexapp.com)
* on [https://develop.stratexapp.com](https://develop.stratexapp.com)


## ON WHAT OPERATING SYSTEM, BROWSER, ETC.?

* Windows 7.1
   * Chrome Version 54
   * Internet Explorer 11
   * Opera Version 41
* Windows 10
   * Internet Explorer
   * Edge
* Mac OS X 10.9 (13A603)
   * Safari Version 7.0 (9537.71)
   * Chrome Version 31.0.1650.57

## PLEASE PROVIDE ANY ADDITIONAL INFORMATION BELOW.

* None
* Extra files are available on [StratExApp files on Google Drive]
* Find the private [Videos generated on GDrive]
* Find the public [Videos on StratEx YouTube channel]
* Find the public [Documentation on Read The Docs]

## Bug report (if any)

* None

[StratExApp files on Google Drive]: https://drive.google.com/a/alt-f1.be/folderview?id=0B9L2cx0TUjLGUFZBSkF6WlFCYms&usp=sharing#list
[Videos generated on GDrive]: https://drive.google.com/a/alt-f1.be/folderview?id=0B9L2cx0TUjLGa190N1ZURHBpUFE&usp=sharing
[Videos on StratEx YouTube channel]: https://www.youtube.com/channel/UCuwGfoVoozq0ZTmHJ3WCvTQ
[Documentation on Read The Docs]: http://stratexapp-docs.readthedocs.org/en/latest/

8.5. Research & Development topics

  1. Prerequisite:

  2. Single Page Application (SPA)

    • Build a SPA such as Microsoft Azure for our customers

    • Test SPA App

  3. IT Automation best practices

  • Security

8.6. Open Authentication (OAuth)

OAuth is an open standard for access delegation, commonly used as a way for Internet users to grant websites or applications access to their information on other websites but without giving them the passwords.

Source: Wikipedia contributors. (2019, March 19). OAuth. In Wikipedia, The Free Encyclopedia. Retrieved 12:20, March 23, 2019, from https://en.wikipedia.org/w/index.php?title=OAuth&oldid=888559139

8.6.1. Use case for Open Authentication

A user requires access to a resource on a web application (eg StratEx) using her credentials from another website (eg Microsoft Office365).

  1. She needs to login using the form from Office365

  2. Office365 will generate a token

  3. The token is used by StratEx ensuring that the user is effectively logged using her Office365 credentials

  4. StratEx can use the resources made available by Office365 such as username, firstname, lastname, email address, read access to OneDrive, write and send new emails…

8.6.2. Open authentication using Office365

Microsoft graph documentation makes available Office365 resources of each registrered user:

Request an access token to Office 365:

Description of each parameter (see use-the-authorization-code-to-request-an-access-token):

  • https://login.microsoftonline.com/common/oauth2/v2.0/authorize

  • ? client_id =f5d835b0-4bc1-98e7-f98cb4aaef31

  • & scope =https%3A%2F%2Fgraph.microsoft.com%2Fuser.read

  • & response_type =code

  • & redirect_uri =https%3A%2F%2Ftimesheet-stg-inlsprl.azurewebsites.net%2Fsignin-microsoft

  • & state =Ao8m01yi1E76wQIXPJW-F92Fq1v

8.7. Web Scraping

8.7.1. Documentation

8.7.2. Scraping: Software architecture

digraph Software_architecture { label="Scraping : best practices : Data model v2020-12-26"; graph [ rankdir="TB"; fontname="Avenir" ]; edge [ fontname="Avenir" ]; label="Software architecture" "Redis" [ label = "<f0> Redis | {Urls to scrape}" shape = "Mrecord" ]; "Crawler"[ label = "<f0> Crawler | {Crawl Web pages | Crawl Images}" shape="Mrecord" ] "Logs"[ shape="cylinder" ] "Monitoring"[ shape="record" ] "Database"[ shape="cylinder" ] "Crawler" -> "Database"[ label="store structured data" ] "Redis" -> "Crawler" [ label="Get the URL of\n the next resource" ]; "Crawler" -> "Logs"[ label="Store\n missing data,\n network errors,\n exceptional conditions" ]; "Monitoring" -> "Crawler" [ label="Monitor\n CPU,\n Memory,\n Disk I/O,\n Network I/O" ]; }

8.7.3. Scraping: best practices

digraph SimilarityFeatures { label="Scraping : best practices : Data model v2020-12-26"; graph [ rankdir="LR"; fontname="Avenir" ]; edge [ fontname="Avenir" ]; "Crawler_best_practices" -> "Logs_best_practices" [style=invis]; "Crawler_best_practices" [ label = "<f0> Crawler best practices | Monitor closely the I/Os and the Network | Resilient | can be paused | can continue crawling | multithreaded (~200 threads) | don't keep much in runtime memory | use database to store data" shape = "Mrecord"]; "Logs_best_practices" [ label = "<f0> Logs | Store in files | use `tail -f` to follow the Logs | identify missing data | identitfy network errors | identify exceptional conditions | log the current url | Handle non-ASCII characters" shape = "Mrecord" ]; "Invisiblity_best_practices" -> "Crawl_images_best_practices" [style=invis]; "Invisiblity_best_practices" [ label = "<f0> Hide yourself | Spoof the Header | Rotate IP's | Use proxies | Strip tracking query parameters" shape = "Mrecord" ]; "Crawl_images_best_practices" [ label = "<f0> Crawl pages | Download images directly | AVOID download through proxies | store placeholder <no image detected> | use placeholder to retry a download | query data with placeholder" shape = "Mrecord" ]; "Source_code_best_practices" [ label = "<f0> Source code | add exceptions aroun any code that interacts\n with the network of HTML responses | AVOID loading details' pages | Try to grab data from subcategory listings | store placeholder if data is not present" shape = "Mrecord"]; }

8.7.4. Extraordinary examples

How To Scrape Amazon Product Data and Prices using Python 3

Source : https://www.scrapehero.com/tutorial-how-to-scrape-amazon-product-details-using-python-and-selectorlib/

8.7.4.1. selectors.yml

name:
    css: '#productTitle'
    type: Text
price:
    css: '#price_inside_buybox'
    type: Text
short_description:
    css: '#featurebullets_feature_div'
    type: Text
images:
    css: '.imgTagWrapper img'
    type: Attribute
    attribute: data-a-dynamic-image
rating:
    css: span.arp-rating-out-of-text
    type: Text
number_of_reviews:
    css: 'a.a-link-normal h2'
    type: Text
variants:
    css: 'form.a-section li'
    multiple: true
    type: Text
    children:
        name:
            css: ""
            type: Attribute
            attribute: title
        asin:
            css: ""
            type: Attribute
            attribute: data-defaultasin
product_description:
    css: '#productDescription'
    type: Text
sales_rank:
    css: 'li#SalesRank'
    type: Text
link_to_all_reviews:
    css: 'div.card-padding a.a-link-emphasis'
    type: Link

8.7.4.2. Amazon.py

from selectorlib import Extractor
import requests
import json
from time import sleep


# Create an Extractor by reading from the YAML file
e = Extractor.from_yaml_file('selectors.yml')

def scrape(url):
    headers = {
        'authority': 'www.amazon.com',
        'pragma': 'no-cache',
        'cache-control': 'no-cache',
        'dnt': '1',
        'upgrade-insecure-requests': '1',
        'user-agent': 'Mozilla/5.0 (X11; CrOS x86_64 8172.45.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.64 Safari/537.36',
        'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
        'sec-fetch-site': 'none',
        'sec-fetch-mode': 'navigate',
        'sec-fetch-dest': 'document',
        'accept-language': 'en-GB,en-US;q=0.9,en;q=0.8',
    }

    # Download the page using requests
    print("Downloading %s"%url)
    r = requests.get(url, headers=headers)
    # Simple check to check if page was blocked (Usually 503)
    if r.status_code > 500:
        if "To discuss automated access to Amazon data please contact" in r.text:
            print("Page %s was blocked by Amazon. Please try using better proxies\n"%url)
        else:
            print("Page %s must have been blocked by Amazon as the status code was %d"%(url,r.status_code))
        return None
    # Pass the HTML of the page and create
    return e.extract(r.text)

# product_data = []
with open("urls.txt",'r') as urllist, open('output.jsonl','w') as outfile:
    for url in urllist.readlines():
        data = scrape(url)
        if data:
            json.dump(data,outfile)
            outfile.write("\n")
            # sleep(5)

8.7.5. Data model of a scraper

digraph SimilarityFeatures { label="Les eshops belges : Data model v2020-12-26"; graph [ rankdir = "LR" ]; edge [ ]; "leseshopsbelges" [ label = "<f0> leseshopsbelges| id | name | <f_url> url | description | http_status | modified | created <f1>" shape = "record" ]; "html_heads" [ label = "<f0> html_heads| <f1> id | base | generator | http_status | link | meta | script | style | title | <f_url> url | modified | created " shape = "record" ]; "metadatas_generators"[ label="<f0> metadatas_generators | <f_url> url | content " shape="record" ]; "leseshopsbelges":f_url -> "html_heads":f_url [ id = 0 ]; "html_heads":f_url -> "metadatas_generators":f_url [ id = 0 ]; }

digraph Scrapy_Spiders { label="Scrapy spiders: Items, Data model v2020-12-27"; graph [ rankdir = "LR" ]; edge [ ]; "WebsitesMetadataItem" [ label = "<f0> WebsitesMetadataItem| <f1> id | base | generator | http_status | link | meta | script | style | title | <f_url> url | modified | created " shape = "record" ]; "WebsitesHtmlMetaPropertyItem"[ label="<f0> WebsitesHtmlMetaPropertyItem | <f_url> url | property | content " shape="record" ]; "WebsitesHtmlMetaNameItem"[ label="<f0> WebsitesHtmlMetaNameItem | <f_url> url | name | content " shape="record" ]; "WebsitesMetadataItem":f_url -> "WebsitesHtmlMetaPropertyItem":f_url [ id = 0 ]; "WebsitesMetadataItem":f_url -> "WebsitesHtmlMetaNameItem":f_url [ id = 0 ]; }

8.7.6. Scraping: Scrapy spiders or Crawlers

digraph Scrapy_Spiders { label="Scrapy spiders: Items, Data model v2020-12-27"; graph [ rankdir = "LR" ]; edge [ ]; "WebsitesMetadataItem" [ label = "<f0> WebsitesMetadataItem| <f1> id | base | generator | http_status | link | meta | script | style | title | <f_url> url | modified | created " shape = "record" ]; "WebsitesHtmlMetaPropertyItem"[ label="<f0> WebsitesHtmlMetaPropertyItem | <f_url> url | property | content " shape="record" ]; "WebsitesHtmlMetaNameItem"[ label="<f0> WebsitesHtmlMetaNameItem | <f_url> url | name | content " shape="record" ]; "WebsitesMetadataItem":f_url -> "WebsitesHtmlMetaPropertyItem":f_url [ id = 0 ]; "WebsitesMetadataItem":f_url -> "WebsitesHtmlMetaNameItem":f_url [ id = 0 ]; }

8.7.7. Similarity features

Jaccard: https://en.wikipedia.org/wiki/Jaccard_index

TD-IDF, term frequency–inverse document frequency: https://en.wikipedia.org/wiki/Tf%E2%80%93idf

Mann–Whitney U test is a nonparametric test of the null hypothesis that, for randomly selected values X and Y from two populations, the probability of X being greater than Y is equal to the probability of Y being greater than X.

Kolmogorov–Smirnov test In statistics, the Kolmogorov–Smirnov test (K–S test or KS test) is a nonparametric test of the equality of continuous (or discontinuous, see Section 2.2), one-dimensional probability distributions that can be used to compare a sample with a reference probability distribution (one-sample K–S test), or to compare two samples (two-sample K–S test).

digraph SimilarityFeatures { label="Similarity features v2020-12-23. Source: 2018-02-03-AAAI-KG-Tutorial-CK.pptx"; Similarity_Features [label="Similarity_Features"]; Attribute_names_similarity [label="Attribute_names_similarity"]; Jaccard [label="Jaccard"]; Value_Similarity [label="Value_Similarity"]; Distribution_Similarity [label="Distribution_Similarity"]; Mann_Whitney_test [label="Mann_Whitney_test"]; Kolmogorov_Smirnov_test [label="Kolmogorov_Smirnov_test"]; Histogram_Similarity [label="Histogram_Similarity"]; TF_IDF [label="TF-IDF"]; Similarity_Features -> Attribute_names_similarity [label=""] Attribute_names_similarity -> Jaccard [label=""] Similarity_Features -> Value_Similarity [label=""] Value_Similarity -> TF_IDF [label=""] Value_Similarity -> Jaccard [label=""] Similarity_Features -> Distribution_Similarity [label=""] Distribution_Similarity -> Mann_Whitney_test [label=""] Distribution_Similarity -> Kolmogorov_Smirnov_test [label=""] Similarity_Features -> Histogram_Similarity [label=""] Histogram_Similarity -> Mann_Whitney_test [label=""] }

Source : AAAI 2018 Tutorial Building Knowledge Graphs

8.8. Authentic sources

8.8.1. Geography - Countries