Construction d'APIs Géographiques à base d'Open Data, PostgreSQL & PostGIS
Sébastien Prunier - @sebprunier
Guillaume Soldera - @gsoldera
#DevoxxFR
Construction d'APIs Géographiques à base d'Open Data, PostgreSQL & PostGIS
#DevoxxFR
Guillaume
Sébastien
Développeur @SerliFr
@gsoldera
Développeur @SerliFr
@sebprunier
Le contexte
« Êtes-vous au courant des risques naturels et technologiques autour de chez vous ? »
Cibles
Le grand public
Les professionnels
Beaucoup de sources de données
Inconvénients
Notre mission
Les Open Data
Open Data en France
Open Data en France
4ème place
score : 72/100
Open Data en France
Open Data en France
2ème place
Open Data en France
Les formats de jeux de données géographiques
4 formats principaux
CSV
JSON
GeoJSON
GeoJSON - Notion de base
GeoJSON - Geometry
{
"type": "Point",
"coordinates": [30, 10]
}
Point
GeoJSON - Geometry
{
"type": "MultiPoint",
"coordinates": [
[10, 40], [40, 30], [20, 20], [30, 10]
]
}
Ensemble de points
GeoJSON - Geometry
{
"type": "LineString",
"coordinates": [
[30, 10], [10, 30], [40, 40]
]
}
Segment
GeoJSON - Geometry
{
"type": "MultiLineString",
"coordinates": [
[
[10, 10], [20, 20], [10, 40]
],
[
[40, 40], [30, 30], [40, 20], [30, 10]
]
]
}
Ensemble de segments
GeoJSON - Geometry
{
"type": "Polygon",
"coordinates": [
[
[30, 10], [40, 40], [20, 40], [10, 20], [30, 10]
]
]
}
Polygone
GeoJSON - Geometry
{
"type": "Polygon",
"coordinates": [
[
[35, 10], [45, 45], [15, 40], [10, 20], [35, 10]
],
[
[20, 30], [35, 35], [30, 20], [20, 30]
]
]
}
Polygone (avec trous)
GeoJSON - Geometry
{
"type": "MultiPolygon",
"coordinates": [
[
[
[30, 20], [45, 40], [10, 40], [30, 20]
]
],
[
[
[15, 5], [40, 10], [10, 20], [5, 10], [15, 5]
]
]
]
}
Ensemble de polygones
GeoJSON - Feature
{
"type": "Feature",
"geometry": {
"type": "Point",
"coordinates": [102.0, 0.5]
},
"properties": {
"prop": "value"
}
}
GeoJSON - FeatureCollection
{
"type": "FeatureCollection",
"features": [
{
"type": "Feature",
"geometry": {
"type": "Point",
"coordinates": [102.0, 0.5]
},
"properties": {
"prop": "value"
}
},
{
"type": "Feature",
"geometry": {
"type": "LineString",
"coordinates": [
[102.0, 0.0], [103.0, 1.0], [104.0, 0.0], [105.0, 1.0]
]
},
"properties": {
"prop1": 0.0
}
}
]
}
GeoJSON - Visualisation
Shapefile
Shapefile
Shapefile - Visualisation
Shapefile vs. GeoJSON
| Shapefile | GeoJSON |
Type de fichier | Binaire | Texte |
Taille de fichier (Ex: contour des communes) | ✅ Optimal (420 Mo) | ⚠️ > Shapefile (1.12 Go) |
Visualisation des données | ⚠️ Nécessite un outil | ✅ Simple |
Traitement des données | ⚠️ Nécessite un outil | ✅ Simple |
La représentation des
données géospatiales
Systèmes de coordonnées
Système de coordonnées géographiques
EPSG.IO
Système de coordonnées projetées
⇒ Plus facilement exploitables que des valeurs angulaires
⇒ ⚠️ les calculs de distance/surface ne prennent pas en compte la courbure de la Terre
Système de coordonnées projetées
Système de coordonnées projetées
(1,1)
Système de coordonnées projetées
(5,3)
Système de coordonnées projetées
Système de coordonnées projetées
Projection
azimutale
Projection
conique
Projection
cylindrique
Projection Mercator (cylindrique)
EPSG.IO
Projection de Peters (cylindrique)
Projection Lambert 93 (conique)
EPSG.IO
Système de coordonnées
Système et projection | Code | Système de projection | Longitude (degré) X (mètres) | Latitude (degré) Y (mètres) |
WGS84 | EPSG:4326 | Ellipsoïdal | 2.351000 | 48.856700 |
Pseudo-Mercator | EPSG:3857 | Cartésien | 261712.12 | 6250581.27 |
RGF93 Lambert93 | EPSG:2154 | Cartésien | 652381.07 | 6862047.10 |
Coordonnées de Paris
⚠️ S’assurer que les données sont dans le même système de projection avant de les traiter
Le stockage des
données géographiques dans PostgreSQL avec PostGIS
PostgreSQL
PostGIS
CREATE EXTENSION IF NOT EXISTS postgis;
SELECT postgis_full_version();
Tables « Metadata »
PostGIS - SPATIAL_REF_SYS
SELECT *
FROM spatial_ref_sys;
SRID : Spatial Reference IDentifier
SRID
SELECT
ST_SRID('POINT(2.5559 49.0083)'::geography) as default_srid_geography,
ST_SRID('POINT(2.5559 49.0083)'::geometry) as default_srid_geometry
;
-- default_srid_geography : 4326
-- default_srid_geometry : 0
SRID
SELECT ST_Equals(
ST_GeomFromText('POINT(0 0)', 4326),
ST_GeomFromText('POINT(0 0)', 26918)
);
ERROR: ST_Equals: Operation on mixed SRID geometries (Point, 4326) != (Point, 26918)
SRID
⚠️ Attention : les index sont construits en utilisant les SRID des « geometries » stockées, donc si les transformations de SRID sont faites au moment de la requête, l’index risque de ne pas être utilisé
Geometry vs Geography
Geometry
SELECT ST_Distance(
'SRID=4326;POINT(-118.4079 33.9434)'::geometry, -- Los Angeles (LAX)
'SRID=4326;POINT(2.5559 49.0083)'::geometry -- Paris (CDG)
) as distance_geom;
-- Résultat : 121.898285970107
Geometry
Crédits : Boundless - Paul Ramsey
Coordonnées « lat / lon »
Crédits : Boundless - Paul Ramsey
Geography
SELECT ST_Distance(
'SRID=4326;POINT(-118.4079 33.9434)'::geography, -- Los Angeles (LAX)
'SRID=4326;POINT(2.5559 49.0083)'::geography -- Paris (CDG)
) as distance_geog;
-- Résultat : 9124665.27317673
Geography
Crédits : Boundless - Paul Ramsey
Geometry
SELECT ST_Distance(
ST_GeometryFromText('SRID=4326;Point(-118.4079 33.9434)'), -- LAX (Los Angeles)
ST_GeometryFromText('SRID=4326;Point(139.733 35.567)') -- NRT (Tokyo/Narita)
) as distance_geom;
-- Résultat : 258.146005837336
⇒ distance 2 fois plus grande que Paris / Los Angeles
Geometry
Crédits : Boundless - Paul Ramsey
Geography
SELECT ST_Distance(
ST_GeographyFromText('Point(-118.4079 33.9434)'), -- LAX (Los Angeles)
ST_GeographyFromText('Point(139.733 35.567)') -- NRT (Tokyo/Narita)
) as distance_geog;
-- Résultat : 8833954.77277118
⇒ distance moins grande que Paris / Los Angeles
Geography
Crédits : Boundless - Paul Ramsey
Geometry - Distances correctes
SELECT
ST_Distance(
ST_GeographyFromText('Point(-118.4079 33.9434)'), -- LAX
ST_GeographyFromText('Point(139.733 35.567)')) -- NRT (Tokyo/Narita)
AS geography_distance,
ST_DistanceSphere(
ST_GeometryFromText('SRID=4326;Point(-118.4079 33.9434)'), -- LAX
ST_GeometryFromText('SRID=4326;Point(139.733 35.567)')) -- NRT (Tokyo/Narita)
AS geometry_distance_sphere;
-- Résultat geography_distance : 8833954.77277118
-- Résultat geometry_distance_sphere : 8814948.74627491
Geometry - Distances correctes
SELECT
ST_Distance(
ST_GeographyFromText('Point(-118.4079 33.9434)'), -- LAX
ST_GeographyFromText('Point(139.733 35.567)')) -- NRT (Tokyo/Narita)
AS geography_distance,
ST_DistanceSpheroid(
ST_GeometryFromText('SRID=4326;Point(-118.4079 33.9434)'), -- LAX
ST_GeometryFromText('SRID=4326;Point(139.733 35.567)'), -- NRT (Tokyo/Narita)
'SPHEROID["WGS 84",6378137,298.257223563]')
AS geometry_distance_spheroid
;
-- Résultat geography_distance : 8833954.77277118
-- Résultat geometry_distance_spheroid : 8833954.772771182
Geometry - Distances correctes
SELECT
ST_Distance(
ST_GeographyFromText('SRID=4326;POINT(2.3522219 48.856614)'), -- Paris
ST_GeographyFromText('SRID=4326;POINT(0.340375 46.580224)') -- Poitiers
) as distance_geog,
ST_Distance(
ST_Transform(ST_GeometryFromText('SRID=4326;POINT(2.3522219 48.856614)'), 2154),
ST_Transform(ST_GeometryFromText('SRID=4326;POINT(0.340375 46.580224)'), 2154)
) as distance_geom;
-- Résultat distance_geog : 294669.87507824
-- Résultat distance_geom : 294475.09209107
Geometry vs Geography
Démo !
#DevoxxFR
Fonctions de base
Colonne geography
CREATE TABLE airports (
code VARCHAR(3),
geog GEOGRAPHY
);
CREATE TABLE airports (
code VARCHAR(3),
geog GEOGRAPHY(Point)
);
INSERT INTO airports VALUES ('LAX', 'POINT(-118.4079 33.9434)');
Colonne geometry
CREATE TABLE airports (
code VARCHAR(3),
geom GEOMETRY
);
CREATE TABLE airports (
code VARCHAR(3),
geom GEOMETRY(Point, 4326)
);
INSERT INTO airports VALUES ('LAX', 'SRID=4326;POINT(-118.4079 33.9434)');
Vue geometry_columns
SELECT *
FROM geometry_columns;
PostGIS - Insertion
INSERT INTO airports
VALUES (
'LAX',
'SRID=4326;POINT(-118.4079 33.9434)'
);
INSERT INTO airports
VALUES (
'CDG',
ST_SetSRID(
ST_GeomFromGeoJSON(
'{"type": "Point", "coordinates": [2.5559, 49.0083]}'
),
4326
)
);
Les indexes
Index
Spatial Index - Fonctionnement
Spatial Index - Fonctionnement
Quelle(s) ligne(s) intersecte(nt) l’étoile verte ?
Spatial Index - Fonctionnement
Quelle(s) lignes intersectent l’étoile verte ?
Spatial Index - Fonctionnement
Quelle(s) boîte(s) intersectent la boîte verte ?
Spatial Index - Fonctionnement
Quelle(s) lignes intersectent l’étoile verte ?
Performance sans index
B
C
D
E
F
H
G
I
K
L
M
A
B
C
D
E
F
G
H
I
J
K
L
M
13 “boîtes” à vérifier
Trouver quel objet contient
A
J
Performance avec index
A
B
C
D
E
F
H
G
I
J
K
L
M
A
B
C
D
E
F
G
H
I
J
K
L
M
Trouver quel objet contient
N
N
P
P
Q
Q
R
R
S
S
T
T
U
U
Performance avec index
T
P
N
C
D
E
Q
N
A
B
F
G
U
S
R
H
I
J
K
L
M
A
B
C
D
E
F
G
H
I
J
K
L
M
8 “boîtes” à vérifier
Trouver quel objet contient
P
Q
R
S
T
U
Les formats de représentations
Représentations principales
POINT(2.3522219 48.856614)
SRID=4326;POINT(2.3522219 48.856614)
{"type":"Point","coordinates":[2.5559,49.0083]}
Autres représentations
<gml:Point>
<gml:coordinates>2.3522219,48.856614</gml:coordinates>
</gml:Point>
cx="2.5559" cy="-49.0083"
PostGIS - Formats
Format | Entrée | Sortie |
WKT | ✅ | ✅ |
WKB | ✅ | ✅ |
GeoJSON | ✅ | ✅ |
SVG | ❌ | ✅ |
GML | ✅ | ✅ |
MVT (Mapbox Vector Tiles) | ❌ | ✅ |
Shapefile | ❌ | ❌ |
La problématique
PostgreSQL
PostGIS
Shapefile
(.shp)
WKT ?
WKB ?
GeoJSON ?
GML ?
❌
✅
?
Les outils de transformation
ogr2ogr
ogr2ogr \
-t_srs EPSG:4326
-f GeoJSON cities.json \
cities.shp
Format et fichier cibles
Fichier source
Projection cible
Simplification
Précision des coordonnées
Source : https://xkcd.com/2170/
Démo !
#DevoxxFR
Simplification
⚠️ Transformation & Simplification
SELECT ST_AsGeoJSON(ST_Intersection(cc.contour, i.area))
FROM inondation i
INNER JOIN commune_contour cc ON ST_Intersects(cc.contour, i.area)
WHERE cc.typecom = 'COM' AND cc.com = '65275'
lwgeom_intersection_prec: GEOS Error: TopologyException: Input geom 0 is invalid: Self-intersection at 0.20284285714285696 43.048653571428574
Self-intersection
SELECT ST_IsValidReason(contour), ST_AsGeoJSON(contour)
FROM commune_contour
WHERE ST_IsValid(contour) = false
Self-intersection
ST_MakeValid
UPDATE commune_contour
SET contour = ST_MakeValid(contour)
WHERE ST_IsValid(contour) = false
L’insertion des données
Choix
Cas d’usage
PPRN - Données
Approche
PostgreSQL
data.gouv.fr
curl
shp
csv
unzip
sql
sql
node
node
geo
json
ogr2ogr
psql
sql
schémas, indexes,
contraintes
psql
Approche
Création des schémas
Documentation des données
Documentation des données
Documentation des données
INSERT INTO Data_Provider(id, name, url, description)
VALUES ('ministere-de-la-transition-ecologique',
'Ministère de la Transition écologique',
'https://www.ecologie.gouv.fr/',
NULL);
INSERT INTO Data_Source(id, name, url, description, publication_date, data_provider_id)
VALUES ('georisques',
'Géorisques',
'https://www.georisques.gouv.fr/donnees/bases-de-donnees',
'Géorisques est le site de référence sur les risques majeurs naturels et technologiques',
'2020-09-01',
'ministere-de-la-transition-ecologique');
INSERT INTO Table_Data_Source(table_name, data_source_id) VALUES ('zone_pprn', 'georisques');
Documentation des données
COMMENT ON TABLE zone_pprn IS 'Liste des zones des PPRN';
COMMENT ON COLUMN zone_pprn.id_zone IS 'Identifiant de la zone PPRN';
COMMENT ON COLUMN zone_pprn.id_gaspar IS 'Identifiant GASPAR du PPRN';
COMMENT ON COLUMN zone_pprn.nom IS 'Nom de la zone (intitulé exact figurant dans le règlement du PPRN)';
COMMENT ON COLUMN zone_pprn.codezone IS 'Code de la zone (intitulé exact figurant dans le règlement du PPRN)';
COMMENT ON COLUMN zone_pprn.typereg IS 'Type standardisé caractérisant la nature de la réglementation prévalant sur la zone';
COMMENT ON COLUMN zone_pprn.urlfic IS 'URL permettant d''accéder depuis le web au fichier contenant le règlement de la zone';
COMMENT ON COLUMN zone_pprn.area IS 'Zone géographique du PPRN';
Création des schémas
CREATE TABLE zone_pprn
(
id_zone TEXT NOT NULL PRIMARY KEY,
id_gaspar character varying(20) NOT NULL,
nom TEXT,
codezone TEXT,
typereg character varying(2) NOT NULL,
urlfic TEXT,
area geometry NOT NULL
);
CREATE INDEX zone_pprn_id_gaspar_idx ON zone_pprn (id_gaspar);
CREATE INDEX zone_pprn_area_idx ON zone_pprn USING gist (area);
Insertion des data
Insertion des data CSV
function buildInsertQuery(rows) {
return `INSERT INTO document_pprn(id_gaspar, nom, …) VALUES ${rows.join(',')};`
}
const fs = require('fs');
const csv = require('csv-parser');
fs.createReadStream(document_pprn_file)
.pipe(csv({separator: ';'}))
.on('data', (data) => {
const {id_gaspar,nom,...} = data;
rows.push(`('${id_gaspar}', ${handleStringValue(nom)}, …)`);
if (rows.length === bulkSize) {
console.log(buildInsertQuery(rows));
rows = [];
}
})
…
Insertion des data géographiques
const pipeline = chain([
fs.createReadStream(inputFileName),
Pick.withParser({filter: 'features'}),
streamArray()
]);
pipeline.on('data', data => {
const feature = data.value;
const geom = `ST_SetSRID(ST_GeomFromGeoJSON('${JSON.stringify(feature.geometry)}'), 4326)`
const props = featureMapper(feature.properties).map(property => buildColumnValue(property))
props.push(geom)
const toInsert = `(${props.join(",")})`;
rows.push(toInsert)
if (rows.length === bulkSize) {
console.log(buildInsertQuery(rows, sqlTableTemplate));
rows = [];
}
});
…
Avantages / Inconvénients
Comment réduire le temps d’exécution
Commande COPY
COPY <nom_table> FROM '<chemin_fichier>' DELIMITER ';' CSV HEADER;
Instruction \copy
Approche n°2
PostgreSQL
data.gouv.fr
curl
shp
csv
unzip
\copy
sql
schémas
sql
indexes, contraintes
psql
ogr2ogr
csv
csv
csv
mlr
mlr
Approche n°2
Génération des CSV à partir des SHP
ogr2ogr \
-t_srs EPSG:4326 \
-f CSV zone_pprn_17.csv \
zone_pprn_s_017.shp \
-lco SEPARATOR=SEMICOLON \
-lco GEOMETRY=AS_WKT \
-lco GEOMETRY_NAME=geom_latlon -lco CREATE_CSVT=YES
Cible
Source
Format d’export de la géométrie
Permet de spécifier le nom de la colonne qui contiendra la géométrie
Exemple de fichier CSV généré
geom_latlon;id_zone;id_gaspar;nom;...
"POLYGON (...)";RNATZR000000000039802;17DDTM19960003;"Zone réglementaire..";...
Manipulation des CSV
Démo !
#DevoxxFR
Transformation des CSV
mlr \
--icsv --ifs ';' \
--ocsv --ofs ';' \
cut -o -f id_zone,id_gaspar,nom,codezone,typereg,urlfic,geom_latlon \
zone_pprn_17.csv \
| mlr \
--icsv --ifs ';' \
--ocsv --ofs ';' put '
$geom_latlon = "SRID=4326; ". $geom_latlon .""
' > zone_pprn_transformed.csv
Transformation des CSV
mlr \
--icsv --ifs ';' --ocsv --ofs ';' put '
if ($coderisque == 9999999) {
$coderisque = "".$num_alea."00000"
} else {
$coderisque = $coderisque
}
' multirisque_pprn_017.csv \
| mlr --icsv --ifs ';' --ocsv --ofs ';' cut -o -f id_gaspar,coderisque \
> multirisque_pprn_light.csv
Insertion des données
export PGPASSWORD=******
export pg_cmd_base="psql -h XXX.XXX.XX.XX -p 5400 -U postgres -d demo_devoxx"
echo "copy multirisque_pprn"
FILE="$(ls input/n_zonages_risque_naturel/multirisque_pprn_light.csv)"
FULL_PATH="$(pwd)/$FILE"
CMD="$pg_cmd_base -c \"\COPY multirisque_pprn FROM '$FULL_PATH' DELIMITER ';' CSV HEADER;\""
eval "$CMD"
Avantages / Inconvénients
Les API
Enjeux de nos API
Data
API
Applications clients finaux
Partenaires
B2C
B2B
Bonnes pratiques
Points d’attention particuliers
Source : @FGRibreau
Nommage et Cohérence
Nommage
« There are only two hard things in Computer Science: cache invalidation and naming things. »
(Phil Karlton)
#DevoxxFR
Arborescence de nos ressources
Nommage
Cohérence des endpoints
Cohérence des endpoints
Par défaut, les zones ne sont pas remontées dans les payloads (trop volumineux). Pour récupérer les zones afin de les visualiser (sur une carte par exemple) :
Design des réponses
& Utilisabilité
Utilisabilité
/v1/argiles/zones?code_insee=17300
???
/v1/argiles/_metadata
Utilisabilité
/v2/argiles/zones?code_insee=17300
Design des réponses
Design des réponses
Versioning
Versioning
Versioning
PostgREST
API Clients
v1
PostgREST
Versioning & Code Style
Versioning & Coding Style
@Override
public JsonNode toJson(ObjectMapper objectMapper) {
ObjectNode node = objectMapper.createObjectNode();
node.put("ref", ref);
descriptionOpt.ifPresent(description -> node.put("description", description));
if (!wcomList.isEmpty()) {
node.set("wcom", JsonUtils.toJsonStringArray(wcomList, objectMapper));
}
node.put("externalUrl", "https://www.pop.culture.gouv.fr/notice/merimee/" + ref);
areaOpt.ifPresent(area -> {
try {
JsonNode areaNode = objectMapper.readTree(area);
node.set("geometry", areaNode);
} catch (JsonProcessingException e) {
node.set("geometry", NullNode.getInstance());
}
});
return node;
}
Pagination
Pagination
self
first
last
prev
next
Pagination
➜ curl http://localhost:8080/api/v1/territoires/communes --include
HTTP/1.1 200
Link: </api/v1/territoires/communes?page=2>; rel="next", � </api/v1/territoires/communes?page=1510>; rel="last", � </api/v1/territoires/communes?page=1>; rel="self"
X-Total-Count: 37742
vary: accept-encoding
Content-Type: application/json;charset=UTF-8
Content-Length: 29985
Date: Fri, 11 Mar 2022 13:31:49 GMT
Pagination
➜ curl "http://localhost:8080/api/v1/territoires/communes?page=42" --include
HTTP/1.1 200
Link: </api/v1/territoires/communes?page=43>; rel="next",� </api/v1/territoires/communes?page=1510>; rel="last", � </api/v1/territoires/communes?page=41>; rel="prev", � </api/v1/territoires/communes?page=42>; rel="self", � </api/v1/territoires/communes?page=1>; rel="first"
X-Total-Count: 37742
vary: accept-encoding
Content-Type: application/json;charset=UTF-8
Content-Length: 28307
Date: Fri, 11 Mar 2022 13:45:02 GMT
Pagination
➜ curl "http://localhost:8080/api/v1/territoires/communes?page=42&per_page=200" --include
HTTP/1.1 200
Link: </api/v1/territoires/communes?per_page=200&page=43>; rel="next", � </api/v1/territoires/communes?per_page=200&page=378>; rel="last", � </api/v1/territoires/communes?per_page=200&page=41>; rel="prev", � </api/v1/territoires/communes?per_page=200&page=42>; rel="self", � </api/v1/territoires/communes?per_page=200&page=1>; rel="first"
X-Total-Count: 37742
vary: accept-encoding
Content-Type: application/json;charset=UTF-8
Content-Length: 118156
Date: Fri, 11 Mar 2022 13:46:54 GMT
Documentation
Documentation
Documentation
Documentation
Quelques problématiques autour d’Open API (Swagger)
responses:
'200':
description: OK - Retourne le détail du fournisseur de données.
content:
application/json:
schema:
$ref: 'schemas.yml#/components/schemas/DataProvider'
Documentation
Quelques problématiques autour d’Open API (Swagger)
responses:
'200':
description: OK - Retourne l'ensemble des risques associés au point géographique donné.
content:
application/x-ndjson:
schema:
type: array
items:
$ref: '#/components/schemas/BulkRisquesOutput'
Documentation
La documentation est rendue via un template HTML Spring Boot grâce à la lib JS SwaggerUI
<body>
<div id="swagger-ui"></div>
<script src="doc/javascripts/swagger-ui-bundle.js" charset="UTF-8"></script>
<script src="doc/javascripts/swagger-ui-standalone-preset.js" charset="UTF-8"></script>
<script>
window.onload = function () {
const ui = SwaggerUIBundle({
url: "doc/swagger.yml",
dom_id: '#swagger-ui',
deepLinking: true,
presets: [
SwaggerUIBundle.presets.apis,
SwaggerUIStandalonePreset
],
plugins: [
SwaggerUIBundle.plugins.DownloadUrl
],
layout: "StandaloneLayout"
})
window.ui = ui
}
</script>
</body>
@Controller
public class DocumentationController {
@GetMapping("/doc")
public String doc() {
return "documentation";
}
}
Testing
Testing
TestContainers
TestContainers
@Testcontainers
public class PaginatedQueryTests {
@Container
private static final PostgreSQLContainer<?> PG_CONTAINER = new PostgreSQLContainer<>("postgres:12")
.withUsername("xxx_user_test")
.withPassword("xxx_user_test")
.withDatabaseName("xxx_db_test")
.withExposedPorts(5432)
.withInitScript("utils/pagination/paginated-query-tests.sql");
2022-04-11 10:23:15.390 INFO 48343 --- [ main] 🐳 [postgres:12] : Creating container for image: postgres:12
2022-04-11 10:23:15.459 INFO 48343 --- [ main] 🐳 [postgres:12] : Starting container with ID: e91faa1e5ea405fc581331d09b49d0537fba3ba7327a66127221367d623b20a2
2022-04-11 10:23:15.923 INFO 48343 --- [ main] 🐳 [postgres:12] : Container postgres:12 is starting: e91faa1e5ea405fc581331d09b49d0537fba3ba7327a66127221367d623b20a2
2022-04-11 10:23:17.277 INFO 48343 --- [ main] 🐳 [postgres:12] : Container postgres:12 started in PT1.887176S
2022-04-11 10:23:17.280 INFO 48343 --- [ main] org.testcontainers.ext.ScriptUtils : Executing database script from utils/pagination/paginated-query-tests.sql
2022-04-11 10:23:17.360 INFO 48343 --- [ main] org.testcontainers.ext.ScriptUtils : Executed database script from utils/pagination/paginated-query-tests.sql in 80 ms.
Tests End-To-End
test('GET /v1/urbanisme/monuments-historiques/_metadata', async () => {
const response = await request.get('/v1/urbanisme/monuments-historiques/_metadata');
expect(response.status).toBe(200);
const expectedData = {
data_sources: [
{
id: "monuments-historiques",
name: "Monuments Historiques",
url: "https://data.culture.gouv.fr/[...]-monuments-historiques/",
description: "Liste et localisation des Immeubles protégés au titre des Monuments Historiques",
publication_date: "2020-12-20",
provider: {
id: "ministere-de-la-culture",
name: "Ministère de la Culture",
url: "https://data.culture.gouv.fr/",
description: "La plate-forme de données ouvertes du ministère de la Culture"
}
}
]
};
expect(response.body).toEqual(expectedData);
});
Les problèmes liés à la précision
PostGIS 2.5
PostGIS 3.x
La sécurité
API Management
Otoroshi
Data
API
Applications clients/soc.
Partenaires
B2C
B2B
Otoroshi
Otoroshi
Otoroshi
Analytics & Monitoring
Analytics
Otoroshi
Data
API
Applications clients/soc.
Partenaires
B2B
Events
Dataviz
Analytics
Analytics
Metrics Clever Cloud
Metrics Clever Cloud
Monitoring
Alerting
Les performances
Performances
Kent Beck (extreme programming) :
Pourquoi fast ?
API « Résumé »
GET /api/v1/risques
?lat=48.768837219&lon=1.968329122
&fields=naturels.secheresses,naturels.tempetes
Résumé
Risques
Sécheresses
Inondations
Tempêtes
Centrales Nucléaires
❌
❌
API _bulk
POST /api/v1/risques/_bulk
&fields=naturels.secheresses,naturels.tempetes
Résumé
Risques
Sécheresses
Inondations
Tempêtes
Centrales Nucléaires
Résumé
Risques
Sécheresses
Inondations
Tempêtes
Centrales Nucléaires
Résumé
Risques
Sécheresses
Inondations
Tempêtes
Centrales Nucléaires
Bulk
Risques
{"id":"1","lat":48.768837,"lon":1.968329}
{"id":"2","lat":48.346621,"lon":0.476500}
{"id":"3","lat":47.711751,"lon":-3.469709}
❌
❌
❌
❌
❌
❌
lat=48.768837
lon=1.968329
lat=48.346621
lon=0.476500
lat=47.711751
lon=-3.469709
API _bulk
curl --location --request POST 'http://localhost:8080/api/v1/risques/_bulk?fields=naturels.secheresses.scores' \
--header 'Content-Type: application/x-ndjson' \
--data-raw '{"id":"6982a745-1284-4e07-a628-e7d567560773","lat":48.7688372194,"lon":1.9683291223,"radius":1000}
{"id":"036aee0d-74b6-40fc-910e-8f5f7687983c","lat":48.3466211545,"lon":0.4765005545,"radius":1000}
{"id":"be63380e-5fbf-4e97-991e-b7f7694e70e0","lat":47.7117511724,"lon":-3.4697094768,"radius":1000}
{"id":"41a01f64-e82c-4941-a493-593d39abd98a","lat":42.6923137272,"lon":2.8918979363,"radius":1000}
{"id":"9299b0d0-6d57-4c36-971a-a5732bee11ad","lat":48.8601300090,"lon":2.3694592772,"radius":1000}'
{"id":"036aee0d-74b6-40fc-910e-8f5f7687983c","status":200,"risques":{"naturels":{"secheresses":{"score":4,"color":"#e9352e","label":"Fort"}}}}
{"id":"6982a745-1284-4e07-a628-e7d567560773","status":200,"risques":{"naturels":{"secheresses":{"score":5,"color":"#68389b","label":"Très fort"}}}}
{"id":"be63380e-5fbf-4e97-991e-b7f7694e70e0","status":200,"risques":{"naturels":{"secheresses":{"score":1,"color":"#a0cd63","label":"A priori nul ou hors zonage"}}}}
{"id":"41a01f64-e82c-4941-a493-593d39abd98a","status":200,"risques":{"naturels":{"secheresses":{"score":4,"color":"#e9352e","label":"Fort"}}}}
{"id":"9299b0d0-6d57-4c36-971a-a5732bee11ad","status":200,"risques":{"naturels":{"secheresses":{"score":1,"color":"#a0cd63","label":"A priori nul ou hors zonage"}}}}
id de corrélation
Hey !
hey -n 500 -c 50 "http://localhost:8080/api/v1/risques/pprn/zones?lat=46.15654&lon=-1.15420"
Summary:
Total: 0.8334 secs
Slowest: 0.3070 secs
Fastest: 0.0070 secs
Average: 0.0727 secs
Requests/sec: 599.9446
Response time histogram:
0.007 [1] |
0.037 [174] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
0.067 [113] |■■■■■■■■■■■■■■■■■■■■■■■■■■
0.097 [56] |■■■■■■■■■■■■■
0.127 [66] |■■■■■■■■■■■■■■■
0.157 [38] |■■■■■■■■■
0.187 [34] |■■■■■■■■
0.217 [11] |■■■
0.247 [5] |■
0.277 [1] |
0.307 [1] |
Latency distribution:
10% in 0.0141 secs
25% in 0.0274 secs
50% in 0.0563 secs
75% in 0.1128 secs
90% in 0.1602 secs
95% in 0.1812 secs
99% in 0.2224 secs
Details (average, fastest, slowest):
DNS+dialup: 0.0005 secs, 0.0070 secs, 0.3070 secs
DNS-lookup: 0.0002 secs, 0.0000 secs, 0.0028 secs
req write: 0.0000 secs, 0.0000 secs, 0.0014 secs
resp wait: 0.0706 secs, 0.0066 secs, 0.3064 secs
resp read: 0.0016 secs, 0.0004 secs, 0.0378 secs
Status code distribution:
[200] 500 responses
@Async
Première version : code bloquant
Deuxième version : code non-bloquant (@EnableAsync sur l’Application)
@Async
@GetMapping(path = "/api/v1/data/sources/{id}", produces = "application/json; charset=UTF-8")
@ResponseBody
public CompletableFuture<String> byId(@PathVariable String id, HttpServletResponse response) {
CompletableFuture<Optional<DataSource>> dataSourceOptFuture = dataSourceRepository.findById(id);
return dataSourceOptFuture.thenApply(
dataSourceOpt -> processOptionalJsonable(dataSourceOpt, response)
);
}
@GetMapping(path = "/api/v1/data/source/{id}", produces = "application/json; charset=UTF-8")
@ResponseBody
public String byId(@PathVariable String id, HttpServletResponse response) {
Optional<DataSource> dataSourceOpt = dataSourceRepository.findById(id);
return processOptionalJsonable(dataSourceOpt, response);
}
@Async
@Bean
public Executor taskExecutor() {
int parallelism = Math.min(
Math.max(
Double.valueOf(Math.ceil(
Runtime.getRuntime().availableProcessors() * parallelismFactor)
).intValue(),
parallelismMin
),
parallelismMax
);
return new ForkJoinPool(
parallelism,
ForkJoinPool.defaultForkJoinWorkerThreadFactory,
new Thread.UncaughtExceptionHandler() {...},
true
);
}
executor.parallelism-min=${EXECUTOR_PARALLELISM_MIN:2}
executor.parallelism-max=${EXECUTOR_PARALLELISM_MAX:10}
executor.parallelism-factor=${EXECUTOR_PARALLELISM_FACTOR:2.0}
@Value("${executor.parallelism-min}")
private int parallelismMin;
@Value("${executor.parallelism-max}")
private int parallelismMax;
@Value("${executor.parallelism-factor}")
private double parallelismFactor;
Scalabilité
Scalabilité des addons
Scalabilité des addons
Scalabilité
Configuration Pool de connexions à la DB
spring.datasource.hikari.maximum-pool-size=${HIKARI_MAXIMUM_POOL_SIZE:10}
Optimisation des requêtes
Optimisation - Cas d’usage
Optimisation - Cas d’usage
Tranche détaillée d’unité urbaine :
donne une classification des communes
selon leur nombre d’habitants
Optimisation - V0
création d’une vue pour « fusionner » base_permanente_equipement et ref_typeequipement
Optimisation des requêtes - V0
WITH
tduu_city AS (
SELECT tduu FROM appartenance_geographique WHERE com='86281'
), app_geo_tduu AS (
SELECT com, typecom
FROM appartenance_geographique ag
INNER JOIN tduu_city tc ON ag.tduu = tc.tduu
)
SELECT distinct(rte.lib_domaine) as label, COALESCE(b.score, 0) as score
FROM ref_typeequipement rte
LEFT JOIN (
SELECT lib_domaine as label, (ntile / 10::decimal) AS score
FROM (
SELECT *, ntile(100) OVER (PARTITION BY lib_domaine ORDER BY countEquipCom)
FROM (
SELECT bpe.com, bpe.lib_domaine, count(*) AS countEquipCom
FROM view_bpe bpe
INNER JOIN app_geo_tduu ag ON bpe.com = ag.com AND bpe.typecom = ag.typecom
GROUP BY bpe.com, bpe.lib_domaine
) as sub_a
) as sub_b
WHERE com='86281'
) as b on rte.lib_domaine = b.label
Optimisation des requêtes - V0
WITH
tduu_city AS (
SELECT tduu FROM appartenance_geographique WHERE com='86281'
), app_geo_tduu AS (
SELECT com, typecom
FROM appartenance_geographique ag
INNER JOIN tduu_city tc ON ag.tduu = tc.tduu
)
SELECT distinct(rte.lib_domaine) as label, COALESCE(b.score, 0) as score
FROM ref_typeequipement rte
LEFT JOIN (
SELECT lib_domaine as label, (ntile / 10::decimal) AS score
FROM (
SELECT *, ntile(100) OVER (PARTITION BY lib_domaine ORDER BY countEquipCom)
FROM (
SELECT bpe.com, bpe.lib_domaine, count(*) AS countEquipCom
FROM view_bpe bpe
INNER JOIN app_geo_tduu ag ON bpe.com = ag.com AND bpe.typecom = ag.typecom
GROUP BY bpe.com, bpe.lib_domaine
) as sub_a
) as sub_b
WHERE com='86281'
) as b on rte.lib_domaine = b.label
Optimisation des requêtes - V0
WITH
tduu_city AS (
SELECT tduu FROM appartenance_geographique WHERE com='86281'
), app_geo_tduu AS (
SELECT com, typecom
FROM appartenance_geographique ag
INNER JOIN tduu_city tc ON ag.tduu = tc.tduu
)
SELECT distinct(rte.lib_domaine) as label, COALESCE(b.score, 0) as score
FROM ref_typeequipement rte
LEFT JOIN (
SELECT lib_domaine as label, (ntile / 10::decimal) AS score
FROM (
SELECT *, ntile(100) OVER (PARTITION BY lib_domaine ORDER BY countEquipCom)
FROM (
SELECT bpe.com, bpe.lib_domaine, count(*) AS countEquipCom
FROM view_bpe bpe
INNER JOIN app_geo_tduu ag ON bpe.com = ag.com AND bpe.typecom = ag.typecom
GROUP BY bpe.com, bpe.lib_domaine
) as sub_a
) as sub_b
WHERE com='86281'
) as b on rte.lib_domaine = b.label
Optimisation des requêtes - V0
WITH
tduu_city AS (
SELECT tduu FROM appartenance_geographique WHERE com='86281'
), app_geo_tduu AS (
SELECT com, typecom
FROM appartenance_geographique ag
INNER JOIN tduu_city tc ON ag.tduu = tc.tduu
)
SELECT distinct(rte.lib_domaine) as label, COALESCE(b.score, 0) as score
FROM ref_typeequipement rte
LEFT JOIN (
SELECT lib_domaine as label, (ntile / 10::decimal) AS score
FROM (
SELECT *, ntile(100) OVER (PARTITION BY lib_domaine ORDER BY countEquipCom)
FROM (
SELECT bpe.com, bpe.lib_domaine, count(*) AS countEquipCom
FROM view_bpe bpe
INNER JOIN app_geo_tduu ag ON bpe.com = ag.com AND bpe.typecom = ag.typecom
GROUP BY bpe.com, bpe.lib_domaine
) as sub_a
) as sub_b
WHERE com='86281'
) as b on rte.lib_domaine = b.label
Optimisation des requêtes - V0
WITH
tduu_city AS (
SELECT tduu FROM appartenance_geographique WHERE com='86281'
), app_geo_tduu AS (
SELECT com, typecom
FROM appartenance_geographique ag
INNER JOIN tduu_city tc ON ag.tduu = tc.tduu
)
SELECT distinct(rte.lib_domaine) as label, COALESCE(b.score, 0) as score
FROM ref_typeequipement rte
LEFT JOIN (
SELECT lib_domaine as label, (ntile / 10::decimal) AS score
FROM (
SELECT *, ntile(100) OVER (PARTITION BY lib_domaine ORDER BY countEquipCom)
FROM (
SELECT bpe.com, bpe.lib_domaine, count(*) AS countEquipCom
FROM view_bpe bpe
INNER JOIN app_geo_tduu ag ON bpe.com = ag.com AND bpe.typecom = ag.typecom
GROUP BY bpe.com, bpe.lib_domaine
) as sub_a
) as sub_b
WHERE com='86281'
) as b on rte.lib_domaine = b.label
Optimisation des requêtes - V0
WITH
tduu_city AS (
SELECT tduu FROM appartenance_geographique WHERE com='86281'
), app_geo_tduu AS (
SELECT com, typecom
FROM appartenance_geographique ag
INNER JOIN tduu_city tc ON ag.tduu = tc.tduu
)
SELECT distinct(rte.lib_domaine) as label, COALESCE(b.score, 0) as score
FROM ref_typeequipement rte
LEFT JOIN (
SELECT lib_domaine as label, (ntile / 10::decimal) AS score
FROM (
SELECT *, ntile(100) OVER (PARTITION BY lib_domaine ORDER BY countEquipCom)
FROM (
SELECT bpe.com, bpe.lib_domaine, count(*) AS countEquipCom
FROM view_bpe bpe
INNER JOIN app_geo_tduu ag ON bpe.com = ag.com AND bpe.typecom = ag.typecom
GROUP BY bpe.com, bpe.lib_domaine
) as sub_a
) as sub_b
WHERE com='86281'
) as b on rte.lib_domaine = b.label
Optimisation des requêtes - V0
WITH
tduu_city AS (
SELECT tduu FROM appartenance_geographique WHERE com='86281'
), app_geo_tduu AS (
SELECT com, typecom
FROM appartenance_geographique ag
INNER JOIN tduu_city tc ON ag.tduu = tc.tduu
)
SELECT distinct(rte.lib_domaine) as label, COALESCE(b.score, 0) as score
FROM ref_typeequipement rte
LEFT JOIN (
SELECT lib_domaine as label, (ntile / 10::decimal) AS score
FROM (
SELECT *, ntile(100) OVER (PARTITION BY lib_domaine ORDER BY countEquipCom)
FROM (
SELECT bpe.com, bpe.lib_domaine, count(*) AS countEquipCom
FROM view_bpe bpe
INNER JOIN app_geo_tduu ag ON bpe.com = ag.com AND bpe.typecom = ag.typecom
GROUP BY bpe.com, bpe.lib_domaine
) as sub_a
) as sub_b
WHERE com='86281'
) as b on rte.lib_domaine = b.label
Temps d’exécution :
2.8s
Optimisation des requêtes - V0
WITH
tduu_city AS (
SELECT tduu FROM appartenance_geographique WHERE com='86281'
), app_geo_tduu AS (
SELECT com, typecom
FROM appartenance_geographique ag
INNER JOIN tduu_city tc ON ag.tduu = tc.tduu
)
SELECT distinct(rte.lib_domaine) as label, COALESCE(b.score, 0) as score
FROM ref_typeequipement rte
LEFT JOIN (
SELECT lib_domaine as label, (ntile / 10::decimal) AS score
FROM (
SELECT *, ntile(100) OVER (PARTITION BY lib_domaine ORDER BY countEquipCom)
FROM (
SELECT bpe.com, bpe.lib_domaine, count(*) AS countEquipCom
FROM view_bpe bpe
INNER JOIN app_geo_tduu ag ON bpe.com = ag.com AND bpe.typecom = ag.typecom
GROUP BY bpe.com, bpe.lib_domaine
) as sub_a
) as sub_b
WHERE com='86281'
) as b on rte.lib_domaine = b.label
Optimisation - V1
Optimisation - V1
CREATE MATERIALIZED VIEW view_grouped_domaine_bpe_tduu
AS
SELECT bpe.com, bpe.lib_domaine, ag.tduu, count(*) as count
FROM view_bpe bpe
INNER JOIN appartenance_geographique ag ON bpe.com = ag.com AND bpe.typecom = ag.typecom
GROUP BY bpe.com, bpe.lib_domaine, ag.tduu;
CREATE INDEX view_grouped_domaine_bpe_tduu_typecom_com_idx ON view_grouped_domaine_bpe_tduu (com);
CREATE INDEX view_grouped_domaine_bpe_tduu_lib_domaine_idx ON view_grouped_domaine_bpe_tduu (lib_domaine);
CREATE INDEX view_grouped_domaine_bpe_tduu_tduu_idx ON view_grouped_domaine_bpe_tduu (tduu);
Optimisation - V1
WITH
tduu_city AS (
SELECT tduu FROM appartenance_geographique WHERE com='86281'
), bpe_rank_for_city AS (
SELECT lib_domaine as label, (ntile / 10::decimal) AS score
FROM (
SELECT *, ntile(100) OVER (PARTITION BY lib_domaine ORDER BY countEquipCom)
FROM (
SELECT bpe.com, bpe.lib_domaine, bpe.count as countEquipCom
FROM view_grouped_domaine_bpe_tduu bpe
INNER JOIN tduu_city tc ON bpe.tduu = tc.tduu
) AS sub_a
) AS sub_b
WHERE com='86281'
)
SELECT distinct(rte.lib_domaine) AS label, COALESCE(b.score, 0) as score
FROM ref_typeequipement rte
LEFT JOIN bpe_rank_for_city b ON rte.lib_domaine = b.label
Optimisation - V1
WITH
tduu_city AS (
SELECT tduu FROM appartenance_geographique WHERE com='86281'
), bpe_rank_for_city AS (
SELECT lib_domaine as label, (ntile / 10::decimal) AS score
FROM (
SELECT *, ntile(100) OVER (PARTITION BY lib_domaine ORDER BY countEquipCom)
FROM (
SELECT bpe.com, bpe.lib_domaine, bpe.count as countEquipCom
FROM view_grouped_domaine_bpe_tduu bpe
INNER JOIN tduu_city tc ON bpe.tduu = tc.tduu
) AS sub_a
) AS sub_b
WHERE com='86281'
)
SELECT distinct(rte.lib_domaine) AS label, COALESCE(b.score, 0) as score
FROM ref_typeequipement rte
LEFT JOIN bpe_rank_for_city b ON rte.lib_domaine = b.label
Optimisation - V1
WITH
tduu_city AS (
SELECT tduu FROM appartenance_geographique WHERE com='86281'
), bpe_rank_for_city AS (
SELECT lib_domaine as label, (ntile / 10::decimal) AS score
FROM (
SELECT *, ntile(100) OVER (PARTITION BY lib_domaine ORDER BY countEquipCom)
FROM (
SELECT bpe.com, bpe.lib_domaine, bpe.count as countEquipCom
FROM view_grouped_domaine_bpe_tduu bpe
INNER JOIN tduu_city tc ON bpe.tduu = tc.tduu
) AS sub_a
) AS sub_b
WHERE com='86281'
)
SELECT distinct(rte.lib_domaine) AS label, COALESCE(b.score, 0) as score
FROM ref_typeequipement rte
LEFT JOIN bpe_rank_for_city b ON rte.lib_domaine = b.label
Optimisation - V1
WITH
tduu_city AS (
SELECT tduu FROM appartenance_geographique WHERE com='86281'
), bpe_rank_for_city AS (
SELECT lib_domaine as label, (ntile / 10::decimal) AS score
FROM (
SELECT *, ntile(100) OVER (PARTITION BY lib_domaine ORDER BY countEquipCom)
FROM (
SELECT bpe.com, bpe.lib_domaine, bpe.count as countEquipCom
FROM view_grouped_domaine_bpe_tduu bpe
INNER JOIN tduu_city tc ON bpe.tduu = tc.tduu
) AS sub_a
) AS sub_b
WHERE com='86281'
)
SELECT distinct(rte.lib_domaine) AS label, COALESCE(b.score, 0) as score
FROM ref_typeequipement rte
LEFT JOIN bpe_rank_for_city b ON rte.lib_domaine = b.label
Optimisation - V1
WITH
tduu_city AS (
SELECT tduu FROM appartenance_geographique WHERE com='86281'
), bpe_rank_for_city AS (
SELECT lib_domaine as label, (ntile / 10::decimal) AS score
FROM (
SELECT *, ntile(100) OVER (PARTITION BY lib_domaine ORDER BY countEquipCom)
FROM (
SELECT bpe.com, bpe.lib_domaine, bpe.count as countEquipCom
FROM view_grouped_domaine_bpe_tduu bpe
INNER JOIN tduu_city tc ON bpe.tduu = tc.tduu
) AS sub_a
) AS sub_b
WHERE com='86281'
)
SELECT distinct(rte.lib_domaine) AS label, COALESCE(b.score, 0) as score
FROM ref_typeequipement rte
LEFT JOIN bpe_rank_for_city b ON rte.lib_domaine = b.label
Optimisation - V1
WITH
tduu_city AS (
SELECT tduu FROM appartenance_geographique WHERE com='86281'
), bpe_rank_for_city AS (
SELECT lib_domaine as label, (ntile / 10::decimal) AS score
FROM (
SELECT *, ntile(100) OVER (PARTITION BY lib_domaine ORDER BY countEquipCom)
FROM (
SELECT bpe.com, bpe.lib_domaine, bpe.count as countEquipCom
FROM view_grouped_domaine_bpe_tduu bpe
INNER JOIN tduu_city tc ON bpe.tduu = tc.tduu
) AS sub_a
) AS sub_b
WHERE com='86281'
)
SELECT distinct(rte.lib_domaine) AS label, COALESCE(b.score, 0) as score
FROM ref_typeequipement rte
LEFT JOIN bpe_rank_for_city b ON rte.lib_domaine = b.label
Temps d’exécution :
320ms
Performances réseau
⇒ Amélioration du temps de requête (moins de données qui transitent entre la base et le serveur)
⇒ Amélioration du temps d’affichage côté front (payload moins lourd)
Démo !
#DevoxxFR
Conclusion
Merci à vous !
Questions ?
#DevoxxFR
Sources