{"id":35325,"date":"2025-11-03T09:44:27","date_gmt":"2025-11-03T08:44:27","guid":{"rendered":"https:\/\/pba.mmsh.fr\/?p=35325"},"modified":"2025-12-02T23:44:13","modified_gmt":"2025-12-02T22:44:13","slug":"seminaire-larchivage-du-web-francais-a-lere-de-lia-institutions-patrimoniales-et-collaborations-academiques","status":"publish","type":"post","link":"https:\/\/pba.mmsh.fr\/?p=35325","title":{"rendered":"[S\u00e9minaire] L&rsquo;archivage du web fran\u00e7ais \u00e0 l\u2019\u00e8re de l\u2019IA : institutions patrimoniales et collaborations acad\u00e9miques"},"content":{"rendered":"<div class='__iawmlf-post-loop-links' style='display:none;' data-iawmlf-post-links='[{&quot;id&quot;:19,&quot;href&quot;:&quot;https:\\\/\\\/mediatec.hypotheses.org\\\/contacts&quot;,&quot;archived_href&quot;:&quot;&quot;,&quot;redirect_href&quot;:&quot;&quot;,&quot;checks&quot;:[],&quot;broken&quot;:false,&quot;last_checked&quot;:null,&quot;process&quot;:&quot;done&quot;},{&quot;id&quot;:21,&quot;href&quot;:&quot;https:\\\/\\\/evento.renater.fr\\\/survey\\\/inscription-seance-2...-rtg28yx8&quot;,&quot;archived_href&quot;:&quot;&quot;,&quot;redirect_href&quot;:&quot;&quot;,&quot;checks&quot;:[],&quot;broken&quot;:false,&quot;last_checked&quot;:null,&quot;process&quot;:&quot;done&quot;},{&quot;id&quot;:22,&quot;href&quot;:&quot;https:\\\/\\\/www.legifrance.gouv.fr\\\/jorf\\\/id\\\/JORFTEXT000000266350&quot;,&quot;archived_href&quot;:&quot;&quot;,&quot;redirect_href&quot;:&quot;&quot;,&quot;checks&quot;:[],&quot;broken&quot;:false,&quot;last_checked&quot;:null,&quot;process&quot;:&quot;done&quot;},{&quot;id&quot;:23,&quot;href&quot;:&quot;https:\\\/\\\/www.bnf.fr\\\/fr\\\/depot-legal-du-web&quot;,&quot;archived_href&quot;:&quot;&quot;,&quot;redirect_href&quot;:&quot;&quot;,&quot;checks&quot;:[],&quot;broken&quot;:false,&quot;last_checked&quot;:null,&quot;process&quot;:&quot;done&quot;},{&quot;id&quot;:24,&quot;href&quot;:&quot;https:\\\/\\\/www.ina.fr\\\/institut-national-audiovisuel\\\/collections-audiovisuelles\\\/le-web-media&quot;,&quot;archived_href&quot;:&quot;&quot;,&quot;redirect_href&quot;:&quot;&quot;,&quot;checks&quot;:[],&quot;broken&quot;:false,&quot;last_checked&quot;:null,&quot;process&quot;:&quot;done&quot;}]'><\/div>\n<p class=\"has-text-align-right\"><a href=\"#EN\">English version<\/a><\/p>\n\n\n\n<p>S\u00e9ance n\u00b0 2 du s\u00e9minaire WebLab \u2013 Humath\u00e8que Condorcet <a href=\"https:\/\/pba.mmsh.fr\/?page_id=34023\"><em>Le Web et les archives du Web pour la recherche en SHS : savoirs, m\u00e9thodes et outils pour la collecte, l\u2019analyse et la p\u00e9rennisation de corpus en ligne<\/em><\/a><\/p>\n\n\n\n<p>Date : Jeudi 27 novembre de 14h \u00e0 16h<\/p>\n\n\n\n<p>Lieu : Salle Michel Seurat &#8211; <a href=\"https:\/\/mediatec.hypotheses.org\/contacts\">M\u00e9diath\u00e8que de la MMSH<\/a>&nbsp;et en visioconf\u00e9rence<\/p>\n\n\n\n<p>Lien de connexion sur <a href=\"https:\/\/evento.renater.fr\/survey\/inscription-seance-2...-rtg28yx8\">inscription<\/a><\/p>\n\n\n\n<p>Pour s\u2019inscrire : <a href=\"https:\/\/evento.renater.fr\/survey\/inscription-seance-2...-rtg28yx8\">https:\/\/evento.renater.fr\/survey\/inscription-seance-2&#8230;-rtg28yx8<\/a><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Programme de la s\u00e9ance<\/h2>\n\n\n\n<p><strong>L&rsquo;archivage du web fran\u00e7ais \u00e0 l\u2019\u00e8re de l\u2019IA : institutions patrimoniales et collaborations acad\u00e9miques<\/strong><\/p>\n\n\n\n<p>Instaur\u00e9 en 2006 par la loi <a href=\"https:\/\/www.legifrance.gouv.fr\/jorf\/id\/JORFTEXT000000266350\/\">DADVSI<\/a>, le d\u00e9p\u00f4t l\u00e9gal du Web fran\u00e7ais est collect\u00e9 et archiv\u00e9 par la <a href=\"https:\/\/www.bnf.fr\/fr\/depot-legal-du-web\">Biblioth\u00e8que nationale de France<\/a> (BnF) et l&rsquo;<a href=\"https:\/\/www.ina.fr\/institut-national-audiovisuel\/collections-audiovisuelles\/le-web-media\">Institut National de l&rsquo;audiovisuel<\/a> (INA). Ces deux institutions se r\u00e9partissent cette mission patrimoniale selon des p\u00e9rim\u00e8tres sp\u00e9cifiques et des organisations diff\u00e9rentes. Lors de cette s\u00e9ance, G\u00e9raldine Camile (BnF) et J\u00e9r\u00f4me Thi\u00e8vre (INA)  nous feront d\u00e9couvrir les rouages  de l&rsquo;archivage du Web, entre choix strat\u00e9giques, contraintes techniques et vocation patrimoniale.  Ils pr\u00e9senteront les possibilit\u00e9s de collaboration entre leurs \u00e9quipes et les communaut\u00e9s acad\u00e9miques. Cette rencontre sera \u00e9galement l\u2019occasion de questionner l\u2019impact des technologies dites d&rsquo;intelligence artificielle dans le domaine de l&rsquo;archivage du Web au sein de ces deux institutions d\u00e9positaires.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Intervenante et intervenant <\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li>G\u00e9raldine Camile, Biblioth\u00e8que nationale de France, membre de l&rsquo;\u00e9quipe BnF DataLab<\/li>\n<\/ul>\n\n\n\n<div data-wp-interactive=\"core\/file\" class=\"wp-block-file\"><object data-wp-bind--hidden=\"!state.hasPdfPreview\" hidden class=\"wp-block-file__embed\" data=\"https:\/\/pba.mmsh.fr\/wp-content\/uploads\/2025\/11\/presentation_bnf.pdf\" type=\"application\/pdf\" style=\"width:100%;height:600px\" aria-label=\"Contenu embarqu\u00e9 G\u00e9raldine Camile, L\u2019archivage du web fran\u00e7ais \u00e0 l\u2019\u00e8re de l\u2019IA.\"><\/object><a id=\"wp-block-file--media-fe5a89c6-e2d3-4537-bcd2-21715d3d10a8\" href=\"https:\/\/pba.mmsh.fr\/wp-content\/uploads\/2025\/11\/presentation_bnf.pdf\">G\u00e9raldine Camile, L\u2019archivage du web fran\u00e7ais \u00e0 l\u2019\u00e8re de l\u2019IA<\/a><a href=\"https:\/\/pba.mmsh.fr\/wp-content\/uploads\/2025\/11\/presentation_bnf.pdf\" class=\"wp-block-file__button wp-element-button\" download aria-describedby=\"wp-block-file--media-fe5a89c6-e2d3-4537-bcd2-21715d3d10a8\">T\u00e9l\u00e9charger<\/a><\/div>\n\n\n\n<ul class=\"wp-block-list\">\n<li>J\u00e9r\u00f4me\u00a0Thi\u00e8vre, Responsable Entr\u00e9e &amp; Collecte du Web, Institut national de l&rsquo;audiovisuel<\/li>\n<\/ul>\n\n\n\n<div data-wp-interactive=\"core\/file\" class=\"wp-block-file\"><object data-wp-bind--hidden=\"!state.hasPdfPreview\" hidden class=\"wp-block-file__embed\" data=\"https:\/\/pba.mmsh.fr\/wp-content\/uploads\/2025\/12\/ina_presentation.pdf\" type=\"application\/pdf\" style=\"width:100%;height:600px\" aria-label=\"Contenu embarqu\u00e9 J\u00e9r\u00f4me Thievre, D\u00e9p\u00f4t L\u00e9gal du Web de l&apos;Ina.\"><\/object><a id=\"wp-block-file--media-aeb833b7-5652-467b-95c0-64ecabc26bd7\" href=\"https:\/\/pba.mmsh.fr\/wp-content\/uploads\/2025\/12\/ina_presentation.pdf\">J\u00e9r\u00f4me Thievre, D\u00e9p\u00f4t L\u00e9gal du Web de l&rsquo;Ina<\/a><a href=\"https:\/\/pba.mmsh.fr\/wp-content\/uploads\/2025\/12\/ina_presentation.pdf\" class=\"wp-block-file__button wp-element-button\" download aria-describedby=\"wp-block-file--media-aeb833b7-5652-467b-95c0-64ecabc26bd7\">T\u00e9l\u00e9charger<\/a><\/div>\n\n\n\n<h2 class=\"wp-block-heading\">Compte Rendu de la S\u00e9ance<\/h2>\n\n\n\n<p>Le s\u00e9minaire anim\u00e9 par le WebLab a accueilli ce jeudi 27 novembre J\u00e9r\u00f4me Thi\u00e8vre, responsable de l\u2019entr\u00e9e de la collecte du webmedia \u00e0 l\u2019Institut National de l\u2019Audiovisuel et G\u00e9raldine Camile, membre de l\u2019\u00e9quipe DataLab \u00e0 la Bliblioth\u00e8que Nationale de France.<\/p>\n\n\n\n<p>J\u00e9r\u00f4me Thi\u00e8vre est revenu sur l\u2019histoire et le fonctionnement du d\u00e9p\u00f4t l\u00e9gal du web. L\u2019INA, dont la mission de conservation remonte \u00e0 1974, a vu son r\u00f4le \u00e9voluer avec la loi de 1992, qui a \u00e9tendu son activit\u00e9 vers la valorisation des d\u00e9p\u00f4ts radio et t\u00e9l\u00e9 au b\u00e9n\u00e9fice des chercheurs et des \u00e9tudiants. En 1995, la cr\u00e9ation d\u2019INAth\u00e8que a permis la consultation de ces fonds, et en 2009 l\u2019archivage du web a d\u00e9marr\u00e9, marqu\u00e9 plus tard par la cr\u00e9ation, en 2022, d\u2019un laboratoire destin\u00e9 \u00e0 accompagner les chercheurs.<\/p>\n\n\n\n<p>Le cadre juridique actuel repose sur la loi DADVSI de 2006 et sur le d\u00e9cret de 2011 portant sur le d\u00e9p\u00f4t l\u00e9gal de l\u2019internet. Il pr\u00e9cise que les contenus archiv\u00e9s restent la propri\u00e9t\u00e9 de leurs auteurs et que leur consultation est fortement encadr\u00e9e, r\u00e9partie dans une cinquantaine de biblioth\u00e8ques et m\u00e9diath\u00e8ques partenaires. Le p\u00e9rim\u00e8tre couvre l\u2019audiovisuel au sens large, de la t\u00e9l\u00e9vision et de la radio aux m\u00e9dias natifs du web, comme YouTube, les plateformes de podcast, le replay, mais aussi certains sites d\u2019actualit\u00e9 et m\u00eame des cha\u00eenes sur les r\u00e9seaux sociaux.<\/p>\n\n\n\n<p>L\u2019INA archive ainsi des sites web, des cha\u00eenes de plateformes et des actualit\u00e9s publi\u00e9es sur Twitter. D\u00e8s 2009, la collecte s\u2019est \u00e9tendue \u00e0 YouTube, d\u00e9j\u00e0 incontournable, ce qui permet aujourd\u2019hui de retracer des contenus parfois disparus. Le travail de collecte d\u00e9pend \u00e9troitement d\u2019un patient travail d\u2019identification effectu\u00e9 par les documentalistes. Au total, environ 30 000 sites ont \u00e9t\u00e9 archiv\u00e9s, en particulier ceux li\u00e9s \u00e0 la t\u00e9l\u00e9vision, \u00e0 la radio ou \u00e0 leurs communaut\u00e9s de spectateurs, amateurs comme professionnels. Gr\u00e2ce \u00e0 des r\u00e9cup\u00e9rations depuis Internet Archive, certaines pages peuvent remonter au d\u00e9but des ann\u00e9es 2000. L\u2019ensemble repr\u00e9sente environ vingt milliards de pages archiv\u00e9es.<\/p>\n\n\n\n<p>Les volumes audiovisuels sont tout aussi impressionnants : quarante-deux millions de vid\u00e9os collect\u00e9es gr\u00e2ce \u00e0 des robots sp\u00e9cialis\u00e9s par plateforme, et une attention particuli\u00e8re port\u00e9e aux catalogues des diffuseurs comme Arte. L\u2019objectif reste d\u2019obtenir une repr\u00e9sentation fid\u00e8le et pertinente du paysage audiovisuel en ligne, malgr\u00e9 les limites impos\u00e9es depuis septembre 2024 \u00e0 la collecte sur YouTube. L\u2019INA conserve aussi environ cinq millions de fichiers audios issus de trente mille cha\u00eenes ou \u00e9missions. Depuis 2014, un archivage sp\u00e9cifique des r\u00e9seaux sociaux s\u2019est d\u00e9velopp\u00e9, concentr\u00e9 d\u2019abord sur Twitter, avec plus de trois milliards de tweets s\u00e9lectionn\u00e9s selon des crit\u00e8res li\u00e9s \u00e0 l\u2019actualit\u00e9 m\u00e9diatique, aux \u00e9v\u00e9nements majeurs ou aux comptes institutionnels. Plus r\u00e9cemment, l\u2019INA a commenc\u00e9 \u00e0 collecter sur Bluesky, r\u00e9seau dont l\u2019ouverture facilite la r\u00e9cup\u00e9ration de donn\u00e9es.<\/p>\n\n\n\n<p>Le d\u00e9p\u00f4t l\u00e9gal ne s\u2019applique toutefois pas aux plateformes b\u00e9n\u00e9ficiant du statut d\u2019h\u00e9bergeur : YouTube n\u2019a par exemple aucune obligation particuli\u00e8re envers le d\u00e9p\u00f4t l\u00e9gal, ce qui complique l\u2019acc\u00e8s aux donn\u00e9es et peut emp\u00eacher certaines collectes. Parmi les projets en cours, l\u2019INA souhaite d\u00e9velopper la collecte des contenus de SVOD (Netflix, Canal+) de plus en plus pr\u00e9sents dans les foyers, ainsi que des streams en direct sur Twitch, YouTube Live ou les cha\u00eenes FAST. Les moteurs de recherche et applications internes continuent d\u2019\u00eatre mis \u00e0 jour pour suivre l\u2019\u00e9volution des formats.<\/p>\n\n\n\n<p>Les archives WebMedia conserv\u00e9es \u00e0 Paris permettent de consulter une URL telle qu\u2019elle apparaissait \u00e0 une date pr\u00e9cise, avec ses images, sons, vid\u00e9os et scripts. Les \u00e9l\u00e9ments interactifs restent parfois imparfaitement captur\u00e9s, mais peuvent souvent \u00eatre retrouv\u00e9s dans des archives s\u00e9par\u00e9es.<\/p>\n\n\n\n<p>Le Lab de l\u2019INA a pour mission d\u2019accompagner les universitaires en leur fournissant expertise documentaire, outils et donn\u00e9es. Il propose un suivi m\u00e9thodologique, organise des r\u00e9sidences pour des projets s\u00e9lectionn\u00e9s par un comit\u00e9 scientifique, met \u00e0 disposition des corpus et anime la vie scientifique \u00e0 travers ateliers et s\u00e9minaires. Le Lab a d\u00e9j\u00e0 accueilli plus d\u2019une centaine de chercheurs et soutenu plus de quatre-vingts projets, m\u00eame si le web ne repr\u00e9sente qu\u2019une faible part des demandes. Toute demande d\u2019export de donn\u00e9es est examin\u00e9e \u00e0 la fois scientifiquement et juridiquement, puis encadr\u00e9e par une licence d\u2019utilisation qui d\u00e9finit pr\u00e9cis\u00e9ment ce qu\u2019il est possible d\u2019en faire.<\/p>\n\n\n\n<p>Le d\u00e9veloppement de l\u2019IA \u00e0 l\u2019INA, pilot\u00e9 par l\u2019\u00e9quipe 2IA, se montre \u00eatre un outil particuli\u00e8rement int\u00e9ressant pour l\u2019analyse de contenu, comme la transcription, l\u2019extraction d\u2019entit\u00e9s nomm\u00e9es, la reconnaissance des visages, des voix ou des objets.<\/p>\n\n\n\n<p>L\u2019acc\u00e8s aux donn\u00e9es devient toutefois de plus en plus difficile, les plateformes renfor\u00e7ant leurs protections pour \u00e9viter l\u2019aspiration massive de contenus par des acteurs tiers ou par les IA. Les donn\u00e9es archiv\u00e9es, stock\u00e9es sur deux sites en copie double, repr\u00e9sentent moins de quatre p\u00e9taoctets, un volume important mais encore g\u00e9rable, bien que co\u00fbteux en \u00e9nergie. Enfin, la collecte et la consultation reposent sur une \u00e9quipe r\u00e9duite, compos\u00e9e de sept ing\u00e9nieurs et d\u2019un responsable documentaire entour\u00e9 d\u2019une petite \u00e9quipe d\u00e9di\u00e9e.<\/p>\n\n\n\n<p>De son c\u00f4t\u00e9, G\u00e9raldine Camile, explique que BNF s\u2019int\u00e9resse tr\u00e8s t\u00f4t aux archives du web et participe d\u00e8s 2003 \u00e0 un consortium international visant \u00e0 conserver l\u2019internet mondial. Dans le cadre du d\u00e9p\u00f4t l\u00e9gal, elle a la charge de collecter l\u2019ensemble des contenus relevant de son p\u00e9rim\u00e8tre, \u00e0 l\u2019exception de la radio et de la t\u00e9l\u00e9vision, confi\u00e9es \u00e0 l\u2019INA. Les donn\u00e9es recueillies sont consid\u00e9r\u00e9es comme patrimoniales, ce qui permet de les conserver sans demander l\u2019accord des auteurs, tant que les sites rel\u00e8vent du web fran\u00e7ais. La d\u00e9finition du \u201cweb fran\u00e7ais\u201d repose sur deux crit\u00e8res : l\u2019h\u00e9bergeur ou le producteur doit \u00eatre \u00e9tabli en France, ce qui d\u00e9limite clairement le p\u00e9rim\u00e8tre du d\u00e9p\u00f4t l\u00e9gal g\u00e9r\u00e9 par la BNF.<\/p>\n\n\n\n<p>La collecte couvre une grande diversit\u00e9 de ressources, qu\u2019il s\u2019agisse de journaux en PDF, de livres num\u00e9riques, de sites d\u2019art ou de litt\u00e9rature en ligne, de pages li\u00e9es \u00e0 des \u00e9v\u00e9nements marquants, ou encore de r\u00e9seaux sociaux et de sites repr\u00e9sentatifs de la vari\u00e9t\u00e9 du web, des jeux en ligne \u00e0 Leboncoin. Une m\u00eame adresse peut \u00eatre collect\u00e9e \u00e0 plusieurs dates : l\u2019archive produite n\u2019est jamais une copie parfaite, mais une reconstitution fid\u00e8le \u00e0 partir des \u00e9l\u00e9ments captur\u00e9s. La BNF utilise pour cela des robots d\u2019exploration et veille \u00e0 compl\u00e9ter ses collections avec tous types de sites, formats et pratiques du web.<\/p>\n\n\n\n<p>Deux logiques de collecte coexistent. La premi\u00e8re, dite large, est men\u00e9e une fois par an et vise \u00e0 aspirer un maximum de sites \u00e0 partir des donn\u00e9es fournies par les h\u00e9bergeurs fran\u00e7ais, comme l\u2019AFNIC ou OVH. La seconde est une collecte cibl\u00e9e, align\u00e9e sur les priorit\u00e9s documentaires de la BNF et de plus en plus structur\u00e9e en projets, par exemple autour du Covid, de l\u2019actualit\u00e9 ou de l\u2019environnement. Le choix des sites ne suit aucune ligne scientifique, esth\u00e9tique ou morale&nbsp;: l\u2019objectif est d\u2019\u00eatre repr\u00e9sentatif, non exhaustif, et cela implique une forme d\u2019arbitrage intellectuel que l\u2019automatisation ne peut remplacer.<\/p>\n\n\n\n<p>La BNF utilise notamment le robot Heritrix, qui part d\u2019une liste d\u2019URL et explore les liens pr\u00e9sents dans le code source des pages. Les archives peuvent \u00eatre consult\u00e9es dans vingt-deux biblioth\u00e8ques partenaires \u00e0 travers le pays. Les chercheurs peuvent y acc\u00e9der de plusieurs mani\u00e8res : en passant par la Wayback Machine lorsque l\u2019URL est connue, en explorant des parcours guid\u00e9s \u00e9labor\u00e9s par les biblioth\u00e9caires ou en utilisant l\u2019application Archives de l\u2019internet Labs, qui propose une recherche en plein texte. Les archives peuvent \u00eatre cit\u00e9es dans les travaux, bien que leur r\u00e9utilisation soit strictement encadr\u00e9e. La BNF comme l\u2019INA accueille \u00e9galement des chercheurs associ\u00e9s dont les projets s\u2019appuient sur les collections web, et collabore avec Humanum \u00e0 travers son datalab.<\/p>\n\n\n\n<p>Pour la recherche, ces archives permettent de retrouver des sites disparus, de v\u00e9rifier la conservation de contenus \u00e9tudi\u00e9s ou encore de co-construire des collectes en partenariat avec des \u00e9quipes scientifiques. Elles servent aussi \u00e0 indexer des ensembles massifs, explorer les m\u00e9tadonn\u00e9es et constituer des corpus adapt\u00e9s \u00e0 des projets sp\u00e9cifiques. Une grande part du travail de la BNF consiste \u00e0 accompagner les chercheurs sur le plan m\u00e9thodologique afin de faciliter l\u2019usage de ces ressources encore peu famili\u00e8res.<\/p>\n\n\n\n<p>L\u2019Intelligence artificielle intervient essentiellement dans la cr\u00e9ation de parcours th\u00e9matiques guid\u00e9s, m\u00eame si la s\u00e9lection finale reste humaine. Les \u00e9volutions rapides du web, et en particulier la fermeture de certaines plateformes face aux enjeux li\u00e9s \u00e0 l\u2019IA, compliquent le travail de collecte. Les IA g\u00e9n\u00e9ratives ne sont pas consid\u00e9r\u00e9es comme des profils sp\u00e9cifiques \u00e0 archiver, mais il est possible d\u2019enregistrer des sessions d\u2019utilisation. L\u2019IA est davantage pr\u00e9sente dans des projets de recherche associ\u00e9s, comme AdaptMed, qui explore la g\u00e9n\u00e9ration automatique de reformulations dans le domaine m\u00e9dical, m\u00eame si elle n\u2019est pas appliqu\u00e9e directement sur l\u2019ensemble des archives.<br>La BNF travaille n\u00e9anmoins sur une feuille de route d\u00e9di\u00e9e \u00e0 l\u2019IA, structur\u00e9e autour de cinq axes, allant de la d\u00e9finition d\u2019une strat\u00e9gie \u00e0 la pr\u00e9paration des infrastructures, en passant par l\u2019acquisition de nouvelles comp\u00e9tences et la mise en place d\u2019un programme de recherche pluriannuel.<\/p>\n\n\n\n<p>Les limites juridiques restent importantes : ces archives sont prot\u00e9g\u00e9es par le droit d\u2019auteur, ce qui emp\u00eache d\u2019utiliser librement des outils externes pour analyser les donn\u00e9es.<\/p>\n\n\n\n<h1 class=\"wp-block-heading\" id=\"EN\">[Seminar] French Web Archiving in the Age of AI: Heritage Institutions and Academic Collaborations<\/h1>\n\n\n\n<p>Session 2 of the WebLab \u2013 Humath\u00e8que Condorcet seminar <em><a href=\"https:\/\/pba.mmsh.fr\/?page_id=34023\">The Web and Web Archives for Research in the Social Sciences and Humanities: Knowledge, Methods, and Tools for Collecting, Analyzing, and Preserving Online Corpora<\/a><\/em><\/p>\n\n\n\n<p><strong>Date:<\/strong> Thursday, November 27, 2:00\u20134:00 PM<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Session Program<\/h2>\n\n\n\n<p>French Web Archiving in the Age of AI: Heritage Institutions and Academic Collaborations<\/p>\n\n\n\n<p>Established in 2006 by the <a href=\"https:\/\/www.legifrance.gouv.fr\/jorf\/id\/JORFTEXT000000266350\/\">DADVSI <\/a>law, the legal deposit of the French web is collected and archived by the <a href=\"https:\/\/www.bnf.fr\/fr\/depot-legal-du-web\">Biblioth\u00e8que nationale de France (BnF)<\/a> and the <a href=\"https:\/\/www.ina.fr\/institut-national-audiovisuel\/collections-audiovisuelles\/le-web-media\">National Audiovisual Institute (INA)<\/a>. These two institutions share this heritage mission according to specific scopes and different organizational structures. During this session, G\u00e9raldine Camile (BnF) and J\u00e9r\u00f4me Thi\u00e8vre (INA) will provide insight into the workings of web archiving, covering strategic choices, technical constraints, and heritage responsibilities. They will also present opportunities for collaboration between their teams and academic communities. This meeting will be an opportunity to discuss the impact of so-called artificial intelligence technologies on web archiving within these two custodial institutions.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Speakers<\/h2>\n\n\n\n<p>G\u00e9raldine Camile, Biblioth\u00e8que nationale de France, member of the BnF DataLab team<br>J\u00e9r\u00f4me Thi\u00e8vre, Head of Web Ingestion and Collection, National Audiovisual Institute<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Report of the Session<\/h2>\n\n\n\n<p>The seminar hosted by WebLab on Thursday, November 27 welcomed J\u00e9r\u00f4me Thi\u00e8vre, Head of Webmedia Collection Intake at the National Audiovisual Institute (Institut National de l&rsquo;Audiovisuel, INA), and G\u00e9raldine Camile, member of the DataLab team at the National Library of France (Biblioth\u00e8que Nationale de France, BNF).<\/p>\n\n\n\n<p>J\u00e9r\u00f4me Thi\u00e8vre began by outlining the history and functioning of the legal deposit of the web. INA, whose preservation mission dates back to 1974, saw its role evolve with the 1992 law that expanded its activity toward making radio and television deposits available to researchers and students. In 1995, the creation of INAth\u00e8que enabled consultation of its holdings, and in 2009 web archiving began, later strengthened in 2022 with the creation of a research laboratory dedicated to supporting scholars.<\/p>\n\n\n\n<p>The current legal framework is based on the 2006 DADVSI law and the 2011 decree on the legal deposit of the internet. It specifies that archived content remains the property of its authors and that consultation is strictly regulated, accessible across about fifty partner libraries and media libraries. The scope covers audiovisual media in the broad sense, from television and radio to native web media such as YouTube, podcast platforms, replay services, as well as certain news sites and even social media channels.<\/p>\n\n\n\n<p>INA therefore archives websites, platform channels, and news items published on Twitter. As early as 2009, collection extended to YouTube, already essential at the time, which today makes it possible to retrieve content that has sometimes disappeared. The collection process depends heavily on meticulous identification work carried out by documentalists. In total, around 30,000 sites have been archived, particularly those linked to television, radio, or their communities of viewers, both amateur and professional. Thanks to recoveries from Internet Archive, some pages go back to the early 2000s. Altogether, the archive represents around twenty billion pages.<\/p>\n\n\n\n<p>The audiovisual volumes are equally impressive: forty-two million videos collected using platform-specific robots, with particular attention paid to broadcaster catalogs such as Arte. The goal remains to obtain a faithful and relevant representation of the online audiovisual landscape, despite the limitations imposed since September 2024 on collecting from YouTube. INA also preserves around five million audio files from thirty thousand channels or programs. Since 2014, a dedicated social media archive has been developed, initially focused on Twitter, with more than three billion tweets selected according to criteria related to media coverage, major events, or institutional accounts. More recently, INA has begun collecting on Bluesky, whose open design facilitates data retrieval.<\/p>\n\n\n\n<p>However, the legal deposit does not apply to platforms with host status: YouTube, for example, has no particular obligation toward the legal deposit, which complicates data access and can prevent certain types of collection. Among its ongoing projects, INA seeks to develop collection of SVOD content (Netflix, Canal+), increasingly present in households, as well as live streams on Twitch, YouTube Live, or FAST channels. Search engines and internal applications continue to be updated to keep pace with evolving formats.<\/p>\n\n\n\n<p>The WebMedia archives preserved in Paris allow users to view a URL as it appeared on a specific date, with its images, audio, video, and scripts. Interactive elements are sometimes imperfectly captured but can often be retrieved from separate archives.<\/p>\n\n\n\n<p>INA\u2019s Lab supports scholars by providing them with documentary expertise, tools, and data. It offers methodological guidance, organizes residencies for projects selected by a scientific committee, provides corpora, and contributes to academic life through workshops and seminars. The Lab has already hosted more than one hundred researchers and supported over eighty projects, even though the web remains a small share of requests. Any data export request undergoes both scientific and legal review and is governed by a usage license specifying permitted uses.<\/p>\n\n\n\n<p>The development of AI at INA, led by the 2IA team, has proven particularly valuable for content analysis, including transcription, named-entity extraction, and face, voice, or object recognition.<\/p>\n\n\n\n<p>Access to data is nevertheless becoming increasingly difficult, as platforms strengthen protections to prevent large-scale scraping by third parties or by AI systems. The archived data, stored redundantly at two sites, amounts to under four petabytes, large but still manageable, though energy-intensive. Finally, collection and consultation rely on a small team composed of seven engineers and one documentary lead supported by a small dedicated staff.<\/p>\n\n\n\n<p>For her part, G\u00e9raldine Camile explained that the BNF became interested in web archives very early and has participated in an international consortium aimed at preserving the global internet since 2003. Within the framework of legal deposit, it is responsible for collecting all content within its scope, except for radio and television, which fall under INA. The collected data are considered heritage materials, allowing the BNF to preserve them without seeking authors\u2019 consent, as long as the sites fall within the French web. The definition of the \u201cFrench web\u201d is based on two criteria: the host or producer must be established in France, which clearly delineates the BNF\u2019s legal deposit perimeter.<\/p>\n\n\n\n<p>The collection covers a wide variety of resources, including PDF newspapers, e-books, online art or literature sites, pages related to major events, social media, and sites representative of the diversity of the web, from online games to second-hand sales website. The same address may be collected at multiple dates: the resulting archive is never a perfect copy but a faithful reconstruction based on the captured elements. The BNF uses crawler robots and ensures its collections include all types of sites, formats, and web practices.<\/p>\n\n\n\n<p>Two collection approaches coexist. The first, known as broad crawling, occurs once a year and aims to capture as many sites as possible using data from French hosts such as AFNIC or OVH. The second is targeted collection, aligned with the BNF\u2019s documentary priorities and increasingly structured as thematic projects, for example, around Covid, current affairs, or environmental issues. The selection of sites follows no scientific, aesthetic, or moral line: the goal is to be representative, not exhaustive, which requires intellectual arbitration that automation cannot replace.<\/p>\n\n\n\n<p>The BNF uses the Heritrix crawler, which starts from a list of URLs and explores links found in the page source code. The archives can be consulted in twenty-two partner libraries across the country. Researchers can access them in several ways: via the Wayback Machine when the URL is known, through curated paths developed by librarians, or using the Internet Archives Labs application, which offers full-text search. Archived materials may be cited in academic work, although reuse remains strictly regulated. Both the BNF and INA also host affiliated researchers whose projects draw on web collections, and they collaborate with Huma-Num through its datalab.<\/p>\n\n\n\n<p>For research, these archives make it possible to recover disappeared sites, verify the preservation of studied content, or co-construct collections in collaboration with academic teams. They are also used to index large datasets, explore metadata, and build corpora tailored to specific projects. A major part of the BNF\u2019s work consists in supporting researchers methodologically to facilitate the use of these still relatively unfamiliar resources.<\/p>\n\n\n\n<p>AI is primarily used in creating thematic guided paths, although the final selection remains human. Rapid changes in the web, particularly the increasing closure of platforms in response to AI-related challenges, complicate collection efforts. Generative AIs are not considered specific profiles to archive, but sessions of use can be recorded. AI appears more prominently in associated research projects, such as AdaptMed, which explores automatic generation of reformulations in the medical domain, although it is not applied directly to the entire archive.<\/p>\n\n\n\n<p>The BNF is nonetheless developing an AI roadmap structured around five axes, ranging from defining a strategy to preparing infrastructures, acquiring new competencies, and establishing a multi-year research program.<\/p>\n\n\n\n<p>Legal constraints remain significant: these archives are protected by copyright, which prevents the free use of external tools to analyze the data.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>English version S\u00e9ance n\u00b0 2 du s\u00e9minaire WebLab \u2013 Humath\u00e8que Condorcet Le Web et les archives du Web pour la recherche en SHS : savoirs, m\u00e9thodes et outils pour la collecte, l\u2019analyse et la p\u00e9rennisation de corpus en ligne Date : Jeudi 27 novembre de 14h \u00e0 16h Lieu : Salle Michel Seurat &#8211; M\u00e9diath\u00e8que&hellip; <a class=\"more-link\" href=\"https:\/\/pba.mmsh.fr\/?p=35325\">Poursuivre la lecture <span class=\"screen-reader-text\">[S\u00e9minaire] L&rsquo;archivage du web fran\u00e7ais \u00e0 l\u2019\u00e8re de l\u2019IA : institutions patrimoniales et collaborations acad\u00e9miques<\/span><\/a><\/p>\n","protected":false},"author":8,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[130,39],"tags":[131,132,128],"class_list":["post-35325","post","type-post","status-publish","format-standard","hentry","category-archives-du-web","category-seminaires","tag-archivage-du-web","tag-ia","tag-weblab","entry"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.5 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>[S\u00e9minaire] L&#039;archivage du web fran\u00e7ais \u00e0 l\u2019\u00e8re de l\u2019IA : institutions patrimoniales et collaborations acad\u00e9miques - P\u00f4le Biblioth\u00e8ques et Archives de la MMSH<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/pba.mmsh.fr\/?p=35325\" \/>\n<meta property=\"og:locale\" content=\"fr_FR\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"[S\u00e9minaire] L&#039;archivage du web fran\u00e7ais \u00e0 l\u2019\u00e8re de l\u2019IA : institutions patrimoniales et collaborations acad\u00e9miques - P\u00f4le Biblioth\u00e8ques et Archives de la MMSH\" \/>\n<meta property=\"og:description\" content=\"English version S\u00e9ance n\u00b0 2 du s\u00e9minaire WebLab \u2013 Humath\u00e8que Condorcet Le Web et les archives du Web pour la recherche en SHS : savoirs, m\u00e9thodes et outils pour la collecte, l\u2019analyse et la p\u00e9rennisation de corpus en ligne Date : Jeudi 27 novembre de 14h \u00e0 16h Lieu : Salle Michel Seurat &#8211; M\u00e9diath\u00e8que&hellip; Poursuivre la lecture [S\u00e9minaire] L&rsquo;archivage du web fran\u00e7ais \u00e0 l\u2019\u00e8re de l\u2019IA : institutions patrimoniales et collaborations acad\u00e9miques\" \/>\n<meta property=\"og:url\" content=\"https:\/\/pba.mmsh.fr\/?p=35325\" \/>\n<meta property=\"og:site_name\" content=\"P\u00f4le Biblioth\u00e8ques et Archives de la MMSH\" \/>\n<meta property=\"article:published_time\" content=\"2025-11-03T08:44:27+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-12-02T22:44:13+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/pba.mmsh.fr\/wp-content\/uploads\/2022\/04\/LOGO-MMSH-UAR-2022-2.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1654\" \/>\n\t<meta property=\"og:image:height\" content=\"552\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"PBA\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"\u00c9crit par\" \/>\n\t<meta name=\"twitter:data1\" content=\"PBA\" \/>\n\t<meta name=\"twitter:label2\" content=\"Dur\u00e9e de lecture estim\u00e9e\" \/>\n\t<meta name=\"twitter:data2\" content=\"18 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/pba.mmsh.fr\\\/?p=35325#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/pba.mmsh.fr\\\/?p=35325\"},\"author\":{\"name\":\"PBA\",\"@id\":\"https:\\\/\\\/pba.mmsh.fr\\\/#\\\/schema\\\/person\\\/3d9360bfca1e55d492a60db4c2243434\"},\"headline\":\"[S\u00e9minaire] L&rsquo;archivage du web fran\u00e7ais \u00e0 l\u2019\u00e8re de l\u2019IA : institutions patrimoniales et collaborations acad\u00e9miques\",\"datePublished\":\"2025-11-03T08:44:27+00:00\",\"dateModified\":\"2025-12-02T22:44:13+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/pba.mmsh.fr\\\/?p=35325\"},\"wordCount\":3597,\"commentCount\":3,\"publisher\":{\"@id\":\"https:\\\/\\\/pba.mmsh.fr\\\/#organization\"},\"keywords\":[\"archivage du web\",\"IA\",\"WebLab\"],\"articleSection\":[\"Archives du Web\",\"S\u00e9minaires\"],\"inLanguage\":\"fr-FR\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/pba.mmsh.fr\\\/?p=35325#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/pba.mmsh.fr\\\/?p=35325\",\"url\":\"https:\\\/\\\/pba.mmsh.fr\\\/?p=35325\",\"name\":\"[S\u00e9minaire] L'archivage du web fran\u00e7ais \u00e0 l\u2019\u00e8re de l\u2019IA : institutions patrimoniales et collaborations acad\u00e9miques - P\u00f4le Biblioth\u00e8ques et Archives de la MMSH\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/pba.mmsh.fr\\\/#website\"},\"datePublished\":\"2025-11-03T08:44:27+00:00\",\"dateModified\":\"2025-12-02T22:44:13+00:00\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/pba.mmsh.fr\\\/?p=35325#breadcrumb\"},\"inLanguage\":\"fr-FR\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/pba.mmsh.fr\\\/?p=35325\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/pba.mmsh.fr\\\/?p=35325#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Accueil\",\"item\":\"https:\\\/\\\/pba.mmsh.fr\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"[S\u00e9minaire] L&rsquo;archivage du web fran\u00e7ais \u00e0 l\u2019\u00e8re de l\u2019IA : institutions patrimoniales et collaborations acad\u00e9miques\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/pba.mmsh.fr\\\/#website\",\"url\":\"https:\\\/\\\/pba.mmsh.fr\\\/\",\"name\":\"P\u00f4le Biblioth\u00e8ques et Archives de la MMSH\",\"description\":\"Biblioth\u00e8ques et Archives \u00e0 la Maison m\u00e9diterran\u00e9enne des sciences de l\u2019homme\",\"publisher\":{\"@id\":\"https:\\\/\\\/pba.mmsh.fr\\\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/pba.mmsh.fr\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"fr-FR\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/pba.mmsh.fr\\\/#organization\",\"name\":\"P\u00f4le Biblioth\u00e8ques et Archives de la MMSH\",\"url\":\"https:\\\/\\\/pba.mmsh.fr\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"fr-FR\",\"@id\":\"https:\\\/\\\/pba.mmsh.fr\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"https:\\\/\\\/pba.mmsh.fr\\\/wp-content\\\/uploads\\\/2026\\\/01\\\/cropped-cropped-LOGO-UAR-MMSHS-coul.png\",\"contentUrl\":\"https:\\\/\\\/pba.mmsh.fr\\\/wp-content\\\/uploads\\\/2026\\\/01\\\/cropped-cropped-LOGO-UAR-MMSHS-coul.png\",\"width\":1161,\"height\":303,\"caption\":\"P\u00f4le Biblioth\u00e8ques et Archives de la MMSH\"},\"image\":{\"@id\":\"https:\\\/\\\/pba.mmsh.fr\\\/#\\\/schema\\\/logo\\\/image\\\/\"}},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/pba.mmsh.fr\\\/#\\\/schema\\\/person\\\/3d9360bfca1e55d492a60db4c2243434\",\"name\":\"PBA\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"fr-FR\",\"@id\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/ed598663e5724b22628bfb625b94f461150f3876e2428b46c711076584c3fad5?s=96&d=mm&r=g\",\"url\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/ed598663e5724b22628bfb625b94f461150f3876e2428b46c711076584c3fad5?s=96&d=mm&r=g\",\"contentUrl\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/ed598663e5724b22628bfb625b94f461150f3876e2428b46c711076584c3fad5?s=96&d=mm&r=g\",\"caption\":\"PBA\"},\"url\":\"https:\\\/\\\/pba.mmsh.fr\\\/?author=8\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"[S\u00e9minaire] L'archivage du web fran\u00e7ais \u00e0 l\u2019\u00e8re de l\u2019IA : institutions patrimoniales et collaborations acad\u00e9miques - P\u00f4le Biblioth\u00e8ques et Archives de la MMSH","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/pba.mmsh.fr\/?p=35325","og_locale":"fr_FR","og_type":"article","og_title":"[S\u00e9minaire] L'archivage du web fran\u00e7ais \u00e0 l\u2019\u00e8re de l\u2019IA : institutions patrimoniales et collaborations acad\u00e9miques - P\u00f4le Biblioth\u00e8ques et Archives de la MMSH","og_description":"English version S\u00e9ance n\u00b0 2 du s\u00e9minaire WebLab \u2013 Humath\u00e8que Condorcet Le Web et les archives du Web pour la recherche en SHS : savoirs, m\u00e9thodes et outils pour la collecte, l\u2019analyse et la p\u00e9rennisation de corpus en ligne Date : Jeudi 27 novembre de 14h \u00e0 16h Lieu : Salle Michel Seurat &#8211; M\u00e9diath\u00e8que&hellip; Poursuivre la lecture [S\u00e9minaire] L&rsquo;archivage du web fran\u00e7ais \u00e0 l\u2019\u00e8re de l\u2019IA : institutions patrimoniales et collaborations acad\u00e9miques","og_url":"https:\/\/pba.mmsh.fr\/?p=35325","og_site_name":"P\u00f4le Biblioth\u00e8ques et Archives de la MMSH","article_published_time":"2025-11-03T08:44:27+00:00","article_modified_time":"2025-12-02T22:44:13+00:00","og_image":[{"width":1654,"height":552,"url":"https:\/\/pba.mmsh.fr\/wp-content\/uploads\/2022\/04\/LOGO-MMSH-UAR-2022-2.png","type":"image\/png"}],"author":"PBA","twitter_card":"summary_large_image","twitter_misc":{"\u00c9crit par":"PBA","Dur\u00e9e de lecture estim\u00e9e":"18 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/pba.mmsh.fr\/?p=35325#article","isPartOf":{"@id":"https:\/\/pba.mmsh.fr\/?p=35325"},"author":{"name":"PBA","@id":"https:\/\/pba.mmsh.fr\/#\/schema\/person\/3d9360bfca1e55d492a60db4c2243434"},"headline":"[S\u00e9minaire] L&rsquo;archivage du web fran\u00e7ais \u00e0 l\u2019\u00e8re de l\u2019IA : institutions patrimoniales et collaborations acad\u00e9miques","datePublished":"2025-11-03T08:44:27+00:00","dateModified":"2025-12-02T22:44:13+00:00","mainEntityOfPage":{"@id":"https:\/\/pba.mmsh.fr\/?p=35325"},"wordCount":3597,"commentCount":3,"publisher":{"@id":"https:\/\/pba.mmsh.fr\/#organization"},"keywords":["archivage du web","IA","WebLab"],"articleSection":["Archives du Web","S\u00e9minaires"],"inLanguage":"fr-FR","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/pba.mmsh.fr\/?p=35325#respond"]}]},{"@type":"WebPage","@id":"https:\/\/pba.mmsh.fr\/?p=35325","url":"https:\/\/pba.mmsh.fr\/?p=35325","name":"[S\u00e9minaire] L'archivage du web fran\u00e7ais \u00e0 l\u2019\u00e8re de l\u2019IA : institutions patrimoniales et collaborations acad\u00e9miques - P\u00f4le Biblioth\u00e8ques et Archives de la MMSH","isPartOf":{"@id":"https:\/\/pba.mmsh.fr\/#website"},"datePublished":"2025-11-03T08:44:27+00:00","dateModified":"2025-12-02T22:44:13+00:00","breadcrumb":{"@id":"https:\/\/pba.mmsh.fr\/?p=35325#breadcrumb"},"inLanguage":"fr-FR","potentialAction":[{"@type":"ReadAction","target":["https:\/\/pba.mmsh.fr\/?p=35325"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/pba.mmsh.fr\/?p=35325#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Accueil","item":"https:\/\/pba.mmsh.fr\/"},{"@type":"ListItem","position":2,"name":"[S\u00e9minaire] L&rsquo;archivage du web fran\u00e7ais \u00e0 l\u2019\u00e8re de l\u2019IA : institutions patrimoniales et collaborations acad\u00e9miques"}]},{"@type":"WebSite","@id":"https:\/\/pba.mmsh.fr\/#website","url":"https:\/\/pba.mmsh.fr\/","name":"P\u00f4le Biblioth\u00e8ques et Archives de la MMSH","description":"Biblioth\u00e8ques et Archives \u00e0 la Maison m\u00e9diterran\u00e9enne des sciences de l\u2019homme","publisher":{"@id":"https:\/\/pba.mmsh.fr\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/pba.mmsh.fr\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"fr-FR"},{"@type":"Organization","@id":"https:\/\/pba.mmsh.fr\/#organization","name":"P\u00f4le Biblioth\u00e8ques et Archives de la MMSH","url":"https:\/\/pba.mmsh.fr\/","logo":{"@type":"ImageObject","inLanguage":"fr-FR","@id":"https:\/\/pba.mmsh.fr\/#\/schema\/logo\/image\/","url":"https:\/\/pba.mmsh.fr\/wp-content\/uploads\/2026\/01\/cropped-cropped-LOGO-UAR-MMSHS-coul.png","contentUrl":"https:\/\/pba.mmsh.fr\/wp-content\/uploads\/2026\/01\/cropped-cropped-LOGO-UAR-MMSHS-coul.png","width":1161,"height":303,"caption":"P\u00f4le Biblioth\u00e8ques et Archives de la MMSH"},"image":{"@id":"https:\/\/pba.mmsh.fr\/#\/schema\/logo\/image\/"}},{"@type":"Person","@id":"https:\/\/pba.mmsh.fr\/#\/schema\/person\/3d9360bfca1e55d492a60db4c2243434","name":"PBA","image":{"@type":"ImageObject","inLanguage":"fr-FR","@id":"https:\/\/secure.gravatar.com\/avatar\/ed598663e5724b22628bfb625b94f461150f3876e2428b46c711076584c3fad5?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/ed598663e5724b22628bfb625b94f461150f3876e2428b46c711076584c3fad5?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/ed598663e5724b22628bfb625b94f461150f3876e2428b46c711076584c3fad5?s=96&d=mm&r=g","caption":"PBA"},"url":"https:\/\/pba.mmsh.fr\/?author=8"}]}},"_links":{"self":[{"href":"https:\/\/pba.mmsh.fr\/index.php?rest_route=\/wp\/v2\/posts\/35325","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/pba.mmsh.fr\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/pba.mmsh.fr\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/pba.mmsh.fr\/index.php?rest_route=\/wp\/v2\/users\/8"}],"replies":[{"embeddable":true,"href":"https:\/\/pba.mmsh.fr\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=35325"}],"version-history":[{"count":21,"href":"https:\/\/pba.mmsh.fr\/index.php?rest_route=\/wp\/v2\/posts\/35325\/revisions"}],"predecessor-version":[{"id":35451,"href":"https:\/\/pba.mmsh.fr\/index.php?rest_route=\/wp\/v2\/posts\/35325\/revisions\/35451"}],"wp:attachment":[{"href":"https:\/\/pba.mmsh.fr\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=35325"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/pba.mmsh.fr\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=35325"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/pba.mmsh.fr\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=35325"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}