{"id":35483,"date":"2026-02-14T18:33:35","date_gmt":"2026-02-14T17:33:35","guid":{"rendered":"https:\/\/pba.mmsh.fr\/?p=35483"},"modified":"2026-03-09T11:29:11","modified_gmt":"2026-03-09T10:29:11","slug":"ia-et-analyse-des-donnees-du-web-archive-entre-sciences-de-linformatique-et-histoire-numerique","status":"publish","type":"post","link":"https:\/\/pba.mmsh.fr\/?p=35483","title":{"rendered":"IA et analyse des donn\u00e9es du Web archiv\u00e9 : entre sciences de l\u2019informatique et histoire num\u00e9rique"},"content":{"rendered":"<div class='__iawmlf-post-loop-links' style='display:none;' data-iawmlf-post-links='[{&quot;id&quot;:11,&quot;href&quot;:&quot;https:\\\/\\\/doi.org\\\/10.36253\\\/979-12-215-0413-2.22&quot;,&quot;archived_href&quot;:&quot;&quot;,&quot;redirect_href&quot;:&quot;&quot;,&quot;checks&quot;:[],&quot;broken&quot;:false,&quot;last_checked&quot;:null,&quot;process&quot;:&quot;done&quot;}]'><\/div>\n<p>Entre sciences de l&rsquo;informatique et histoire, ce projet de recherche interdisciplinaire explore les potentialit\u00e9s offertes par les sciences de l&rsquo;informatique pour l&rsquo;analyse des donn\u00e9es du web archiv\u00e9 et l&rsquo;histoire des m\u00e9dias num\u00e9riques.  Port\u00e9 par Sophie Gebeil et Line Jamet-Jakubiec, l&rsquo;objectif est de d\u00e9velopper et \u00e0 utiliser des programmes, des outils, dont IA, pour automatiser des t\u00e2ches dans le traitement des donn\u00e9es et ainsi en extraire les principales tendances, tout en r\u00e9fl\u00e9chissant aux implications \u00e9pist\u00e9mologiques d&rsquo;une telle d\u00e9marche.&nbsp;<\/p>\n\n\n\n<p><strong>Automatiser l&rsquo;analyse des donn\u00e9es du web archiv\u00e9 en histoire<\/strong><\/p>\n\n\n\n<p>Ce projet port\u00e9 par Sophie Gebeil et Line Jamet-Jakubiec s&rsquo;int\u00e9resse \u00e0 l&rsquo;exploitation des donn\u00e9es du Web archiv\u00e9, notamment celles qui sont conserv\u00e9es par la BNF et l&rsquo;INA dans le cadre du d\u00e9p\u00f4t l\u00e9gal du Web fran\u00e7ais cr\u00e9\u00e9 en 2006. Les sources num\u00e9riques sont nombreuses et vari\u00e9es, tout comme le format des donn\u00e9es \u00e0 exploiter.&nbsp; L\u2019objectif du projet est de fournir aux chercheurs en sciences humaines des programmes et des outils leur permettant d&rsquo;exploiter et d&rsquo;analyser de vastes corpus de donn\u00e9es issues du Web dans le cadre du projet IUF \u00ab\u00a0l&rsquo;archivage du web, un d\u00e9fi historiographique : entre fragmentation est m\u00e9diation\u00a0\u00bb.<\/p>\n\n\n\n<p>Parmi les d\u00e9veloppements informatiques qu&rsquo;il est envisag\u00e9 de mettre en place dans le cadre de ce projet, l&rsquo;utilisation et programmation d&rsquo;une IA pour traiter les donn\u00e9es et en extraire des tendances (chatbot, outils pr\u00e9visionnels, textes g\u00e9n\u00e9r\u00e9s automatiquement \u00e0 partir d&rsquo;autres textes, graphiques de tendances, outils de navigation), d\u00e9veloppement de scripts pour traiter les diff\u00e9rents formats des donn\u00e9es mis \u00e0 disposition, utilisation de frameworks d\u00e9di\u00e9s (Django ou MongoDB par exemple) pour cr\u00e9er des bases de donn\u00e9es appropri\u00e9es, selon les besoins qui seront exprim\u00e9s. Le choix des langages de programmation sera discut\u00e9 avec les diff\u00e9rents intervenants du projet: Python (pour sa polyvalence et ses biblioth\u00e8ques), Java ou C++ (pour leurs performances et leurs biblioth\u00e8ques), Javascript (pour sa facilit\u00e9 d&rsquo;int\u00e9gration des outils d&rsquo;IA), Rust (pour sa fiabilit\u00e9), R (pour ses mod\u00e8les statistiques avanc\u00e9es)&#8230;<\/p>\n\n\n\n<p><strong>Trois ans de collaboration interdisciplinaire autour du Web archiv\u00e9 <\/strong><\/p>\n\n\n\n<p>Au sein de l&rsquo;atelier visual studies et humanit\u00e9s num\u00e9riques en M\u00e9diterran\u00e9e, une premi\u00e8re exploration de traitement des donn\u00e9es automatis\u00e9es avait \u00e9t\u00e9 r\u00e9alis\u00e9e dans le cadre du projet \u00ab\u00a0Ecrans en lutte, me\u0301moires des mouvements sociaux sur les WebTV franc\u0327aises depuis la fin des anne\u0301es 1990\u00a0\u00bb, laur\u00e9at de l&rsquo;appel \u00e0 chercheur associ\u00e9 de l&rsquo;Ina (2018-2019). \u00c0 partir d&rsquo;un corpus de 58 sites web de t\u00e9l\u00e9vision militante fran\u00e7aise archiv\u00e9es par l&rsquo;Ina constitu\u00e9 sur une p\u00e9riode de cinq ans (de mai 2010 \u00e0 mai 2015), une m\u00e9thodologie un processus d&rsquo;extraction d&rsquo;informations a \u00e9t\u00e9 mis au point par l&rsquo;INA. A partir des fichiers HTML du corpus, la collaboration avec l&rsquo;Ina et l&rsquo;implication de l&rsquo;entreprise Gamuza, une cha\u00eene de traitement a \u00e9t\u00e9 d\u00e9velopp\u00e9e afin de permettre l&rsquo;extraction puis l&rsquo;identification des principales caract\u00e9ristiques et \u00e9volutions observ\u00e9es durant ces cinq ann\u00e9es. En novembre 2021, l&rsquo;inauguration du CEDRE AMU f\u00fbt l&rsquo;occasion d&rsquo;une premi\u00e8re discussion dans le cadre du projet PICCH, avec Mathieu G\u00e9nois, physicien sp\u00e9cialis\u00e9 dans l&rsquo;analyse de r\u00e9seaux. Cela se concr\u00e9tise par l&rsquo;exploration du corpus traitant du trenti\u00e8me anniversaire de la Marche pour l&rsquo;\u00e9galit\u00e9 et contre le racisme (2013) archiv\u00e9 par l&rsquo;Ina, \u00e0 travers le stage et la co-direction interdisciplinaire du m\u00e9moire de master de Davide Rendina (Patrice Bellot LIS, Sophie Gebeil, Mathieu G\u00e9nois) centr\u00e9 sur l&rsquo;analyse s\u00e9mantique des donn\u00e9es, entre sciences de l&rsquo;informatique et histoire, soutenu en ao\u00fbt 2023. Lors de la rentr\u00e9e, des offres de stage sont propos\u00e9es par le laboratoire TELEMMe pour prolonger l&rsquo;analyse des donn\u00e9es du web archiv\u00e9es concernant la Marche de 1983. Deux \u00e9tudiants en licence Sciences de l&rsquo;informatique d&rsquo;AMU sont alors recrut\u00e9s durant l&rsquo;ann\u00e9e universitaire 2023\/2024, impulsant une premi\u00e8re collaboration avec Line Jamet-Jakubiec, ma\u00eetresse de conf\u00e9rences au LIS et responsable de la Licence Informatique. A partir de cette exp\u00e9rience concluante, L. Jamet-Jakubiec et S. Gebeil travaillent \u00e0 la r\u00e9daction du pr\u00e9sent projet de recherche interdisciplinaire, qui co\u00efncide avec la cr\u00e9ation du WebLab en 2024. Le s\u00e9minaire de 2025\/2025 a \u00e9t\u00e9 l&rsquo;occasion d&rsquo;\u00e9changes fructueux avec les \u00e9quipes de la BnF et de l&rsquo;Ina, permettant de mettre en place un cadre s\u00e9curis\u00e9 sur le plan juridique pour prolonger l&rsquo;exploration des donn\u00e9es. Deux corpus sont alors identifi\u00e9s : le site du journal ind\u00e9pendant Marsactu archiv\u00e9 par la BnF, et celui du site France Info TV archiv\u00e9 par l&rsquo;Ina.<\/p>\n\n\n\n<p><strong>Former par la recherche : les stages 2026<\/strong><\/p>\n\n\n\n<p>En ce premier semestre 2026, les laboratoires TELEMMe et le LIS accueilleront des stagiaires en science de l&rsquo;informatique d&rsquo;informatique qui participeront au d\u00e9veloppement de programmes permettant de traiter et d\u2019analyser des corpus issus des archives du Web conserv\u00e9es par la BnF et l\u2019INA. Sous la direction de Line Jamet-Jakubiec et de Sophie Gebeil, elles et ils contribueront \u00e0 la mise en place d\u2019outils adapt\u00e9s aux besoins des chercheurs, ainsi qu\u2019\u00e0 l\u2019int\u00e9gration d\u2019outils dits d\u2019intelligence artificielle pour le nettoyage, ou l\u2019extraction de tendances ou encore\u00a0\u00a0la navigation dans les donn\u00e9es. Elles et ils pourront r\u00e9aliser des scripts de traitement automatique pour diff\u00e9rents formats de donn\u00e9es et participera \u00e0 l\u2019analyse et \u00e0 la visualisation des r\u00e9sultats. Elles et ils auront l&rsquo;occasion de pr\u00e9senter ce travail dans le cadre du WebLab et contribueront ainsi \u00e0 la r\u00e9daction d&rsquo;une documentation technique.<\/p>\n\n\n\n<p>Les comp\u00e9tences vis\u00e9es concernent aussi bien le domaine de l&rsquo;archivage du Web que les sciences de l&rsquo;informatique :<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>programmation avanc\u00e9e&nbsp;&nbsp;appliqu\u00e9e \u00e0 des corpus issus du Web archiv\u00e9,<\/li>\n\n\n\n<li>ma\u00eetrise de techniques d\u2019extraction de donn\u00e9es, de construction de bases de donn\u00e9es et d\u2019int\u00e9gration d\u2019outils d\u2019IA,<\/li>\n\n\n\n<li>approfondissement des m\u00e9thodes d\u2019analyse computationnelle pour les sciences humaines et des pratiques de documentation scientifique,<\/li>\n\n\n\n<li>d\u00e9couverte de l&rsquo;\u00e9tat de l&rsquo;art des programmes informatiques d\u00e9di\u00e9s \u00e0 l&rsquo;analyse des donn\u00e9es du web archiv\u00e9.<\/li>\n<\/ul>\n\n\n\n<p>R\u00e9f\u00e9rences cit\u00e9es : <\/p>\n\n\n\n<p>Gebeil, Sophie, and J\u00e9r\u00f4me Thi\u00e8vre. \u201cFrom Archived Web Corpus to Readable Data for History Research.\u201d In&nbsp;<em>The Routledge Companion to Transnational Web Archive Studies<\/em>, 361. Taylor &amp; Francis, 2024. https:\/\/www.routledge.com\/The-Routledge-Companion-to-Transnational-Web-Archive-Studies\/Aasman-Ben-David-Brugger\/p\/book\/9781032497785 <\/p>\n\n\n\n<p>Rendina Davide, Gebeil Sophie, G\u00e9nois Mathieu et Bellot Patrice, 2024, \u00ab&nbsp;Semantic analysis of web archive historical data&nbsp;\u00bb dans Exploring the Archived Web during a Highly Transformative Age Proceedings of the 5th international RESAW conference, Marseille, June 2023, s.l.<a href=\"https:\/\/doi.org\/10.36253\/979-12-215-0413-2.22\" target=\"_blank\" rel=\"noreferrer noopener\">10.36253\/979-12-215-0413-2.22<\/a><\/p>\n\n\n\n<h1 class=\"wp-block-heading\">AI and the Analysis of Archived Web Data: At the Intersection of Computer Science and Digital History<\/h1>\n\n\n\n<p>At the crossroads of computer science and history, this interdisciplinary research project explores the potential of computational methods for analyzing archived web data and studying the history of digital media. Led by Sophie Gebeil and Line Jamet-Jakubiec, the project aims to develop and use programs and tools, including AI, to automate data processing tasks and extract major trends, while also reflecting on the epistemological implications of such an approach.<\/p>\n\n\n\n<p><strong>Automating the Analysis of Archived Web Data in History<\/strong><\/p>\n\n\n\n<p>This project, led by Sophie Gebeil and Line Jamet-Jakubiec, focuses on the use of archived web data, particularly materials preserved by the Biblioth\u00e8que nationale de France and the Institut national de l&rsquo;audiovisuel as part of the legal deposit of the French web established in 2006. Digital sources are abundant and diverse, as are the formats in which the data are made available.<\/p>\n\n\n\n<p>The goal is to provide humanities scholars with programs and tools that enable them to explore and analyze large-scale web corpora within the framework of the IUF project \u201cWeb Archiving as a Historiographical Challenge: Between Fragmentation and Mediation.\u201d<\/p>\n\n\n\n<p>Among the planned technical developments are the use and programming of AI systems to process data and extract trends, including chatbots, predictive tools, automatically generated texts based on other texts, trend visualizations, and navigation tools. The project also involves developing scripts to handle the various data formats provided, and using dedicated frameworks such as Django or MongoDB to build databases tailored to research needs. The choice of programming languages will be discussed with project partners and may include Python for its versatility and extensive libraries, Java or C++ for performance, JavaScript for ease of AI tool integration, Rust for reliability, and R for advanced statistical modeling.<\/p>\n\n\n\n<p><strong>Three Years of Interdisciplinary Collaboration Around Archived Web Data<\/strong><\/p>\n\n\n\n<p>Within the Visual Studies and Digital Humanities in the Mediterranean workshop, an initial exploration of automated data processing was conducted as part of the project \u201cScreens in Struggle: Memories of Social Movements on French Web TV Since the Late 1990s,\u201d which received the Associated Researcher Award from the INA in 2018\u20132019. Based on a corpus of 58 French activist web TV sites archived by the Institut national de l&rsquo;audiovisuel over a five-year period from May 2010 to May 2015, a methodology and information extraction process was developed by the INA. Using the corpus HTML files, and through collaboration with the INA and the company Gamuza, a processing pipeline was designed to enable the extraction and identification of the main characteristics and developments observed during those five years.<\/p>\n\n\n\n<p>In November 2021, the inauguration of CEDRE AMU provided an opportunity for an initial discussion within the PICCH project, in collaboration with Mathieu G\u00e9nois, a physicist specializing in network analysis. This collaboration led to the exploration of a corpus related to the thirtieth anniversary of the 1983 March for Equality and Against Racism, archived by the Institut National de l&rsquo;Audiovisuel. The work took shape through an internship and the interdisciplinary co-supervision of Davide Rendina\u2019s master\u2019s thesis, centered on the semantic analysis of data at the intersection of computer science and history, which was defended in August 2023.<\/p>\n\n\n\n<p>At the start of the following academic year, internship opportunities were offered by the TELEMMe laboratory to extend the analysis of archived web data related to the 1983 March. Two undergraduate computer science students from AMU were recruited during the 2023\u20132024 academic year, initiating a first collaboration with Line Jamet-Jakubiec, Associate Professor at LIS and head of the Computer Science undergraduate program. Building on this successful experience, L. Jamet-Jakubiec and S. Gebeil began drafting the present interdisciplinary research project, which coincided with the creation of the WebLab in 2024.<\/p>\n\n\n\n<p>The 2025\u20132026 seminar series enabled productive exchanges with teams from the Biblioth\u00e8que nationale de France and the Institut national de l&rsquo;audiovisuel, leading to the establishment of a legally secure framework for continuing data exploration. Two corpora were identified: the independent news site Marsactu, archived by the BnF, and the France Info TV website, archived by the INA.<\/p>\n\n\n\n<p><strong>Training Through Research: 2026 Internships<\/strong><\/p>\n\n\n\n<p>In the first semester of 2026, the TELEMMe and LIS laboratories will host computer science interns who will contribute to the development of programs designed to process and analyze web archive corpora preserved by the Biblioth\u00e8que nationale de France and the Institut national de l&rsquo;audiovisuel. Under the supervision of Line Jamet-Jakubiec and Sophie Gebeil, they will help design tools tailored to researchers\u2019 needs and integrate AI-based tools for data cleaning, trend extraction, and navigation.<\/p>\n\n\n\n<p>They will develop automated processing scripts for different data formats and participate in analyzing and visualizing results. They will also have the opportunity to present their work as part of the WebLab and contribute to the drafting of technical documentation.<\/p>\n\n\n\n<p>The targeted skills span both web archiving and computer science:<\/p>\n\n\n\n<p>-Advanced programming applied to corpora drawn from archived web data<br>-Mastery of data extraction techniques, database construction, and AI tool integration<br>-Deeper knowledge of computational methods for the humanities and of scientific documentation practices<br>-Exploration of the current state of the art in software dedicated to analyzing archived web data<br><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Entre sciences de l&rsquo;informatique et histoire, ce projet de recherche interdisciplinaire explore les potentialit\u00e9s offertes par les sciences de l&rsquo;informatique pour l&rsquo;analyse des donn\u00e9es du web archiv\u00e9 et l&rsquo;histoire des m\u00e9dias num\u00e9riques. Port\u00e9 par Sophie Gebeil et Line Jamet-Jakubiec, l&rsquo;objectif est de d\u00e9velopper et \u00e0 utiliser des programmes, des outils, dont IA, pour automatiser des&hellip; <a class=\"more-link\" href=\"https:\/\/pba.mmsh.fr\/?p=35483\">Poursuivre la lecture <span class=\"screen-reader-text\">IA et analyse des donn\u00e9es du Web archiv\u00e9 : entre sciences de l\u2019informatique et histoire num\u00e9rique<\/span><\/a><\/p>\n","protected":false},"author":24,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-35483","post","type-post","status-publish","format-standard","hentry","category-non-classe","entry"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.5 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>IA et analyse des donn\u00e9es du Web archiv\u00e9 : entre sciences de l\u2019informatique et histoire num\u00e9rique - P\u00f4le Biblioth\u00e8ques et Archives de la MMSH<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/pba.mmsh.fr\/?p=35483\" \/>\n<meta property=\"og:locale\" content=\"fr_FR\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"IA et analyse des donn\u00e9es du Web archiv\u00e9 : entre sciences de l\u2019informatique et histoire num\u00e9rique - P\u00f4le Biblioth\u00e8ques et Archives de la MMSH\" \/>\n<meta property=\"og:description\" content=\"Entre sciences de l&rsquo;informatique et histoire, ce projet de recherche interdisciplinaire explore les potentialit\u00e9s offertes par les sciences de l&rsquo;informatique pour l&rsquo;analyse des donn\u00e9es du web archiv\u00e9 et l&rsquo;histoire des m\u00e9dias num\u00e9riques. Port\u00e9 par Sophie Gebeil et Line Jamet-Jakubiec, l&rsquo;objectif est de d\u00e9velopper et \u00e0 utiliser des programmes, des outils, dont IA, pour automatiser des&hellip; Poursuivre la lecture IA et analyse des donn\u00e9es du Web archiv\u00e9 : entre sciences de l\u2019informatique et histoire num\u00e9rique\" \/>\n<meta property=\"og:url\" content=\"https:\/\/pba.mmsh.fr\/?p=35483\" \/>\n<meta property=\"og:site_name\" content=\"P\u00f4le Biblioth\u00e8ques et Archives de la MMSH\" \/>\n<meta property=\"article:published_time\" content=\"2026-02-14T17:33:35+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2026-03-09T10:29:11+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/pba.mmsh.fr\/wp-content\/uploads\/2022\/04\/LOGO-MMSH-UAR-2022-2.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1654\" \/>\n\t<meta property=\"og:image:height\" content=\"552\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Meriem Bataoui\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"\u00c9crit par\" \/>\n\t<meta name=\"twitter:data1\" content=\"Meriem Bataoui\" \/>\n\t<meta name=\"twitter:label2\" content=\"Dur\u00e9e de lecture estim\u00e9e\" \/>\n\t<meta name=\"twitter:data2\" content=\"10 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/pba.mmsh.fr\\\/?p=35483#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/pba.mmsh.fr\\\/?p=35483\"},\"author\":{\"name\":\"Meriem Bataoui\",\"@id\":\"https:\\\/\\\/pba.mmsh.fr\\\/#\\\/schema\\\/person\\\/843c6247543306a187c9f51589fe5484\"},\"headline\":\"IA et analyse des donn\u00e9es du Web archiv\u00e9 : entre sciences de l\u2019informatique et histoire num\u00e9rique\",\"datePublished\":\"2026-02-14T17:33:35+00:00\",\"dateModified\":\"2026-03-09T10:29:11+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/pba.mmsh.fr\\\/?p=35483\"},\"wordCount\":2046,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\\\/\\\/pba.mmsh.fr\\\/#organization\"},\"inLanguage\":\"fr-FR\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/pba.mmsh.fr\\\/?p=35483#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/pba.mmsh.fr\\\/?p=35483\",\"url\":\"https:\\\/\\\/pba.mmsh.fr\\\/?p=35483\",\"name\":\"IA et analyse des donn\u00e9es du Web archiv\u00e9 : entre sciences de l\u2019informatique et histoire num\u00e9rique - P\u00f4le Biblioth\u00e8ques et Archives de la MMSH\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/pba.mmsh.fr\\\/#website\"},\"datePublished\":\"2026-02-14T17:33:35+00:00\",\"dateModified\":\"2026-03-09T10:29:11+00:00\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/pba.mmsh.fr\\\/?p=35483#breadcrumb\"},\"inLanguage\":\"fr-FR\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/pba.mmsh.fr\\\/?p=35483\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/pba.mmsh.fr\\\/?p=35483#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Accueil\",\"item\":\"https:\\\/\\\/pba.mmsh.fr\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"IA et analyse des donn\u00e9es du Web archiv\u00e9 : entre sciences de l\u2019informatique et histoire num\u00e9rique\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/pba.mmsh.fr\\\/#website\",\"url\":\"https:\\\/\\\/pba.mmsh.fr\\\/\",\"name\":\"P\u00f4le Biblioth\u00e8ques et Archives de la MMSH\",\"description\":\"Biblioth\u00e8ques et Archives \u00e0 la Maison m\u00e9diterran\u00e9enne des sciences de l\u2019homme\",\"publisher\":{\"@id\":\"https:\\\/\\\/pba.mmsh.fr\\\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/pba.mmsh.fr\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"fr-FR\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/pba.mmsh.fr\\\/#organization\",\"name\":\"P\u00f4le Biblioth\u00e8ques et Archives de la MMSH\",\"url\":\"https:\\\/\\\/pba.mmsh.fr\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"fr-FR\",\"@id\":\"https:\\\/\\\/pba.mmsh.fr\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"https:\\\/\\\/pba.mmsh.fr\\\/wp-content\\\/uploads\\\/2026\\\/01\\\/cropped-cropped-LOGO-UAR-MMSHS-coul.png\",\"contentUrl\":\"https:\\\/\\\/pba.mmsh.fr\\\/wp-content\\\/uploads\\\/2026\\\/01\\\/cropped-cropped-LOGO-UAR-MMSHS-coul.png\",\"width\":1161,\"height\":303,\"caption\":\"P\u00f4le Biblioth\u00e8ques et Archives de la MMSH\"},\"image\":{\"@id\":\"https:\\\/\\\/pba.mmsh.fr\\\/#\\\/schema\\\/logo\\\/image\\\/\"}},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/pba.mmsh.fr\\\/#\\\/schema\\\/person\\\/843c6247543306a187c9f51589fe5484\",\"name\":\"Meriem Bataoui\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"fr-FR\",\"@id\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/e74de76f50ae360216a9f1cd26a16d30d83538bfe584a9610bfce08f261bbf1a?s=96&d=mm&r=g\",\"url\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/e74de76f50ae360216a9f1cd26a16d30d83538bfe584a9610bfce08f261bbf1a?s=96&d=mm&r=g\",\"contentUrl\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/e74de76f50ae360216a9f1cd26a16d30d83538bfe584a9610bfce08f261bbf1a?s=96&d=mm&r=g\",\"caption\":\"Meriem Bataoui\"},\"url\":\"https:\\\/\\\/pba.mmsh.fr\\\/?author=24\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"IA et analyse des donn\u00e9es du Web archiv\u00e9 : entre sciences de l\u2019informatique et histoire num\u00e9rique - P\u00f4le Biblioth\u00e8ques et Archives de la MMSH","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/pba.mmsh.fr\/?p=35483","og_locale":"fr_FR","og_type":"article","og_title":"IA et analyse des donn\u00e9es du Web archiv\u00e9 : entre sciences de l\u2019informatique et histoire num\u00e9rique - P\u00f4le Biblioth\u00e8ques et Archives de la MMSH","og_description":"Entre sciences de l&rsquo;informatique et histoire, ce projet de recherche interdisciplinaire explore les potentialit\u00e9s offertes par les sciences de l&rsquo;informatique pour l&rsquo;analyse des donn\u00e9es du web archiv\u00e9 et l&rsquo;histoire des m\u00e9dias num\u00e9riques. Port\u00e9 par Sophie Gebeil et Line Jamet-Jakubiec, l&rsquo;objectif est de d\u00e9velopper et \u00e0 utiliser des programmes, des outils, dont IA, pour automatiser des&hellip; Poursuivre la lecture IA et analyse des donn\u00e9es du Web archiv\u00e9 : entre sciences de l\u2019informatique et histoire num\u00e9rique","og_url":"https:\/\/pba.mmsh.fr\/?p=35483","og_site_name":"P\u00f4le Biblioth\u00e8ques et Archives de la MMSH","article_published_time":"2026-02-14T17:33:35+00:00","article_modified_time":"2026-03-09T10:29:11+00:00","og_image":[{"width":1654,"height":552,"url":"https:\/\/pba.mmsh.fr\/wp-content\/uploads\/2022\/04\/LOGO-MMSH-UAR-2022-2.png","type":"image\/png"}],"author":"Meriem Bataoui","twitter_card":"summary_large_image","twitter_misc":{"\u00c9crit par":"Meriem Bataoui","Dur\u00e9e de lecture estim\u00e9e":"10 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/pba.mmsh.fr\/?p=35483#article","isPartOf":{"@id":"https:\/\/pba.mmsh.fr\/?p=35483"},"author":{"name":"Meriem Bataoui","@id":"https:\/\/pba.mmsh.fr\/#\/schema\/person\/843c6247543306a187c9f51589fe5484"},"headline":"IA et analyse des donn\u00e9es du Web archiv\u00e9 : entre sciences de l\u2019informatique et histoire num\u00e9rique","datePublished":"2026-02-14T17:33:35+00:00","dateModified":"2026-03-09T10:29:11+00:00","mainEntityOfPage":{"@id":"https:\/\/pba.mmsh.fr\/?p=35483"},"wordCount":2046,"commentCount":0,"publisher":{"@id":"https:\/\/pba.mmsh.fr\/#organization"},"inLanguage":"fr-FR","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/pba.mmsh.fr\/?p=35483#respond"]}]},{"@type":"WebPage","@id":"https:\/\/pba.mmsh.fr\/?p=35483","url":"https:\/\/pba.mmsh.fr\/?p=35483","name":"IA et analyse des donn\u00e9es du Web archiv\u00e9 : entre sciences de l\u2019informatique et histoire num\u00e9rique - P\u00f4le Biblioth\u00e8ques et Archives de la MMSH","isPartOf":{"@id":"https:\/\/pba.mmsh.fr\/#website"},"datePublished":"2026-02-14T17:33:35+00:00","dateModified":"2026-03-09T10:29:11+00:00","breadcrumb":{"@id":"https:\/\/pba.mmsh.fr\/?p=35483#breadcrumb"},"inLanguage":"fr-FR","potentialAction":[{"@type":"ReadAction","target":["https:\/\/pba.mmsh.fr\/?p=35483"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/pba.mmsh.fr\/?p=35483#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Accueil","item":"https:\/\/pba.mmsh.fr\/"},{"@type":"ListItem","position":2,"name":"IA et analyse des donn\u00e9es du Web archiv\u00e9 : entre sciences de l\u2019informatique et histoire num\u00e9rique"}]},{"@type":"WebSite","@id":"https:\/\/pba.mmsh.fr\/#website","url":"https:\/\/pba.mmsh.fr\/","name":"P\u00f4le Biblioth\u00e8ques et Archives de la MMSH","description":"Biblioth\u00e8ques et Archives \u00e0 la Maison m\u00e9diterran\u00e9enne des sciences de l\u2019homme","publisher":{"@id":"https:\/\/pba.mmsh.fr\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/pba.mmsh.fr\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"fr-FR"},{"@type":"Organization","@id":"https:\/\/pba.mmsh.fr\/#organization","name":"P\u00f4le Biblioth\u00e8ques et Archives de la MMSH","url":"https:\/\/pba.mmsh.fr\/","logo":{"@type":"ImageObject","inLanguage":"fr-FR","@id":"https:\/\/pba.mmsh.fr\/#\/schema\/logo\/image\/","url":"https:\/\/pba.mmsh.fr\/wp-content\/uploads\/2026\/01\/cropped-cropped-LOGO-UAR-MMSHS-coul.png","contentUrl":"https:\/\/pba.mmsh.fr\/wp-content\/uploads\/2026\/01\/cropped-cropped-LOGO-UAR-MMSHS-coul.png","width":1161,"height":303,"caption":"P\u00f4le Biblioth\u00e8ques et Archives de la MMSH"},"image":{"@id":"https:\/\/pba.mmsh.fr\/#\/schema\/logo\/image\/"}},{"@type":"Person","@id":"https:\/\/pba.mmsh.fr\/#\/schema\/person\/843c6247543306a187c9f51589fe5484","name":"Meriem Bataoui","image":{"@type":"ImageObject","inLanguage":"fr-FR","@id":"https:\/\/secure.gravatar.com\/avatar\/e74de76f50ae360216a9f1cd26a16d30d83538bfe584a9610bfce08f261bbf1a?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/e74de76f50ae360216a9f1cd26a16d30d83538bfe584a9610bfce08f261bbf1a?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/e74de76f50ae360216a9f1cd26a16d30d83538bfe584a9610bfce08f261bbf1a?s=96&d=mm&r=g","caption":"Meriem Bataoui"},"url":"https:\/\/pba.mmsh.fr\/?author=24"}]}},"_links":{"self":[{"href":"https:\/\/pba.mmsh.fr\/index.php?rest_route=\/wp\/v2\/posts\/35483","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/pba.mmsh.fr\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/pba.mmsh.fr\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/pba.mmsh.fr\/index.php?rest_route=\/wp\/v2\/users\/24"}],"replies":[{"embeddable":true,"href":"https:\/\/pba.mmsh.fr\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=35483"}],"version-history":[{"count":11,"href":"https:\/\/pba.mmsh.fr\/index.php?rest_route=\/wp\/v2\/posts\/35483\/revisions"}],"predecessor-version":[{"id":35533,"href":"https:\/\/pba.mmsh.fr\/index.php?rest_route=\/wp\/v2\/posts\/35483\/revisions\/35533"}],"wp:attachment":[{"href":"https:\/\/pba.mmsh.fr\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=35483"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/pba.mmsh.fr\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=35483"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/pba.mmsh.fr\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=35483"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}