Programmation Python/Programmer en deux minutes/l'interface de Wikipédia pour programmer

Programmer en deux minutes/une messagerie instantanée

Programmer en deux minutes/un serveur Web

Nous allons écrire un script en Python et l'exécuter dans une console. Le script va utiliser deux ensembles de commandes définis dans la bibliothèque fournie à l'installation du langage.

Programmation Python

Partie 1 - Introduction

Partie 2 - Le langage

Partie 3 - Les bibliothèques

Version imprimable

[ Modifier le sommaire ]

Programmer en deux minutes/une messagerie instantanée

Programmer en deux minutes/un serveur Web

L'interface avec Wikipédia se fait via des requêtes à :

https://fr.wikipedia.org/w/api.php?

Par exemple :

https://fr.wikipedia.org/w/api.php?action=query&prop=info%7Crevisions&titles=Accueil

Structure de l'API Wikipédia

Obtenir les informations à propos de la dernière révision de la page Accueil :

api.php ? action=query & prop=info|revisions & rvprop=timestamp & titles=Accueil

<?xml version="1.0"?>
<api>
  <query>
    <pages>
      <page pageid="15169" ns="0" title="Accueil" touched="2009-05-10T14:43:08Z" lastrevid="229318" counter="0" length="1878">
        <revisions>
          <rev revid="229318" minor="" user="Savant-fou" timestamp="2009-04-25T16:07:37Z" comment="Ajout rapide de la catégorie [[:Catégorie:Accueil|Accueil]] (avec [[MediaWiki:Gadget-HotCats.js|HotCats]])" />
        </revisions>
      </page>
    </pages>
  </query>
</api>

À l'écriture de ces lignes le 17 mai 2009, le dernier réviseur de la page Accueil était Savant-fou (d · c · b) - cette information est contenue dans la chaîne « user="Savant-fou" » - qui, le 25 avril 2009 à 16:07, a résumé sa modification par le commentaire « Ajout rapide de la catégorie Accueil (avec HotCats) ». Pour connaître le dernier réviseur de la page d'accueil en ce moment, cliquez sur le lien ci-dessus qui affiche ce document XML.

résultat de commandes dans différents formats

XML

api.php ? action=query & titles=Albert%20Einstein & prop=info & format=xmlfm

<?xml version="1.0" encoding="utf-8"?>
<api>
  <query>
    <pages>
      <page pageid="736" ns="0" title="Albert Einstein" touched="2007-07-06T04:37:30Z" lastrevid="142335140" counter="4698" length="86906" />
    </pages>
  </query>
</api>

JSON

api.php ? action=query & titles=Albert%20Einstein & prop=info & format=jsonfm

{
    "query": {
        "pages": {
            "736": {
                "pageid": 736,
                "ns": 0,
                "title": "Albert Einstein",
                "touched": "2007-07-06T04:37:30Z",
                "lastrevid": 142335140,
                "counter": 4698,
                "length": 86906
            }
        }
    }
}

YAML

api.php ? action=query & titles=Albert%20Einstein & prop=info & format=yamlfm

---
query: 
  pages: 
    - 
      pageid: 736
      ns: 0
      title: Albert Einstein
      touched: |
        2008-03-16T04:59:39Z
      lastrevid: 198568286
      counter: 4698
      length: 81076

WDDX

api.php ? action=query & titles=Albert%20Einstein & prop=info & format=wddxfm

<?xml version="1.0" encoding="utf-8"?>
<wddxPacket version="1.0">
  <header/>
  <data>
    <struct>
      <var name="query">
        <struct>
          <var name="pages">
            <struct>
              <var name="736">
                <struct>
                  <var name="pageid">
                    <number>736</number>
                  </var>
                  <var name="ns">
                    <number>0</number>
                  </var>
                  <var name="title">
                    <string>Albert Einstein</string>
                  </var>
                  <var name="touched">
                    <string>2007-07-06T04:37:30Z</string>
                  </var>
                  <var name="lastrevid">
                    <number>142335140</number>
                  </var>
                  <var name="counter">
                    <number>4698</number>
                  </var>
                  <var name="length">
                    <number>86906</number>
                  </var>
                </struct>
              </var>
            </struct>
          </var>
        </struct>
      </var>
    </struct>
  </data>
</wddxPacket>

PHP (serialized format, with line breaks added for readability. Use PHP's unserialize() function to recover data.)

api.php ? action=query & titles=Albert%20Einstein & prop=info & format=php

a:1:{s:5:"query";a:1:{s:5:"pages";a:1:{i:736;a:7:{s:6:"pageid";i:736;s:2:"ns";i:0;s:5:"title";s:15:"Albert Einstein";
s:7:"touched";s:20:"2007-07-06T04:37:30Z";s:9:"lastrevid";i:142335140;s:7:"counter";i:4698;s:6:"length";i:86906;}}}}

PHP (var_export format. Use PHP's eval() function to recover data.)

api.php ? action=query & titles=Albert%20Einstein & prop=info & format=dbg

array (
  'query' => 
  array (
    'pages' => 
    array (
      736 => 
      array (
        'pageid' => 736,
        'ns' => 0,
        'title' => 'Albert Einstein',
        'touched' => '2008-10-11T20:27:04Z',
        'lastrevid' => 244636163,
        'counter' => 4698,
        'length' => 89641,
      ),
    ),
  ),
)

Dernier modifieur de la page

1. Ouvrir un éditeur de texte, coller le script suivant (sans caractères spéciaux comme "é" si le fichier est en ASCII au lieu de Unicode)...

modifieur_de_la_page.py

#!/usr/bin/python
# -*- coding: latin-1 -*-
import urllib.request, re # import des modules à partir de la bibliothèque d'instructions de base, 
# 'urllib' pour URL library et 're' pour regular expression.
nom_de_page = "Accueil"

url = "http://fr.wikipedia.org/w/api.php?action=query&prop=info|revisions&titles=%s&format=xml" % nom_de_page
# affichage
page = urllib.request.urlopen(url)
infos = str(page.read(), 'utf_8') # lit le résultat de la requête à l'url ci-dessus
page.close()
print("Les informations demandées concernant" + nom_de_page + "sont les suivantes, en XML :\n\n" + infos)

# extraction
print("\n...recherche l'expression rationnelle...")
reviseur = re.findall(' user="(.*?)" ',infos) # recherche l'expression rationnelle
print("\nDernier reviseur : " + str(reviseur))

...enregistrez ce script (par exemple modifieur_de_la_page.py) et exécutez-le. Le script utilise cette requête pour afficher le dernier modifieur de la page d'accueil.

Note : il s'agit de la méthode utilisant les bibliothèques natives de Python. Une alternative est d'utiliser le framework Pywikibot, qui se charge de l'API de MediaWiki à la place du développeur, rendant la tâche beaucoup plus simple pour des scripts plus perfectionnés.

Boucle des modifieurs du bistro

2. Obtenir la liste des derniers modifieurs des Bistros du mois dernier. Ouvrir l'éditeur de texte, écrire ce script utilisant plusieurs fois cette requête... Si vous souhaitez utiliser le code suivant avec Python 3, faites les mêmes modifications que dans le script précédent. C'est-à-dire : rajoutez des parenthèses aux print ; chargez la classe urllib.request (au lieu d'urllib tout court) ; utilisez la fonction urllib.request.urlopen (au lieu de urllib.urlopen) ; transformez le résultat de read en chaîne de caractères (infos = str(url.read(), 'utf_8')).

boucle_reviseur_bistro.py

#!/usr/bin/python
# -*- coding: latin-1 -*-
import urllib  # import des modules de la bibliothèque d'instructions fournie à l'installation.
import re

a = 0
space = ' '
nospace = ''

while a < 31:
    a = a+1          # a est un nombre entier (integer)
    b = str(int(a))  # et b une chaine de caractères (string)
    nom = "Wikipedia:Le_Bistro/"+b+"_avril_2009"
    nom = nom.replace(space, nospace)  # supprime les espace
    url = urllib.urlopen("http://fr.wikipedia.org/w/api.php?action=query" +
          "&prop=info|revisions&titles=%s&format=xml" % nom )
    infos = url.read() 
    url.close()
    reviseur = re.findall(' user="(.*?)" ', infos)  # recherche l'expression rationnelle
    for chacun in reviseur:
        print("\nDernier reviseur du bistro du " + b + " avril 2009 : " + chacun)

...enregistrez ce script (par exemple boucle_reviseur_bistro.py) et exécutez-le.

Tous les modifieurs de la page

3. La liste des modifieurs de la page d'accueil entre deux dates, et les commentaires de révisions : ouvrir l'éditeur de texte, écrire ce script, faire les mêmes modifications pour Python 3 le cas échéant... Ce script utilise cette requête.

liste_des_reviseurs.py

#!/usr/bin/python
# -*- coding: latin-1 -*-
import urllib.request, re # import des modules à partir de la bibliothèque d'instructions de base, 
 
debut = str(int(20090311000000)) # date pour commencer a lister, ici 11 mars 2009.
fin = str(int(20090511000000))
 
nom = "Accueil"
 
url = "http://fr.wikipedia.org/w/api.php?action=query" + "&prop=revisions&rvstart=%s&revend=%s&titles=%s&format=xml" % (debut, fin, nom)
page = urllib.request.urlopen(url)
infos = str(page.read(), 'utf_8') # lit le résultat de la requête à l'url ci-dessus
page.close()

# recherche et affiche les réviseurs
#
reviseur = re.findall(' user="(.*?)" ',infos) # recherche l'expression rationnelle
for chacun in reviseur:
    print("Reviseur : " + chacun)

# recherche et affiche les commentaires
commentaire = re.findall(' comment="(.*?)" ',infos) # recherche l'expression rationnelle
for comment in commentaire:
    print("\nCommentaire de revision : " + comment)

Félicitations, vous utilisez Wikipédia via son API !

Vous pouvez poursuivre cet exercice en programmant du python sur une base vierge, ou alors utiliser la librairie d'instructions Pywikipedia et les scripts Pywikipedia hébergés par Wikimédia. Par exemple, vous devriez être capable de lire un script tel que :

statistics_in_wikitable.py

#!/usr/bin/python
# -*- coding: utf-8  -*-
"""

\03{lightyellow}This bot renders statistics provided by [[Special:Statistics]] in a table on a wiki page.\03{default}
Thus it creates and updates a Statistics wikitable.

The following parameters are supported:

\03{lightred}-screen\03{default}     If True, doesn't do any changes, but only shows the statistics.

\03{lightgreen}-page\03{default}    On what page statistics are rendered.
        If not existing yet, it is created.
        If existing, it is updated.
"""
__version__ = '$Id$'
import wikipedia
import pagegenerators
import query
import time

# This is the title of the wikipage where to render stats.
your_page = "Logstats"

summary_update = {
    'en':u'Updating some statistics.',
    }
summary_creation = {
    'en':u'Creating statistics log page.',
    }

class StatisticsBot:
    def __init__ (self, screen, your_page):
        """
        Constructor. Parameter:
            * screen    - If True, doesn't do any real changes,
                          but only shows some stats.
        """
        self.screen = screen
        self.your_page = your_page
        self.dict = self.getdata() # Try to get data.
        self.site = wikipedia.getSite()

    def run(self):
        if self.screen:
            wikipedia.output("Bot is running to output stats.")
            self.idle(1) # Run a function to idle
            self.outputall()
        if not self.screen:
            self.outputall() # Output all datas on screen.
            wikipedia.output("\nBot is running. " +
                "Going to treat \03{lightpurple}%s\03{default}..." %
                self.your_page )
            self.idle(2)
            self.treat()

    # getdata() returns a dictionnary of the query to
    #   api.php?action=query&meta=siteinfo&siprop=statistics
    def getdata(self):
        # This method return data in a dictionnary format.
        # View data with: api.php?action=query&meta=siteinfo&siprop=statistics&format=jsonfm
        params = {
        'action'    :'query',
        'meta'      :'siteinfo',
        'siprop'    :'statistics',
        }
        wikipedia.output("\nQuerying api for json-formatted data...")
        try:
            data = query.GetData(params,self.site, encodeTitle = False)
        except:
            url = self.site.protocol() + '://' + self.site.hostname() + self.site.api_address()
            wikipedia.output("The query has failed. Have you check the API? Cookies are working?")
            wikipedia.output(u"\n>> \03{lightpurple}%s\03{default} <<" % url)
        if data != None:
            wikipedia.output("Extracting statistics...")
            data = data['query']      # "query" entry of data.
            dict = data['statistics'] # "statistics" entry of "query" dict.
            return dict

    def treat(self):
        page = wikipedia.Page(self.site, self.your_page)
        if page.exists():
            wikipedia.output(u'\nWikitable on ' +
                u'\03{lightpurple}%s\03{default} will be completed with:\n' % self.your_page )
            text = page.get()
            newtext = self.newraw()
            wikipedia.output(newtext)
            choice = wikipedia.inputChoice(
                u'Do you want to add these on wikitable?', ['Yes', 'No'], ['y', 'N'], 'N')
            text = text[:-3] + newtext
            summ = wikipedia.translate(self.site, summary_update)
            if choice == 'y':
                try:
                    page.put(u''.join(text), summ)
                except:
                    wikipedia.output(u'Impossible to edit. It may be an edit conflict... Skipping...')
        else:
            wikipedia.output(
                u'\nWikitable on \03{lightpurple}%s\03{default} will be created with:\n' % self.your_page )
            newtext = self.newtable()+self.newraw()
            wikipedia.output(newtext)
            summ = wikipedia.translate(self.site, summary_creation)
            choice = wikipedia.inputChoice(
                u'Do you want to accept this page creation?', ['Yes', 'No'], ['y', 'N'], 'N')
            if choice == 'y':
                try:
                    page.put(newtext, summ)
                except wikipedia.LockedPage:
                    wikipedia.output(u"Page %s is locked; skipping." % title)
                except wikipedia.EditConflict:
                    wikipedia.output(u'Skipping %s because of edit conflict' % title)
                except wikipedia.SpamfilterError, error:
                    wikipedia.output(
                        u'Cannot change %s because of spam blacklist entry %s' % (title, error.url))

    def newraw(self):
        newtext = ('\n|----\n!\'\''+ self.date() +'\'\'')    # new raw for date and stats
        for name in self.dict:
            newtext += '\n|'+str(abs(self.dict[name]))
        newtext += '\n|----\n|}'
        return newtext

    def newtable(self):
        newtext = ('\n{| class=wikitable style=text-align:center\n!'+ "date")    # create table
        for name in self.dict:
            newtext += '\n|'+name
        return newtext

    def date(self):
        return time.strftime('%Y/%m/%d', time.localtime(time.time()))
    
    def outputall(self):
        list = self.dict.keys()
        list.sort()
        for name in self.dict:
            wikipedia.output("There are "+str(self.dict[name])+" "+name)
    
    def idle(self, retry_idle_time):
        time.sleep(retry_idle_time)
        wikipedia.output(u"Starting in %i second..." % retry_idle_time)
        time.sleep(retry_idle_time)

def main(your_page):
    screen = False # If True it would not edit the wiki, only output statistics
    _page = None

    wikipedia.output("\nBuilding the bot...")
    for arg in wikipedia.handleArgs():    # Parse command line arguments
        if arg.startswith('-page'):
            if len(arg) == 5:
                _page = wikipedia.input(u'On what page do you want to add statistics?')
            else:
                _page = arg[6:]
        if arg.startswith("-screen"):
            screen = True
    if not _page:
        _page = your_page
        if not screen:
            wikipedia.output("The bot will add statistics on %s.\n" % _page )
    bot = StatisticsBot(screen, _page) # Launch the instance of a StatisticsBot
    bot.run() # Execute the 'run' method

if __name__ == "__main__":
    try:
        main(your_page)
    finally:
        wikipedia.stopme()

Le script statistics_in_wikitable.py importe quelques librairies d'instructions dont Pywikipedia, définit trois variables, définit l'objet StatisticsBot, puis définit une fonction principale qui est exécutée à la fin du script (par l'instruction try: main(your_page)).

Pour aller plus loin : Catégorie:Programmation.

... en deux minutes avec Python :

une messagerie instantanée,
un programme en interface avec Wikipédia,
un serveur Web pour partager des fichiers et mettre en ligne un site statique en HTML.

Voir aussi le livre : Programmer en deux minutes