Plagiary Poets

Last week I launched PlagiaryPoets.io, an interactive app that visualizes text reuse within early poetry:

plagiary_poets.png

I was inspired to build the app after spending a month studying D3.js with Bob Holt, Mike Pennisi, and Yannick Assogba, three brilliant developers who work for Bocoup. The site was a lot of fun to build, and I look forward to launching related projects in the months to come. If you have any thoughts about this one, feel free to drop me a line below!

Visualizing Shakespearean Characters

Some time ago, I was intrigued to discover that Shakespeare’s Histories have a noticeable lack of female characters [link]. Since then, I’ve been curious to further explore the nuances of Shakespearean characters, paying particular respect to the gender dynamics of the Bard’s plays. This post is a quick sketch of some of the insights to which that curiosity has led.

To get a closer look at Shakespeare’s characters, I ran some analysis on the Folger Shakespeare Library’s gold-standard set of Shakespearean texts [link], all of which are encoded in fantastic XML markup that captures a number of character-level attributes, including gender. Using that markup, I extracted data for each character in Shakespeare’s plays, and then scoured through those features in search of patterns. All of the characters with an identified gender in this dataset are plotted below (mouseover for character name and source play):

Looking at this plot, we can see that the most prominent characters in Shakespearean drama are almost all well-known, titular males. There is also a noticeable inverse-relationship between a character’s prominence and the point in the play wherein that character is introduced. Looking more closely at the plot, I’ve further noticed that Shakespeare was curiously consistent in his treatment of characters who appear in multiple plays. In both 1 Henry IV and 2 Henry IV, for instance, Falstaff is given ~6,000 words and is introduced only a few hundred words into the work. Looking at the long tail, by contrast, one finds that among the outspoken characters introduced after the ~15,000 word mark—including Westmoreland and Bedford in 2 Henry IV, and Cade, Clifford, and Iden from 2 Henry VI—nearly all hail from Histories.

While the plot above gives one a birds’ eye view of Shakespeare’s characters, the plot doesn’t make it particularly easy to differentiate male and female character dynamics. As a step in this direction, the plot below visualizes character entrances by gender for each of Shakespeare's plays:

Examining at the distribution along the x-axis, we can see that male characters consistently enter the stage before female characters. An exception to this general rule may be found in the Comedies, as plays like Taming of the Shrew, All’s Well that Ends Well, and Midsummer Nights’ Dream begin with female characters on stage. Looking at the distribution along the y-axis, we can also see that for most plays, male characters continue to be introduced on stage long after the last female characters have been introduced.

Given the plots above, some might conclude that Shakespeare privileged male characters over female characters, as he introduced the former earlier and tended to give them more lines. There is evidence in the plays, however, that points in the opposite direction. Looking at Shakespeare’s minor characters, we see that the smallest and least significant roles in each play were almost universally assigned to males:

Here we see that even important males characters such as Fleance in Macbeth and Cornelius in Hamlet are given very few lines indeed, and the smallest female roles are consistently given more lines than the smallest male roles.

In sum, the plots above show that a number of heretofore undisclosed patterns emerge when we analyze Shakespeare’s characters in the aggregate. However, the plots above don’t show the connections between characters. One way to investigate these interconnections is through a co-occurrence matrix, in which each cell represents the degree to which two characters appear on stage concurrently:

In this visualization, “Frequency” represents the number of times a character appears on stage, “Gender” is indicated by the markup within the Folger Shakespeare Digital Collection XML (red = female, blue = male, green = unspecified), and “Cluster” reflects the subgroup of characters with whom a given character regularly appears, as determined by a fast greedy modularity ranking algorithm. Interacting with this plot allows one to uncover a number of insights. In the first place, we can see that the Histories consistently feature more “clusters” of characters than do Comedies or Tragedies. That is to say, while Comedies tend to be wildly interconnected affairs, Histories tend to include many small, isolated groups of characters that interact rather little with each other.  Looking at the gender dynamics of these groups, we can also see that in Comedies such as Merry Wives of Windsor and Histories such as Richard III and Henry V, female characters tend to appear on stage together, almost creating a coherent collective over the course of the play.

Finally, a number of female characters—such as Queen Margaret in 2 Henry VI and Adrianna in Comedy of Errors—appear on stage more frequently than any other character in their respective plays, despite the fact that they say fewer words than their respective plays' most outspoken characters. That is to say, their visual presence on stage is disproportionate to their verbal presence on stage. This raises a number of questions: To what extent were female characters meant to fulfill the role of a spectacle in Shakespearean drama? It’s difficult to imagine that the male players who acted as females projected authentic feminine voices. Did the limitations of imitative speech help mitigate the number of lines given to these prominent female characters? These and other questions remain to be explored in future work. 

* * *

The Python, Javascript, HTML and CSS used to generate the data and visualizations above is available here [link]. If you find it useful, feel free to drop me a line!

Clustering Semantic Vectors with Python

Google's Word2Vec and Stanford's GloVe have recently offered two fantastic open source software packages capable of transposing words into a high dimension vector space. In both cases, a vector's position within the high dimensional space gives a good indication of the word's semantic class (among other things), and in both cases these vector positions can be used in a variety of applications. In the post below, I'll discuss one approach you can take to clustering the vectors into coherent semantic groupings. 

Both Word2Vec and GloVe can create vector spaces given a large training corpus, but both maintain pretrained vectors as well. To get started with ~1GB of pretrained vectors from GloVe, one need only run the following lines:

wget http://www-nlp.stanford.edu/data/glove.6B.300d.txt.gz
gunzip glove.6B.300d.txt.gz

If you unzip and then glance at glove.6B.300d.txt, you'll see that it's organized as follows:

the 0.04656 0.21318 -0.0074364 [...] 0.053913
, -0.25539 -0.25723 0.13169 [...] 0.35499
. -0.12559 0.01363 0.10306 [...] 0.13684
of -0.076947 -0.021211 0.21271 [...] -0.046533
to -0.25756 -0.057132 -0.6719 [...] -0.070621
[...]
sandberger 0.429191 -0.296897 0.15011 [...] -0.0590532

Each new line contains a token followed by 300 signed floats, and those values appear to be organized from most to least common. Given this ready format, it's fairly straightforward to get straight to clustering!

There are a variety of methods for clustering vectors, including density-based clustering, hierarchical clustering, and centroid clustering. One of the most intuitive and most commonly used centroid-based methods is K-Means. Given a collection of points in a space, K-Means uses a Hunger Games style random lottery to pick a few lucky points (colored green below), then assigns each of the non-lucky points to the lucky point to which it's closest. Using these preliminary groupings, the next step is to find the "centroid" (or geometric center) of each group, using the same technique one would use to find the center of a square. These centroids become the new lucky points, and again each non-lucky point is again assigned to the lucky point to which it's closest. This process continues until the centroids settle down and stop moving, after which the clustering is complete. Here's a nice visual description of K-Means [source]:

left.gif

To cluster the GloVe vectors in a similar fashion, one can use the sklearn package in Python, along with a few other packages:

from __future__ import division
from sklearn.cluster import KMeans 
from numbers import Number
from pandas import DataFrame
import sys, codecs, numpy

It will also be helpful to build a class to mimic the behavior of autovivification in Perl, which is essentially the process of creating new default hash values given a new key. In Python, this behavior is available through collections.defaultdict(), but the latter isn't serializable, so the following class is handy. Given an input key it hasn't seen, the class will create an empty list as the corresponding hash value:

class autovivify_list(dict):
        '''Pickleable class to replicate the functionality of collections.defaultdict'''
        def __missing__(self, key):
                value = self[key] = []
                return value

        def __add__(self, x):
                '''Override addition for numeric types when self is empty'''
                if not self and isinstance(x, Number):
                        return x
                raise ValueError

        def __sub__(self, x):
                '''Also provide subtraction method'''
                if not self and isinstance(x, Number):
                        return -1 * x
                raise ValueError

We also want a method to read in a vector file (e.g. glove.6B.300d.txt) and store each word and the position of that word within the vector space. Because reading in and analyzing some of the larger GloVe files can take a long time, to get going quickly one can limit the number of lines to read from the input file by specifying a global value (n_words), which is defined later on:

def build_word_vector_matrix(vector_file, n_words):
        '''Read a GloVe array from sys.argv[1] and return its vectors and labels as arrays'''
        numpy_arrays = []
        labels_array = []
        with codecs.open(vector_file, 'r', 'utf-8') as f:
                for c, r in enumerate(f):
                        sr = r.split()
                        labels_array.append(sr[0])
                        numpy_arrays.append( numpy.array([float(i) for i in sr[1:]]) )

                        if c == n_words:
                                return numpy.array( numpy_arrays ), labels_array

        return numpy.array( numpy_arrays ), labels_array

Scikit-Learn's implementation of K-Means returns an object (cluster_labels in these snippets) that indicates the cluster to which each input vector belongs. That object doesn't tell one which word belongs in each cluster, however, so the following method takes care of this. Because all of the words being analyzed are stored in labels_array and the cluster to which each word belongs is stored in cluster_labels, the following method can easily map those two sequences together:

def find_word_clusters(labels_array, cluster_labels):
        '''Read the labels array and clusters label and return the set of words in each cluster'''
        cluster_to_words = autovivify_list()
        for c, i in enumerate(cluster_labels):
                cluster_to_words[ i ].append( labels_array[c] )
        return cluster_to_words

Finally, we can call the methods above, perform K-Means clustering, and print the contents of each cluster with the following block:

if __name__ == "__main__":
        input_vector_file = sys.argv[1] # The Glove file to analyze (e.g. glove.6B.300d.txt)
        n_words           = int(sys.argv[2]) # The number of lines to read from the input file
        reduction_factor  = float(sys.argv[3]) # The desired amount of dimension reduction 
        clusters_to_make  = int( n_words * reduction_factor ) # The number of clusters to make
        df, labels_array  = build_word_vector_matrix(input_vector_file, n_words)
        kmeans_model      = KMeans(init='k-means++', n_clusters=clusters_to_make, n_init=10)
        kmeans_model.fit(df)

        cluster_labels    = kmeans_model.labels_
        cluster_inertia   = kmeans_model.inertia_
        cluster_to_words  = find_word_clusters(labels_array, cluster_labels)

        for c in cluster_to_words:
                print cluster_to_words[c]
                print "\n"

The full script is available here. To run it, one needs to specify the vector file to be read in, the number of words one wishes to sample from that file (one can of course read them all, but doing so can take some time), and the "reduction factor", which determines the number of clusters to be made. If one specifies a reduction factor of .1, for instance, the routine will produce n*.1 clusters, where n is the number of words sampled from the file. The following command reads in the first 10,000 words, and produces 1,000 clusters:

python cluster_vectors.py glove.6B.300d.txt 10000 .1

The output of this command is the series of clusters produced by the K-Means clustering:

[u'Chicago', u'Boston', u'Houston', u'Atlanta', u'Dallas', u'Denver', u'Philadelphia', u'Baltimore', u'Cleveland', u'Pittsburgh', u'Buffalo', u'Cincinnati', u'Louisville', u'Milwaukee', u'Memphis', u'Indianapolis', u'Auburn', u'Dame']

[u'Product', u'Products', u'Shipping', u'Brand', u'Customer', u'Items', u'Retail', u'Manufacturer', u'Supply', u'Cart', u'SKU', u'Hardware', u'OEM', u'Warranty', u'Brands']

[u'home', u'house', u'homes', u'houses', u'housing', u'offices', u'household', u'acres', u'residence']

[...]

[u'Night', u'Disney', u'Magic', u'Dream', u'Ultimate', u'Fantasy', u'Theme', u'Adventure', u'Cruise', u'Potter', u'Angels', u'Adventures', u'Dreams', u'Wonder', u'Romance', u'Mystery', u'Quest', u'Sonic', u'Nights']

I'm currently using these word clusters for fuzzy plagiarism detection, but they can serve a wide variety of purposes. If you find them helpful for a project you're working on, feel free to drop me a note below!

Cross-Lingual Plagiarism Detection with Scikit-Learn

Oliver Goldsmith, one of the great poets, playwrights, and historians of science from the Enlightenment, was many things. He was "an idle, orchard-robbing schoolboy; a tuneful but intractable sizar of Trinity; a lounging, loitering, fair-haunting, flute-playing Irish ‘buckeen.’" He was also a brilliant plagiarist. Goldsmith frequently borrowed whole sentences and paragraphs from French philosophes such as Voltaire and Diderot, closely translating their works into his own voluminous books without offering so much as a word that the passages were taken from elsewhere. Over the last several months, I have worked with several others to study the ways Goldsmith adapted and freely translated these source texts into his own writing in order to develop methods that can be used to discover crosslingual text reuse. By outlining below some of the methods that I have found useful within this field of research, the following post attempts to show how automated methods can be used to further advance our understanding of the history of authorship.

Sample Training Data

In order to identify the passages within Goldsmith's corpus that were taken from other writers, I decided to train a machine learning algorithm to differentiate between plagiarisms and non-plagiarisms. To distinguish between these classes of writing, John Dillon and I collected a large number of plagiarized and non-plagiarized passages within Goldsmith's writing, and provided annotations to identify whether the target passage had been plagiarized or not. Here are a few sample rows from the training data:

French SourceGoldsmith TextPlagiarism
Bothwell eut toute l'insolence qui suit les grands crimes. Il assembla les principaux seigneurs, et leur fit signer un écrit, par lequel il était dit expressément que la reine ne se pouvait dispenser de l'éspouser, puisqu'il l'avait enlevée, et qu'il avait couché avec elle.Bothwell was possessed of all the insolence which attends great crimes: he assembled the principal Lords of the state, and compelled them to sign an instrument, purporting, that they judged it the Queen's interest to marry Bothwell, as he had lain with her against her will.1
Histoire c'est le récit des faits donnés pour vrais; au contraire de la fable, qui est le récit des faits donnés pour faux.In the early part of history a want of real facts hath induced many to spin out the little that was known with conjecture.0
La meilleure maniere de connoître l'usage qu'on doit faire de l' esprit, est de lire le petit nombre de bons ouvravrages de génie qu'on a dans les langues savantes & dans la nôtre.The best method of knowing the true use to be made of wit is, by reading the small number of good works, both in the learned languages, and in our own.1
Comme il y a en Peinture différentes écoles, il y en a aussi en Sculpture, en Architecture, en Musique, & en général dans tous les beaux Arts.A school in the polite arts, properly signifies, that succession of artists which has learned the principles of the art from some eminent master, either by hearing his lessons, or studying his works.0
Des étoiles qui tombent, des montagnes qui se fendent, des fleuves qui reculent, le Soleil & la Lune qui se dissolvent, des comparaisons fausses & gigantesques, la nature toûjours outrée, sont le caractere de ces écrivains, parce que dans ces pays où l'on n'a jamais parlé en public.Falling stars, splitting mountains, rivers flowing to their sources, the sun and moon dissolving, false and unnatural comparisons, and nature everywhere exaggerated, form the character of these writers; and this arises from their never, in these countries, being permitted to speak in public.1

Given this training data, the goal was to identify some features that commonly appear in Goldsmith’s plagiarized passages but don’t commonly appear in his non-plagiarized passages. If we could derive a set of features that differentiate between these two classes, we would be ready to search through Goldsmith’s corpus and tease out only those passages that had been borrowed from elsewhere.

Feature Selection: Alzahrani Similarity

Because a plagiarized passage can be expected to have language that is similar but not necessarily identical to the language used within the plagiarized source text, I decided to test some fuzzy string similarity measures. One of the more promising leads on this front was adapted from the work of Salha M. Alzahrani et al. [2012], who has produced a number of great papers on plagiarism detection. The specific similarity measure adapted from Alzahrani calculates the similarity between two passages (call them Passage A and Passage B) in the following way:

def alzahrani_similarity( a_passage, b_passage ):

    # Create a similarity counter and set its value to zero
    similarity = 0

    # For each word in Passage A
    for a_word in a_passage:

        # If that word is in Passage B
        if a_word in b_passage:

            # Add one to the similarity counter
            similarity += 1

        # Otherwise,
        else:

            # For each word in Passage B
            for b_word in b_passage:    

                # If the current word from Passage A is a synonym of the current word from Passage B,
                if a_word in find_synonyms( b_word ):

                    # Add one half to the similarity counter
                    similarity += .5
                    break

    # Finally, divide the similarity score by the number of words in the longer passage
    return similarity / max( len(a_passage), len(b_passage) )

To prepare the data for this algorithm, I used the Google Translate API to translate French texts into English, the Big Huge Labs Thesaurus API to collect synonyms for each word in Passage B, and the NLTK to clean the resulting texts (dropping stop words, removing punctuation, etc.). Once these resources were prepared, I used an implementation of the algorithm described above to calculate the "similarity" between the paired passages in the training data. As one can see, the similarity value returned by this algorithm discriminates reasonably well between plagiarized and non-plagiarized passages:

alzahrani_aggregate.png

The y-axis here is discrete--each data point represents either a plagiarized pair of passages (such as those in the training data discussed above), or a non-plagiarized pair of passages. The x-axis is really the important axis. The further to the right a point falls on this axis, the greater the length-normalized similarity score for the passage pair. As one would expect, plagiarized passages have much higher similarity scores than non-plagiarized passages.

In order to investigate how sensitive this similarity method is to passage length, I iterated over all sub-windows of n words within the training data, and used the same similarity method to calculate the similarity of the sub-window within the text. When n is five, for instance, one would compare the first five words of Passage A to the first five from Passage B. After storing that value, one would compare words two through six from Passage A to words one through five of Passage B, then words three through seven from Passage A to words one through five of Passage B, proceeding in this way until all five-word windows had been compared. Once all of these five-word scores are calculated, only the maximum score is retained, and the rest are discarded. The following plot shows that as the number of words in the sub-window increases, the separation between plagiarized and non-plagiarized passages also increases:

Each facet within this plot represents a subwindow length, and each point within that facet represents the maximum observed similarity value for a single pair of passages. As the length of the subwindows increases, the distance between plagiarized and non-plagiarized passages also increases.

Each facet within this plot represents a subwindow length, and each point within that facet represents the maximum observed similarity value for a single pair of passages. As the length of the subwindows increases, the distance between plagiarized and non-plagiarized passages also increases.

Feature Selection: Word2Vec Similarity

Although the method discussed above provides helpful separation between plagiarized and non-plagiarized passages, it reduces word pairs to one of three states: equivalent, synonymous, and irrelevant. Intuitively, this model feels limited, because one senses that words can have degrees of similarity. Consider the words small, tiny, and humble. The thesaurus discussed above identifies these terms as synonyms, and the algorithm described above essentially treats the words as interchangeable synonyms. This is slightly unsatisfying because the word small seems more similar to the word tiny than the word humble.

To capture some of these finer gradations in meaning, I called on Word2Vec, a method that uses backpropagation to represent words in high-dimensional vector spaces. Once a word has been transposed into this vector space, one can compare a word's vector to another word's vector and obtain a measure of the similarity of those words. The following snippet, for instance, uses a cosine distance metric to measure the degree to which tiny and humble are similar to the word small:

from gensim.models.word2vec import Word2Vec
from sklearn.metrics.pairwise import cosine_similarity        

# Load the Google pretrained word vectors
model = Word2Vec.load_word2vec_format('../google_pretrained_word_vectors/GoogleNews-vectors-negative300.bin.gz', binary=True)

# Obtain the vector representations of three words
v1 = model[ "small" ]
v2 = model[ "tiny" ]
v3 = model[ "humble" ]

# Measure the similarity of "tiny" and "humble" to the word "small"
for v in [v2,v3]:
    print cosine_similarity(v1, v)

Running this script returns [[ 0.71879274]] and [[ 0.29307675]] respectively, which is to say Word2Vec can recognize that the word small is more similar to tiny than it is to humble. Because Word2Vec allows one to calculate these fine gradations of word similarity, it does a great job calculating the similarity of passages from the Goldsmith training data. The following plot shows the separation achieved by running a modified version of the "Alzahrani algorithm" described above, using this time Word2Vec to measure word similarity:

word2vec_aggregate.png

As one can see, the Word2Vec similarity measure achieves very promising separation between plagiarized and non-plagiarized passage pairs. By repeating the subwindow method described above, one can identify the critical value wherein separation between plagiarized and non-plagiarized passages is best achieved with a Word2Vec similarity metric:

word2vec_subwindows.png

Feature Selection: Syntactic Similarity

Much like the semantic features discussed above, syntactic similarity can also serve as a clue of plagiarism. While a thoroughgoing pursuit of syntactic features might lead one deep into sophisticated analysis of dependency trees, it turns out one can get reasonable results by simply examining the distribution of part of speech tags within Goldsmith's plagiarisms and their source texts. Using the Stanford Part of Speech (POS) Tagger's French and English models, and a custom mapping I put together to link the French POS tags to the universal tagset, I transformed each of the paired passages in the training data into a POS sequence such as the following: 

[(u'Newton', u'NNP'), (u'appeared', u'VBD'),...,(u'amazing', u'JJ'), (u'.', u'.')]
[(u'Newton', u'NPP'), (u'parut', u'V'),...,(u'nouvelle:', u'CL'),(u'.', u'.')]

Using these sequences, two similarity metrics were used to measure the similarity between each of the paired passages in the training data. The first measure (on the x-axis below) simply measured the cosine distance between the two POS sequences; the second measure (on the y-axis below) calculated the longest common POS substring between the two passages. As one would expect, plagiarized passages tend to have higher values in both categories:

The x-axis in this plot indicates the value obtained when one measures the cosine similarity between the part of speech vectors of the relevant paired passages; the y-axis indicates the value obtained when one measures the longest common part of speech substring for the relevant paired passages. Points are jittered to allow one to see the full data set, and stat_ellipse is plotted at the 90% level.

Classifier Results

From the similarity metrics discussed above, I selected a bare-bones set of six features that could be fed to a plagiarism classifier: (1) the aggregate "Alzahrani similarity" score, (2) the maximum six-gram Alzahrani similarity score, (3) the aggregate Word2Vec similarity score, (4) the cosine distance between the part of speech tag sets, (5) the longest common part of speech string, and (6) the longest contiguous common part of speech string. Those values were all represented in a matrix format with one pair of passages per row and one feature per column. Once this matrix was prepared, a small selection of classifiers hosted within Python's Scikit Learn library were chosen for comparison. Cross-classifier comparison is valuable, because different classifiers use very different logic to classify observations. The following plot from the Scikit Learn documentation shows that using a common set of input data (the first column below), the various classifiers in the given row classify that data rather differently:

plot_classifier_comparison_001.png

In order to avoid prejudging the best classifier for the current task, half a dozen classifiers were selected and evaluated with hold one out tests. That is to say, for each observation in the training data, all other rows were used to train the given classifier, and the trained classifier was asked to predict whether the left-out observation was a plagiarism or not. Because this is a two class prediction task (each observation either is or is not an instance of plagiarism), the baseline success rate is 50%. Any performance below this baseline would be worse than random guessing. Happily, all of the classifiers achieved success rates that greatly exceeded this baseline value:

Generally speaking, precision values were higher than recall, perhaps because some of the plagiarisms in the training data were fuzzier than others. Nevertheless, these accuracy values were high enough to warrant further exploration of Goldsmith's writing. Using the array of features discussed above and others to be discussed in a subsequent post, I tracked down a significant number of plagiarisms that were not part of the training data, including the following outright translations from the Encyclopédie:

French SourceGoldsmith Text
Il n'est point douteux que l' Empire , composé d'un grand nombre de membres très-puissans, ne dût être regardé comme un état très-respectable à toute l'Europe, si tous ceux qui le composent concouroient au bien général de leur pays. Mais cet état est sujet à de très-grands inconvéniens: l'autorité du chef n'est point assez grande pour se faire écouter: la crainte, la défiance, la jalousie, regnent continuellement entre les membres: personne ne veut céder en rien à son voisin: les affaires les plus sérieuses les plus importantes pour tout le corps sont quelquefois négligées pour des disputes particulieres, de préséance, d'étiquette, de droits imaginaires d'autres minuties.It is not to be doubted but that the empire, composed as it is of several very powerful states, must be considered as a combination that deserves great respect from the other powers of Europe, provided that all the members which compose it would concur in the common good of their country. But the state is subject to very great inconveniences; the authority of the head is not great enough to command obedience; fear, distrust, and jealousy reign continually among the members; none are willing to yield in the least to their neighbours; the most serious and the most important affairs with respect to the community, are often neglected for private disputes, for precedencies, and all the imaginary privileges of misplaced ambition.
L' Eloquence , dit M. de Voltaire, est née avant les regles de la Rhétorique, comme les langues se sont formées avant la Grammaire.Thus we see, eloquence is born with us before the rules of rhetoric, as languages have been formed before the rules of grammar.
L' empire Germanique, dans l'état où il est aujourd'hui, n'est qu'une portion des états qui étoient soûmis à Charlemagne. Ce prince possédoit la France par droit de succession; il avoit conquis par la force des armes tous les pays situés depuis le Danube jusqu'à la mer Baltique; il y réunit le royaume de Lombardie, la ville de Rome son territoire, ainsi que l'exarchat de Ravennes, qui étoient presque les seuls domaines qui restassent en Occident aux empereurs de Constantinople.The empire of Germany, in its present state is only a part of those states that were once under the dominion of Charlemagne. This prince was possessed of France by right of succession: he had conquered by force of arms all the countries situated between the Baltic Sea and the Danube. He added to his empire the kingdom of Lombardy, the city of Rome and its territory, together with the exarchate of Ravenna, which were almost the only possessions that remained in the West to the emperors of Constantinople.
Il n'est point de genre de poésie qui n'ait son caractere particulier; cette diversité, que les anciens observerent si religieusement, est fondée sur la nature même des sujets imités par les poëtes. Plus leurs imitations sont vraies, mieux ils ont rendu les caracteres qu'ils avoient à exprimer....Ainsi l'églogue ne quitte pas ses chalumeaux pour entonner la trompette, l' élégie n'emprunte point les sublimes accords de la ly