Archival Follies, History and Historians, Now in Actual Work

Ex Readex: Redux

Or, the world in a grain of ads

You’ll recall that in my last I wondered “What am I getting wrong?” — a big question, for sure, with many and varied answers, as friends, acquaintances and passer-by would be happy to tell you. But in this case I was specifically concerned with what I was misunderstanding about the search results I was receiving from a Readex database, America’s Historical Newspapers.

Well, you’ll be pleased to know that Readex, in the person of their marketing director, David Loiterstein, was kind enough to get in touch by e-mail and tell me exactly that. And the answer? Granularity.

Basically, the AHN database does not consistently break down advertising sections at the same level of granularity; it has changed over time. As David explained:

Initially, particularly for the 18th century in which the first series of newspapers was so heavily concentrated, we identified individually every advertisement on every page; however, in later series multiple contiguous advertisements were identified in groups.

So: sometimes individual ads count as individual “articles,” sometimes a multiple-ad block count as one, and sometimes entire columns of ads count as one unit; and the granularity of the ads goes down, generally speaking, over time. Which means that my results — which included all article types, including ads — were skewed by the ways ads are counted.

David provided a graph of his own, illustrating this effect, and suggesting a way to get clear of it (reproduced here with permission):
AdBlocker
I’ll let him explain:

This approach seen above—in which advertisements are isolated and an aggregate number of the other article types is counted separately—provides a more representative measure of available “texts.” While the data does in fact indicate fewer “articles” available between 1820 and 1850 in what is otherwise a steady increase in articles available between 1690 and 1819 and between 1850 and 1922. The declining number of ads as a percentage of “articles” or “text” is a result not of fewer ads but the changing approach by which we identify them.

Thus, practically speaking, if you want to get some kind of a baseline for how representative a given search’s results are, you’re going to have to sacrifice including ads in those search results. Not ideal, of course, but much better than not knowing what your results mean. In addition to responding directly to this specific question, David also mentioned that Readex was working to update the Readex Help section, and fix the discrepancy between the two portals I had noticed.

So where does this leave us?

Well, with a much better understanding of how one of the most important databases in Early American historical research functions, for which I am grateful to David and his colleagues for their quick response and kind explanation.

I would note, though, that even using the new numbers, the curve still shows an unexpected dip in the 1820s and 1830s — the heart of the Jacksonian era, where most historians would tell you that print, and especially newspapers, exploded. As I said before, this is not something I think unique to Readex, but rather an artifact of the way many digitization projects have done triage (or, alternately, it might be proof that print output indeed declined, in which case steam-powered presses were not actually all that important in the development of American democracy! But let’s hope not, as then we’d have to revise a lot of historiography…).

In any case, all good factors to keep in mind when trying to use large collections to buttress claims about relative representativeness, ubiquity, or uniqueness. And now on to new and exciting problems…

History and Historians, Now in Actual Work

Ex Readex: Not Much?

Or, Caveat NewsBank

UPDATE: See the subsequent post for the thrilling reveal!

In harmony with one of the recent memes floating around the world of digital history — the happy attention to some of what historians don’t know about database design, how particular databases are missing parts of texts, within particular series, and proposals for how we might directly address this issue, as a collective, I thought it might be worthwhile to add my own experience to the pile.

Briefly stated: one of the standard databases, Readex’s America’s Historical Newspapers, seems to have a shockingly low number of texts available for the Jacksonian-to-Antebellum period — more, even, than their own product descriptions (which emphasize the coverage of particular years) would lead you to believe. Here’s a picture:

(see below for a table with the raw numbers)

But before I get too far in here, let me emphasize the caveat: I say “seems” for a couple of reasons.

First and foremost: I may be completely misunderstanding something about how searches work in this database. The y-axis on the above graph is the number of “hits” a blank or wildcard (* or ?) search in the fulltext field of the database returned for 5 year intervals (blanks and wildcards returned equal numbers). This may or may not be the same as the number of “documents” (articles) or “images” in the database; though I should think it would be.

Second, the results I’m getting seem to run counter to some of the statistics Readex itself provides about the component databases searched by “America’s Historical Newspapers.” See, for example, what they say about the number of images in each of the seven (7!) component series of “Early American Newspapers” in the product description. The numbers of images available seem out of whack with what you’d expect…but since these are such big dates ranges, it could be that what I’ve found is still true for this period; I don’t know. There also seems to be a discrepancy between the these figures and those produced on different search pages available on the Readex site.

On another level, though, all this jibes with something I’ve long suspected — the digitization of print materials from the U.S. follows an uneven U-shaped curve, where the trough is roughly 1800 to 1850. Broadly speaking, it seems like every possible scrap of material from the colonial and revolutionary era has been digitized, extending, in the case of the Founder’s Paper’s projects (e.g. Rotunda) far into manuscript materials. Then, just as the print explosion begins in the U.S., digitized materials drop off, picking up again with the Civil War, and increasing as we approach the 20th-century. That seems to be borne out here (or, at least as far as I was willing to go with the data entry).

This curve is in many ways totally understandable; there are fewer colonial and revolutionary periodicals, so why not be complete about it? And obviously there is more interest in the more recent past (perhaps the post-war stuff is digitized because it’s close enough that it might be good for local histories and genealogical work?). But on another level, it’s troubling; especially given how historians are beginning to use this and like databases to talk about the appearance of particular terms. Comprehensiveness, esp. relative comprehensiveness really matters there.

That’s how I happened on this case. I came across this oddity while trying to control for changes in the size of the database while tracking changes in the occurrence of a particular set of terms.1

What really shocks me about the numbers I’ve pulled out of AHN — which is, to my knowledge, far and away the most comprehensive database for this period there is — was how much the absolute number of articles scanned is lower over time. I figured, at best, that the coverage was reduced only in terms of geographic range, or narrowed by a focus on particular publishers; only New York, Philadelphia and Boston well-covered, for example, and not the vast West and South. But apparently (and again, I want to emphasize the tentative nature of my conclusion here), that was dead wrong.

The upshot: for given values of the “Early Republic,” digitization is still a ways away, and we should not trust any database’s comprehensiveness — even if, at first glance (or, in my case, continuous usage over aargh, years) seems to suggest that it contains a lot of material.

Okay, so now some blegs: Any thoughts on this? What am I getting wrong? As I said, I can’t help but think this puts a major crimp in what we can use these databases for, in terms of reliability — but I’d be glad to have any mistakes I’m making here pointed out, the sooner the better.

1.) If you’re interested, the string I was searching was this horrible stew of syntax:

(“East Indies” OR “East India” OR “East Indian” OR China OR Chinese OR Orient OR Orient*) NEAR25(specie OR silver OR dollar? OR currency OR circulati*) AND (trade OR commerce) NEAR25(specie OR silver OR dollar? OR currency OR circulati*) AND (drai* OR expor*) NEAR25(specie OR silver OR dollar? OR currency OR circulati*)

Suggestions on how to improve that monster would very welcome.

2.)There is also a discrepancy between two portals to search the Readex newspaper database. When I’ve searched only newspapers from the Archive of Americana portal, I consistently get higher returns than if I had searched America’s Historical Newspapers directly. The difference is potentially significant — in the period 1835-1839, AHN returns 1,702,150 hits compared to AA’s 1,933,685, a difference of 231,535, or 13.6%.

I’m not sure why this is so; the two searches say they are tapping into the same databases, to wit:

AA’s search says it includes:

Early American Newspapers, Series 1 (1690 – 1876), Early American Newspapers, Series 2 (1758 – 1900), Early American Newspapers, Series 3 (1829 – 1922), Early American Newspapers, Series 4 (1756 – 1922), Early American Newspapers, Series 5 (1777 – 1922), Early American Newspapers, Series 6 (1741 – 1922), Early American Newspapers, Series 7 (1773 – 1922), Hispanic American Newspapers (1808 – 1980), African American Newspapers, 1827-1998 (1827 – 1998) and Ethnic American Newspapers from the Balch Collection (1808 – 1980).

While AHN’s claims:

Early American Newspapers Series 1 – 7, 1690-1922; African American Newspapers, 1827-1998; Ethnic American Newspapers from the Balch Collection, 1799-1971; Hispanic American Newspapers, 1808-1980 and Selected Historical Newspapers.

That seems comparable to me. If anything, the AHN search should include more, what with the inclusion of “Selected Historical Newspapers.”

I’m planning to e-mail the Readex people to find out what’s going on — and what I might be missing — but any suggestions in the meantime are welcome.


Raw Numbers

(Note: these figures come from searches performed using the AHN portal, not the AA portal)

Years

Total “Hits” (articles?)

1795-1799

3,626,530

1800-1804

4,422,965

1805-1809

5,041,412

1810-1814

4,838,756

1815-1819

6,449,231

1820-1824

3,856,979

1825-1829

2,338,139

1830-1834

1,991,623

1835-1839

1,702,150

1840-1844

1,907,799

1845-1849

2,398,359

1850-1854

2,682,211

1855-1859

2,762,811

1860-1864

2,757,069

1865-1869

3,725,627

1870-1874

4,531,278

1875-1879

4,566,376

1880-1884

5,015,152

1885-1889

6,958,484

1890-1894

9,701,775

1895-1899

11,397,028

Now in Actual Work

As Threatened, er, Promised

Or, Not Pervasive, but maybe Persuasive or Practical?

So here’s what I’ve come up with as an op-ed proposal. It lacks a strong policy argument, but hopefully uses that perspective trick to good effect.

For the forgetful, here’s the prompt again:

a proposal for a New York Times opinion piece which applies a major finding from your research to a current public policy problem. … it must describe a full op-ed that you might write, and explain its relevance to current events.

Any and all thoughts heartily welcomed.

~~~

“Not so Fast, We’ve Been Here Before”: An Op-Ed Proposal

In 1841, an ex-President and former Secretary of State declared his support for British forces in the “Opium War,” Britain’s war with China over Chinese trade restrictions and closed markets. Though many commentators, then and now, cited the opium trade as the casus belli, John Quincy Adams told a Boston audience that the motive went deeper : “The cause of the war is the Ko-tow! – the arrogant and insupportable pretensions of China, that she will hold commercial intercourse with the rest of mankind, not upon terms of equal reciprocity, but upon the insulting and degrading forms of the relation between lord and vassal.” In Adams’s view, the political despotism of China’s government found its worst expression in illiberal trade policies; and that these restrictions on foreign merchants, Americans prominently among them, justified war.

More recently, another Secretary of State gave a speech calling for all nations to recognize a basic “freedom to connect” to the internet. Made in light of Google’s decision to stop censoring search results in China, Secretary Hillary Clinton’s remarks were a pointed rebuke of Chinese policy. Condemning government censorship of the internet, Secretary Clinton argued that “from an economic standpoint, there is no distinction between censoring political speech and commercial speech.” By linking political and economic liberty together, and critiquing China on both fronts, Clinton’s remarks strongly echo Adams’s speech of almost 170 years before.

This op-ed will argue that U.S. officials would do well to understand the deep historical resonance of American calls for economic and political liberty in China. Though Chinese censorship is indefensible, an awareness of how American calls for reform in China themselves spring from complicated roots in national economic interest and Western imperialism can only improve Sino-American relations.


Image cite: The Suss-Man (gone for the weekend), “Project 366 – 78/366 Diplomacy,” Flickr, CC License

Now in Actual Work

Pervasive, Persuasive, and Practical

Or, What’s a Paradigm Worth These Days?

I.

Recently I’ve found myself completely blocked on a writing assignment.(1.) It’s for a fellowship application; the host institution brings together historians and social scientists under the rubric of understanding and influencing government policy, so it’s a bit of a chimera in terms of disciplinary focus.

The assignment in question calls for:

a proposal for a New York Times opinion piece which applies a major finding from your research to a current public policy problem. … it must describe a full op-ed that you might write, and explain its relevance to current events.

Some words pop out there, no? “Relevance,” “current events,” “a major finding from your research”… you can see how those might bring a historian to a standstill.

It’s not that I don’t want my research to be relevant or au courant. Quite the opposite. Here’s the problem, though: drawing big lessons, lessons big enough to cross time and space, is pretty much the antithesis of dissertation work, and, I think, historical thinking more generally.

Dissertations are about the super-specific. Historians are too, in a way: we’re in the business of explaining the unique, the contingent, the transformative event (or series of events). When context is king, the work is, by definition, not portable.

When I’ve heard historians explain the practical aspects of their research, it usually hinges on perspective. The past is a foreign country, they say, they do things differently there — and we can learn from that. History teaches us about the oddly contingent and jury-rigged origins of things in our own world — what I think of “naming the monster,” the fantasy/horror/folklore trope that knowing the name of a devil gives you the power to exorcise it, a technique being used to very good effect in the history of sexuality and gender at the moment (think of the difference between “marriage the eternal traditional bulwark of human society,” and “marriage the socially constructed category that is always changing” in a courtroom, and I think you’ll see what I mean). Likewise, the foreignness of the past, especially the past of one’s own culture, is an object lesson in how diverse human institutions, motives, and actions are (or rather, were).

In a practical sense, then, historians usually explain their work as the building blocks for something new — by reminding us of what possibilities once existed (a form of naming the monster) — or, more commonly, as a caution against hubris and self-satisfaction. Both are exercises in perspective; knowing where you came from, and what other choices there are out there.

These are good lessons, I think. But it doesn’t get you very far to figuring out what early American ideas about the China trade can say about public policy today.

II.

The always-interesting Tim Burke has been ruminating on a related topic lately. Thinking on the practical bases for popular anti-intellectualism, he’s frustrated with the answers his fellow humanists have come up to explain the value of their knowledge. What’s important about knowing about Hawthorne, or the Constitutional Convention, anyway?

That this is a question at all is, in part, due to the success of the humanist project over the last half-century or so, and the collapse of what Burke terms “ramrod” forms cultural authority — not a bad thing, on balance (“good riddance,” Burke says). But the problem of how to explain the value of this kind of knowledge remains: “educators haven’t arrived at a substitute rationale that’s both persuasive and pervasive.”

Burke argues that this value can be demonstrated in a couple of different ways. One is through sheer enthusiasm for the subject — but passion is hard to instill through training, and even more difficult to generalize. Another answer comes out of the literacy (aka “critical thinking skills”) that humanist work teaches. Burke describes this explanation as a focus on “practicality.”

This is a new iteration of the very old idea that humanist knowledge enriches the storehouse of the mind; Burke’s spin is novel in that it is focused on the problems of a information-rich age, where the ability to “read” in different media and environments, and make judgements about that content — which is now far more important than accumulating content itself (that’s easy).

Any way you put it, though, the ends are the same: a richer, more well-lived life:

Cultural and historical literacy enriches your rhetorical and interpersonal skills. It helps you imagine other people, which is the key to so very much in life: to love well, to raise children well, to live in community well, to self-develop, to choose when and how to fight for yourself and your beliefs.

III.

Burke’s solution to the problem of finding ways to make humanist knowledge relevant is, I think, just a more broadly stated version of the historian’s go-to answer for the value of historical work. But instead of using specific content to demonstrate perspective, it’s the literacy and rhetorical skills developed through repeated efforts of that sort that provide the value.

Perhaps not precisely relevant to my problem of figuring out how my research is relevant to the theoretical readers of my op-ed piece. But thinking in terms of pervasive, persuasive, and practical is a good start. You can decide for yourselves how well my actual proposal meets that standard tomorrow.

To be continued…


1.) A shocking revelation from a blogger who has quarter-long gaps between posts, I know.

2.)Tim Burke,”Hester Prynne, Schmester Prynne, or Sarah Palin’s Ressentiment Clubhouse,”Easily Distracted, 19 January 2010.

Image cite: Gabriela Camerotti, “Practical Magic,” Flickr, CC License

Archival Follies, Now in Actual Work

Wheatonesque

Or, Nineteenth-Century Natural History and Geopolitics Go Together Like…*

Frame

Henry Wheaton was a busy man in 1843. Aside from his official duties as U.S. Minister to Prussia – which included everything from issuing passports and entertaining visiting Americans to more serious affairs like preparing for a treaty negotiations with the Zolleverein, the German Customs Union – he was also intensely engaged in writing reports, as a hobby.

And not just a few. In 1843, Wheaton wrote at least ten reports for the National Institution for the Promotion of Science – aka the “National Institute” – a Washington-based organization that sought to : “to promote Science and the Useful Arts, and to establish a National Museum of Natural History, &c. &c.”

Wheaton’s contributions to the Institute fell firmly in the “&c. &c.” category. Though best known for his legal work – he was the first professional reporter for the Supreme Court, and wrote the standard treatises on international and maritime law – his reports for the National Institute trace a wider circle, and depart significantly from the then-standard definitions of “scientific and useful arts.” He wrote absolutely no treatises on New England ferns or Great Lake mollusks (all popular topics with the Washington professionals cum amateur scientists that made up the bulk of the Institute’s membership), which probably accounts for his failure to get the Institute to help publish his work.

Instead, he wrote on a bewildering array of subjects, including:

The geography of Central Asia; the revival of Greek tragedy in Prussia; German canals; the state of the fine arts in Denmark ; the character of Frederick the Great; the last days of the Emperor Charles V; the genius and labors of Liebniz; the life and writings of Diderot; the Panama canal; the history of the reformation in Germany; Egyptian Antiquities, and the Ptolemaic canal across the Isthmus of Suez.

If we ignore the Teutonic flavor of some of the reports (likely the result of his location and occupation; he had been a diplomat in Prussia since 1835), a striking pattern emerges. Almost uniquely among the corresponding members of the National Institute, Wheaton was concerned with history, culture, art – and, above all commerce and geopolitics.
Continue reading “Wheatonesque”