Archival Follies, Our Glorious National Heritage

A House Divided Will Not Subscribe

Or, The Damn Thing Is All Ads Anyway

As those of you who are so unlucky as to follow me on Twitter know already (twitter being what I’ve been distracting myself in between bouts of what I’ll call, for the sake of argument “writing”), of late I’ve been mucking through Congressional records.

Yes, yes, I see you nodding off, but listen: this time it’s different. This time I’m bushwhacking through the annals of the First Congress. The beginning one!

The timing lends the even the most boring speeches and bills a brassy burnished halo. The Era of Washington! The birth of our empire, and all our liberties! Days when spirits were brave, the stakes were high, men were real [republican] men, women were real [republican] mothers, and small furry creatures from [the Indies] were real small furry creatures from [the Indies], to paraphrase my favorite Adams.

Good times.

Right, where was I? Ah yes, mucking through annals. Well, today I ran across something that makes me think that — age of heroes or no — there never was a newspaper printer with sound marketing sense.

Consider, if you will, the following passage from the journal of William Maclay, a delightfully cantankerous one-term U.S. senator from Pennsylvania:

nothing clever to see here

Maybe I’m misunderstanding Maclay here, but were the local printers really trying to drum up business by scamming members of Congress? Hoping a politician will pay you for services unordered…well that seems a bit daft. Moreover, there’s the question of subscriber base. The combined houses of Congress, at this point, consisted of about ninety members* — hardly a sustainable audience. And once the House voted down subscribing to anything…this seems like it got perverse right quick, no? And if cash wasn’t the goal, that’s even worse; this was decidedly not the group most likely to be swayed by hacky political commentary — or interested in advertisements, either.

Seems to me like the printers of the Early Republic operated on the same principle as all the (failing) local newspaper publishers who insist on stacking eternally unread issues like cord-wood on my stoop every morning. I doubt it worked any better then…

*It was early days. Not every state got their act together to send representatives on time…


Source:

William Maclay, Journal of William Maclay, United States Senator from Pennsylvania, 1789-1791, ed. Edgar S. Maclay (New York: D. Appleton and Company, 1890), 64. The passage appears in the entry for June 3, 1790.

Archival Follies, History and Historians, Now in Actual Work

Ex Readex: Redux

Or, the world in a grain of ads

You’ll recall that in my last I wondered “What am I getting wrong?” — a big question, for sure, with many and varied answers, as friends, acquaintances and passer-by would be happy to tell you. But in this case I was specifically concerned with what I was misunderstanding about the search results I was receiving from a Readex database, America’s Historical Newspapers.

Well, you’ll be pleased to know that Readex, in the person of their marketing director, David Loiterstein, was kind enough to get in touch by e-mail and tell me exactly that. And the answer? Granularity.

Basically, the AHN database does not consistently break down advertising sections at the same level of granularity; it has changed over time. As David explained:

Initially, particularly for the 18th century in which the first series of newspapers was so heavily concentrated, we identified individually every advertisement on every page; however, in later series multiple contiguous advertisements were identified in groups.

So: sometimes individual ads count as individual “articles,” sometimes a multiple-ad block count as one, and sometimes entire columns of ads count as one unit; and the granularity of the ads goes down, generally speaking, over time. Which means that my results — which included all article types, including ads — were skewed by the ways ads are counted.

David provided a graph of his own, illustrating this effect, and suggesting a way to get clear of it (reproduced here with permission):
AdBlocker
I’ll let him explain:

This approach seen above—in which advertisements are isolated and an aggregate number of the other article types is counted separately—provides a more representative measure of available “texts.” While the data does in fact indicate fewer “articles” available between 1820 and 1850 in what is otherwise a steady increase in articles available between 1690 and 1819 and between 1850 and 1922. The declining number of ads as a percentage of “articles” or “text” is a result not of fewer ads but the changing approach by which we identify them.

Thus, practically speaking, if you want to get some kind of a baseline for how representative a given search’s results are, you’re going to have to sacrifice including ads in those search results. Not ideal, of course, but much better than not knowing what your results mean. In addition to responding directly to this specific question, David also mentioned that Readex was working to update the Readex Help section, and fix the discrepancy between the two portals I had noticed.

So where does this leave us?

Well, with a much better understanding of how one of the most important databases in Early American historical research functions, for which I am grateful to David and his colleagues for their quick response and kind explanation.

I would note, though, that even using the new numbers, the curve still shows an unexpected dip in the 1820s and 1830s — the heart of the Jacksonian era, where most historians would tell you that print, and especially newspapers, exploded. As I said before, this is not something I think unique to Readex, but rather an artifact of the way many digitization projects have done triage (or, alternately, it might be proof that print output indeed declined, in which case steam-powered presses were not actually all that important in the development of American democracy! But let’s hope not, as then we’d have to revise a lot of historiography…).

In any case, all good factors to keep in mind when trying to use large collections to buttress claims about relative representativeness, ubiquity, or uniqueness. And now on to new and exciting problems…

History and Historians, Now in Actual Work

Ex Readex: Not Much?

Or, Caveat NewsBank

UPDATE: See the subsequent post for the thrilling reveal!

In harmony with one of the recent memes floating around the world of digital history — the happy attention to some of what historians don’t know about database design, how particular databases are missing parts of texts, within particular series, and proposals for how we might directly address this issue, as a collective, I thought it might be worthwhile to add my own experience to the pile.

Briefly stated: one of the standard databases, Readex’s America’s Historical Newspapers, seems to have a shockingly low number of texts available for the Jacksonian-to-Antebellum period — more, even, than their own product descriptions (which emphasize the coverage of particular years) would lead you to believe. Here’s a picture:

(see below for a table with the raw numbers)

But before I get too far in here, let me emphasize the caveat: I say “seems” for a couple of reasons.

First and foremost: I may be completely misunderstanding something about how searches work in this database. The y-axis on the above graph is the number of “hits” a blank or wildcard (* or ?) search in the fulltext field of the database returned for 5 year intervals (blanks and wildcards returned equal numbers). This may or may not be the same as the number of “documents” (articles) or “images” in the database; though I should think it would be.

Second, the results I’m getting seem to run counter to some of the statistics Readex itself provides about the component databases searched by “America’s Historical Newspapers.” See, for example, what they say about the number of images in each of the seven (7!) component series of “Early American Newspapers” in the product description. The numbers of images available seem out of whack with what you’d expect…but since these are such big dates ranges, it could be that what I’ve found is still true for this period; I don’t know. There also seems to be a discrepancy between the these figures and those produced on different search pages available on the Readex site.

On another level, though, all this jibes with something I’ve long suspected — the digitization of print materials from the U.S. follows an uneven U-shaped curve, where the trough is roughly 1800 to 1850. Broadly speaking, it seems like every possible scrap of material from the colonial and revolutionary era has been digitized, extending, in the case of the Founder’s Paper’s projects (e.g. Rotunda) far into manuscript materials. Then, just as the print explosion begins in the U.S., digitized materials drop off, picking up again with the Civil War, and increasing as we approach the 20th-century. That seems to be borne out here (or, at least as far as I was willing to go with the data entry).

This curve is in many ways totally understandable; there are fewer colonial and revolutionary periodicals, so why not be complete about it? And obviously there is more interest in the more recent past (perhaps the post-war stuff is digitized because it’s close enough that it might be good for local histories and genealogical work?). But on another level, it’s troubling; especially given how historians are beginning to use this and like databases to talk about the appearance of particular terms. Comprehensiveness, esp. relative comprehensiveness really matters there.

That’s how I happened on this case. I came across this oddity while trying to control for changes in the size of the database while tracking changes in the occurrence of a particular set of terms.1

What really shocks me about the numbers I’ve pulled out of AHN — which is, to my knowledge, far and away the most comprehensive database for this period there is — was how much the absolute number of articles scanned is lower over time. I figured, at best, that the coverage was reduced only in terms of geographic range, or narrowed by a focus on particular publishers; only New York, Philadelphia and Boston well-covered, for example, and not the vast West and South. But apparently (and again, I want to emphasize the tentative nature of my conclusion here), that was dead wrong.

The upshot: for given values of the “Early Republic,” digitization is still a ways away, and we should not trust any database’s comprehensiveness — even if, at first glance (or, in my case, continuous usage over aargh, years) seems to suggest that it contains a lot of material.

Okay, so now some blegs: Any thoughts on this? What am I getting wrong? As I said, I can’t help but think this puts a major crimp in what we can use these databases for, in terms of reliability — but I’d be glad to have any mistakes I’m making here pointed out, the sooner the better.

1.) If you’re interested, the string I was searching was this horrible stew of syntax:

(“East Indies” OR “East India” OR “East Indian” OR China OR Chinese OR Orient OR Orient*) NEAR25(specie OR silver OR dollar? OR currency OR circulati*) AND (trade OR commerce) NEAR25(specie OR silver OR dollar? OR currency OR circulati*) AND (drai* OR expor*) NEAR25(specie OR silver OR dollar? OR currency OR circulati*)

Suggestions on how to improve that monster would very welcome.

2.)There is also a discrepancy between two portals to search the Readex newspaper database. When I’ve searched only newspapers from the Archive of Americana portal, I consistently get higher returns than if I had searched America’s Historical Newspapers directly. The difference is potentially significant — in the period 1835-1839, AHN returns 1,702,150 hits compared to AA’s 1,933,685, a difference of 231,535, or 13.6%.

I’m not sure why this is so; the two searches say they are tapping into the same databases, to wit:

AA’s search says it includes:

Early American Newspapers, Series 1 (1690 – 1876), Early American Newspapers, Series 2 (1758 – 1900), Early American Newspapers, Series 3 (1829 – 1922), Early American Newspapers, Series 4 (1756 – 1922), Early American Newspapers, Series 5 (1777 – 1922), Early American Newspapers, Series 6 (1741 – 1922), Early American Newspapers, Series 7 (1773 – 1922), Hispanic American Newspapers (1808 – 1980), African American Newspapers, 1827-1998 (1827 – 1998) and Ethnic American Newspapers from the Balch Collection (1808 – 1980).

While AHN’s claims:

Early American Newspapers Series 1 – 7, 1690-1922; African American Newspapers, 1827-1998; Ethnic American Newspapers from the Balch Collection, 1799-1971; Hispanic American Newspapers, 1808-1980 and Selected Historical Newspapers.

That seems comparable to me. If anything, the AHN search should include more, what with the inclusion of “Selected Historical Newspapers.”

I’m planning to e-mail the Readex people to find out what’s going on — and what I might be missing — but any suggestions in the meantime are welcome.


Raw Numbers

(Note: these figures come from searches performed using the AHN portal, not the AA portal)

Years

Total “Hits” (articles?)

1795-1799

3,626,530

1800-1804

4,422,965

1805-1809

5,041,412

1810-1814

4,838,756

1815-1819

6,449,231

1820-1824

3,856,979

1825-1829

2,338,139

1830-1834

1,991,623

1835-1839

1,702,150

1840-1844

1,907,799

1845-1849

2,398,359

1850-1854

2,682,211

1855-1859

2,762,811

1860-1864

2,757,069

1865-1869

3,725,627

1870-1874

4,531,278

1875-1879

4,566,376

1880-1884

5,015,152

1885-1889

6,958,484

1890-1894

9,701,775

1895-1899

11,397,028

History and Historians, Power At Play, The Past is a Foreign...Something

Triumphant Return! Et L’Affaire Cronon

 

It’s Been A While

As Spring threatens to return, I find my thoughts turning once more (as do those of so many rapidly middle-aging historians) to blogging. I know, gentle readers, that I’ve left you without terrible puns and alliterative link dumps for far too long; the Goose Commerce thread in your RSS reader is, no doubt, covered in dust, mites, and then more dust. And, may I say: that’s disgusting.

But awake! Or at least, don’t delete. I’m back! And plan to post at least weekly here until I lose interest again.1

So, to business…

As I’m sure you’re aware, the newest shiny debate in PastLand is L’affaire Cronon, aka the Wisconsin Republican party’s bizarre attack on one of my favorite authors, William Cronon (Mr. Nature’s Metropolis). The AHA has a full roundup on everything you need to catch yourself up

There’s been a lot of commentary, obviously, but for my own purposes the most interesting include those smart things said about the wider legal context of this attack at the egregiously inappropriately-named AmericanScience blog: Part 1, Part 2.

As for what the heck Cronon himself is up to, the best read I’ve seen so far is what Ben Schmidt, professional history’s own Nate Silver, has said over at the wonderful and informative Sapping Attention. I agree with all that Schmidt says2 : seems like the deliberative democracy shoe, consciously consensual and wholly impractical, is what fits.

While I admire Cronon’s position – especially given that he is ascending to a the highest honorific position within the guild, usually not a place that one achieves by making political waves – I can’t say that I agree with his theory of politics. I side with Martin Van Buren: we need parties, and partisanship, to make the system go; playing the center (ideologically, philosophically) is a fool’s game. Conflict is a feature, not a bug: because people just disagree, that’s why.

Which still leaves us with the problem of establishing and policing standards of discourse: so maybe Prof. Cronon has the right idea after all.

In any case, I look forward to making more of these uninformed comments in the future! Now back to actual work for a change.


Image: law_keven, “Do you think he’s alive???…..” Flickr, CC License

1.) Hey, if I’m nothing if not realistic.

2.) Save the bit about Changes in the Land being the better book: it’s good, but clearly, Nature’s Metropolis is in every way more interesting.

History and Historians

What If Historians Wrote the News?


Shit happened.1


Cf. Greg Marx, “Embrace the Wonk,” Columbia Journalism Review (May/June 2010); Chris Beam, “The Only Politics Article You’ll Ever Have to Read: What if political scientists covered the news?Slate, 4 June 2010; Conor Friedersdorf, “It Depends Who Writes the News,” True/Slant, 7 June 2010; Jonathan Chait, “A Sociologist Covers the News,” The New Republic, 7 June 2010; and Georg Wilhelm Friedrich Hegel, trans. John Sibree, Philosophy of History (New York: American Home Library Company, 1902), §4:66.

Image cite: Frank Wuestefeld, “Shit Happens,” Flickr, CC License