An excerpt from the Social Science Statistics blog:

I was reminded again the other day that the word “data” is plural, since it means more than one “datum”, and thus “data” requires a plural verb. The Economist style guide says so, as does the European Union translation manual. The Oxford English Dictionary doesn’t even have an entry for “data,” subsuming it under “datum,” and it identifies sentences with singular constructions as “irregular or confused usage.”

End of story, right? Maybe, maybe not. There are a couple of problems with the “data is the plural of datum” story. (These have been discussed widely on the web, and I’m drawing freely on those discussions). First, it is not quite right even in Latin to say that “data” is the plural of the singular count noun “datum”; both are conjugations of the verb dare, to give. Second, in English, we hardly ever refer to one piece of data as a datum; at least in political science it is an observation, a case, or perhaps a data point. When the word datum is used, it usually has a specialized meaning and takes the plural form “datums.”

The bigger problem, from my perspective, is that fully adhering to “data” as a plural count noun forces you into constructions like

How many data are enough?

instead of

How much data is enough?

The first of these “How many data are…” is correct for a plural count noun, while the second, “How much data is…” is appropriate for a mass noun such as “gold” or “water.” The second sentence sounds much better to me. It also wins on a Google Scholar search by a margin of 10 to 1 (2120 to 198). There are also about 400 hits for “How much data are…”, no doubt from those who want to treat “data” as a mass noun but have been reminded that “data is plural.” It seems to me that data has come to be like the mass nouns described in this post from Language Log:

A great many M nouns denote collectivities of things, but small things, especially small things whose indivual identities are not usually important to us: CORN, RICE, BARLEY, CHAFF, CONFETTI, etc. Some of these contrast minimally with C nouns of similar denotations, like BEAN, PEA, LENTIL. In any case, it would be easy to think of barley in “The barley was almost cooked” as “meaning more than one” in much the same way as lentils in “The lentils were almost cooked” does — and in fact, every so often someone misidentifies little-thing M nouns as “plural”.

I kind of like the idea of data as a collection of small things that aren’t that important to us as individual objects but that are meaningful when taken together.