31 August 2007

Secure Shell (ssh)

Nearly always the Secure Shell is referred to by its Unix acronym, ssh. While the ssh is analogous to other Unix shells, it's actually part of a more ambitious system to manage peer-to-peer (P2P) communications between client-server nodes. In fact, ssh is more commonly known as a secured communications system, for which the shell interface is a minor part.

Except for a Microsoft product with the same name, ssh is freeware developed in Finland; it facilitates web authoring by allowing the transmission of data in encrypted form. The encryption method involves two "keys," or number generating programs; one is public while the other is private. When a user logs in to a host, the ssh program tells the server which key pair it would like to use for authentication. The server checks if this key is permitted, and if so, sends the user a challenge, a random number, encrypted by the user's public key. The challenge can only be decrypted using the proper private key. The user's client then decrypts the challenge using the private key, proving that it knows the private key but without disclosing it to the server.

In order to actually implement ssh for communication, the host needs to install the ssh daemon (sshd). sshd is the daemon that listens for connections from clients. It is normally started at boot from /etc/rc. It forks a new daemon for each incoming connection. The forked daemons handle key exchange, encryption, authentication, command execution, and data exchange. Today, when you need to configure a web host to support a content management software (CMS), such as a wiki engine, you will most likely need to use an ssh client.

RESOURCES: Kimmo Suominen, "Getting Started with SSH"; "S 5.64 Secure Shell," Bundesamt für Sicherheit in der Informationstechnik (2004); Ka Chun Leung, "Using SSH," Linux.ie (2002); man page for ssh;

Labels: ,

26 August 2007

Songs from the Portuguese

My poet, thou canst touch on all the notes
God set between His After and Before,
And strike up and strike off the general roar
Of the rushing world a melody that floats
In a serene air purely. Antidotes
Of medicated music, answering for
Mankind's forlornest uses, thou canst pour
From thence into their ears. God's will devotes
Thine to such ends, and mine to wait on thine.
How, Dearest, wilt thou have me for most use?
A hope, to sing by gladly? or a fine
Sad memory, with thy songs to interfuse?
A shade, in which to sing---of palm or pine?
A grave, on which to rest from singing? Choose.

Elizabeth Barrett Browning (1845)

A bit of personal context: this was written by Elizabeth Barrett in 1844-45, during her very secret courtship of Robert Browning. Those of you who are fond of early Victorian literature will no doubt remember the eccentric, tyrannical father. Well, Elizabeth had one of these; Edward Moulton Barrett, a former sugar planter from Jamaica, was deadset opposed to the marriage of any of his children, and his eldest child Elizabeth had fragile health. She was already a famous poet and Greek translator when she met Robert in '45. They married and moved to Italy, where they lived until her death 16 years later.

Elizabeth composed the poems for Robert's eyes only, but he was convinced they constituted the finest series of sonnets since Shakespeare, and persuaded her to publish them. She agreed, but referred to them as translations from the Portuguese, rather than her own works.


23 August 2007


An Introduction to KLEMS
(Part 1)

An apparent alternative to the customary two-factor production function, at least for purposes of research, is the KLEMS methodology. KLEMS stands for capital, labor, energy, materials, and [business] services. It is used by government agencies to measure multifactor productivity growth; simply put:

The parts of this equation are:
  • is the annual increase in total factor productivity;
  • is the annual increment in output;
  • is the annual increment in capital inputs;
  • is the annual increment in labor inputs;
  • is the annual increment in energy, materials, and business services.
In each case, the contribution is weighted (w) for the presumed contribution of each to actual output. The calculation of wk, wl, and wip is critical, and but quite simple: it's the average of the factor share of income for t and (t - 1). In other words, supposing labor's share was 32.3% in '03, and 31.9% in '02, then wk would be 32.1% for calculating the growth of MFP in the period mentioned.

Using the Bureau of Economic Advisors' '05 Annual Industry Accounts (PDF), table 2, p.11, one can see that output is divided into actual value added (most recently, 55.8%), of which 31.9% was compensation of employees, 3.8% was net taxes, and 20.1% was "gross operating surplus" (or wk + profit—the two are not differentiated in KLEMS accounting). The remainder, 44.2% of GDP, was intermediate inputs of all kinds, including energy (1.9%), materials (17.2%), and purchased service inputs (25.1%).

I was frankly astonished at the low value of we, although it must be noted that much energy consumed would not appear in the ledger as an input; it's a consumer good (crude oil and PNG are material inputs when acquired by, say, an oil refinery; see p.2b).

KLEMS data has been collected on the US economy since 1947, and attracted some fascinating research in the EU (see EU KLEMS project linked below). As expected, this includes studies on the validity of standard production functions normally used in DGE models of the economy. Incidentally, production functions do not use the same method of calculating weights as does KLEMS. KLEMS simply assumes inputs contribute what they are paid. But formally, inputs to an economy will be reimbursed on the basis of their marginal revenue product; it is often the case that some industrial sectors will experience both a high degree of market concentration for output (oligopoly) and a similarly high degree of market concentration for input (oligopsony). When this happens, factor remuneration may be much lower than their contribution to output. While that's not likely to be a huge influence on factor pay for the entire economy, and not for very mobile factors, it does play a role in studies of capital-labor switching within economic sectors. Economists therefore use other methods for researching the contribution of factors to output, mainly through regression analysis of economic growth in different settings.1

According to Houseman (PDF; linked in part 1), KLEMS is fundamentally flawed because of its assumption that factors are paid their actual contribution: Houseman cites the methods of data collection, which rely on employer surveys to measure expenditures on business services (the largest part of IP, above), then forces a match with census data on industry outputs for those same services. The inevitable deficit in expenditures was then distributed among all industrial sectors of the US economy based on the total output of each industrial sector. Moreover, KLEMS data breaks business services into six categories:
  1. temporary help services
  2. employee leasing services
  3. security guards and patrol services
  4. office administrative services
  5. facility support services,
  6. nonresidential building cleaning services
To generate I-O estimates at a more disaggregated commodity level, it was assumed that industrial sectors utilized all contract labor services in the same proportion. For instance, if an industry was estimated to use 10 percent of all contract labor services, it was assumed to use 10 percent of each of the component contract services. The six categories are thus assigned in uniform proportions on the basis of industry output, despite the well-known fact that manufacturing is a heavy employer of temporary help (35-40% of all temps worked in manufacturing).

The other objection Houseman has is that the the equation at the top of the entry reflects a stable equilibrium model, not the dynamic general equilibrium (DGE) model. As she explains in pp.13ff, a shift to outsourced labor (either Ford's use of temps and Cisco's use of Chinese R&D) results in a prolonged but transitional effect of reduced labor productivity, but since the now-outsourced labor is measured as an intermediate service, the loss of labor productivity is suppressed. Put another way, outsourcing is a method of substituting low-cost labor (especially that with a low value of eψu) for capital, but instead of appearing on the ledger as lower labor productivity, less labor is reported being used. Productivity of labor, as reported, will depend on the arbitrary matter of the institutional relationship.

I was also very disappointed in the limited role of energy utilization in measuring efficiency. My entire interest was to examine US adaptation to soaring prices of non-renewables, but when energy inputs are handled as <2%,>NOTES:
1 The estimation of factor shares is a hot-button issue, partly because of the semantics of human capital. My source on growth accounting with human capital is Charles I. Jones, Introduction to Economic Growth, W.W. Norton & Co. (1998), chapter 3.1: "The Solow Model with Human Capital," which is mainly based on Mankiw, Romer, and Weil's "A Contribution to the Empirics of Economic Growth" (1992).

Usually the baseline of analysis is either the Swan-Solow Classical Growth model; immediately after introduction, professors teaching this model nearly always divide Y (GDP) by labor L to get the intensive form y of the equation. Then all attention is focused on estimating how much of y is caused by technology (A) and how much by capital (K/L = k). In Mankiw, et. al., we get introduce the term H (skilled labor), which is
H = eψuL
where u is the amount of time spent learning a skill and ψ is an empirically determined natural log of return to time spent assimilating that skill. According to Mankiw, et. al., including this in a regression of comparative international data leads to a very good fit.

Labels: ,

14 August 2007


Productive Factors in DGE Economics

In economics classes and the vast majority of monographs I've read on recent economic theory, it's common to refer to two factors of production: capital (K) and labor (L). In the late 19th century, this was very important since there was a lot of polemics over the correct theory of value, with the conservatives of the day insisting on the utility theory of value. Much more recently, we have seen the Solow-Swan Classical Growth Theory, in which the Keynesian theories of the business cycle were supplemented with, then replaced by, a comprehensive model of capital and technology (A) accumulation.

This has typically been disappointing to me precisely because it seemed to me that a model of the economy in which there was a single universal production function Y = f(A, K, L) would yield only certain results. Bear in mind that we're always interested in output per worker (Y/L, or y), which is always assumed to be a function of capital per worker (K/L, or k). Some modern theories of economic growth are described as exogenous, such as Solow's; they are "exogenous" in the sense that they believe the main determinant of economic growth, A, is something outside of the economic model. "Endogenous growth theory," in contrast, focuses on the tendency of capital accumulation to cause A directly. Both theories ran into serious problems with respect to international comparisons. Exogenous growth implied that the difference between countries was the result of capital accumulation, but capital accumulation in the richest nations of the OECD is actually not much larger than that of low-income countries such as the Philippines or India. Nor was this an anomaly of the present day; today, of course, saving and investment in the least-developed countries (LDC's) surpasses that of, say, the USA (where net saving is negative).

Conversely, endogenous growth theory merely consisted of assuring us that there were increasing returns to scale of capital investment; by allowing any exponent on capital that would fit the data, the endogenous growth theorists came up with the most Panglossian view imaginable of economic development. They suffered from the logical dilemma that small, island nations like Singapore often responded better to capital accumulation than large, integrated regions (like Western Europe). While Western Europe, taken as a unit, is very affluent, and K is huge, its network spillovers ought to be larger than Singapore's. As everyone knows, the opposite is true; not only that, but investors in Europe and Japan are keen on exporting capital as if they—i.e., "the market" for investment opportunities—knew better.

Also, I was well aware that some dissident economists had objected to the obsession with human inputs to the industrial economy. Labor is superabundant; economists are usually expected to make sure the supply of that is utilized. Capital is always treated as a component of prior output, or,
K = ∑(Yt-i - Ct-i)(1 - δ)i, for i = 1 --> ∞.
In the equation above, δ is the rate of capital depreciation; in each year of the past t - i, output was Yt-i and consumption was Ct-i; so ∆Kt-i is always the difference between the two, and it will hereafter depreciate at (1 - δ)i where i is how many years ago ∆Kt-i occurred. While i may be as large as ∞, (1 - δ)i would make any capital accumulated before 1968 worth about 1% of its real value in '68.

A problem, though, is that this implies that nothing bad can happen to the economy, provided the immense stock of capital survives (and the world population doesn't shrink). What about peak oil? What about significant changes in climate that reduce farm output? And, for workers, what about corner solutions in which the full employment of all non-labor resources (renewables, energy, and capital) leaves much labor unemployed?

(Part 2)

ADDITIONAL READING & SOURCES: Susan Houseman "Outsourcing, Offshoring, and Productivity Measurement in Manufacturing" (PDF), Upjohn Institute Staff Working Paper No. 06-130 (June 2006); Harold Cole & Lee Ohanian, "The Great Depression in the United States from a Neoclassical Perspective" (PDF), Federal Reserve Bank of Minneapolis (1999);

On the labor theory of value: Albert C. Whitaker, "History and Criticism of the Labor Theory of Value in English Political Economy" (PDF) Stanford University (1904); for an introduction to the utility theory of value, I have to recommend William Stanley Jevons, The Theory of Political Economy (5th edition), IV (complete text). As you can see, Whitaker's work of 1904 was a piece of historical research; Jevons introduced the alternative "utility theory" in 1870. The whole issue of value theory is hence pretty antiquarian. Or so you'd think; but Joseph A. Schumpeter, in his History of Economic Analysis, III.6.2, seems to think it's the basis of any system of economic analysis. Polemically, it means a lot to him, although he argues that one should not draw any polemical conclusions from any theory of value!

Schumpeter is correct when he says one ought not to draw any polemical conclusions from one's theory of value; but the logical corollary is that one therefore ought to use theories of values like domain-specific programming languages, a proposition that would no doubt cause Schumpeter to turn several different shades of magenta.

Labels: , ,

12 August 2007

Appendix to Ennumeration Problem: Sexual Orientation Statistics

(Main article)

Folk traditions on sexual orientation are difficult to generalize, but according to official Canadian statistics on the matter, 1.3% of men and 0.7% of women self-identified as homosexual; 0.9% of women and 0.6% of men self-identified as bisexual. The US Center for Disease Control (CDC) has conducted studies of sexual orientation (the Bureau of the Census has not); according to this 2005 report (PDF),
Approximately 1 percent of men and 3 percent of women 15–44 years of age have had both male and female sexual partners in the last 12 months (table B). Among females, 5.8 percent of teens and 4.8 percent of females 20–24 years of age had had both male and female partners in the last 12 months; percentages were lower at ages 25–44.

The percentages of men and women who reported that they think of themselves as homosexual or bisexual are roughly equal at 4.1 percent. This represents about 2.27 million men and 2.29 million women 18–44 years of age
[4c; stats for sexual attraction suggested nearly identical ratios of sexual orientation, etc. between men and women]
It's interesting to note the CDC statistics tend to rely less on survey techniques ("self-reporting") and more on construction from medical records. It's also interesting to note that CDC research assumes much greater ambiguity in sexual orientation, but also differs from nearly all systems of self-reporting in (a) higher incidence of homosexual orientation than other official surveys, and (b) a higher incidence of homosexuality among women than men.

The well-known Kinsey Report is usually taken to mean that "10% of the population are gay," which is based on the Kinsey rating system (1 = exclusively hetero, 7 = exclusively gay, 2-6 are gradients in between); this refers to findings that 11.6% of white males (ages 20-35) were given a rating of 3 (about equal heterosexual and homosexual experience/response) throughout their adult lives. Even accepting the Kinsey Report findings as accurate, that would still be using a "one-drop" rule for homosexuality. In fact, the Kinsey Reports have serious problems as statistical references on sexuality, including very high proportions of prison inmates, male prostitutes, and (naturally) people willing to discuss the topics.

Mathematics of Sexual Partners
In assessing the explanation of sexual partner estimates, one possible explanation is that the immense disparity in reports by men and women reflects a greater number of same-sex relationships among men. Using a recent study by the National Center for Health Statistics (NCHS), men reported an average of seven partners, while women reported an average of four. Now, an obvious point to acknowledge is that the men reported an average of 7 women, while the women reported an average of four men. However, let's pretend that objection doesn't exist and plow ahead, since I'm merely illustrating a mathematical point.

The total number of relationships between men and women must be equal, since each female relationship with a man corresponds to precisely one male relationship with a woman. So the difference is RM - RF = 3. This is average number of same-sex relationships that men have in excess of the number experienced by women over the course of their lifetime.
2Rmm + Rmf = 7
2Rff + Rmf = 3
The pair of linear equations above features 3 unknowns; setting Rff = 0, Rmf = 3 and 2Rmm = 4, so Rmm = 2, or 28% of all male relationships. That is, to put it mildly, a bit high, especially since the disparity for the USA is rather low. Other international comparisons report ratios of >3:1, mostly in Latin countries. For the UK, it's 2. But based on the information above, it's reasonable to suspect at least some of the women's relationships are lesbian (Rff). How many? According to my calculations, the mean probable interval between lesbian partners is 4.14 years (with some possible overlap). According to the same reporting, the MPI between gay partners is 0.84 years (again, with some possible overlap). Thus, we can gather that the somewhat-smaller "pure lesbian" community reports about one-fifth the number of lesbian relationships, but I have no way of estimating the number of lesbian relationships among bisexual women, or gay relationships among bisexual men. So, while I'm already taking this exercise much too seriously, let's just assume 5Rff = Rmm:
10Rff + Rmf = 7
2Rff + Rmf = 3
Therefore Rff = 0.5, Rmf = 2, and Rmm = 2.5, which means 35% of male relationships are gay.

All this is totally silly, of course, since I'm already ignoring the fact that we have much more detailed self-reporting in the much-cited CDC study. For example, based on prior experience with sex surveys, and the well-known "pairwise paradox," the 2005 CDC study mainly focused on sexual activity in the last 12 months. While it's worth pointing out that the 12-month study period has a much closer mutual correspondence (e.g., 14.8% of men had no sexual partner vs. 13.9% of women; 62.2% of men had exactly one, vs. 66.8% of women; 17.6% of men had >1 partner, vs. 12.7% of women), there are still a few curious disparities: of those reporting two or more partners—which, incidentally, includes those who transitioned from one relationship to another in the last 12 months—3.1% of women reported having partners of both sexes, vs. 1% of men.
ADDITIONAL SOURCES & READING: William D. Mosher, Anjani Chandra, & Jo Jones , "Sexual Behavior and Selected Health Measures: Men and Women 15–44 Years of Age, United States" (PDF), Division of Vital Statistics, CDC (2002); Statistics Canada, "Canadian Community Health Survey" (2004);

A search of Kinsey Institute publications suggests that the word "homosexual" was last used in 1990.

Labels: ,

Enumeration Strategies

P6 alludes to a NYT article about studies on self-reporting of sexual partners. It's extremely common for studies of this type to report that "men have x partners, women have y partners," where x >> y. This is mathematically impossible, which I would think should be obvious. I've heard this sort of statistic repeated ad nauseum, and I've long accepted its unchallenged acceptance as more proof that people either hear evidence of their irrational prejudices, or they tune out and get indignant.
The Myth, the Math, the Sex (Gina Kolata): Everyone knows men are promiscuous by nature. It’s part of the genetic strategy that evolved to help men spread their genes far and wide. The strategy is different for a woman, who has to go through so much just to have a baby and then nurture it. She is genetically programmed to want just one man who will stick with her and help raise their children.

Surveys bear this out. In study after study and in country after country, men report more, often many more, sexual partners than women.

One survey, recently reported by the federal government, concluded that men had a median of seven female sex partners. Women had a median of four male sex partners. Another study, by British researchers, stated that men had 12.7 heterosexual partners in their lifetimes and women had 6.5.
Well, this is impossible. Moreover, forget about prostitution: it holds even in countries where laws against prostitution are not only strictly enforced, but also where prostitution is actually not tenable. But try telling that to commentator "Solar Soul":
It's not mathematically impossible for women to be less promiscuous than men. You just need a small pool of promiscuous women sleeping with a larger pool of promiscuous men. If five promiscuous girls at your high school slept with twenty promiscuous guys, out of a population of 100 guys and 100 girls, then 5% of the girls would be promiscuous, and 20% of the guys would be promiscuous. Sometimes, I really wonder what a PhD is worth.
First, the response is clearly guided by irritation: "I may not know anything about math, but I know women are chaste and men are sex-glutted scumbags." Second, in the example, provided (the high school), Solar Soul is comparing the incidence of arbitrarily-defined "promiscuous ones" in the population of 100 men and 100 women. The example would work slightly better, incidentally, if it used 15 "promiscuous girls" and 60 "promiscuous guys" since at least that way, a majority of the men are "promiscuous" and the majority of women still aren't. It's still far of the mark, though, because the average number of partners is the same for both sexes... which is what the NYT article is all about. It's erroneous to compare modes to modes unless the mode is what you care about, which isn't the case here.

Let's turn now to a (perhaps excessively) serious discussion of the matter:
Norman Brown: If surveys elicit accurate reports from their respondents, heterosexual men and women should, on average, report having had the same number of partners. This is because each new SP for a man is also a new SP for a woman. Thus, for a closed population, men and women must have the same number of opposite-sex SPs, and therefore should generate similar reports. This, however, is rarely the case. Instead, men typically report two to four times as many opposite-sex partners as women.

Wiederman: Rather than a small but statistically significant gender difference, the typical discrepancy in men's and women's lifetime number of sex partners is large by any definition. For example, in national samples, the mean number of sex partners for men and women, respectively, was 12.3 versus 3.3 in the United States (Smith, 1991), 9.9 versus 3.4 in Britain (Wellings et al., 1994), 11.0 versus 3.3 in France (ACSF, 1992), 10.2 versus 4.2 in New Zealand (Davis et al., 1993), and 12.5 versus 5.2 in Norway (Sundet et al., 1989). In populations that are more or less closed systems with an approximately equal ratio of men and women, such as the United States (U.S. Bureau of the Census, 1993), this apparent gender discrepancy does not make logical sense (Einon, 1994; Gurman, 1989).
The operant term here is probably "closed." In some cases, such studies do specify that the respondents are talking about heterosexual contacts; even if they weren't, we'd have to wonder about the conjecture that the massive disparity came from uniquely male homosexuality. More significantly, especially in dense urban areas, it's reasonable to conjecture that numbers of partners are statistically concentrated (like net assets). Put another way, the great majority of people anywhere have 0-2 partners in any given 12-month period, and 1-3 partners in any given five-year period. But it's possible to have a group of, say, 1% of women (sex workers) who are seldom or never surveyed, who account for the greater part of all female sexual encounters; and another group of, say, 10% of men, who have far fewer encounters than the female sex workers, but vastly more than the remaining 90% of men. These 10% would certainly be sufficiently numerous to be represented, even realistically, by a survey like the CDC's; but their partners would be statistically invisible.

This might be true. Weiderman is doubtful, based on Einon's research:
Hypersexual women and prostitutes. Several authors [] have proposed that perhaps the apparent gender discrepancy in number of sex partners is explained by existence of a small subgroup of women who have had sex with an enormous number of men. To address this possibility of a subgroup of highly experienced women who were not prostitutes, Einon (1994) analyzed data from the national samples collected in Britain [] and France [] She found no evidence for the notion that there are more atypically "hypersexual" women compared to such men (and actually found evidence for a relatively greater incidence of "hypersexual" men who reported extremely large numbers of sex partners).

What about professional prostitutes? These women presumably have large numbers of male sex partners, yet may be less likely to be included in studies using typical sampling methodology. Einon (1994) also calculated the number of different male clients that prostitutes in Britain would need to service to resolve the gender discrepancy in self-reported lifetime number of sex partners in that country.
But Brewer, et. al. (2000) contradict this finding:
Brewer, et al.: Einon (18) addressed and dismissed the prostitution explanation for the discrepancy in the British household survey (5). However, her analysis of the lifetime number of reported partners is undermined by the use of point and annual, rather than lifetime, prevalences of prostitutes, and thus does not adjust for the cumulative number of partners that all prostitutes from multiple cohorts had over respondents' lifetimes.


After adjusting for these prostitution-related factors, the ratios for the sex discrepancy in the reported number of sexual partners hover slightly above and below 1 [] indicating that prostitution can account for essentially all of the disparity.
Weiderman examines some other possible explanations in his '97 article.
Several authors have noted that, compared to women, men tend to select sex partners who are relatively younger and such a gender difference in partner choice may affect self-reported lifetime number of sex partners... In other words, as most surveys involve adult respondents (age 18 years and older), some men included in the sample have had sex with female partners who are not old enough to be included in the survey. Although this fact may explain some small degree of the gender discrepancy, it cannot explain adequately the relatively large difference between men's and women's self-reports. That is, in national surveys, men typically report approximately three times as many lifetime sex partners as do women... Preference for pre-adult sex partners explains the apparent gender discrepancy in lifetime partners only if two thirds of adult men's partners are currently younger than age 18, which is a highly unlikely scenario...

Similarly, it would seem that if men begin their sexual careers earlier than do women, men would have a longer period of time in which to accumulate sex partners. However, any such difference in onset of sexual intercourse does not explain the gender discrepancy in lifetime number of sex partners because men still have to have a female partner, regardless of the age of the male. Additionally, at least among the most recent generation of young adults, there does not appear to be a gender discrepancy in age at first experience of sexual intercourse.
In fact, according to the CDC report (PDF), distribution of sexual partners in the 15-19% cohort is almost identical for each partner count; this would contradict the principle that young boys would be more likely to exaggerate their sexual experiences than older men. We'll see how this works out later, because it turns out they still do (just not on purpose).

Brewer, et. al. (2000) actually suggest that, far from exaggerating sexual conquests, men do not report contact with prostitutes when responding to surveys.
In two different parts of the Colorado Springs interview, heterosexual men were asked about contact with prostitutes in the last 5 years, with the second question referring to prostitutes in Colorado Springs only. Eleven of the 110 clients acknowledged prostitute partners only in response to the second question, and 2 additional men who did not report contact with prostitutes were known to be clients from prostitutes' naming them specifically as clients in another part of the interview.
However, it could be argued that Brewer, et al., in their enthusiasm at cracking the case, simply widened the gap: if the female sex-workers, representing 0.023% of the human population (of Colorado Springs, but they say that's representative) account for the entire difference, and men are reluctant to report contacts with prostitutes on surveys, then it's possible that they merely uncovered a huge cesspool of underreported sexual activity involving men and prostitutes.

(Actually, they imply that they can explain pretty much any discrepancy that could have been found with prostitutes. It's like "What Stumped the Bluejays." Since it could explain nearly any number you threw at them, I tend to suspect their study for that very reason. The other problem is, the study is mainly interested in establishing that (a) prostitutes account for a staggering volume of sex, and (b) men are reluctant to tell researchers that their impressive sexual CV's are padded with alleyway tricks. If that were true, however, it seems unlikely that this would have escaped the attention of so many different researchers with different methodologies.)

In any event, much of the discrepancy does indeed apply to a small number of men with large numbers of partners. Just as with income distribution at the high end, distribution of sexual partners doesn't follow a normal distribution; if it did, long lifetimes of celibacy or people like Bertrand Morane would appear once in a billion; in reality, they are quite common. Moreover, for women, large numbers of male partners tends to blur the gender division; according to the CDC study (p.12b), 32% of women with a lifetime count of ≥15 men had had same-sex encounters as well. For men, this tended to follow the plausible pattern of older men reporting more partners; only the >40 set had a >33% likelihood of reporting >15 partners (table 10). For women, only one ninth reached that level, but they did so earlier (25-29; see table 11); and after that age, the number slightly declined (!), suggesting that older cohorts of women offset their longer careers with a lower rate of new partners. The impression one gets examining table 11 is that a steady proportion of the respondents (20%) were adamant about having only one partner their entire lives—consistent with defining female religious narratives.

At last, we return to my preferred explanation: the unintentional classification scheme.
Norman Brown: It is well established that people use multiple strategies to generate numerical estimates, that different strategies are associated with explicable characteristic biases, and that strategy use is influenced by the availability of task-relevant information and the actual magnitude of the to-be-estimated quantity [] Of particular relevance, Brown (1995, 1997) demonstrated that people asked to estimate event frequencies sometimes retrieve and count event instances (i.e., enumerate) and sometimes produce rapid intuitive estimates (i.e., rough approximations). Participants who enumerate often underestimate event frequencies because relevant instances may be permanently forgotten, because output interference causes some instances to become temporally inaccessible, and because people sometimes terminate their retrieval efforts before all relevant instances have been recalled. In contrast, participants who produce rough approximations often overestimate event frequencies. It is believed that people generate these estimates by mapping vague quantifiers (e.g., terms like "quite a few," "lots") onto a numerical response scale and that this process produces overestimation because the lower bound of the response scale is anchored but the upper bound is not (Brown, 1995).

It is conceivable that some people enumerate when reporting their number of lifetime SPs and others respond with rough approximations. If so, all else being equal, people who enumerate should produce smaller estimates than people who use rough approximations. Thus, if we assume that the mean number of SPs is the same for men and women and that men and women respond in good faith, then we should find that men rely more on strategies associated with larger estimates (e.g., rough approximation) and women rely more on those associated with smaller estimates (e.g., enumeration). If this is the case, then differential strategy use can explain the sex difference in reports of lifetime SPs.
P6 was skeptical and hooted a little at this explanation:
How is this
...Some strategies...are associated with relatively large reports, others...are associated with relatively small reports, and that men are more likely to use the former whereas women are more likely to use the latter.
Any different that this?
P6: I think men lie about how many and women lie about how few.
The difference is that men use different methods ("strategies") of answering the question than women do, since women actually enumerate and men estimate. Bear in mind that I have no idea, since my lifetime total is pretty unambiguously fixed in my mind. Brown estimated that, when asked about totals during the preceding 12 months, gender discrepancies would disappear (which they certainly did).
An examination of the written protocols revealed that participants used several different strategies to generate their SP reports.(6) The most common of these was enumeration (e.g., "Counted all the names I remembered."); collapsing across sex, 39% of the sexually-active participants stated that they arrived at their estimates by recalling each of their partners. 29% used a tally-retrieval strategy. These people indicated that they maintain a tally in memory and that they responded to the lifetime question by recalling and stating the current value of this tally (e.g., "I kept track in my diary, and I know that my boyfriend is #27."). Another 17% indicated that their estimates were rough approximations. Protocols were assigned to this category when participants indicated that they generated their responses without carefully examining the available evidence. Such estimates were often accompanied by an expression of uncertainty (e.g., "Rough guess, give or take 1 or 2 partners"). In addition to these common strategies, 11% of the participants produced protocols that were too vague to be coded (e.g., "Memory") or that included only irrelevant information, 2% used a rate-based strategy (e.g., "Avg of 5/year from 16-21, then remained monogamous."), and 1% failed to respond.


In contrast to the lifetime estimates, the past-year SP estimates provided by the sexually active men (M = 3.45) were not significantly larger than those provided by the sexually active women (M = 2.58), t(173) = 1.26, ns. This replicates a common finding in the survey literature (ACSF Investigators, 1992; Johnson et al., 1992; Laumann et al., 1994; Morris, 1993; Smith, 1992) and has two important implications. First, the past-year data argue against the possibility that the sex difference reported above arose because our participants were responding in bad faith; if they had been, there should have been a reliable sex difference for both lifetime and past-year estimates. The past-year data also address an alternative explanation for the partner discrepancy reported above. One could argue that the men in our sample were actually more experienced than the women, and that the reported difference in estimated life-time SPs merely reflected this fact. However, given that the male and female participants were about the same age, and assuming that the men in this sample had more partners than the women, a sex difference should have been apparent in the past-year estimates as well as the lifetime estimates. Because the data do not support this prediction, we conclude that it is unlikely that men and women were drawn from qualitatively different samples.
And that's the difference.

mode: in statistics, the mode is the value that appears most commonly in the set. So, for example, if you have 100 people, and 10 of those people have had >10 sexual partners each, while the remaining 90 have had anywhere from 1 to 10 evenly distributed, then you would have 9 with one, 9 with 2,... 9 with 10, and the mode would be >10 since there are 10 with more than ten. You might even be incited to remark, "The group generally has more than partners each," which would convey a very erroneous impression even if >10 is the most common number of partners.

In the example cited, the mode is >4; if there were 60 "promiscuous" men & 15 "promiscuous" women, then the mode for the men would be >4, while that for the women would be 0. In Solar Soul's original version, the mode for both is 0.

Female religious narratives: coming from an evangelical protestant background, I have a fairly large amount of experience with testimonies by women and men about their religious epiphanies. It's relatively common for men to regale me with speeches about their past, raunchy life; sometimes they exaggerate, as I was sometimes encouraged to do. I can't lie convincingly, so I just opted out. The man typically describes a life ridden with sex and drugs, with something goofy thrown in (video games come to mind), then talks about being saved by God and united with his doting wife. Perhaps it was just me, but I would always read a certain humiliation in the wife's blissful smile; she was the healthy salad with rice crackers, not the slab of steak and baked potato skins with the English pint of ale.

In contrast, the women had a narrative that was very long on descriptions of mental states, and short on a backstory of actual, you know, sin. I use my mother as a canonical example: she would refer to a period of utter moral degeneration. When I was younger, I tried to get some clues about what cosmic depths of Dennis-Hopperesque depravity she'd sunk to. After one especially purple session, she finally spilled the beans: sometimes she didn't tithe. I would like to have seen the look on my face when she said that. Since that time, I've noticed that such defining religious narratives, for women, are rigidly and scrupulously confined to what is a matter of public record.

READING & SOURCES: William D. Mosher, Anjani Chandra, & Jo Jones , "Sexual Behavior and Selected Health Measures: Men and Women 15–44 Years of Age, United States" (PDF-2005); Brewer, et al., "Prostitution and the sex discrepancy in reported number of sexual partners," The National Academy of Sciences (2000); Norman Brown, "Estimating Number of Lifetime Sexual Partners" Journal of Sex Research (1999); Michael W. Wiederman, "The truth must be in here somewhere" Journal of Sex Research (1997)

Jeff Grabmeier RE: research of Terri Fisher, "Women's sexual behaviors may be closer to men's than previously thought," Ohio State Research (2003): This article was designed to test the variance in women's and men's reporting of sexual experience under different interview regimes; generally, women were more susceptible to social pressures and context, whereas men tended to report the same thing regardless. One implication is that women tended to report larger numbers of sexual partners if they were likely to believe they were being tested for truthfulness on a polygraph. The implication is that the discrepancy reflects embarrassment about large numbers of partners, which shrinks as the embarrassment of being thought a slut is replaced by the embarrassment of being caught lying.

Labels: , ,

10 August 2007

Scam: Ecards = Storm Worm

The Storm Worm is being propagated by spam emails that one has received an ecard from a friend. I receive a lot of ecards, but fortunately my email filter correctly diagnosed this particular one as spam. It's not always reliable, though. What the spam does is mimic precisely the format of ecard notifications, and invites the mark to click on a link to view the card. It's just that the malicious email sends one to a link with a numerical domain (e.g., instead of the usual corporate domains.
InformationWeek: The Storm worm blasted computers around the globe in January. It then reappeared in February when it was used in a spam attack that lured blog, bulletin board, and Webmail users to connect to a malicious Web site. Then in April, it hit again, with the Internet Storm Center reportedly detecting at least 20,000 infections in one day.

"With administrators filtering executable attachments at the mail gateway and most e-mail clients preventing a user from opening an executable attachment, virus authors are constantly improvising to stay ahead in the game," wrote Thomas. "Social engineering -- the oldest trick in the book -- along with the fatal combination of human stupidity plus curiosity provides ample fodder for virus authors to lure new victims; the innumerable newbie users of the Internet being the low hanging fruit."

In this attack, which started in June, hackers are spamming out e-mail messages that lure people to click on links that take them to malicious Web pages. This time the e-mails purport to notify the user that someone has sent them an electronic greeting card, or e-card. It might have a subject line saying something like, "You've Received a Postcard from a Family Member." The body of the message says the user needs to click on the link to view the virtual greeting.
The Storm Worm is a downloader trojan that causes the browser to download additional malware, which, in turn, download spambots that attack http://www.microsoft.com[*] When an attachment is opened, the malware installs the wincom32 service, and injects a payload, passing on packets to destinations encoded within the malware itself. According to Symantec, it may also download and run the Trojan.Abwiz.F trojan, and the W32.Mixor.Q@mm worm.

Initially, the Storm Worm was propagated through emailed new stories with startling headlines, like "230 dead as storm batters Europe," hence the name.

SOURCES: Sharon Gaudin, "Storm Worm, Hidden In Phony E-Card Spam, Strikes Again" (July 2, 2007), and Gregg Keizer, "'Storm' Spam Surges, Infections Climb" (Jan. 22, 2007) InformationWeek

Labels: , , ,

04 August 2007

XML-4: Declarations

Disclaimer: I am not an expert on this topic, but a student. I am hoping that these notes will be of help or interest to others trying to understand what XML is and how it works. My notes are not as well-composed as I would like, and I've been more interested in correcting errors than poor composition. Moreover, there's a lot of repetition that I've trying to prune back in subsequent revisions.

From my previous post on XML:
Another important component to XML documents is the document type declaration (DTD) , which includes a list of the elements required in a valid document; or, if there are multiple kinds of documents defined, specifies what must be included in each. DTD's are not the same thing as style sheets that define the tags; DTD's define the elements that exist, what attributes are allowed, and what entities are accepted. A valid XML document has this declaration, and conforms to it. The DTD is sometimes included within the page it describes, but more frequently it makes sense for the DTD to be a file (*.dtd) that serves the whole domain. Examples of DTD code can be found here.

(Some of this passage may be missing in the revised version)
All elements need to be declared, or formally listed in the document type declaration (DTD). A DTD may included with the source document, or it may be a separate file. A DTD includes a long list of statements like , which in this case declares the element "PROGRAM." While that's necessary for a source document to be valid, and [therefore] readable to an XSL compiler, declarations require somewhat more.

The declaration above does not specify what type of data is allowed to belong to the element PROGRAM. Or rather, it does; "ANY" means anything. Ideally, however, XML documents are declared with the required elements; for example, (PERFORMANCE, REVIEW) might be the two child elements, with
declaring that PERFORMANCE includes the child elements of DATE, WORK, and PERFORMER. An element with no children declares its data type as (#PCDATA), i.e., it contains only parsed character data (no tags).

The structure above includes listings where each listing has one of the specified elements, e.g.,
means each PERFORMANCE has one DATE, one WORK, and one PERFORMER.
allows for one or more of each, and
allows for events in which there may be no WORK (as, for example, an appearance by a famous comedian) but one or more performers. Conversely, you might want to specify that there could be anywhere from zero to many of a child element; then one would declare
Otherwise, if the number of each child element is fixed—say, at four—then you include the named element four times in the declaration:
Again, one can specify if, each time an element occurs, it ought to be paired with another:
Here's another way of representing the same association of elements:
This version allows one or more of the listed items, but they must have at least one.

At this point it might be useful to illustrate once more the concept of the tree. I've edited the image a bit.

Click image to see in original context.

I've sort of vandalized the original so that you can see the idea of how XSL's component parsing system, XPath, negotiates the elements and element children in order to render XML source documents.

Turning away from the hierarchical nature of elements, there are three other concepts that need declarations spelled out. One is empty elements, such as images and line breaks.
The second is element attributes, or adjectives for elements. Element attributes are pretty simple. Here's an example I actually used in this very blog post:
<IMG SRC="http://farm2~.jpg" HEIGHT="200px" WIDTH="550px" ALT="XML Element Path" /> 
I truncated the image source attribute, just because it's so long. Images have to have a source, alternative text, and height; they may require alignment and padding. In XML, attributes need to be declared:
The ID and CDATA indicate the type of attribute data allowed; here's a list of acceptable attribute types and their meaning. The extension #REQUIRED means that all instances of IMG have to state ID, HEIGHT, and so forth. The declaration may replace the extension #REQUIRED with a quotation, "center" (for alignment) which specifies the default value for that attribute. In other cases, you may wish for the author of source documents to specify such-and-such an attribute, but you don't want to mandate it. The #IMPLIED extension allows an attribute to be declared, without requiring it. The #FIXED "*" extension pegs the attribute at *, regardless of what the author puts.

A third is the declaration of entities, which are essentially surrogates for data, analogous to variables or "insert x here" memos in a manuscript. Entities are [usually] invoked with an ampersand (&) and closed with a semicolon (;), e.g., &HEADER; In some cases, as with predefined general entities (like &lt; for "<"), it's possible to avoid declaring them entirely under most conditions. The less trivial case of the internal general entity is fairly simple to declare:
<!ENTITY AUTHOR "James R MacLean"> 
A single change of the declaration allows one to replace my name with that a new author. Or one can include a complex patten of chapter headings complete with lines, images, and references to elements.

External general entities are a little more complicated to declare since they must include an universal resource indicator (URI), which is just a URL that ends in the actual name of the resource (e.g., a named section in a file, or an image).
One much-cited advantage to this is that it allows a web designer to generate a web page with source files located in multiple locations. Another is that it allows feeds, like Atom & RSS.

Internal parameter entities are entities that are used in the declaration itself; they are handy for standardizing elements by allowing all of them to have certain parts of their declaration changed at once. Just as an internal parameter entity is invoked with a % rather than an &, so the declaration differs from that of a general entity by using a %.
The above allows one to alter the required children elements for each element that declares (%PERFORMANCES_CONTENT) as its child elements.

OBSERVATION: While XML doesn't define elements, and it's possible for you to create elements like "SNOOKUM" and "UTTER_HUMBUG," defined with the most eccentric characteristics imaginable, the format and terms for declarations are established. There are some limitations to XML's tabula rasa approach. You can't declare "GROOVINESS" as an attribute of the EMPTY element WANKRIDER, although I suppose you could submit it to the W3C for serious consideration.

REFERENCES: Airi Salminen, Frank Wm. Tompa, "Requirements for XML Document Database Systems" (PDF- Nov 2001), esp. p.5; W3C, "Extensible Markup Language (XML) 1.0" (1998)

Elliote Rusty Harold, XML 1.1 Bible, 3rd Edition, Wiley Publishing (2004); see §8: "Element Declarations" and §9: "Attribute Declarations"

Labels: , , , ,

02 August 2007

XML-3: Entities & Elements

Disclaimer: I am not an expert on this topic, but a student. I am hoping that these notes will be of help or interest to others trying to understand what XML is and how it works. My notes are not as well-composed as I would like, and I've been more interested in correcting errors than poor composition. Moreover, there's a lot of repetition that I've trying to prune back.

From my previous post on XML:
Markup languages are about more than merely tags; there are also elements, which include the basic building blocks of a document. [...] In XML [...] everything in the page must belong to an element, and there is a hierarchy of elements. An element may be a listing of some kind, or the body of text, or footnote text, or a title, or salutation. An element is opened by a tag, and must always be closed by one, unless it's an empty element

Entities are named units of storage in XML. Conceivably, the entire document may be an entity, including associated files defining the XML elements. However, this is trivial example of an entity. Internal entities are defined in the document and may include something as simple as a symbol, or perhaps a footer. External entities include a universal resource identifier (URI) , which identifies precisely where the content of the entities is found. In some cases, the advantage of using an entity is that it may be used as a variable; changing the value in one place changes it everywhere it appears in the resulting document. Also, internal parameter entities can be used in the associated files to change what is a legal element.
Documents created in HTML (usually) and in XML (always) contain a header, or prologue, which defines the elements in the body. An element is not something that XML has in addition to tags; tags are used to create elements, designate what they are or do, and what their attributes may be (e.g., color, size, image location, conditionality). While HTML documents may, or may not, be organized into elements, XML source documents always have everything organized into elements. Moreover, these elements are nested; so, for example, the entire visible part of the source document is the root element (or body). Everything is an element that is child to the root; so, for example, the title of the document is a <TITLE> element nested, or contained wholly inside of, the root element.

All elements need to be declared, or formally listed in the document type declaration (DTD). I mention this now because the declaration of entities shines some light into how they work and why. The DTD indicates such things as which elements are children of other elements and specifies what attributes, or descriptive qualities (like height, width, color, typeface) each particular element has.

Entities are a somewhat elusive concept. Internal entities are defined entirely within the DTD; external entities have some or all of their content outside the document. In the external entity, a hyperlink points to the externally located content. Entities are treated as variables, and that's how they work.
  • XML applications have five predefined general entity references (list); they're typographic symbols that are most frequently used to type out illustrative XML code on websites. They appear only in the source code of the file and are invoked with an ampersand ("&").
  • Internal general entities include things like headers or copyright data that must appear many places in the source document. The author may wish to change the effect of the entity everywhere it appears by editing it in the declaration; so, for example by changing the year on the copyright, or including her middle initial in each appearance of her name. They appear only in the source code of the file and are invoked with an ampersand ("&").
  • External general entities are entities that refer to something outside of the document. A common example is parts of a document stored in other files, which may include #PCDATA source code. They appear only in the source code of the file and are invoked with an ampersand ("&").
  • Internal parameter entities are entities that are in the DTD; one can actually incorporate an entity into a declaration. They are invoked with a percentage sign ("%"). This allows one to have declarations that invoke some variable.

The purpose of entities is to allow an author or programmer to construct a document from pieces, including pieces of other documents that may well exist in another domain.

source document: the XML code that constitutes the website. Excludes the declarations and stylesheet.

REFERENCE: XML Tutorial - Entities and Other Components

BOOK: Elliote Rusty Harold, XML 1.1 Bible, 3rd Edition, Wiley Publishing (2004)

Labels: , , , , ,

01 August 2007


The extensible stylesheet language (XSL) is used to create stylesheets for XML documents. XML tags are not defined; when a programmer creates a page in XML, that programmer is in effect creating a new language peculiar to the domain; the term for this is "domain-specific language," and XML has spawned many examples [*][listing]. Each implementation of XML has to have a stylesheet, and XSL is the official language for programming them.

This is a rather arcane topic, and so this post is going to remain a stub until/unless I really think it's necessary to write about what XSL is. However: it actually is a set of three languages, each of which are responsible for a specific part of transforming XML-compliant applications into the desired graphical user interface (GUI).
  • XSL Transformations (XSLT): language for converting XML source documents into (a) other XML-compliant formats, (b) HTML or plain text. The source document is the original code; XSLT creates a new document, called a result document, which is supposed to be useful. Usually the result document is an actual, literal, document (like a readable page on the screen); but it can also be a stream of data that is computer-readable.
  • XSL Formatting Objects (XSL-FO): unified language (i.e., no divide between "markup" and "definition") for converting XSLT result documents into something printable. Mostly used to generate PDF documents.
  • XML Path Language (XPath): used by XML language parsers to navigate an XML document. Recall that XML documents consist of elements organized in a nested hierarchy (like a tree). XPath is a language that helps the parser travel "up the tree" to the root element (i.e., the body of the document), and back "down another branch" as the process of generating an XML transformation.

domain: the term "domain" is somewhat ambiguous; a domain could be a generic term that covers a section of a database, perhaps stored in multiple locations; a website, with a root directory on a host; or an IP address range. In fact, the term here refers explicitly to a peculiar task, function, purpose, or specialty. A domain specific language is therefore a programming language that is used to address a very specific function, sometimes so specific that it has a unique actual application. Please see "Domain-Specific Languages: An Annotated Bibliography," by Arie van Deursen, Paul Klint, & Joost Visser (1999/2000)

SOURCES & ADDITIONAL READING: Wikipedia, Extensible Stylesheet Language (XSL); W3C Tutorials, "The Extensible Stylesheet Language Family (XSL)";

BOOK: Eliot Rusty Harold, XML 1.1 Bible, Wiley Publishing (2004); see §15" "XSL Transformations" and §16: "XSL Formatting Objects"

Labels: , , , ,