Below you can find most of my papers (yellow boxes), handouts and slides (gray boxes), organized according to the following topics: Austronesian historical linguistics, Lexical categories and clausal constituency, Information structure, Prosody of Austronesian languages, Phonology-Morphology interface, Clitic syntax, Other Austronesian morphosyntax and typology, Wakhi, Garifuna, Computational tools for language documentation, Urban language documentation, Acquisition, the pandemic, Reviews.
Austronesian historical linguistics
One of my primary interests has been Austronesian historical morphology and its consequences for the proper analysis of synchronic properties for Austronesian grammars. Reflexes of the transitivity-related prefixes *pa-, *paN-,*paR- and *ka- are found throughout the Austronesian family and play a critical role in determining valency and what has been termed “mode” (distributive, abilitative, reflexive, reciprocal, etc.). In the following, I present evidence that *paN- and *paR- are complex prefixes containing causative *pa-. As reflexes of *paN- and *paR- have seemingly contradictory functions in modern languages, an inherent challenge here is to reconstruct a function for the proto-prefix that can give rise to such diversity. I continue exploring the anti-passive functions of *paN- in Kaufman 2017 (below).
A look at why so much inherited core morphology (e.g. PMP *pa-, *ka-, *paR-, *-an) is wildly multifunctional, with a focus on the reanalysis of PMP *ka-.
In the following, I examine the development and morphosyntax of *pa- in two of its guises, termed by Travis “inner” and “outer” causative, and cross-linguistic variation between Tagalog and Malay.
An important counterpart of causative *pa- is *ka-, which I attempt to reconstruct as the primitive predicate have′, in the following presentation. Similar to *pa-, the reflexes of *ka- appear to vary widely in function and can be found in abilitative, accidental, resultative, passive, and adjectivalizing capacities. By way of analyzing modern functions of *ka- reflexes, I observe difficulties with an analysis of Tagalog ka- as marking an unaccusative/unergative distinction.
Here, I look at a particular connection between *ka- and exclamatives in Austronesian, arguing that *ka- was generalized as an exclamative marker from its earlier functions as a non-finite allomorph of the property-denoting *ma- (PAn *k<um>a-). I speculate more generally on how exclamatives derive their illocutionary force in Austronesian and beyond.
Austronesian languages are famous for their rich voice morphology, which has been analyzed alternatively as nominalizers, agreement markers, transitivity markers and applicatives. In the following talk, I examine the significance of an overlooked but crucial piece of evidence in understanding the patient voice and locative voice, namely, that these voice markers function as case suffixes on accusative and dative arguments in several Formosan languages.
Almost all Philippine and Formosan languages mark aspect with a combination of bona-fide verbal morphology and clitics. I taxonomize these markers as inner aspect and outer aspect, showing that their interpretation matches their positioning. Verbal morphology indicates perfective, imperfective, progressive and prospective. But Austronesian languages also make extensive use of outer aspect markers which are typically translated as ‘still’ and ‘already’. I argue these can be reconstructed as PAn *=daNa and *=pa, respectively. Moving southwards from the Philippines, verbal aspect marking drops off suddenly in central Sulawesi and Borneo. From this point on, inner aspect distinctions are no longer obligatory but outer aspect continues to be marked by a range of clitics, many of which can be traced to *=daNa and *=pa and others which are innovative. The resulting aspectual typology of languages like Malay strongly resembles that of mainland Southeast Asian languages.
In the following effort with co-author Juliette Blevins, we look at the potential role of morphology in a puzzling sound change in Palauan.
Saisiyat presents a number of intriguing problems that are seemingly unique among Formosan languages. I explore Elizabeth Zeitoun, Tai-hwa Chu, and Lalo a tahesh kaybaybaw’s (2015) monograph on Saisiyat morphology and attempt alternative analyses to some of these problems.
I’ve done my best to avoid questions of long-distance relationships, but here I give in and take a comparative look at proposals connecting Austronesian to different language families of the Asian mainland. I am mostly concerned with the likelihood of a connection between Austronesian and Austroasiatic, Sino-Tibetan, and Tai-Kadai, as proposed by various authors. My conclusion is that the evidence for all of these hypotheses is very weak and, to me at least, unconvincing, but if one is forced to choose, the functional morphology matches up best with Austroasiatic.
Despite my skepticism for ancient phylogenetic links between Austronesian and the Asian mainland, I have found extensive lexical evidence for some form of contact between speakers of Austronesian languages and speakers of Mon-Khmer languages in Borneo. This had been posited very tenuously by previous authors on the basis of four words. The handout below contains 89 proposed comparisons, almost all new. The slides provide further background.
Lexical categories and clausal constituency
Starosta, Pawley & Reid (1982) argued convincingly that Austronesian voice developed historically from thematic nominalization as part of a reanalysis of nominal predicates to verbal ones. I argue that the residual nominal properties of voice marked forms in Austronesian go a long way in explaining this family’s most discussed syntactic characteristics. The seemingly exotic extraction restriction can be assimilated to a crosslinguistically commonplace restriction on extracting possessors. In the following paper I lay out my arguments on the basis of Tagalog.
The following paper looks at the typology of nominalism in Austronesian languages and, most importantly, how it disintegrates south of the Philippines.
Here are handouts and slides from presentations that led to the two papers above:
The following presentation examines a paradox of Philippine non-actor voice agents. They fulfill all the basic diagnostics for c-commanding clausemate undergoers and yet constituency diagnostics clearly show that they form a unique constituent with the predicate head in the unmarked predicate-initial order, i.e., [[Pred Agt] Pat]. Closer examination reveals that the c-command diagnostics are not as clear as previously thought, potentially leaving a larger role for non-structural accounts of the asymmetries that do exist.
Conservative Austronesian languages differ from languages across the ocean on the Southeast Asian mainland in being predicate initial, while the latter are overwhelmingly SVO. I show here that they also differ in having an unorthodox predicate phrase with a [[Pred Agt] Pat] structure as opposed to mainland languages, which display a traditional verb phrase.
Ross (2009) proposes that functional morphology and morphosyntax lead to the reconstruction of an Austronesian family tree in which Puyuma, Rukai and Tsou are primary branches of Proto-Austronesian while a fourth branch, which he titles Nuclear Austronesian, is the antecedent for all other documented Austronesian languages. Nucelar Austronesian is posited on the basis of the Starosta, Pawley and Reid reanalysis of nominalizations as main clause predicates, replacing an earlier, robustly verbal paradigm. Puyuma retains the putative Proto-Austronesian state of affairs on this account: it employs nominalizations for relative clauses but still uses the original verbal paradigm in main clause predications. Following up on the above work, we explore the implications for clausal constituency in Puyuma and Tsou based on the assumption that the unusual [[Pred Agt] Pat] structure is a result of its nominal history. If this is correct, main clauses in Puyuma and Tsou should show the standard VP configuration we find in languages of the mainland and elsewhere.
The debate between an ergative analysis of Austronesian voice and a more accusatively oriented “case agreement” analysis has been remarkably long-lived. In the following chapter, I compare these two approaches and note problems for both, which, I argue, can be overcome by a nominalist analysis. I also explore here how Philippine-type languages develop into classically ergative languages as found in South Sulawesi.
A handout from a presentation that culminated in the above chapter:
Finally, from a talk (in Tagalog) on lexical categories in Tagalog and Austronesian languages:
Information structure
Predication is a terribly misunderstood and abused concept and I don’t suggest looking to me for the perfect solution but I do attempt to shed light on the meaning of predication in Philippine-type languages in this chapter, focusing on how a clause is divided into predicate and subject. What I really try to get at though is a vastly unexplored question in Austronesian syntax: Why do so many Austronesian languages use cleft-like constructions in questions and how are these clefts constructed? I argue that apparent clefts in Philippine languages are really monoclausal but that true biclausal clefts develop south of the Philippines once a more robust noun-verb distinction emerges.
Here, I look at the interplay of syntax and prosody in Tagalog pragmatic relations and further explore syntactic and pragmatic diagnostics for topic and focus. I argue that prosody, while playing a relatively minor role in Tagalog focus marking overall, does come in to save the day when syntax cannot do the job alone.
A recent paper by Collins (2019) argues that the definiteness of clausal arguments in Tagalog can be derived entirely from their position and, as a corollary, that Tagalog case markers are semantically vacuous. I show that this cannot be the full story; the nominative case marker must also be a definite determiner regardless of the semantic contribution of clause structure.
Joint work on the expression of (in)definiteness in Indonesian, a language that has no clear definite article but rather a number of competing strategies to optionally signal familiarity and uniqueness.
A paper in the same volume briefly surveying methods of definiteness marking across (mostly non-Oceanic) Austronesian languages:
Prosody of Austronesian languages
I have recently co-authored two handbook chapters with Nikolaus Himmelmann that provide an overview of Austronesian suprasegmental phonology and prosody.
As discussed in the above papers, a growing body of research takes a skeptical approach to claims of word stress in Indonesian languages, arguing that varieties of Indonesian/Malay are indeed stressless. I am interested in using novel types of evidence to approach the question of stresslessness. For instance, how do rappers align syllables to strong beats in stressless languages? How are manual gestures coordinated to syllables in such languages? In the following presentation, I look at beat alignment in Tagalog and Javanese rap, which I argue reveals a major distinction between the two systems.
In the following papers, we look at gesture alignment in distinct varieties of Indonesian, including a western dialect and an eastern dialect. We find that the western dialect conforms well with descriptions of Indonesian as a stressless phrase-based prominence language but that the eastern variety shows clear signs both in prosody and gesture alignment of a penultimate stress pattern, in accordance with Kaufman & Himmelmann’s 2024 areal typology.
Phonology-Morphology interface
Tagalog infixation provided early Optimality Theory one of its key victories. But the plot thickens when we look at cases where infixation is not predicted by the shape of the affix. In this paper, I consider mitigating paradigmatic factors which appear to prevent infixation of V-initial affixes in Austronesian languages.
Another highly analyzed piece of Austronesian morphophonology is Nasal Substitution, shorthand for the set of phonological interactions triggered by reflexes of the Proto-Malayo-Polynesian prefix *maŋ-, which include coalescence (nasal substitution), among a number of other outcomes detailed by Blust (2004). In this work, I look broadly at the cross-linguistic variation in the phonology of *maŋ- from the perspective of contrast preservation, positing that each language sets a limit on the number of mergers this prefix can induce.
Clitic syntax
Austronesian clitics pose major problems for (what I would call) “syntactic imperialism”, the idea that all ordering of function morphology should be handled by the same basic principles that compose phrases and clauses. The following handout is a whirlwind tour through some of the interesting issues that Austronesian clitics present us with.
My dissertation offers a more in depth look at some of these issues. I propose a way to derive the variation in clitic typology without overgenerating unattested clitic types. Clitics are of two types: bona fide syntactic terminals and realizations of features adjoined to phrase edges. The latter can be displaced by phonology but in such cases must be parsed phonologically with preceding material. The latter type are in a head-complement relation with their host and either parsed phonologically with their host or with preceding material. Looking specifically at Tagalog, I examine the interplay between syntax and phonology going beyond the cases of impenetrable phrases examined by Anderson (2005).
A chapter that never made it into my dissertation looks at how different types of clitics are integrated into the prosodic hierarchy in Tagalog:
The following paper looks broadly at the factors involved in clitic ordering and positioning across Austronesian languages:
The following are two talks that emerged from my dissertation work:
Maranao, a language of Mindanao in the southern Philippines, orders its pronominal clitics according to a strict person-based hierarchy. In the following paper, I look at constraints on their ordering and combinatorics and at how the clitic cluster is positioned within the clause and determiner phrase.
The South Sulawesi languages all show verb-adjacent proclitics indexing the ergative argument and second-position clitics doubling the absolutive argument. In the following, I explore some fine grained variation within the subgroup and the details which differentiate second position in South Sulawesi from its counterpart in Philippine languages.
Historically, Malay was caught between South Sulawesi style incorporation of genitive clitics as ergative agreement (found more widely throughout languages of Indonesia) and the mainland Southeast/East Asian phenomenon of pronoun avoidance. As a result, entire titles and descriptive NPs could be incorporated into the verbal slot that normally hosted pronominal proclitics. This is especially useful for exploring the syntax of full noun phrases with a first or second person interpretation (termed “imposters” by Collins and Postal 2014), so that’s what I do in the following paper. But along the way, I propose a development from Classical Malay to modern Indonesian varieties that can account for person-based variation in proclisis across both Sulawesi and Sumatra.
Other Austronesian morphosyntax
The following talk examined variation in voice across languages of the Philippines and Sulawesi from the perspective of relation marking and transitivity, each of which are taken to underlie different theoretical approaches, namely, the case-agreement and ergative analyses.
The following chapters give a relatively detailed overview of the phonology, morphology and syntax of the languages of the central and southern Philippines, and the Sama-Bajaw languages.
What can sluicing (e.g. “They ate but I don’t know what”) tell us about transitivity? In this presentation, Ileana Paul and myself show that actor voice clauses in Tagalog, as opposed to non-actor voice clauses, do not license object “sprouting”. This suggests reduced transitivity of the actor voice, as posited by ergative analyses (De Guzman 1988, Aldridge 2005 inter alia).
I discuss locatives and directionals in Mamuju, a South Sulawesi language, and posit two distinct structures for verb-derived deictics and noun-derived deictics. I show that the former type gives rise to a Pied-Piping with Inversion pattern that has not yet been noted in any language of the region besides Sasak (Austin 2006).
Adverbs are a key testing ground for general principles of structure building in syntax. In particular, they force the question of whether all syntactic elements must be generated in dedicated projections, as in Cinque (1999), or whether certain elements can be freely adjoined with interpretive principles acting as a filter, as in Ernst (2002). I show here that the Tagalog evidence, when seen in its entirety, falls strongly on Ernst’s side. Concentric scope phenomena, in particular, find no natural explanation on the Cinquean view.
Wakhi
I have worked on Wakhi on and off for some 10 years now with a small number of speakers in New York City. Wakhi is a Pamiri language spoken in the area where Pakistan, Afghanistan, Tajikistan and China all come together. For a language with roughly 60,000 speakers, it is very transnational. Wakhi has a highly unusual case marking pattern: plain old nominative-accusative in the present tense but “double oblique” in the past, in which both arguments of a transitive clause are marked with the same case used on present tense objects. The “double oblique” pattern can be seen as an alignment mermaid: ergative from the waist up and accusative from the waist down. In the following talk, I explore whether there are any discernible syntactic differences between the two arguments of past tense clauses versus present tense clauses. Not only are there no differences but it is hard to find many reliable structural diagnostics that differentiate the two arguments in the first place.
In the following talk, on the same topic, I attempt an analysis of Wakhi’s double oblique pattern using ranked case assignment strategies. This allows for a simpler analysis that can handle typological variation across Wakhi dialects and other Iranic languages.
Here, we examine Wakhi from a sociolinguistic perspective as a “diaspora language”. We describe the transnational nature of the Wakhi community and its effects on their language and language attitudes.
Garifuna
Garifuna, an Arawakan language spoken by a largely African descendant population on the Caribbean coast of Central America, displays an extremely rare word order: Verb Aux Subj Obj. Specifically, a postverbal position for auxiliaries in a verb initial language is otherwise unattested. I show that, while the auxiliary must be an independent morphological word based both on morphological and phonological criteria, it is embedded in a larger verbal complex. If we derive the verb complex through head movement, we predict that Garifuna auxiliaries are more bound to the verb than pre-verbal auxiliary in Aux-Verb languages. I show a number of syntactic diagnostics that bear this prediction out. Thus, while Garifuna breaks Greenberg’s 16th Universal on the surface, it still holds true at an underlying level.
With the help of the Endangered Language Documentation Programme at SOAS, I have been involved the documentation of two vocal genres of Garifuna song, arumahani and abeimahani. I undertook fieldwork in Belize together with James Lovell to record some of the best practitioners of this song and interviewed singers and others about the significance of the songs. We then interviewed members of New York’s Garifuna community about the transmission of Garifuna songs, language and spirituality in the diaspora. The project is described in the following slides.
The archival deposit can be found here:
Computational tools (in collaboration with Raphael Finkel)
Supported by an NSF grant for the DEL program, I have been working with Raphael Finkel (Dept. of Computer Science, University of Kentucky) to create Kratylos, an online platform for sharing, browsing and querying interlinear glossed texts and time-aligned annotations. The two papers below discuss the program’s abilities and goals in the current landscape of corpus tools for language documentation.
The Phonomaton allows users to create serial derivations using a full set of phonological features (those of Hayes 2009) and can also calculate distinctive features across a selected inventory. We have not yet written about it but you can try it out here: