For a language family mostly made up of minority languages, Uralic is really quite well documented by any standards. Most of the smaller languages have received decent descriptions already in the 19th century, and many also theoretically updated reflections later on in the 20th and 21th. The big exception had for long been the Samoyedic languages, with the literature being mainly dictionaries and comparative studies, and only Tundra Nenets being described in good detail by multiple scholars. By now however, even the Samoyedic situation has improved. Within the last few decades, Nganasan has received a reference grammar by both Wagner-Nagy (2002) and Katschmann (2008), Forest Nenets by Pusztay (1984), Forest Enets by both Künnap (1999a) and Siegl (2013), Kamassian by Künnap (1999b), Mator by Helimski (1997) (and that’s all before me having looked too much into the literature in Russian). For Selkup I’m not sure of a source worth singling out, but there’s a fair amount of scattered literature. Tundra Enets is the only well-delineated variety I do not know to have been specifically covered by anyone, though it’s also very close to Forest Enets anyway (treating them as two different languages entirely has not struck me as warranted).
At this point it then seems that not only has descriptive work on Samoyedic caught up with comparative work, on a few fronts the former has passed the latter altogether. Janhunen’s classic Samojedischer Wortschatz from 1977 closely follows his 1976 reconstruction of the vowel system of Proto-Northern Samoyedic,  and in particular southern Samoyedic data seems to not be quite systematically integrated into the reconstruction. This task has been later accomplished only for Mator in Helimski’s monograph. For Selkup, there have been numerous specialized studies,  but AFAIK no substantial synthesis so far.
The situation is the most haphazard for Kamassian. There are overview notes in comparative Samoyedic works like Paasonen (1913–1917) and Mikola (2004), and also in Collinder’s even more generalist Comparative Grammar; as well as two monographs by Künnap on historical morphology. But that’s about it. To my knowledge, no-one has ever taken a good detailed look at the basic phonological and etymological issues of the language.
With today’s resources, this will be not an especially hard task. I already have my WIP database of proposed Proto-Samoyedic etymologies in a decent shape. Extracting a list of etymologies extending to Kamass and/or Koibal takes 30 seconds; gathering the corresponding reflexes from SW a bit longer, maybe two days’ worth of work. Double-checking other sources would take longer though. This is especially due to inconsistent treatment of the most thorough records available, those of Kai Donner in the 1910s. Janhunen gives what I believe is Donner’s full transcription, but others seem to habitually drop “unnecessary” diacritics.
However, if we don’t have a solid grasp of even the basic phonological inventory of Kamassian, can we really be sure which diacritics are “unnecessary”? Overviews like Künnap’s grammar or the 1998 handbook on Uralic languages seem to present basically a bare minimum system with just the “base letters” extracted. Which is a good place to start from, but already a look at the other Samoyedic languages (e.g. Selkup with its giant vowel system, or Nenets with its extensive palatalization) suggests that more complexity could have plausibly existed.
I offer here, for starters, two suggestions for amending the synchronic phonology. A full survey of the Donner materials would be required really, but already the inherited Samoyedic component seems to allow tentative conclusions.
The uvular stop /q/ should probably be recognized for Kamassian (as also for Selkup). Donner transcribes [q] versus [k] (actually [qʰ], [kʰ] in most cases, but I will ignore aspiration)  somewhat but not totally consistenly depending on the following vowel:
- before a, å: 2×k, 20×q
- before e, ɛ: 5×k
- before ə: 3×k, 2×q
- before i: 1×q
- before o: 2×k, 9×q
- before u: 9×k, 6×q
- before ʉ: 5×k
So there is potentially decent evidence for a contrast /k/ : /q/ at least before u. Importantly, this seems to be etymological. ku- is mostly from *ku-, *ko- (3× *kå-), while qu- is mostly from *kå-, *kə- (1× *ko-).
Before ə the situation also looks promising at first, but may not stand up to scrutiny. The distribution is again etymological: the three cases of kə- are from *kë-, *kï-, *ku-, while the two cases of qə- are from *kə-. However, two of the three former cases are actually transcribed with ə̣ (= turned ė; there is no ‹ə› as a base symbol in UPA), which seems to be better considered an allophone of /e/ or perhaps /i/. This vowel mostly continues earlier PSmy front vowels *i, *ɛ, *ə̈; it takes front vowel harmony (ə̣d-ľɛm ‘I am visible’, mə̣-ľɛm ‘I sell’, sə̣βəʔ-ľɛm ‘I take out’), varies with e in Donner’s transcription (bej-ľɛm ~ bə̣j-ľɛm ‘I go over’), often corresponds to Castrén’s i (C mija ~ D mə̣j`ɛ ‘earth’) and can be sometimes seemingly triggered by a present or lost palatal consonant (pʰə̣ŕåŋ ‘drill’ < *pərəjəŋ, nə̣mi < *jumpə ‘moss’). In other cases there’s instead back ə̑, which mainly continues *ə, and might be even analyzable as an allophone of /a/ (with the result that there would be no ˣ/ə/ in Kamassian after all). If so though, then the apparent kə | qə contrast could be mostly rendered as a contrast between /Ke/ and /Ka/, instead of /k/ versus /q/. (There still remains a near-minimal pair, though: kə̑m ~ kɛm ‘blood’ < *këm | qə̑ʔ ‘pus’ < *kət.)
Word-medial cases of Proto-Samoyedic *k are not common, but where not palatalized, they also seem to show a somewhat Turkic-style split between velar g (nāgur ‘3’ < *nakur) and uvular ʁ (ťaʁa ~ ďåʁå ‘river’ < *jəkå), which perhaps remains analyzable as allophonic.
I also believe vowel length was phonemic in Kamassian. Donner and Castrén are not completely consistent with each other on this, but there are several indications that lenght is regardless neither random nor context-dependent.
As far as basic statistics go, the different Proto-Samoyedic vowels are split in two clear groups: the open vowels *ä *å and especially *a, and mid *o often yield long vowels, while the close vowels *i *ü *ï *u, reduced *ə and mid *ë only rarely do. Ignoring quality changes and counting half-long vowels (à ù etc.) as short for now, the quantity reflexes are as follows:
- *a > short ×24, long ×26 (52%)
- *ä > short ×47, long ×23 (33%)
- *å > short ×47, long ×16 (25%)
- *o > short ×19, long ×7 (27%)
- *ë > short ×25, long ×1 (4%)
- *u > short ×26, long ×5 (17%)
- *ï > short ×10, long ×2 (17%)
- *ü > short ×18, long ×0 (0%)
- *i > short ×51, long ×2 (4%)
- *ə > short ×68, long ×6 (8%)
Of course open vowels normally tend to be longer than close ones, so this kind of a chart is to be expected. However, the qualitative changes in the vowel system mean that at minimum ā, å̄ < *a are effectively in contrast with a, å < *ə. At least ō and ē seem to be also phonemic. Minimal or near-minimal pairs can be found (å replaced by a for clarity):
- tar̀ ‘hair’ < *tə̈r | tār ‘gills’ < *čar
- qăn̆-ńȧm ‘I freeze’ < *kəntɜ- | māⁿ-ńim ‘I measure’ < *mančə-
- qăzɪl̀ ‘wart’ < *kəsər | qāzə̑ra ‘nutcracker’ < *kasɜra
- kora ‘reindeer bull’ < *korå | kōla ‘fish’ < *kålä
- le ‘bone’ < *lë | ďē ‘heel’ < *jä, nē ‘woman’ < *nä
- pel-ľɛm ‘I put’ < *pën- | sēlə-ľɛm ‘I sharpen’ < *sälä-
The analysis can be improved by noting also some conditional developments leading to the “wrong” vowel length. For one, before consonant clusters or stem-final obstruents, almost no long vowels occur. For two, Donner’s records have also words with stressed or long vowels in the 2nd syllable; these, too, never seem to have long vowels in the 1st syllable. A near-minimal pair of this kind is toli· ‘thief’ < (? *tōlī <) *tåläjə, vs. tōlu ‘darkness’ < *tålwə. For three, long vowels from *close vowels, *ə or *o often seem to be the result of vowel contractions, and after taking this into account, *o patterns together with the other mid vowels after all:
- *i: pīdi ‘thumb’ < *pij-
- *u: ďēdər-ľɛm ~ ťʉ̄dərə-ľɛm ‘I dream’ < *jujtə-, ńī ‘child’ < *ńuə(j), šʉ̄ ‘fire’ < *tuj
- *ə: bʉ̄źɛ ‘husband’ < *wəjs (surely rather *wəjsä?), ťīma ‘tail’ < *təjwå; also, from Castrén: khâŋ < *kəjŋ 'thunder', kê ‘moose’ < *kəå
- *o: mō ~ mū ‘branch’ < *moə, qōriʔ < *koər 'container', šō-ľȧm ‘I come’ < *toj-, šōmi ‘larch’ < *tojmå
An additional indirect line of evidence for phonemic long vowels is the transcription of consonant length. Donner transcribes medial and final single consonants as half-long in some cases (d̀ j̀ ǹ r̀ etc.). At least in CVC words this seems to depend on vowel length: word-final consonants after short vowels are fairly consistently transcribed as half-long, word-final consonants after long vowels consistently as short (though only r occurs more than once: bōr ‘ridge’, ńēr ‘point’, tār ‘gills’, ťēr ‘center’).
I also notice a tendency that long vowels in words of the shape CVCV seem to occur most often before close or reduced vowels in the second syllable, not so often before mid and open vowels (a pattern closely resembling Selkup; cf. Helimski (2007), as cited in footnote 2). But this issue should wait for a detailed survey. Vowel lengthening has probably taken place more than once and probably with variable exact conditioning depending on vowel quality.
One further issue to investigate would an issue I already brushed above: the front unrounded vowels. Donner’s transcription distinguishes no less than five heights i ė e ɛ ȧ and several reduced counterparts ɪ ə̣ ə ə for the first three, which obviously should be phonologized as something simpler (among back rounded vowels he only has u o å). But offhand I am not sure how the different heights should be delineated. Etymologically Donner’s e ɛ are mostly from PSmy *ä, while ė ə̣ are mostly from *i *ə̈ *ə. This may suggest that ė ə̣ should be counted as allophones of /i/ and not /e/. Some apparent front vowels could also be fronted allophones of i̮ e̮ — or perhaps the situation is the opposite, and these back illabial vowels are, despite always continuing PSmy *ï *ë, actually synchronically just backed allophones of the front vowels (similar to the situation in Nenets). Just the native Samoyedic part of the vocabulary is probably insufficient to work out a solution for this though, since this seems to be mostly an issue of free variation and not conditional allophony. Probably the best line of evidence would be instead the degree of variation within Kamassian, e.g. as noted by Donner from his different informants, or between root words and their derivatives and compounds, or between Donner and Castrén’s records.
I would have observations on historical phonology as well, but those shall be left for another time.
- Helimski 1976, “О соответствиях уральских a– и e-основ в тазовском диалекте селькупского языка”, in Советское финноугроведение (SFU ) 12: 113–132.
- Katz 1979, “Beitrag zur Lösung der Problems der Entwicklung von ursam. *j im Selkupischen und der hiemit zusammenhängenden Fragen der historischen Morphologie dieser Sprache und des Uralischen”, in SFU 13: 168–176.
- Mikola 1981, Adalékok a szelkup vokalizmus történetéhez. Nyelvészeti dolgozatok 193.
- Terentjev 1982, “К вопросу о реконструкции прасамодийского языка”, in SFU 18: 189–193.
- Helimski 2007, “Продление гласных перед шва в селкупском языке как фонетический закон“, in Linguistica Uralica 43/2: 124–133.
- Gusev 2012, “О возможных источниках селькупского сочетания -lć-: ПС *jw, *jk, *jm“, in SUST 264 (Festschrift Janhunen): 77–81.