What is a language family? Most languages belong to language families. A language family is a group of related languages that developed from a common historic ancestor, referred to as protolanguage (proto– means ‘early’ in Greek). The ancestral language is usually not known directly, but it is possible to discover many of its features by applying the comparative method that can demonstrate the family status of many languages. Sometimes a protolanguage can be identified with a historically known language. Thus, provincial dialects of Vulgar Latin are known to have given rise to the modern Romance languages, so the Proto-Romance language is more or less identical to Latin. Similarly, Old Norse was the ancestor of Norwegian, Swedish, Danish and Icelandic. Sanskrit was the protolanguage of many of the languages of the Indian subcontinent, such as Bengali, Hindi, Marathi, and Urdu. Further back in time, all these ancestral languages descended, in turn, from one common ancestor. We call this ancestor Proto-Indo-European. Language families can be subdivided into smaller units called branches. For instance, the Indo-European family has several branches, among them, Germanic, Romance, and Slavic.
Structure of a family. Language families can be divided into smaller phylogenetic units, conventionally referred to as branches of the family because the history of a language family is often represented as a tree diagram. A family is a monophyletic unit; all its members derive from a common ancestor, and all attested descendants of that ancestor are included in the family. Some taxonomists restrict the term family to a certain level, but there is little consensus in how to do so. Those who affix such labels also subdivide branches into groups, and groups into complexes. A top-level (i.e., the largest) family is often called a phylum or stock. The closer the branches are to each other, the closer the languages will be related. This means if a branch off of a proto-language is 4 branches down and there is also a sister language to that fourth branch, than each of the two sister languages are more closely related to each other than to that common ancestral proto-language. The term macrofamily or superfamily is sometimes applied to proposed groupings of language families whose status as phylogenetic units is generally considered to be unsubstantiated by accepted historical linguistic methods. For example, the Celtic, Germanic, Slavic, Italic, and Indo-Iranian language families are branches of a larger Indo-European language family. There is a remarkably similar pattern shown by the linguistic tree and the genetic tree of human ancestry that was verified statistically. Languages interpreted in terms of the putative phylogenetic tree of human languages are transmitted to a great extent vertically (by ancestry) as opposed to horizontally (by spatial diffusion).
How do linguists establish relationships among languages?
Sometimes it is relatively easy to establish relationships among languages. Let us look at the Romance languages. We know that Italian is a descendant of Latin, a language that was spoken in Italy two thousand years ago, and one which left a great number of written documents. The Roman conquest helped spread Latin throughout Europe where it eventually developed into regional dialects. When the Roman Empire broke up, these regional dialects evolved into the modern Romance languages that we know today: French, Italian, Portuguese, Spanish, and others. These languages form the Romance branch of the Indo-European language family.
What if the ancestral language left no records?
The case with Romance languages is unusually easy because their common ancestor — Latin — left many written documents. In most cases, however, the ancestral language was not written. As a result, linguists look at similarities among its modern descendants to establish common origins. Where do these mystery languages belong?As it turns out, Latvian belongs to the Baltic branch of the Indo-European language family, Albanian has no close relatives and does not belong to any of the branches of the Indo-European language family, and Basque does not belong to any language family at all. In fact, it is a language isolate, i.e., a language that cannot be reliably assigned to any established language family.
What if there are no records, and we know little about the languages?
In many parts of the world, there are no written records, and we don’t know enough about the languages themselves. Consequently, we have to resort to grouping languages on the basis of geography. This is the case with many of the aboriginal languages of Australia, the native Indian languages of the Americas, the tribal languages of Africa, and countless other languages all over the world.
How many language families are there? According to Ethnologue (16th edition), there are 147 language families in the world. This figure may not be precise because of our limited knowledge about many of the languages spoken in the most linguistically diverse areas of the world such as Africa. The actual number of families, once these languages are studied and relationships among them are established, will undoubtedly keep changing.
World’s largest language families. The largest language families (those with over 25 languages):Niger-Congo, Austronesian, Trans New Guinea, Sino-Tibetan, Indo-European, Afro-Asiatic, Australian, Nilo-Saharan, Oto-Manguean, Austro-Asiatic, Tai-Kadai, Dravidian, Creole, Tupian, Language Isolates, Mayan, Altaic, Uto-Aztecan, Arawakan, Torricelli, Sepik, Quechuan, Na-Dene, Algic, Hmong-Mien, Uralic, North Caucasian, Penutian, Macro-Ge, Ramu-Lower Sepik, Carib, Panoan, Khoisan, Salishan, Tucanoan.There are 6,523 languages in this group, and together they account for close to 95 percent of all world languages (assuming that there are some 6,900 languages in the world). The remaining families account for only 5 percent of the world languages. In addition, there are 53 languages considered unclassified.