(FYI: 3.9K words, a long and technical article)
Tying The World In 1 Thread.
Manmohan Dash,
A Language Research based article, originally written, 08-October-2012, updated today, 17-01-2014.
A technical article in envisioning one modern language system for the world. The ideas are based on intense research in language, performed in last 2-3 years, mainly, and related to a bagful of other such articles in this website, the research being an ongoing activity, in various by-lanes of its logical recourse.
For a pleasurable wonderment, in seeing, what awaits such a playful sentence, on Google Translator, I constructed the following sentence, in Japanese alphabet of hiragana+katakana, which is a very succinctly defined rules, for reading native or non-native-inherited Japanese phonetics, into predefined syllable units, (aka alphabet) into Roman, a rule called as, Hepburn. So (1) gives hiragana and katakana and (2) gives Hepburn. I underlined the parts in (1) and (2 ) so that a voice rendering preserves the sanctity of the output into the targeted language of Odia. (lit. Oriya)
(1) “むん ごて 「ダエト コク」 ぴいばく ちゃふんち”
(2) mu’n go te “da e to coku” pi i ba ku cha hun chi
Now, notice the length of above two sentences in hiragana (1) and Roman (2)
Put the sentence in 1, in Google Translate page and hear the voice.
The Japanese Lady will speak Odia so close to accurate, that it will blow your mind. ( that assuming you know the target language: here Odia)
In other words, if you write things correctly, even in Japanese-Hiragana or katakana someone in Odisha will understand perfectly. Such a facility if designed, can be installed on airports, and bam, she will be understood. This also goes for other targeted language of Indian system, Hindi, Bengali or say Kannada etc.
Note in the above, that, Hiragana-Roman-Mapping as shown above (eg ちゃ ふん ち <=> cha+ hun + chi) is a very well-defined attribute of modern language. This attribute is described in few other articles in this website. If you are interested right away, look for articles with description on consonants and vowels.
eg Vowels can occur at 3 levels: we would say level [0, 1, 2] or [1, 2, 3].
Lets prefer naming [012]. See that if computer omits the [,,] and reads the numbers together life is easier if we stick at 012 libel, because 12 is a smaller number than 123, hence the unmixing that will be performed on the sample will be far less complicated than unmixing of 123. Just a musing.
That means a consonant k, is multiplied to, a vowel v, in 3 different ways; k, k.v or k.v.v, so to say. And also be cautious that v is a variable notation here, of the actual vowels: a, e, i, o, u.
In the first instance k is multiplied to 1, (level 0 of v, where v = any of {a e i o u}).
So, in Hiragana, there are only 5 vowels; a, e, i, o, u, as should be the case, for any language. ( This would be prescribed by anyone who understands what modern language theory is )
In Hiragana e, i, o and u, can go to the 2nd level, ee, ii, oo, uu, apart from being definable in level 1.
Note that vowel level 0, is never defined in entirety of Japanese Language. This makes it so different from Chinese Language. This is also where Indian Language resemble the Chinese Language rather than the Japanese Language, although such mixing is to be avoided or categorized. Also note that vowel level 0 is nothing but halant. A rapid accentuation, eg k, t, p, the vowel is omitted and the consonant is said rapidly, hence words in Indian are like: nak = nose, dant = scolding, khap = a judiciary system. Such definitions of rapid-accent or halant would totally be scrapped from Japanese Language, except for the consonant n, which is the only guy in entirety of the language that can be said rapid-ly.
My Chinese national, Physics Professor told me in 2001, that Chinese Language sounds different from other language, because its a mono-syllable language. But by 2011 or so, I discovered that its technically not true. Japanese language in frequent cases and Indian in its development have monosyllables. The difference is this, rapid-accent or halant, being frequently used in Chinese, totally absent from Japanese but with Indian language kept to the level of a few occurrences. The only other Language Nugget, that my English national Linguistics Professor told me, also turned out to be invalidated by my own analysis, another time.
In Hiragana e, i, o and u, can go to the 2nd level, ee, ii, oo, uu, apart from being definable in level 1. But, a is always a, which is a vowel-level: 2, in our preference of naming: [0, 1, 2], since in Hiragana, level: 0, and level: 1, are not used for this vowel-a, which should always be the case, chose either 0, 1 or 2: just any one of them. In other words, a means aa, automatically. Its always aa, which in short-hand notation is a. So between k, ka and kaa, k and ka are not valid definitions, in Hiragana, whats valid is only kaa, which is always to be written in short-hand as ka. k = null. (also ka = null, that notation now taken by kaa) So, ka = kaa.
Hiragana then chooses level 1, for all other vowels, e, i, o and u. That means, while e, i etc are valid by themselves so also ee, ii. (s+e = se, t+e = te are defined in Japanese, so also, k+i = ki and n+i = ni are defined. But chi and shi are not defined as ch+i and sh+i, they are simply base elements, that is given conjugated phonetics. This is where Indian Language would divert from Japanese, given that Japanese and Chinese Language have preserved their phonetics with utmost care for 1000s of years, but Indian phonetics are mixed from worldly language, the element rendering of Indian Language will have some specialized feature, although such has been already accomplished by this website)
The doubling of vowels or a specified sequence of vowels is called as don; (called eg i-don, read idon, with i as e in English. called e-don, read edon, e as in end. etc. don is ton, as on means voice or sound in Japanese). So itou = i+to+u is u-don, note how vowel-o is always made to follow vowel-u. Whenever two vowels occur in Japanese mapping to Roman (Hepburn) one will therefore see only this much rule. a = aa, (therefore ka = kaa). e (therefore se, ke). ee (therefore long form of vowel e, as in seeiguchi, the see part here is to be said like “say”, and not “sea”), i (as in ki), ii (as in kii). o and oo. u and uu. Then a few more rules as to which heterogeneous double vowels, eg udon: ou in itou. All other combinations are forbidden.
This softens the language, because it does not allow spurious formation. So, there are 5 vowels and strict restrictions as to how they are to be combined to consonants and among themselves. Its like 1-way traffic. In Particle Physics soft means a low energy attribute. So, by allowing another wing, in a restrictive way, rather than random possibilities, one conserves the momentum in a particular way. If you have 10 possibilities available in speaking a language, than only 1 or 2, it sits totally convenient, that, you and the whole community speaks comfortably and communicates efficiently. Modernity is all about comfort, efficiency and communication.
In Indian Language, any one of them, on the other hand there are (at-least) 15 vowels, in totality there are about 20 languages, so thats; 300 vowels. Then, we are blessed with, an unimaginable level of writing styles, eg one writes bindiya one writes beendeeyaa
btw ee here would be an incorrect way, it should be, i or ii, depending on how-long one would like to say it, short = i or long = ii.
ever saw someone writing beendeeya bhig gayi? why not bheeg then? why on one sentence two ways of writing, bindi and bheeg, or beendi and bhig? It should have been clear that its inconsistent, that way. Modern languages do not define two ways, and when they do, for any reason eg: India being different reason, consider it a degeneracy and put it all into the simplest form.
eg k, ka, kaa are all same, as has been explained above, in the instance of Japanese Language. (In Indian Language this is not yet so, which is why language expands and newer forms are created, not only birth control we need language control.)
They are different only if you realize: k is not “to be” allowed, such is allowed only for a chosen consonant n in Hiragana and the same should be done for only n in Indian languages, since n is a special sound (rather phonetics), which allows both nasal and non-nasal phonetics.
But Indian Language now define such zero-ness vowelation (read explanation of halant or rapid-accent above) for all consonants, that means 50 or whatever consonants, are special. See that there are 21 singleton consonants, t, m, k, j etc. Then there are conjugated consonants, eg sh, nt etc. This is where language like Tamil define far too many conjugated consonants and claim they are kind of divine language. You can see that only higher level of consonant conjugation etc has been defined, nothing too special, phonetics must be modernized, as far as I recall there are 300 or so conjugated phonetics, in Tamil, far larger than most other Indian language? In the end we need 50 base phonetics. (called as syllable or alphabet, an alphabet is syllables, some of which are singleton and some are conjugated)
If all consonants are given the privilege of vowel-ness-0 (rapid-accent) that would be like all member of parliament are prime ministers, because, see MP is PM under parity, and (we) Indians don’t care Physics.
So (morality without much fanfare) it should always be kaa = ka. lets chose anyone of them and represent through a because then we have again, in addition, doubling of vowels. I am talking about here, in Indian Language context, how we have created another vowel, a, which is the short vowel. Then long vowel a, written now as aa, is also available, life just got complicated, although thats the way it has been, lets recognize this language fact and remember there are 20 language which does this, this is as I have often pronounced a shear drainage on our own resources, eg how we are going to modernize everything we create in India, education software etc.
Also note that writing byeeeee doesn’t add any charm to what we are saying.
In English its bye and in any Asiatic Language, one has a strict definition, of what the Roman (mapping, definition, transliteration) should be, which is why; beendiya is wrong and inconsistently wrong. To explain, we wrote here bendiya with an added stress of e to read it longer, as per the rule. We did not write bindiya or biindiya or bindiiya, which are the correct ones.
Also bindiyaa is not correct one, unless aa means; a is doubled. In that case, lets first, make sure which a it is; a-short अ or a-long आ? Having both is a blunder. We have all 3, a-zero, a-short and a-long, e-zero e-short and e-long and so on. So in result we have 50*15 instead of ~50*5. Thats 500 more phonetics per language. Then we have 20 languages which are 1–1 correspondences, of each other. So we have 10, 000 unnecessary phonetics.
Note that, these are order of the estimate, eg I said 500, there are more than 50 consonants that are conjugated phonetics, and so on. A strict number can be calculated though, and the research on this website, in last 2-3 years has made that totally possible, one of the achievement in language research.
Think of this, as buying 10 K computers instead of 1. Because ultimately computing power needs will be augmented in the exponents, and having made blunders upon blunders has only created massive sociological-economical-computational disasters. How about a Nehru Institute for computational disaster mitigation? And we have been doing that for 100 years. So why is India poor? We purchase more computers than we need, because we don’t use the ones that we have, efficiently.
In addition, we map, foreign language, into this scheme. There is degeneracy between spoken and written forms of our language, eg Hindi and Rajasthani? Bengali and Odia? If “they” speak a word differently does not mean they should be called as different language. Its such a wastage. What one can do is have one formal word, but different way of vowelation, as is prevalent, so, the computer reads only as one word, by some algorithm, thats implemented each time; a formulaic unity is found. eg Pronobo Dada and Pranaba Dada is a formulaic unity. One is eg used in Bengali and the other in another say Hindi or Odia, but it referring to the same adjective, noun etc the computer can be trained to recognize such as one. Gandhi (father of nation) Giri and Gandi (dirty) Giri are not a formulaic unity.
Writing a word like, bekar, is wrong, all consonants must follow a vowel except n. Japanese Language would write it as bekaru, bekara etc. As I have already mentioned, except for n, no other consonant or conjugated consonant has the right to sound like too special. Thats because only God can appear through such schemes, but it has no factual or real value. (its merely semantic in the sense I am saying God, whether we write bekara or bekar has no special difference, modern language is space-bar) And if you believe in God, you really don’t need to speak any language. Cover your mouth with a cotton and chant space-bar.
This scrapping consonants of their divine privileges has made Japanese Language one of the most advanced languages today. All Indian languages could have been put into one language, if we were to work in that spirit in the last 100 year. We have made huge blunders, because, this way one day there will be 300 language and 7000 dialects, and all we will hear is; how great our civilization has been. Its got nothing to do with civilization, but how machines have produced more and more language forms due to mathematical mixing.
Now, lets get back to where we started; how it will be useful, to have a Unified Roman Mapping, of all language in the world. Because, as you see above, Roman says chahunchi (ちゃふんち). Odia or Hindi (Indian language, in the list of prime language of India) as per our transLIT rules, would split it as, ch+a+hu+n+chi and what not due to machine mixing.
Once that is so, infinite number of other combinations are possible, in principle, and carried by machine or logic, which is the claim, I have made, we have created language, newer forms, by simply allowing such mis-thinking, although that also could have been prevented by succinctly defining all consonants and forbidding computer to accept any other input, there does exist many rules of Indianic, although not quite there yet and not quite full-proof.
One can argue that here ch consonant follows vowel: a. But 1st of all ch is not a consonant. Its c.
Although one could argue its a conjugated consonant, c and h, one must define conjugations with vowel rules defined on the consonants, that is, cehu would be a conjugated consonant from ch base, as would be, ciha, how to restrict ourselves to the minimal number of phonetics in a language from a moderner perspective should be the goal.
c has a degeneracy; s, k, and c. c is s-like in ceramic. c is k-like in Canada and c is c-like as in Chinese, this last one has been said as a ch, hence origin of ch consonant, there could be other routes to this definition, figure out first, before allowing vagueness to rule you all the way to heaven. But in defining a mapping one can’t simply take these degeneracy and put into one place, I tried that at home, and guess what, it dawned on me, all consonants and vowels alternate into each other, hence it amounts to saying “nothing”. Language has originated from nothing, just like Universe did. This is the basis of my pronouncement; Xa.Ma theory of Language, X and M alternate. M is specified and X is unknown variable and so on.
Risk of CH >> machine might split it like h and c. Then you have ca and ha and c can go into s or k, at times.
So Hiragana: chahunchi, is cha+hu+n+chi. where cha is not ch+a but chi+ya, always, by rule. Where i+y is understood to be present, because cha will always mean this for the computer. Now they have chi. But its not c+hi or c+h+i. Its a consonant chi, to be said as chi in chimney. (it would read like ci in some other target language mapping) chi, has zero chance of getting mixed with other consonant, because chi does not split into c or h or i. chi is always chi. The important thing is such “by themselves” phonetics units (aka syllables) are few in number, and so because, they are totally to be found, somewhere, in deep ancient usage. So to have 100 such phonetics and claiming we have a far deeper usage isn’t acceptable, we need written proof thats 1000 years old, a situation where science, technology, modernity faces the risk of “Einstein sure of limited prospects of Universe but not about …”
If you want to define such consonants for Indian Languages, I have already done that, through mapping of Hiragana to Hindi, Odia and therefore all 20 Indian languages, this way we will have ~ 50 total phonetic units for all Indian Language instead of 12 K or so, that we now have, why 12 K, because I showed you 10 K and we have some surplus pseudo consonants in the base alphabet, all these have been studied by me and posted on this website in last several/6? months. Actual number of total phonetics used in Indian Language System, can be even 15 K. I have probably done a better calculation somewhere, but don’t recall.
Once, therefore, there is only one Roman System for the entire world, the computer can’t be misled. To do a meaningful translation one must first satisfy all rules, which would be coded into the system and transparent, to users. If there is some issue it will be 6 or even 50 but not 15000. Its a scientific system not a sabbji mandy. (sabji mandi; A Vegetable Store House)
See the degeneracy between y and i, mandi and mandy, are the same thing, we need to choose, only one for now, for one country, and in the future, for all world. That way some of English Language’s superfluous nature can be addressed? Not sure. This is because its not clear English is a language or not, it uses Roman. One can test this by making a fictitious English Language to be written in one of the computer-mapping of other language. But presently there is much pseudo science going on about languages. eg they define delineations, genders, gender and delineation based formalism and so on, all this has been done (or can be done) to hide the actual limitations of language. Hindi is a pioneer of such practices, where table will be masculine, and chair would be feminine. Why is that? Kamasutra?]
Native Speakers in any language can be hired, to help machines, translate accurately, and then produce the voice, or develop software, which helps speakers communicate, via their native language; without understanding a target language. So organizations that work toward such goals take all language and produce internal software which does only Unified Roman Mapping. This is possible only when Google develops and completes all its present projects and integrates other serious scientific builders. All language research I have done, are on Google System, except my personal references. So what is to be done is to have; a basic usage of language first, which need to be translated, between all possible language, in the world, a prototype study, then machines can be installed at airports, shopping malls etc.
Lets say, I have a friend, who can give me the few things I want to say, in Tagalog, and that he gives, via a Roman transLIT. Then I feed that Roman script, into a machine and it will produce the spoken Tagalog. In advance, a good Roman transLIT must be developed for each language, as just explained above ( Roman transLIT, transLIT is a short hand for transliteration as transLAT is a shorthand for translation, transLIT is a mapping of how elements of phonetics are to be written in Roman, without ambiguity or degeneracy and modern rules as has been explained eg, in this article, various rules)
Also if a system is developed, one does not need even help from a friend. Everyone feeds necessary words and translations into a system, much like they do on Wikipedia and the end-user in real-time, uses this from his, her computer: he plugs in a word in his native language and the corresponding phrase, word, shows up, due to a Roman Mapping, (better?) than a simplistic translation software.
And the rules are strict, so on the sharing-database, there is only one input, which has to confirm, to base rule of transLIT in Roman which will be same for all language in the world. That is the Roman Mapping or TransLIT is same for all languages. For this to work, one needs to first, figure out all complexities pre-handed, but thats a local PIA not a global PIA. PIA: pain in the asterisks.
When Roman Rules are defined in the system, all other rules have to be restricted, not scrapped off. So any issue can be discussed offhand. Expect politics and religion in place of technology, because; there are a lot of people who don’t believe in maths because they were told maths is not something to be believed in.
Also that means you scan all possible words in a country, eg from a scripture. You also tackle all archaic way of writing a language and put them in proper place, degeneracy not with standing. This looks like a huge task for Research organizations, but you can really do this for your language in one evening (the rules, not the scanning). Follow some of the langauge articles on this website. I developed; Hiragana-Indian Mapping called it: Indougana (search indougana on this site) in 20–30 minutes. And I learned Hiragana in 3 hours and now I can read Japanese. Now I am onto Kanji. I recognize, I guess, 50–100 kanji by now and even giving you (some) meanings not known ever, by anyone. Check my last couple day’s articles. Also I have been writing haikus in the last year or more, some are good I guess:
kaeru iu ame wo futteru yūjin ne
蛙 いう
雨を 降ってる
友人 ねFrog says … its raining … friend’s coming.
The idea is its not very difficult to learn another language more so for a scientific purpose. You can learn all the rules of Hindi in a day or two, because 80% of that, are the rules of any other language, due to a great deal of Roman Unification that has been achieved so far, by unknown geniuses, whose work I am only trying to enhance.
Since Roman Mapping will be defined in one specific way, the computer will automatically pick up the right word, or a few right words, because, a speaker of a native language would have translated the meaning. That means a vast organization works, to bring absolutely everything into the data base …
One world united by one language.
Mathematics and technology makes this possible.