Encyclopaedia Britannica, 11th Edition, "Diameter" to "Dinarchus" Volume 8, Slice 4
Act 1894, is of interest. See Table IV.
Certain substitutions may be made in this scale at the option of the master of any emigrant ship, provided that the substituted articles are set forth in the contract tickets of the steerage passengers.
In the British army the soldier is fed partly by a system of co-operation. He gets a free ration from government of 1 lb. of bread and ¾ lb. of meat; in addition there is a messing allowance of 3½d. per man per day. He is able to supplement his food by purchases from the canteen. Much depends on the individual management in each regiment as to the satisfactory expenditure of the messing allowance. In some regiments an allowance is made from the canteen funds towards messing in addition to that granted by the government. The ordinary _field_ ration of the British soldier is 1½ lb. of bread or 1 lb. of biscuit; 1 lb. of fresh, salt or preserved meat; ½ oz. of coffee; 1/6 oz. of tea; 2 oz. of sugar; ½ oz. of salt, 1/36 oz. of pepper, the whole weighing something over 2 lb. 3 oz. This cannot be looked on as a fixed ration, as it varies in different campaigns, according to the country into which the troops may be sent. The Prussian soldier during peace gets weekly from his canteen 11 lb. 1 oz. of rye bread and not quite 2½ lb. of meat. This is obviously insufficient, but under
TABLE IV.--_Weekly, per Statute Adult._
+---------------------------+-------------------+-------------------+ | | Scale A. | Scale B. | | |For voyages not |For voyages | | | exceeding 84 days | exceeding 84 days | | | for sailing ships | for sailing ships | | | or 50 days | or 50 days | | | for steamships. | for steamships. | +---------------------------+-------------------+-------------------+ | | lb. oz. | lb. oz. | | Bread or biscuit, not | | | | inferior to navy biscuit | 3 8 | 3 8 | | Wheaten flour | 1 0 | 2 0 | | Oatmeal | 1 8 | 1 0 | | Rice | 1 8 | 0 8 | | Peas | 1 8 | 1 8 | | Beef | 1 4 | 1 4 | | Pork | 1 0 | 1 0 | | Butter | .. | 0 4 | | Potatoes | 2 0 | 2 0 | | Sugar | 1 0 | 1 0 | | Tea | 0 2 | 0 2 | | Salt | 0 2 | 0 2 | | Pepper (white or | | | | black), ground | 0 0½ | 0 0½ | | Vinegar | 1 gill | 1 gill | | Preserved meat | .. | 1 0 | | Suet | | 0 6 | | Raisins | | 0 8 | | Lime juice | | 0 6 | +---------------------------+-------------------+-------------------+
the conscription system it is reckoned that he will be able to make up the deficiency out of his own private means, or obtain charitable contributions from his friends. In the French infantry of the line each man during peace gets weekly 15 lb. of bread, 3-3/10 lb. of meat, 2½ lb. of haricot beans or other vegetables, with salt and pepper, and 1¾ oz. of brandy.
An Austrian under the same circumstances receives 13.9 lb. of bread, ½ lb. of flour and 3.3 lb. of meat.
The Russian conscript is allowed weekly:--
Black bread 7 lb. Meat 7 lb. Kvass (beer) 7.7 quarts. Sour cabbage 24½ gills = 122½ oz. Barley 24½ gills = 122½ oz. Salts 10½ oz. Horse-radish 28 grains. Pepper 28 grains. Vinegar 5½ gills = 26½ oz.
DIETETICS, the science of diet, i.e. the food and nutrition of man in health and disease (see NUTRITION). This article deals mainly with that part of the subject which has to do with the composition and nutritive values of foods and their adaptation to the use of people in health. The principal topics considered are: (1) Food and its functions; (2) Metabolism of matter and energy; (3) Composition of food materials; (4) Digestibility of food; (5) Fuel value of food; (6) Food consumption; (7) Quantities of nutrients needed; (8) Hygienic economy of food; (9) Pecuniary economy of food.
1. _Food and its Functions._--For practical purposes, food may be defined as that which, when taken into the body, may be utilized for the formation and repair of body tissue, and the production of energy. More specifically, food meets the requirements of the body in several ways. It is used for the formation of the tissues and fluids of the body, and for the restoration of losses of substance due to bodily activity. The potential energy of the food is converted into heat or muscular work or other forms of energy. In being thus utilized, food protects body substance or previously acquired nutritive material from consumption. When the amount of food taken into the body is in excess of immediate needs, the surplus may be stored for future consumption.
Ordinary food materials, such as meat, fish, eggs, vegetables, &c., consist of inedible materials, or _refuse_, e.g. bone of meat and fish, shell of eggs, rind and seed of vegetables; and _edible material_, as flesh of meat and fish, white and yolk of eggs, wheat flour, &c. The edible material is by no means a simple substance, but consists of _water_, and some or all of the compounds variously designated as food stuffs, proximate principles, nutritive ingredients or nutrients, which are classified as _protein_, _fats_, _carbohydrates_ and _mineral matters_. These have various functions in the nourishment of the body.
The _refuse_ commonly contains compounds similar to those in the food from which it is derived, but since it cannot be eaten, it is usually considered as a non-nutrient. It is of importance chiefly in a consideration of the pecuniary economy of food. _Water_ is also considered as a non-nutrient, because although it is a constituent of all the tissues and fluids of the body, the body may obtain the water it needs from that drunk; hence, that contained in the food materials is of no special significance as a nutrient.
_Mineral matters_, such as sulphates, chlorides, phosphates and carbonates of sodium, potassium, calcium, &c., are found in different combinations and quantities in most food materials. These are used by the body in the formation of the various tissues, especially the skeletal and protective tissues, in digestion, and in metabolic processes within the body. They yield little or no energy, unless perhaps the very small amount involved in their chemical transformation.
Protein[1] is a term used to designate the whole group of nitrogenous compounds of food except the nitrogenous fats. It includes the albuminoids, as albumin of egg-white, and of blood serum, myosin of meat (muscle), casein of milk, globulin of blood and of egg yolk, fibrin of blood, gluten of flour; the gelatinoids, as gelatin and allied substances of connective tissue, collagen of tendon, ossein of bone and the so-called extractives (e.g. creatin) of meats; and the amids (e.g. asparagin) and allied compounds of vegetables and fruits.
The albuminoids and gelatinoids, classed together as proteids, are the most important constituents of food, because they alone can supply the nitrogenous material necessary for the formation of the body tissues. For this purpose, the albuminoids are most valuable. Both groups of compounds, however, supply the body with energy, and the gelatinoids in being thus utilized protect the albuminoids from consumption for this purpose. When their supply in the food is in excess of the needs of the body, the surplus proteids may be converted into body fat and stored.
The so-called extractives, which are the principal constituents of meat extract, beef tea and the like, act principally as stimulants and appetizers. It has been believed that they serve neither to build tissue nor to yield energy, but recent investigations[2] indicate that creatin may be metabolized in the body.
The _fats_ of food include both the animal fats and the vegetable oils. The _carbohydrates_ include such compounds as starches, sugars and the fibre of plants or cellulose, though the latter has but little value as food for man. The more important function of both these classes of nutrients is to supply energy to the body to meet its requirements above that which it may obtain from the proteids. It is not improbable that the atoms of their molecules as well as those from the proteids are built up into the protoplasmic substance of the tissues. In this sense, these nutrients may be considered as being utilized also for the formation of tissue; but they are rather the accessory ingredients, whereas the proteids are the essential ingredients for this purpose. The fats in the food in excess of the body requirements may be stored as body fat, and the surplus carbohydrates may also be converted into fat and stored.
To a certain extent, then, the nutrients of the food may substitute each other. All may be incorporated into the protoplasmic structure of body tissue, though only the proteids can supply the essential nitrogenous ingredients; and apart from the portion of the proteid material that is indispensable for this purpose, all the nutrients are used as a source of energy. If the supply of energy in the food is not sufficient, the body will use its own proteid and fat for this purpose. The gelatinoids, fats and carbohydrates in being utilized for energy protect the body proteids from consumption. The fat stored in the body from the excess of food is a reserve of energy material, on which the body may draw when the quantity of energy in the food is insufficient for its immediate needs.
What compounds are especially concerned in intellectual activity is not known. The belief that fish is especially rich in phosphorus and valuable as a brain food has no foundation in observed fact.
2. _Metabolism of Matter and Energy._--The processes of nutrition thus consist largely of the transformation of food into body material and the conversion of the potential energy of both food and body material into the kinetic energy of heat and muscular work and other forms of energy. These various processes are generally designated by the term metabolism. The metabolism of matter in the body is governed largely by the needs of the body for energy. The science of nutrition, of which the present subject forms a part, is based on the principle that the transformations of matter and energy in the body occur in accordance with the laws of the conservation of matter and of energy. That the body can neither create nor destroy matter has long been universally accepted. It would seem that the transformation of energy must likewise be governed by the law of the conservation of energy; indeed there is every reason a priori to believe that it must; but the experimental difficulties in the way of absolute demonstration of the principle are considerable. For such demonstration it is necessary to prove that the income and expenditure of energy are equal. Apparatus and methods of inquiry devised in recent years, however, afford means for a comparison of the amounts of both matter and energy received and expended by the body, and from the results obtained in a large amount of such research, it seems probable that the law obtains in the living organism in general.
The first attempt at such demonstration was made by M. Rubner[3] in 1894, experimenting with dogs doing no external muscular work. The income of energy (as heat) was computed, but the heat eliminated was measured. In the average of eight experiments continuing forty-five days, the two quantities agreed within 0.47%, thus demonstrating what it was desired to prove--that the heat given off by the body came solely from the oxidation of food within it. Results in accordance with these were reported by Studenski[4] in 1897, and by Laulanie[5] in 1898.
The most extensive and complete data yet available on the subject have been obtained by W. O. Atwater, F. G. Benedict and associates[6] in experiments with men in the respiration calorimeter, in which a subject may remain for several consecutive days and nights. These experiments involve actual weighing and analyses of the food and drink, and of the gaseous, liquid and solid excretory products; determinations of potential energy (heat of oxidation) of the oxidizable material received and given off by the body (including estimation of the energy of the material gained or lost by the body); and measurements of the amounts of energy expended as heat and as external muscular work. By October 1906 eighty-eight experiments with fifteen different subjects had been completed. The separate experiments continued from two to thirteen days, making a total of over 270 days.
TABLE I.--_Percentage Composition of some Common Food Materials._
+--------------------------+--------+--------+---------+------+---------+--------+-----------+ | Food Material. | Refuse.| Water.| Protein.| Fat. | Carbo- | Mineral| Fuel Value| | | | | | |hydrates.| Matter.| per lb. | +--------------------------+--------+--------+---------+------+---------+--------+-----------+ | | | | | | | | | | | % | % | % | % | % | % | Calories. | |Beef, fresh (medium fat)--| | | | | | | | | Chuck | 16.3 | 52.6 | 15.5 | 15.0 | . . | 0.8 | 910 | | Loin | 13.3 | 52.5 | 16.1 | 17.5 | . . | 0.9 | 1025 | | Ribs | 20.8 | 43.8 | 13.9 | 21.2 | . . | 0.7 | 1135 | | Round | 7.2 | 60.7 | 19.0 | 12.8 | . . | 1.0 | 890 | | Shoulder | 16.4 | 56.8 | 16.4 | 9.8 | . . | 0.9 | 715 | |Beef, dried and smoked | 4.7 | 53.7 | 26.4 | 6.9 | . . | 8.9 | 790 | |Veal-- | | | | | | | | | Leg | 14.2 | 60.1 | 15.5 | 7.9 | . . | 0.9 | 625 | | Loin | 16.5 | 57.6 | 16.6 | 9.0 | . . | 0.9 | 685 | | Breast | 21.3 | 52.0 | 15.4 | 11.0 | . . | 0.8 | 745 | |Mutton-- | | | | | | | | | Leg | 18.4 | 51.2 | 15.1 | 14.7 | . . | 0.8 | 890 | | Loin | 16.0 | 42.0 | 13.5 | 28.3 | . . | 0.7 | 1415 | | Flank | 9.9 | 39.0 | 13.8 | 36.9 | . . | 0.6 | 1770 | |Pork-- | | | | | | | | | Loin | 19.7 | 41.8 | 13.4 | 24.2 | . . | 0.8 | 1245 | | Ham, fresh | 10.7 | 48.0 | 13.5 | 25.9 | . . | 0.8 | 1320 | | Ham, smoked and salted | 13.6 | 34.8 | 14.2 | 33.4 | . . | 4.2 | 1635 | | Fat, salt | . . | 7.9 | 1.9 | 86.2 | . . | 3.9 | 3555 | | Bacon | 7.7 | 17.4 | 9.1 | 62.2 | . . | 4.1 | 2715 | | Lard, refined | . . | . . | . . |100.0 | . . | . . | 4100 | |Chicken | 25.9 | 47.1 | 13.7 | 12.3 | . . | 0.7 | 765 | |Turkey | 22.7 | 42.4 | 16.1 | 18.4 | . . | 0.8 | 1060 | |Goose | 17.6 | 38.5 | 13.4 | 29.8 | . . | 0.7 | 1475 | |Eggs | 11.2 | 65.5 | 13.1 | 9.3 | . . | 0.9 | 635 | |Cod, fresh | 29.9 | 58.5 | 11.1 | 0.2 | . . | 0.8 | 220 | |Cod, salted | 24.9 | 40.2 | 16.0 | 0.4 | . . | 18.5 | 325 | |Mackerel, fresh | 44.7 | 40.4 | 10.2 | 4.2 | . . | 0.7 | 370 | |Herring, smoked | 44.4 | 19.2 | 20.5 | 8.8 | . . | 7.4 | 755 | |Salmon, tinned | . . | 63.5 | 21.8 | 12.1 | . . | 2.6 | 915 | |Oysters, shelled | . . | 88.3 | 6.0 | 1.3 | 3.3 | 1.1 | 225 | |Butter | . . | 11.0 | 1.0 | 85.0 | . . | 3.0 | 3410 | |Cheese | . . | 34.2 | 25.9 | 33.7 | 2.4 | 3.8 | 1885 | |Milk, whole | . . | 87.0 | 3.3 | 4.0 | 5.0 | 0.7 | 310 | |Milk, skimmed | . . | 90.5 | 3.4 | 0.3 | 5.1 | 0.7 | 165 | |Oatmeal | . . | 7.7 | 16.7 | 7.3 | 66.2 | 2.1 | 1800 | |Corn (maize) meal | . . | 12.5 | 9.2 | 1.9 | 75.4 | 1.0 | 1635 | |Rye flour | . . | 12.9 | 6.8 | 0.9 | 78.7 | 0.7 | 1620 | |Buckwheat flour | . . | 13.6 | 6.4 | 1.2 | 77.9 | 0.9 | 1605 | |Rice | . . | 12.3 | 8.0 | 0.3 | 79.0 | 0.4 | 1620 | |Wheat flour, white | . . | 12.0 | 11.4 | 1.0 | 75.1 | 0.5 | 1635 | |Wheat flour, graham | . . | 11.3 | 13.3 | 2.2 | 71.4 | 1.8 | 1645 | |Wheat, breakfast food | . . | 9.6 | 12.1 | 1.8 | 75.2 | 1.3 | 1680 | |Wheat bread, white | . . | 35.3 | 9.2 | 1.3 | 53.1 | 1.1 | 1200 | |Wheat bread, graham | . . | 35.7 | 8.9 | 1.8 | 52.1 | 1.5 | 1195 | |Rye bread | . . | 35.7 | 9.0 | 0.6 | 53.2 | 1.5 | 1170 | |Biscuit (crackers) | . . | 6.8 | 9.7 | 12.1 | 69.7 | 1.7 | 1925 | |Macaroni | . . | 10.3 | 13.4 | 0.9 | 74.1 | 1.3 | 1645 | |Sugar | . . | . . | . . | . . | 100.0 | . . | 1750 | |Starch (corn starch) | . . | . . | . . | . . | 90.0 | . . | 1680 | |Beans, dried | . . | 12.6 | 22.5 | 1.8 | 59.6 | 3.5 | 1520 | |Peas, dried | . . | 9.5 | 24.6 | 1.0 | 62.0 | 2.9 | 1565 | |Beets | 20.0 | 70.0 | 1.3 | 0.1 | 7.7 | 0.9 | 160 | |Cabbage | 50.0 | 44.2 | 0.7 | 0.2 | 4.5 | 0.4 | 100 | |Potatoes | 20.0 | 62.6 | 1.8 | 0.1 | 14.7 | 0.8 | 295 | |Sweet potatoes | 20.0 | 55.2 | 1.4 | 0.6 | 21.9 | 0.9 | 440 | |Tomatoes | . . | 94.3 | 0.9 | 0.4 | 3.9 | 0.5 | 100 | |Apples | 25.0 | 63.3 | 0.3 | 0.3 | 10.8 | 0.3 | 190 | |Bananas | 35.0 | 48.9 | 0.8 | 0.4 | 14.3 | 0.6 | 260 | |Grapes | 25.0 | 58.0 | 1.0 | 1.2 | 14.4 | 0.4 | 295 | |Strawberries | 5.0 | 85.9 | 0.9 | 0.6 | 7.0 | 0.6 | 150 | |Almonds | 45.0 | 2.7 | 11.5 | 30.2 | 9.5 | 1.1 | 1515 | |Brazil nuts | 49.6 | 2.6 | 8.6 | 33.7 | 3.5 | 2.0 | 1485 | |Chestnuts | 16.0 | 37.8 | 5.2 | 4.5 | 35.4 | 1.1 | 915 | |Walnuts | 58.1 | 1.0 | 6.9 | 26.6 | 6.8 | 0.6 | 1250 | +--------------------------+--------+--------+---------+------+---------+--------+-----------+
In some cases the subjects were at rest; in others they performed varying amounts of external muscular work on an apparatus by means of which the amount of work done was measured. In some cases they fasted, and in others they received diets generally not far from sufficient to maintain nitrogen, and usually carbon, equilibrium in the body. In these experiments the amount of energy expended by the body as heat and as external muscular work measured in terms of heat agreed on the average very closely with the amount of heat that would be produced by the oxidation of all the matter metabolized in the body. The variations for individual days, and in the average for individual experiments as well, were in some cases appreciable, amounting to as much as 6%, which is not strange in view of the uncertainties in physiological experimenting; but in the average of all the experiments the energy of the expenditure was above 99.9% of the energy of the income,--an agreement within one part in 1000. While these results do not absolutely prove the application of the law of the conservation of energy in the human body, they certainly approximate very closely to such demonstration. It is of course possible that energy may have given off from the body in other forms than heat and external muscular work. It is conceivable, for example, that intellectual activity may involve the transformation of physical energy, and that the energy involved may be eliminated in some form now unknown. But if the body did give off energy which was not measured in these experiments, the quantity must have been extremely small. It seems fair to infer from the results obtained that the metabolism of energy in the body occurred in conformity with the law of the conservation of energy.
3. _Composition of Food Materials._--The composition of food is determined by chemical analyses, the results of which are conventionally expressed in terms of the nutritive ingredients previously described. As a result of an enormous amount of such investigation in recent years, the kinds and proportions of nutrients in our common sorts of food are well known. Average values for percentage composition of some ordinary food materials are shown in Table I. (Table I. also includes figures for fuel value.)
It will be observed that different kinds of food materials vary widely in their proportions of nutrients. In general the animal foods contain the most protein and fats, and vegetable foods are rich in carbohydrates. The chief nutrient of lean meat and fish is protein; but in medium fat meats the proportion of fat is as large as that of protein, and in the fatter meats it is larger. Cheese is rich in both protein and fat. Among the vegetable foods, dried beans and peas are especially rich in protein. The proportion in oatmeal is also fairly large, in wheat it is moderate, and in maize meal and rice it is rather small. Oats contain more oil than any of the common cereals, but in none of them is the proportion especially large. The most abundant nutrient in all the cereals is starch, which comprises from two-thirds to three-fourths or more of their total nutritive substance. Cotton-seed is rich in edible oil, and so are olives. Some of the nuts contain fairly large proportions of both protein and fat. The nutrient of potatoes is starch, present in fair proportion. Fruits contain considerable carbohydrates, chiefly sugar. Green vegetables are not of much account as sources of any of the nutrients or energy.
Similar food materials from different sources may also differ considerably in composition. This is especially true of meats. Thus, the leaner portions from a fat animal may contain nearly as much fat as the fatter portions from a lean animal. The data here presented are largely those for American food products, but the available analyses of English food materials indicate that the latter differ but little from the former in composition. The analyses of meats produced in Europe imply that they commonly contain somewhat less fat and more water, and often more protein, than American meats. The meats of English production compare with the American more than with the European meats. Similar vegetable foods from the different countries do not differ so much in composition.
4. _Digestibility or Availability of Food Materials._--The value of any food material for nutriment depends not merely upon the kinds and amounts of nutrients it contains, but also upon the ease and convenience with which the nutrients may be digested, and especially upon the proportion of the nutrients that will be actually digested and absorbed. Thus, two foods may contain equal amounts of the same nutrient, but the one most easily digested will really be of most value to the body, because less effort is necessary to utilize it. Considerable study of this factor is being made, and much valuable information is accumulating, but it is of more especial importance in cases of disordered digestion.
TABLE II.--_Coefficients of Digestibility (or Availability) of Nutrients in Different Classes of Food Materials._
+--------------------------+----------+----------+----------------+ | Kind of Food. | Protein. | Fat. | Carbohydrates. | +--------------------------+----------+----------+----------------+ | | | | | | | % | % | % | | Meats | 98 | 98 | .. | | Fish | 96 | 97 | .. | | Poultry | 96 | 97 | .. | | Eggs | 97 | 98 | .. | | Dairy products | 97 | 96 | 98 | | Total animal food of | | | | | mixed diet | 97 | 97 | 98 | | Potatoes | 73 | .. | 98 | | Beets, carrots, &c. | 72 | .. | 97 | | Cabbage, lettuce, &c. | .. | .. | 83 | | Legumes | 78 | 90 | 95 | | Oatmeal | 78 | 90 | 97 | | Corn meal | 80 | .. | 99 | | Wheat meals without bran | 83 | .. | 93 | | Wheat meals with bran | 75 | .. | 92 | | White bread | 88 | .. | 98 | | Entire wheat bread | 82 | .. | 94 | | Graham bread | 76 | .. | 90 | | Rice | 76 | .. | 91 | | Fruits and nuts | 80 | 86 | 96 | | Sugars and starches | .. | .. | 98 | | Total vegetable food of | | | | | mixed diet | 85 | 90 | 97 | | Total food of mixed diet | 92 | 95 | 97 | +--------------------------+----------+----------+----------------+
The digestibility of food in the sense of thoroughness of digestion, however, is of particular importance in the present discussion. Only that portion of the food that is digested and absorbed is available to the body for the building of tissue and the production of energy. Not all the food eaten is thus actually digested; undigested material is excreted in the faeces. The thoroughness of digestion is determined experimentally by weighing and analysing the food eaten and the faeces pertaining to it. The difference between the corresponding ingredients of the two is commonly considered to represent the amounts of the ingredients digested. Expressed in percentages, these are called coefficients of digestibility. See Table II.
Such a method is not strictly accurate, because the faeces do not consist entirely of undigested food but contain in addition to this the so-called metabolic products, which include the residuum of digestive juices not resorbed, fragments of intestinal epithelium, &c. Since there is as yet no satisfactory method of separating these constituents of the excreta, the actual digestibility of the food is not determined. It has been suggested that since these materials must originally come from food, they represent, when expressed in terms of food ingredients, the cost of digestion; hence that the values determined as above explained represent the portion of food available to the body for the building of tissue and the yielding of energy, and what is commonly designated as digestibility should be called availability. Other writers retain the term "digestibility," but express the results as "apparent digestibility," until more knowledge regarding the metabolic products of the excreta is available and the actual digestibility may be ascertained.
Experimental inquiry of this nature has been very active in recent years, especially in Europe, the United States and Japan; and the results of considerably over 1000 digestion experiments with single foods or combinations of food materials are available. These were mostly with men, but some were with women and with children. The larger part of these have been taken into account in the following estimations of the digestibility of the nutrients in different classes of food materials. The figures here shown are subject to revision as experimental data accumulate. They are not to be taken as exact measures of the digestibility (or availability) of every kind of food in each given class, but they probably represent fairly well the average digestibility of the classes of food materials as ordinarily utilized in the mixed diet.
5. _Fuel Value of Food._--The potential energy of food is commonly measured as the amount of heat evolved when the food is completely oxidized. In the laboratory this is determined by burning the food in oxygen in a calorimeter. The results, which are known as the heat of combustion of the food, are expressed in calories, one calory being the amount of heat necessary to raise the temperature of one kilogram of water one degree centigrade. But it is to be observed that this unit is employed simply from convenience, and without implication as to what extent the energy of food is converted into heat in the body. The unit employed in the measurement of some other form of energy might be used instead, as, for example, the foot-ton, which represents the amount of energy necessary to raise one ton through one foot.
TABLE III.--_Estimates of Heats of Combustion and of Fuel Value of Nutrients in Ordinary Mixed Diet._
+---------------------------+-------------+-------------+ | Nutrients. | Heat of | Fuel Value. | | | Combustion. | | +---------------------------+-------------+-------------+ | | | | | | Calories. | Calories. | | | | | | One gram of protein | 5.65 | 4.05 | | One gram of fats | 9.40 | 8.93 | | One gram of carbohydrates | 4.15 | 4.03 | | | | | +---------------------------+-------------+-------------+
The amount of energy which a given quantity of food will produce on complete oxidation outside the body, however, is greater than that which the body will actually derive from it. In the first place, as previously shown, part of the food will not be digested and absorbed. In the second place, the nitrogenous compounds absorbed are not completely oxidized in the body, the residuum being excreted in the urine as urea and other bodies that are capable of further oxidation in the calorimeter. The total heat of combustion of the food eaten must therefore be diminished by the heat of combustion of the oxidizable material rejected by the body, to find what amount of energy is actually available to the organism for the production of work and heat. The amount thus determined is commonly known as the fuel value of food.
Rubner's[7] commonly quoted estimates for the fuel value of the nutrients of mixed diet are,--for protein and carbohydrates 4.1, and for fats 9.3 calories per gram. According to the method of deduction, however, these factors were more applicable to digested than to total nutrients. Atwater[8] and associates have deduced, from data much more extensive than those available to Rubner, factors for total nutrients somewhat lower than these, as shown in Table III. These estimates seem to represent the best average factors at present available, but are subject to revision as knowledge is extended.
TABLE IV.--_Quantities of Available Nutrients and Energy in Daily Food Consumption of Persons in Different Circumstances._
+------------------------------------------+--------+--------------------------------+ | | | Nutrients and Energy | | | Number | per Man per Day. | | | | of +------+------+--------+---------+ | |Studies.| Pro- | Fat. |Carbohy-| Fuel | | | | tein.| | drates.| Value. | +------------------------------------------+--------+------+------+--------+---------+ | | | | | | | | _Persons with Active Work._ | |Grams.|Grams.| Grams.|Calories.| | English royal engineers | 1 | 132 | 79 | 612 | 3835 | | Prussian machinists | 1 | 129 | 107 | 657 | 4265 | | Swedish mechanics | 5 | 174 | 105 | 693 | 4590 | | Bavarian lumbermen | 3 | 120 | 277 | 702 | 6015 | | American lumbermen | 5 | 155 | 327 | 804 | 6745 | | Japanese rice cleaner | 1 | 103 | 11 | 917 | 4415 | | Japanese jinrikshaw runner | 1 | 137 | 22 | 1010 | 5050 | | Chinese farm labourers in California | 1 | 132 | 90 | 621 | 3980 | | American athletes | 19 | 178 | 192 | 525 | 4740 | | American working-men's families | 13 | 156 | 226 | 694 | 5650 | | | | | | | | | | | | | | | | _Persons with Ordinary Work._ | | | | | | | Bavarian mechanics. | 11 | 112 | 32 | 553 | 3060 | | Bavarian farm labourers | 5 | 126 | 52 | 526 | 3200 | | Russian peasants | .. | 119 | 31 | 571 | 3155 | | Prussian prisoners | 1 | 117 | 28 | 620 | 3320 | | Swedish mechanics. | 6 | 123 | 75 | 507 | 3325 | | American working-men's families | 69 | 105 | 135 | 426 | 3480 | | | | | | | | | _Persons with Light Work._ | | | | | | | American artisans' families | 21 | 93 | 107 | 358 | 2880 | | English tailors (prisoners) | 1 | 121 | 37 | 509 | 2970 | | German shoemakers | 1 | 99 | 73 | 367 | 2629 | | Japanese prisoners | 1 | 43 | 6 | 444 | 2110 | | | | | | | | | _Professional and Business Men._ | | | | | | | Japanese professional men. | 13 | 75 | 15 | 408 | 2190 | | Japanese students | 8 | 85 | 18 | 537 | 2800 | | Japanese military cadets | 11 | 98 | 20 | 611 | 3185 | | German physicians | 2 | 121 | 90 | 317 | 2685 | | Swedish medical students | 5 | 117 | 108 | 291 | 2725 | | Danish physicians | 1 | 124 | 133 | 242 | 2790 | | American professional and business | | | | | | | men and students | 51 | 98 | 125 | 411 | 3285 | | | | | | | | | _Persons with Little or no Exercise._| | | | | | | Prussian prisoners | 2 | 90 | 27 | 427 | 2400 | | Japanese prisoners | 1 | 36 | 6 | 360 | 1725 | | Inmates of home for aged--Germany | 1 | 85 | 43 | 322 | 2097 | | Inmates of hospitals for insane--America | 49 | 80 | 86 | 353 | 2590 | | | | | | | | | _Persons in Destitute Circumstances._| | | | | | | Prussian working people | 13 | 63 | 43 | 372 | 2215 | | Italian mechanics | 5 | 70 | 36 | 384 | 2225 | | American working-men's families | 11 | 69 | 75 | 263 | 2085 | +------------------------------------------+--------+------+------+--------+---------+
The heats of combustion of all the fats in an ordinary mixed diet would average about 9.40 calories per gram, but as only 95% of the fat would be available to the body, the fuel value per gram would be (9.40 × 0.95 =) 8.93 calories. Similarly, the average heat of combustion of carbohydrates of the diet would be about 4.15 calories per gram, and as 97% of the total quantity is available to the body, the fuel value per gram would be 4.03. (It is commonly assumed that the resorbed fats and carbohydrates are completely oxidized in the body.) The heats of combustion of all the kinds of protein in the diet would average about 5.65 calories per gram. Since about 92% of the total protein would be available to the body, the potential energy of the available protein would be equivalent to (5.65 × 0.92 =) 5.20 calories; but as the available protein is not completely oxidized allowance must be made for the potential energy of the incompletely oxidized residue. This is estimated as equivalent to 1.15 calories for the 0.92 gram of available protein; hence, the fuel value of the total protein is (5.20 - 1.15 =) 4.05 calories per gram. Nutrients of the same class, but from different food materials, vary both in digestibility and in heat of combustion, and hence in fuel value. These factors are therefore not so applicable to the nutrients of the separate articles in a diet as to those of the diet as a whole.
6. _Food Consumption._--Much information regarding the food consumption of people in various circumstances in different parts of the world has accumulated during the past twenty years, as a result of studies of actual dietaries in England, Germany, Italy, Russia, Sweden and elsewhere in Europe, in Japan and other oriental countries, and especially in the United States. These studies commonly consist in ascertaining the kinds, amounts and composition of the different food materials consumed by a group of persons during a given period and the number of meals taken by each member of the group, and computing the quantities of the different nutrients in the food on the basis of one man for one day. When the members of the group are of different age, sex, occupation, &c., account must be taken of the effect of these factors on consumption in estimating the value "per man." Men as a rule eat more than women under similar conditions, women more than children, and persons at active work more than those at sedentary occupation. The navvy, for example, who is constantly using up more nutritive material or body tissue to supply the energy required for his muscular work needs more protein and energy in his food than a bookkeeper who sits at his desk all day.
In making allowance for these differences, the various individuals are commonly compared with a man at moderately active muscular work, who is taken as unity. A man at hard muscular work is reckoned at 1.2 times such an individual; a man with light muscular work or a boy 15-16 years old, .9; a man at sedentary occupation, woman at moderately active muscular work, boy 13-14 or girl 15-16 years old, .8; woman at light work, boy 12 or girl 13-14 years old, .7; boy 10-11 or girl 10-12 years old, .6; child 6-9 years old, .5; child 2-5 years old, .4; child under 2 years, .3. These factors are by no means absolute or final, but are based in part upon experimental data and in part upon arbitrary assumption.
The total number of dietary studies on record is very large, but not all of them are complete enough to furnish reliable data. Upwards of 1000 are sufficiently accurate to be included in statistical averages of food consumed by people in different circumstances, nearly half of which have been made in the United States in the past decade. The number of persons in the individual studies has ranged from one to several hundred. Some typical results are shown in Table IV.
7. _Quantities of Nutrients needed._--For the proper nourishment of the body, the important problem is how much protein, fats and carbohydrates, or more simply, what amounts of protein and potential energy are needed under varying circumstances, to build and repair muscular and other tissues and to supply energy for muscular work, heat and other forms of energy. The answer to the problem is sought in the data obtained in dietary studies with considerable numbers of people, and in metabolism experiments with individuals in which the income and expenditure of the body are measured. From the information thus derived, different investigators have proposed so-called dietary standards, such as are shown in the table below, but unfortunately the experimental data are still insufficient for entirely trustworthy figures of this sort; hence the term "standard" as here used is misleading. The figures given are not to be considered as exact and final as that would suggest; they are merely tentative estimates of the average daily amounts of nutrients and energy required. (It is to be especially noted that these are available nutrients and fuel value rather than total nutrients and energy.) Some of the values proposed by other investigators are slightly larger than these, and others are decidedly smaller, but these are the ones that have hitherto been most commonly accepted in Europe and America.
TABLE V.--_Standards for Dietaries. Available Nutrients and Energy per Man per Day._
+---------------------------+---------+--------+---------+---------+ | | Protein.| Fat. | Carbo- | Fuel | | | | |hydrates.| Value. | +---------------------------+---------+--------+---------+---------+ | | | | | | | _Voit's Standards._ |Grams.[9]| Grams. | Grams. |Calories.| | Man at hard work | 133 | 95 | 437 | 3270 | | Man at moderate work | 109 | 53 | 485 | 2965 | | _Atwater's Standards._ | | | | | | Man at very hard | | | | | | muscular work | 161 | ..[10]| ..[10]| 5500 | | Man at hard muscular work | 138 | .. | .. | 4150 | | Man at moderately | | | | | | active muscular work | 115 | .. | .. | 3400 | | Man at light to moderate | | | | | | muscular work | 103 | .. | .. | 3050 | | Man at "sedentary" | | | | | | or woman at moderately | | | | | | active work | 92 | .. | .. | 2700 | | Woman at light muscular | | | | | | work, or man without | | | | | | muscular exercise | 83 | .. | .. | 2450 | | | | | | | +---------------------------+---------+--------+---------+---------+
8. _Hygienic Economy of Food._--For people in good health, there are two important rules to be observed in the regulation of the diet. One is to choose the foods that "agree" with them, and to avoid those which they cannot digest and assimilate without harm; and the other is to use such sorts and quantities of foods as will supply the kinds and amounts of nutrients needed by the body and yet to avoid burdening it with superfluous material to be disposed of at the cost of health and strength.
As for the first-mentioned rule, it is practically impossible to give information that may be of more than general application. There are people who, because of some individual peculiarity, cannot use foods which for people in general are wholesome and nutritious. Some persons cannot endure milk, others suffer if they eat eggs, others have to eschew certain kinds of meat, or are made uncomfortable by fruit; but such cases are exceptions. Very little is known regarding the cause of these conditions. It is possible that in the metabolic processes to which the ingredients of the food are subjected in the body, or even during digestion before the substances are actually taken into the body, compounds may be formed that are in one way or another injurious. Whatever the cause may be, it is literally true in this sense that "what is one man's meat is another man's poison," and each must learn for himself what foods "agree" with him and what ones do not. But for the great majority of people in health, suitable combinations of the ordinary sorts of wholesome food materials make a healthful diet. On the other hand, some foods are of particular value at times, aside from their use for nourishment. Fruits and green vegetables often benefit people greatly, not as nutriment merely, for they may have very little actual nutritive material, but because of fruit or vegetable acids or other substances which they contain, and which sometimes serve a most useful purpose.
TABLE VI.--_Amounts of Nutrients and Energy Furnished for One Shilling in Food Materials at Ordinary Prices._
+----------------------+-------+----------------------------------------------+ | | | One Shilling will buy | | | +----------+-------------------------+---------+ | Food Materials |Prices | | Available Nutrients. | | | as Purchased. | per |Total Food+-------------------------+ Fuel | | | lb. |materials.| | | Carbo- | Value. | | | | |Protein.| Fat. |hydrates.| | +----------------------+-------+----------+--------+------+---------+---------+ | | s. d. | lb. | lb. | lb. | lb. |Calories.| | Beef, round | 0 10 | 1.20 | .22 | .14 | .. | 1,155 | | | 0 8½ | 1.41 | .26 | .17 | .. | 1,235 | | | 0 5 | 2.40 | .44 | .29 | .. | 2,105 | | | | | | | | | | Beef, sirloin | 0 10 | 1.20 | .19 | .20 | .. | 1,225 | | | 0 9 | 1.33 | .21 | .22 | .. | 1,360 | | | 0 8 | 1.50 | .. | .. | .. | .. | | | 0 5 | 2.40 | .. | .. | .. | .. | | | | | | | | | | Beef, rib | 0 9 | 1.33 | .19 | .19 | .. | 1,200 | | | 0 7½ | 1.60 | .. | .. | .. | .. | | | 0 4½ | 2.67 | .. | .. | .. | .. | | | | | | | | | | Mutton, leg | 0 9 | 1.33 | .20 | .20 | .. | 1,245 | | | 0 5 | 2.40 | .37 | .35 | .. | 2,245 | | | | | | | | | | Pork, spare-rib | 0 9 | 1.33 | .17 | .31 | .. | 1,645 | | | 0 7 | 1.71 | .22 | .39 | .. | 2,110 | | | | | | | | | | Pork, salt, fat | 0 7 | 1.71 | .03 | 1.40 | .. | 6,025 | | | 0 5 | 2.40 | .04 | 1.97 | .. | 8,460 | | | | | | | | | | Pork, smoked ham | 0 8 | 1.50 | .20 | .48 | .. | 2,435 | | | 0 4½ | 2.67 | .36 | .85 | .. | 4,330 | | | | | | | | | | Fresh cod | 0 4 | 3.00 | .34 | .01 | .. | 710 | | | 0 3 | 4.00 | .45 | .01 | .. | 945 | | | | | | | | | | Salt cod | 0 3½ | 3.43 | .54 | .07 | .. | 1,370 | | | 0 10 | 1.20 | .07 | .01 | .04 | 275 | | | | | | | | | |Milk, whole, 4d. a qt.| 0 2 | 6.00 | .19 | .23 | .30 | 1,915 | | " 3d. a qt.| 0 1½ | 8.00 | .26 | .30 | .40 | 2,550 | | " 2d. a qt.| 0 1 | 12.00 | .38 | .46 | .60 | 3,825 | | | | | | | | | | Milk, skimmed, 2d. a | 0 1 | 12.00 | .40 | .03 | .61 | 2,085 | | qt. | | | | | | | | Butter | 1 6 | .67 | .01 | .54 | .. | 2,320 | | | 1 3 | .80 | .01 | .64 | .. | 2,770 | | | 1 0 | 1.00 | .01 | .81 | .. | 3,460 | | | | | | | | | | Margarine | 0 4 | 3.00 | .. | 2.37 | .. | 10,080 | | | | | | | | | | Eggs, 2s. a dozen | 1 4 | .75 | .10 | .07 | .. | 475 | | " 1½s. a dozen | 1 0 | 1.00 | .13 | .09 | .. | 635 | | " 1s. a dozen | 0 8 | 1.50 | .19 | .13 | .. | 950 | | | | | | | | | | Cheese | 0 8 | 1.50 | .38 | .48 | .04 | 2,865 | | | 0 7 | 1.71 | .43 | .55 | .04 | 3,265 | | | 0 5 | 2.40 | .60 | .77 | .06 | 4,585 | | | | | | | | | | Wheat bread |0 1-1/8| 10.67 | .76 | .13 | 5.57 | 12,421 | | | | | | | | | | Wheat flour |0 1-3/5| 7.64 | .67 | .07 | 5.63 | 12,110 | | | 0 1½ | 8.16 | .72 | .07 | 6.01 | 12,935 | | | | | | | | | | Oatmeal |0 1-2/5| 8.39 | 1.11 | .54 | 5.54 | 14,835 | | | 0 1½ | 8.16 | 1.08 | .53 | 5.39 | 14,430 | | | | | | | | | | Rice | 0 1¾ | 6.86 | .45 | .02 | 5.27 | 10,795 | | | | | | | | | | Potatoes |0 0-2/3| 18.00 | .25 | .02 | 2.70 | 5,605 | | | 0 0½ | 24.00 | .34 | .02 | 3.60 | 7,470 | | | | | | | | | | Beans | 0 2 | 6.00 | 1.05 | .10 | 3.47 | 8,960 | | | | | | | | | | Sugar | 1 ¾ | 6.86 | .. | .. | 6.86 | 12,760 | +----------------------+-------+----------+--------+------+---------+---------+
The proper observance of the second rule mentioned requires information regarding the demands of the body for food under different circumstances. To supply this information is one purpose of the effort to determine the so-called dietary standards mentioned above. It should be observed, however, that these are generally more applicable to the proper feeding of a group or class of people as a whole than for particular individuals in this class. The needs of individuals will vary largely from the average in accordance with the activity and individuality. Moreover, it is neither necessary nor desirable for the individual to follow any standard exactly from day to day. It is requisite only that the average supply shall be sufficient to meet the demands of the body during a given period.
The cooking of food and other modes of preparing it for consumption have much to do with its nutritive value. Many materials which, owing to their mechanical condition or to some other cause, are not particularly desirable food materials in their natural state, are quite nutritious when cooked or otherwise prepared for consumption. It is also a matter of common experience that well-cooked food is wholesome and appetizing, whereas the same material poorly prepared is unpalatable. There are three chief purposes of cooking; the first is to change the mechanical condition of the food. Heating changes the structure of many food materials very materially, so that they may be more easily chewed and brought into a condition in which the digestive juices can act upon them more freely, and in this way probably influencing the ease and thoroughness of digestion. The second is to make the food more appetizing by improving the appearance or flavour or both. Food which is attractive to the eye and pleasing to the palate quickens the flow of saliva and other digestive juices and thus aids digestion. The third is to kill, by heat, disease germs, parasites or other dangerous organisms that may be contained in food. This is often a very important matter and applies to both animal and vegetable foods. Scrupulous neatness should always be observed in storing, handling and serving food. If ever cleanliness is desirable it must be in the things we eat, and every care should be taken to ensure it for the sake of health as well as of decency. Cleanliness in this connexion means not only absence of visible dirt, but freedom from undesirable bacteria and other minute organisms and from worms and other parasites. If food, raw or cooked, is kept in dirty places, peddled from dirty carts, prepared in dirty rooms and in dirty dishes, or exposed to foul air, disease germs and other offensive and dangerous substances may easily enter it.
9. _Pecuniary Economy of Food._--Statistics of economy and of cost of living in Great Britain, Germany and the United States show that at least half, and commonly more, of the income of wage-earners and other people in moderate circumstances is expended for subsistence. The relatively large cost of food, and the important influence of diet upon health and strength, make a more widespread understanding of the subject of dietetics very desirable. The maxim that "the best is the cheapest" does not apply to food. The "best" food, in the sense of that which is the finest in appearance and flavour and which is sold at the highest price, is not generally the most economical.
The price of food is not regulated largely by its value for nutriment. Its agreeableness to the palate or to the buyer's fancy is a large factor in determining the current demand and market price. There is no more nutriment in an ounce of protein or fat from the tender-loin of beef than from the round or shoulder. The protein of animal food has, however, some advantage over that of vegetable foods in that it is more thoroughly, and perhaps more easily, digested, for which reason it would be economical to pay somewhat more for the same quantity of nutritive material in the animal food. Furthermore, animal foods such as meats, fish and the like, gratify the palate as most vegetable foods do not. For persons in good health, foods in which the nutrients are the most expensive are like costly articles of adornment. People who can well afford them may be justified in buying them, but they are not economical. The most economical food is that which is at the same time most healthful and cheapest.
The variations in the cost of the actual nutriment in different food materials may be illustrated by comparison of the amounts of nutrients obtained for a given sum in the materials as bought at ordinary market prices. This is done in Table VI., which shows the amounts of available nutrients contained in the quantities of different food materials that may be purchased for one shilling at prices common in England.
When proper attention is given to the needs of the body for food and the relation between cost and nutritive value of food materials, it will be found that with care in the purchase and skill in the preparation of food, considerable control may be had over the expensiveness of a palatable, nutritious and healthful diet.
AUTHORITIES.--COMPOSITION OF FOODS:--König, _Chemie der menschlichen Nahrungs- und Genussmittel_; Atwater and Bryant, "Composition of American Food Materials," Bul. 28, Office of Experiment Stations, U.S. Department of Agriculture. NUTRITION AND DIETETICS:--Armsby, _Principles of Animal Nutrition_; Lusk, _The Science of Nutrition_; Burney Yeo, _Food in Health and Disease_; Munk and Uffelmann, _Die Ernährung des gesunden und kranken Menschen_; Von Leyden, _Ernährungstherapie und Diätetik_; Dujardin-Beaumetz, Hygiène alimentaire; Hutchison, _Food and Dietetics_; R. H. Chittenden, _Physiological Economy in Nutrition_ (1904), _Nutrition of Man_ (1907); Atwater, "Chemistry and Economy of Food," Bul. 21, Office of Experiment Stations, U.S. Department of Agriculture. See also other Bulletins of the same office on composition of food, results of dietary studies, metabolism experiments, &c., in the United States. GENERAL METABOLISM:--Voit, _Physiologie des allgemeinen Stoffwechsels und der Ernährung_; Hermann, _Handbuch der Physiologie_, Bd. vi.; Von Noorden, _Pathologie des Stoffwechsels_; Schäfer, _Text-Book of Physiology_, vol. i.; Atwater and Langworthy, "Digest of Metabolism Experiments," Bull. 45, Office of Experiment Stations, U.S. Department of Agriculture. (W. O. A.; R. D. M.)
FOOTNOTES:
[1] The terms applied by different writers to these nitrogenous compounds are conflicting. For instance, the term "proteid" is sometimes used as protein is here used, and sometimes to designate the group here called albuminoids. The classification and terminology here followed are those tentatively recommended by the Association of American Agricultural Colleges and Experiment Stations.
[2] Folin, _Festschrift für Olaf Hammarsten_, iii. (Upsala, 1906).
[3] _Ztschr. Biol._ 30, 73.
[4] In Russian. Cited in United States Department of Agriculture, Office of Experiment Stations, Bul. No. 45, _A Digest of Metabolism Experiments_, by W. O. Atwater and C. F. Langworthy.
[5] _Arch. physiol. norm. et path._ (1894) 4.
[6] U.S. Department of Agriculture, Office of Experiment Stations, Bulletins Nos. 63, 69, 109, 136, 175. For a description of the respiration calorimeter here mentioned see also publication No. 42 of the Carnegie Institution of Washington.
[7] _Ztschr. Biol._ 21 (1885), p. 377.
[8] _Connecticut_ (Storrs) _Agricultural Experiment Station Report_ (1899), 73.
[9] One ounce equals 28.35 grams.
[10] As the chief function of both fats and carbohydrates is to furnish energy, their exact proportion in the diet is of small account. The amount of either may vary largely according to taste, available supply, or other condition, as long as the total amount of both is sufficient, together with the protein to furnish the required energy.
DIETRICH, CHRISTIAN WILHELM ERNST (1712-1774), German painter, was born at Weimar, where he was brought up early to the profession of art by his father Johann George, then painter of miniatures to the court of the duke. Having been sent to Dresden to perfect himself under the care of Alexander Thiele, he had the good fortune to finish in two hours, at the age of eighteen, a picture which attracted the attention of the king of Saxony. Augustus II. was so pleased with Dietrich's readiness of hand that he gave him means to study abroad, and visit in succession the chief cities of Italy and the Netherlands. There he learnt to copy and to imitate masters of the previous century with a versatility truly surprising. Winckelmann, to whom he had been recommended, did not hesitate to call him the Raphael of landscape. Yet in this branch of his practice he merely imitated Salvator Rosa and Everdingen. He was more successful in aping the style of Rembrandt, and numerous examples of this habit may be found in the galleries of St Petersburg, Vienna and Dresden. At Dresden, indeed, there are pictures acknowledged to be his, bearing the fictitious dates of 1636 and 1638, and the name of Rembrandt. Among Dietrich's cleverest reproductions we may account that of Ostade's manner in the "Itinerant Singers" at the National Gallery. His skill in catching the character of the later masters of Holland is shown in candlelight scenes, such as the "Squirrel and the Peep-Show" at St Petersburg, where we are easily reminded of Godfried Schalcken. Dietrich tried every branch of art except portraits, painting Italian and Dutch views alternately with Scripture scenes and still life. In 1741 he was appointed court painter to Augustus III. at Dresden, with an annual salary of 400 thalers (£60), conditional on the production of four cabinet pictures a year. This condition, no doubt, accounts for the presence of fifty-two of the master's panels and canvases in one of the rooms at the Dresden museum. Dietrich, though popular and probably the busiest artist of his time, never produced anything of his own; and his imitations are necessarily inferior to the originals which he affected to copy. His best work is certainly that which he gave to engravings. A collection of these at the British Museum, produced on the general lines of earlier men, such as Ostade and Rembrandt, reveal both spirit and skill. Dietrich, after his return from the Peninsula, generally signed himself "Dietericij," and with this signature most of his extant pictures are inscribed. He died at Dresden, after he had successively filled the important appointments of director of the school of painting at the Meissen porcelain factory and professor of the Dresden academy of arts.
DIETRICH OF BERN, the name given in German popular poetry to Theodoric the Great. The legendary history of Dietrich differs so widely from the life of Theodoric that it has been suggested that the two were originally unconnected. Medieval chroniclers, however, repeatedly asserted the identity of Dietrich and Theodoric, although the more critical noted the anachronisms involved in making Ermanaric (d. 376) and Attila (d. 453) contemporary with Theodoric (b. 455). That the legend is based on vague historical reminiscences is proved by the retention of the names of Theodoric (Thiuda-reiks, Dietrich) and his father Theudemir (Dietmar), by Dietrich's connexion with Bern (Verona) and Raben (Ravenna). Something of the Gothic king's character descended to Dietrich, familiarly called the Berner, the favourite of German medieval saga heroes, although his story did not leave the same mark on later German literature as did that of the Nibelungs. The cycle of songs connected with his name in South Germany is partially preserved in the Heldenbuch (q.v.) in _Dietrich's Flucht_, the _Rabenschlacht_ and _Alpharts Tod_; but it was reserved for an Icelandic author, writing in Norway in the 13th century, to compile, with many romantic additions, a consecutive account of Dietrich. In this Norse prose redaction, known as the _Vilkina Saga_, or more correctly the _Thidrekssaga_, is incorporated much extraneous matter from the Nibelungen and Wayland legends, in fact practically the whole of south German heroic tradition.
There are traces of a form of the Dietrich legend in which he was represented as starting out from Byzantium, in accordance with historical tradition, for his conquest of Italy. But this early disappeared, and was superseded by the existing legend, in which, perhaps by an "epic fusion" with his father Theudemir, he was associated with Attila, and then by an easy transition with Ermanaric. Dietrich was driven from his kingdom of Bern by his uncle Ermanaric. After years of exile at the court of Attila he returned with a Hunnish army to Italy, and defeated Ermanaric in the Rabenschlacht, or battle of Ravenna. Attila's two sons, with Dietrich's brother, fell in the fight, and Dietrich returned to Attila's court to answer for the death of the young princes. This very improbable renunciation of the advantages of his victory suggests that in the original version of the story the Rabenschlacht was a defeat. In the poem of _Ermenrichs Tod_ he is represented as slaying Ermanaric, as in fact Theodoric slew Odoacer. "Otacher" replaces Ermanaric as his adversary in the _Hildebrandslied_, which relates how thirty years after the earlier attempt he reconquered his Lombard kingdom. Dietrich's long residence at Attila's court represents the youth and early manhood of Theodoric spent at the imperial court and fighting in the Balkan peninsula, and, in accordance with epic custom, the period of exile was adorned with war-like exploits, with fights with dragons and giants, most of which had no essential connexion with the cycle. The romantic poems of _König Laurin_, _Sigenot_, _Eckenlied_ and _Virginal_ are based largely on local traditions originally independent of Dietrich. The court of Attila (Etzel) was a ready bridge to the Nibelungen legend. In the final catastrophe he was at length compelled, after steadily holding aloof from the combat, to avenge the slaughter of his Amelungs by the Burgundians, and delivered Hagen bound into the hands of Kriemhild. The flame breath which anger induced from him shows the influence of pure myth, but the tales of his demonic origin and of his being carried off by the devil in the shape of a black horse may safely be put down to the clerical hostility to Theodoric's Arianism.
Generally speaking, Dietrich of Bern was the wise and just monarch as opposed to Ermanaric, the typical tyrant of Germanic legend. He was invariably represented as slow of provocation and a friend of peace, but once roused to battle not even Siegfried could withstand his onslaught. But probably Dietrich's fight with Siegfried in Kriemhild's rose garden at Worms is a late addition to the Rosengarten myth. The chief heroes of the Dietrich cycle are his tutor and companion in arms, Hildebrand (see HILDEBRAND, lay of), with his nephews the Wolfings Alphart and Wolfhart; Wittich, who renounced his allegiance to Dietrich and slew the sons of Attila; Heime and Biterolf.
The contents of the poems dealing with the Dietrich cycle are summarized by Uhland in _Schriften zur Geschichte der Dichtung und Sage_ (Stuttgart, 1873). The _Thidrekssaga_ (ed. C. Unger, Christiania, 1853) is translated into German by F. H. v. der Hagen in _Altdeutsche und altnordische Heldensagen_ (vols. i. and ii. 3rd ed., Breslau, 1872). A summary of it forms the concluding chapter of T. Hodgkin's _Theodoric the Goth_ (1891). The variations in the Dietrich legend in the Latin historians, in Old and Middle High German literature, and in the northern saga, can be studied in W. Grimm's _Deutsche Heldensage_ (2nd ed., Berlin, 1867). There is a good account in English in F. E. Sandbach's _Heroic Saga-cycle of Dietrich of Bern_ (1906), forming No. 15 of Alfred Nutt's _Popular Studies in Mythology_, and another in M. Bentinck Smith's translation of Dr O. L. Jiriczek's _Deutsche Heldensage_ (_Northern Legends_, London, 1902). For modern German authorities and commentators see B. Symons, "Deutsche Heldensage" in H. Paul's _Grd. d. german. Phil._ (Strassburg, new ed., 1905); also Goedeke, _Geschichte der deutschen Dichtung_ (i. 241-246).
DIEZ, FRIEDRICH CHRISTIAN (1794-1876), German philologist, was born at Giessen, in Hesse-Darmstadt, on the 15th of March 1794. He was educated first at the gymnasium and then at the university of his native town. There he studied classics under Friedrich Gottlieb Welcker (1784-1868) who had just returned from a two years' residence in Italy to fill the chair of archaeology and Greek literature. It was Welcker who kindled in him a love of Italian poetry, and thus gave the first bent to his genius. In 1813 he joined the Hesse corps as a volunteer and served in the French campaign. Next year he returned to his books, and this short taste of military service was the only break in a long and uneventful life of literary labours. By his parents' desire he applied himself for a short time to law, but a visit to Goethe in 1818 gave a new direction to his studies, and determined his future career. Goethe had been reading Raynouard's _Selections from the Romance Poets_, and advised the young scholar to explore the rich mine of Provençal literature which the French savant had opened up. This advice was eagerly followed, and henceforth Diez devoted himself to Romance literature. He thus became the founder of Romance philology. After supporting himself for some years by private teaching, he removed in 1822 to Bonn, where he held the position of privatdocent. In 1823 he published his first work, _An Introduction to Romance Poetry_; in the following year appeared _The Poetry of the Troubadours_, and in 1829 _The Lives and Works of the Troubadours_. In 1830 he was called to the chair of modern literature. The rest of his life was mainly occupied with the composition of the two great works on which his fame rests, the _Grammar of the Romance Languages_ (1836-1844), and the _Lexicon of the Romance Languages--Italian, Spanish and French_ (1853); in these two works Diez did for the Romance group of languages what Jacob Grimm did for the Teutonic family. He died at Bonn on the 29th of May 1876.
The earliest French philologists, such as Perion and Henri Estienne, had sought to discover the origin of French in Greek and even in Hebrew. For more than a century Ménage's _Etymological Dictionary_ held the field without a rival. Considering the time at which it was written (1650), it was a meritorious work, but philology was then in the empirical stage, and many of Ménage's derivations (such as that of "rat" from the Latin "mus," or of "haricot" from "faba") have since become bywords among philologists. A great advance was made by Raynouard, who by his critical editions of the works of the Troubadours, published in the first years of the 19th century, laid the foundations on which Diez afterwards built. The difference between Diez's method and that of his predecessors is well stated by him in the preface to his dictionary. In sum it is the difference between science and guess-work. The scientific method is to follow implicitly the discovered principles and rules of phonology, and not to swerve a foot's breadth from them unless plain, actual exceptions shall justify it; to follow the genius of the language, and by cross-questioning to elicit its secrets; to gauge each letter and estimate the value which attaches to it in each position; and lastly to possess the true philosophic spirit which is prepared to welcome any new fact, though it may modify or upset the most cherished theory. Such is the historical method which Diez pursues in his grammar and dictionary. To collect and arrange facts is, as he tells us, the sole secret of his success, and he adds in other words the famous apophthegm of Newton, "hypotheses non fingo." The introduction to the grammar consists of two parts:--the first discusses the Latin, Greek and Teutonic elements common to the Romance languages; the second treats of the six dialects separately, their origin and the elements peculiar to each. The grammar itself is divided into four books, on phonology, on flexion, on the formation of words by composition and derivation, and on syntax.
His dictionary is divided into two parts. The first contains words common to two at least of the three principal groups of Romance:--Italian, Spanish and Portuguese, and Provençal and French. The Italian, as nearest the original, is placed at the head of each article. The second part treats of words peculiar to one group. There is no separate glossary of Wallachian.
Of the introduction to the grammar there is an English translation by C. B. Cayley. The dictionary has been published in a remodelled form for English readers by T. C. Donkin.
DIEZ, a town of Germany, in the Prussian province of Hesse-Nassau, romantically situated in the deep valley of the Lahn, here crossed by an old bridge, 30 m. E. from Coblenz on the railway to Wetzlar. Pop. 4500. It is overlooked by a former castle of the counts of Nassau-Dillenburg, now a prison. Close by, on an eminence above the river, lies the castle of Oranienstein, formerly a Benedictine nunnery and now a cadet school, with beautiful gardens. There are a Roman Catholic and two Evangelical churches. The new part of the town is well built and contains numerous pretty villa residences. In addition to extensive iron-works there are sawmills and tanneries. In the vicinity are Fachingen, celebrated for its mineral waters, and the majestic castle of Schaumburg belonging to the prince of Waldeck-Pyrmont.
DIFFERENCES, CALCULUS OF (_Theory of Finite Differences_), that branch of mathematics which deals with the successive differences of the terms of a series.
1. The most important of the cases to which mathematical methods can be applied are those in which the terms of the series are the values, taken at stated intervals (regular or irregular), of a continuously varying quantity. In these cases the formulae of finite differences enable certain quantities, whose exact value depends on the law of variation (i.e. the law which governs the relative magnitude of these terms) to be calculated, often with great accuracy, from the given terms of the series, without explicit reference to the law of variation itself. The methods used may be extended to cases where the series is a double series (series of double entry), i.e. where the value of each term depends on the values of a pair of other quantities.
2. The _first differences_ of a series are obtained by subtracting from each term the term immediately preceding it. If these are treated as terms of a new series, the first differences of this series are the _second differences_ of the original series; and so on. The successive differences are also called _differences of the first, second, ... order_. The differences of successive orders are most conveniently arranged in successive columns of a table thus:--
+-----+----------+-----------+-----------------+----------------------+ |Term.| 1st Diff.| 2nd Diff. | 3rd Diff. | 4th Diff. | +-----+----------+-----------+-----------------+----------------------+ | | | | | | | a | | | | | | | b - a | | | | | b | | c - 2b +a | | | | | c - b | | d - 3c + 3b - a | | | c | | d - 2c +b | | e - 4d + 6c - 4b + a | | | d - c | | e - 3d + 3c - b | | | d | | e - 2d +c | | | | | e - d | | | | | e | | | | | +-----+----------+-----------+-----------------+----------------------+
_Algebra of Differences and Sums._
3. The formal relations between the terms of the series and the differences may be seen by comparing the arrangements (A) and (B) in fig. 1. In (A) the various terms and differences are the same as in § 2, but placed differently. In (B) we take a new series of terms [alpha], [beta], [gamma], [delta], commencing with the same term [alpha], and take the successive sums of pairs of terms, instead of the successive differences, but place them to the left instead of to the right. It will be seen, in the first place, that the successive terms in (A), reading downwards to the right, and the successive terms in (B), reading downwards to the left, consist each of a series of terms whose coefficients follow the binomial law; i.e. the coefficients in b - a, c - 2b + a, d - 3c + 3b - a, ... and in [alpha] + [beta], [alpha] + 2[beta] + [gamma], [alpha] + 3[beta] + 3[gamma] + [delta], ... are respectively the same as in y - x, (y - x)², (y - x)³, ... and in x + y, (x + y)², (x + y)³,.... In the second place, it will be seen that the relations between the various terms in (A) are identical with the relations between the similarly placed terms in (B); e.g. [beta] + [gamma] is the difference of [alpha] + 2[beta] + [gamma] and [alpha] + [beta], just as c - b is the difference of c and b: and d - c is the sum of c - b and d - 2c + b, just as [beta] + 2[gamma] + [delta] is the sum of [beta] + [gamma] and [gamma] + [delta]. Hence if we take [beta], [gamma], [delta], ... of (B) as being the same as b - a, c - 2b + a, d -3c + 3b - a, ... of (A), all corresponding terms in the two diagrams will be the same.
Thus we obtain the two principal formulae connecting terms and differences. If we provisionally describe b - a, c - 2b + a, ... as the first, second, ... differences of the particular term a (§ 7), then (i.) the nth difference of a is
n·n - 1 l - nk + ... + (-1)^(n-2) ------- c + (-1)^(n-1) nb + (-1)^n a, 1·2
where l, k ... are the (n + 1)th, nth, ... terms of the series a, b, c, ...; the coefficients being those of the terms in the expansion of (y -x)^n: and (ii.) the (n + 1)th term of the series, i.e. the nth term after a, is
n·n - 1 a + n[beta] + ------- [gamma] + ... 1·2
where [beta], [gamma], ... are the first, second, ... differences of a; the coefficients being those of the terms in the expansion of (x + y)^n.
4. Now suppose we treat the terms a, b, c, ... as being themselves the first differences of another series. Then, if the first term of this series is N, the subsequent terms are N + a, N + a + b, N + a + b + c, ...; i.e. the difference between the (n + 1)th term and the first term is the sum of the first n terms of the original series. The term N, in the diagram (A), will come above and to the left of a; and we see, by (ii.) of § 3, that the sum of the first n terms of the original series is
/ n·n - 1 \ n·n - 1 n·n - 1·n - 2 ( N + na + ------- [beta] + ...) - N = na + ------- [beta] + ------------- [gamma] + ... \ 1·2 / 1·2 1 · 2 · 3
5. As an example, take the arithmetical series
a, a + p, a + 2p, ...
The first differences are p, p, p, ... and the differences of any higher order are zero. Hence, by (ii.) of § 3, the (n + 1)th term is a + np, and, by § 4, the sum of the first n terms is na + ½n(n - 1)p = ½n{2a + (n - 1)p}.
6 As another example, take the series 1, 8, 27, ... the terms of which are the cubes of 1, 2, 3, ... The first, second and third differences of the first term are 7, 12 and 6, and it may be shown (§ 14 (i.)) that all differences of a higher order are zero. Hence the sum of the first n terms is
n·n - 1 n·n - 1·n - 2 n·n - 1·n - 2·n - 3 n + 7 ------- + 12 ------------- + 6 ------------------- = 1·2 1·2·3 1·2·3·4
¼n^4 + ½n³ + ¼n² = {½n(n + 1)}².
7. In § 3 we have described b - a, c - 2b + a, ... as the first, second, ... differences of a. This ascription of the differences to particular terms of the series is quite arbitrary. If we read the differences in the table of § 2 upwards to the right instead of downwards to the right, we might describe e - d, e - 2d + c, ... as the first, second, ... differences of e. On the other hand, the term of greatest weight in c -2b + a, i.e. the term which has the numerically greatest coefficient, is b, and therefore c - 2b + a might properly be regarded as the second difference of b, and similarly e - 4d + 6c - 4b + a might be regarded as the fourth difference of c. These three methods of regarding the differences lead to three different systems of notation, which are described in §§ 9, 10 and 11.
_Notation of Differences and Sums._
8. It is convenient to denote the terms a, b, c, ... of the series by u0, u1, u2, u3, ... If we merely have the terms of the series, un may be regarded as meaning the (n + 1)th term. Usually, however, the terms are the values of a quantity u, which is a function of another quantity x, and the values of x, to which a, b, c, ... correspond, proceed by a constant difference h. If x0 and u0 are a pair of corresponding values of x and u, and if any other value x0 + mh of x and the corresponding value of u are denoted by xm and um, then the terms of the series will be ... u_(n-2), u_(n-1), u_n, u_(n+1), u_(n+2) ..., corresponding to values Of x denoted by ... x_(n-2), x_(n-1), x_n, x_(n+1), x_(n+2)....
9. In the _advancing-difference notation_ u_(n+1) - u_n is denoted by [Delta]un. The differences [Delta]u0, [Delta]u1, [Delta]u2 ... may then be regarded as values of a function [Delta]u corresponding to values of x proceeding by constant difference h; and therefore [Delta]u_(n+1) -[Delta]u_n denoted by [Delta][Delta]u_n, or, more briefly, [Delta]²u_n; and so on. Hence the table of differences in § 2, with the corresponding values of x and of u placed opposite each other in the ordinary manner of mathematical tables, becomes
+---------+---------+----------------+-----------------+-----------------+----------------------+ | x | u | 1st Diff. | 2nd Diff. | 3rd Diff. | 4th Diff. | +---------+---------+----------------+-----------------+-----------------+----------------------+ | . | . | . | . | . | . | | . | . | . | . | . | . | | . | . | . | . | . | . | | | | | | | | | x_(n-2) | u_(n-2) | | [Delta]²u_(n-3) | | [Delta]^4u_(n-4) ... | | | | [Delta]u_(n-2) | | [Delta]³u_(n-3) | | | x_(n-1) | u_(n-1) | | [Delta]²u_(n-2) | | [Delta]^4u_(n-3) ... | | | | [Delta]u_(n-1) | | [Delta]³u_(n-2) | | | xn | u_n | | [Delta]²u_(n-1) | | [Delta]^4u_(n-2) ... | | | | [Delta]u_n | | [Delta]³u_(n-1) | | | x_(n+1) | u_(n+1) | | [Delta]²u_n | | [Delta]^4u_(n-1) ... | | | | [Delta]u_(n+1) | | [Delta]³u_n | | | x_(n+2) | u_(n+2) | | [Delta]²u_(n+1) | | [Delta]^4u_n ... | | . | . | . | . | . | . | | . | . | . | . | . | . | | . | . | . | . | . | . | +---------+---------+----------------+-----------------+-----------------+----------------------+
The terms of the series of which ... u_(n-1), u_n, u_(n+1), ... are the first differences are denoted by [Sigma]u, with proper suffixes, so that this series is ... [Sigma]u_(n-1), [Sigma]u_n, [Sigma]u_(n+1).... The suffixes are chosen so that we may have [Delta][Sigma]un = un, whatever n may be; and therefore (§ 4) [Sigma]un may be regarded as being the sum of the terms of the series up to and including un-1. Thus if we write [Sigma]u_(n-1) = C + un-2, where C is any constant, we shall have
[Sigma]u_n = [Sigma]u_(n-1) + [Delta][Sigma]u_(n-1) = C + u_(n-2) + u_(n-1), [Sigma]u_(n+1) = C + u_(n-2) + u_(n-1) + u_n,
and so on. This is true whatever C may be, so that the knowledge of ... u_n-1, u_n, ... gives us no knowledge of the exact value of [Sigma]u_n; in other words, C is an arbitrary constant, the value of which must be supposed to be the same throughout any operations in which we are concerned with values of [Sigma]_u corresponding to different suffixes.
There is another symbol E, used in conjunction with u to denote the next term in the series. Thus Eun means u_(n+1), so that Eun = u_n + [Delta]u_n.
10. Corresponding to the advancing-difference notation there is a _receding-difference_ notation, in which u_(n+1) - u_n is regarded as a difference of u_(n+1), and may be denoted by [Delta]'u_(n+1), and similarly u_(n+1) - 2u_n + u_(n-1) may be denoted by [Delta]'²u_(n+1). This notation is only required for certain special purposes, and the usage is not settled (§ 19 (ii.)).
11. The _central-difference_ notation depends on treating u_(n+1) - 2u_n -u_(n-1) as the second difference of un, and therefore as corresponding to the value x_n; but there is no settled system of notation. The following seems to be the most convenient. Since un is a function of x_n, and the second difference u_(n+2) - 2u_(n+1) + u_n is a function of x_(n+1), the first difference u_(n+1) - u_n must be regarded as a function of x_(n+½), i.e. of ½{x_n + x_(n+1)}. We therefore write u_(n+1) - u_n = [delta]u_(n+½), and each difference in the table in § 9 will have the same suffix as the value of x in the same horizontal line; or, if the difference is of an odd order, its suffix will be the means of those of the two nearest values of x. This is shown in the table below.
In this notation, instead of using the symbol E, we use a symbol [mu] to denote the mean of two consecutive values of u, or of two consecutive differences of the same order, the suffixes being assigned on the same principle as in the case of the differences. Thus
[mu]u_(n+½) = ½{u_n + u_(n+1)}, [mu][delta]u_n = ½{[delta]u_(n-½)} + [delta]u_(n+½), &c.
If we take the means of the differences of odd order immediately above and below the horizontal line through any value of x, these means, with the differences of even order in that line, constitute the _central differences_ of the corresponding value of u. Thus the table of central differences is as follows, the values obtained as means being placed in brackets to distinguish them from the actual differences:--
+-------+-------+---------------------+----------------+----------------------+----------------------+ | x | u | 1st Diff. | 2nd Diff. | 3rd Diff. | 4th Diff. | +-------+-------+---------------------+----------------+----------------------+----------------------+ | . | . | . | . | . | . | | . | . | . | . | . | . | | . | . | . | . | . | . | |x_(n-2)|u_(n-2)| {[mu][delta]u_(n-2)}| [delta]²u_(n-2)| {[mu][delta]³u_(n-2)}| [delta]^4u_(n-2) ... | | | | [delta]u_(n-3/2) | | [delta]³u_(n-3/2) | | |x_(n-1)|u_(n-1)| {[mu][delta]u_(n-1)}| [delta]²u_(n-1)| {[mu][delta]³u_(n-1)}| [delta]^4u_(n-1) ... | | | | [delta]u_(n-½) | | [delta]³u_(n-2 | | |x_n |u_n | ([mu][delta]u_n) | [delta]²u_n | ([mu][delta]³u_n) | [delta]^4u_n ... | | | | [delta]u_(n+½) | | [delta]³u_(n+½) | | |x_(n+1)|u_(n+1)| {[mu][delta]u_(n+1)}| [delta]²u_(n+1)| {[mu][delta]³u_(n+1)}| [delta]^4u_(n+1) ... | | | | [delta]u_(n+3/2) | | [delta]³u_(n+3/2) | | |x_(n+2)|u_(n+2)| {[mu][delta]u_(n+2)}| [delta]²u_(n+2)| {[mu][delta]³u_(n+2)}| [delta]^4u_(n+2) ... | | . | . | . | . | . | . | | . | . | . | . | . | . | | . | . | . | . | . | . | +-------+-------+---------------------+----------------+----------------------+----------------------+
Similarly, by taking the means of consecutive values of u and also of consecutive differences of even order, we should get a series of terms and differences central to the intervals x_(n-2) to x_(n-1), x_(n-1) to x_n, ....
The terms of the series of which the values of u are the first differences are denoted by [sigma]u, with suffixes on the same principle; the suffixes being chosen so that [delta][sigma]un shall be equal to un. Thus, if
[sigma]u_(n-3/2) = C + u_(n-2),
then
[sigma]u_(n-½) = C + u_(n-2) + u_(n-1), [sigma]_(n+½) = C + u_(n-2) + u_(n-1) + u_n, &c.,
and also
[mu][sigma]u_(n-1) = C + u_(n-2) + ½u_(n-1), [mu][sigma]u_n = C + u_(n-2) + u_(n-1) + ½u_n, &c.,
C being an arbitrary constant which must remain the same throughout any series of operations.
_Operators and Symbolic Methods._
12. There are two further stages in the use of the symbols [Delta], [Sigma], [delta], [sigma], &c., which are not essential for elementary treatment but lead to powerful methods of deduction.
(i.) Instead of treating [Delta]u as a function of x, so that [Delta]u_n means ([Delta]u)_n, we may regard [Delta] as denoting an _operation_ performed on u, and take [Delta]un as meaning [Delta].u_n. This applies to the other symbols E, [delta], &c., whether taken simply or in combination. Thus [Delta]Eu_n means that we first replace un by un+1, and then replace this by u_(n+2) - u_(n+1).
(ii.) The operations [Delta], E, [delta], and [mu], whether performed separately or in combination, or in combination also with numerical multipliers and with the operation of differentiation denoted by D (:= d/dx), follow the ordinary rules of algebra: e.g. [Delta](u_n + v_n) = [Delta]u_n + [Delta]v_n, [Delta]Du_n = D[Delta]u_n, &c. Hence the symbols can be separated from the functions on which the operations are performed, and treated as if they were algebraical quantities. For instance, we have
E·u_n = u_(n+1) = u_n + [Delta]u_n = 1·u_n + [Delta]·u_n,
so that we may write E = 1 + [Delta], or [Delta] = E - 1. The first of these is nothing more than a statement, in concise form, that if we take two quantities, subtract the first from the second, and add the result to the first, we get the second. This seems almost a truism. But, if we deduce E^n = (1 + [Delta])^n, [Delta]^n = (E-1)^n, and expand by the binomial theorem and then operate on u0, we get the general formulae
n·n - 1 un = u0 + n[Delta]u0 + ------- [Delta]^2u0 + ... + [Delta]^nu0, 1·2 n·n - 1 [Delta]^nu0 = u_n - nu_(n-1) + ------- u_(n-2) + ... + (-1)^nu0, 1·2
which are identical with the formulae in (ii.) and (i.) of § 3.
(iii.) What has been said under (ii.) applies, with certain reservations, to the operations [Sigma] and [sigma], and to the operation which represents integration. The latter is sometimes denoted by D^-1; and, since [Delta][Sigma]un = un, and [delta][sigma]u_n = u_n, we might similarly replace [Sigma] and [sigma] by [Delta]^-1 and [delta]^-1. These symbols can be combined with [Delta], E, &c. according to the ordinary laws of algebra, provided that proper account is taken of the arbitrary constants introduced by the operations D^-1, [Delta]^-1, [delta]^-1.
_Applications to Algebraical Series._
13. _Summation of Series._--If ur, denotes the (r+1)th term of a series, and if vr is a function of r such that [Delta]v_r = u_r for all integral values of r, then the sum of the terms u_m, u_(m+1), ... un is v_(n+1) -v_m. Thus the sum of a number of terms of a series may often be found by inspection, in the same kind of way that an integral is found.
14. _Rational Integral Functions._--(i.) If u_r is a rational integral function of r of degree p, then [Delta]ur, is a rational integral function of r of degree p-1.
(ii.) A particular case is that of a _factorial_, i.e. a product of the form (r+a+1) (r+a+2) ... (r+b), each factor exceeding the preceding factor by 1. We have
[Delta]·(r+a+1) (r+a+2) ... (r+b) = (b-a)·(r+a+2) ... (r+b),
whence, changing a into a-1,
[Sigma](r+a+1)(r+a+2) ... (r+b) = _const._ + (r+a)(r+a+1) ... (r+b)/(b-a+1).
A similar method can be applied to the series whose (r+1)th term is of the form 1/(r+a+1) (r+a+2) ... (r+b).
(iii.) Any rational integral function can be converted into the sum of a number of factorials; and thus the sum of a series of which such a function is the general term can be found. For example, it may be shown in this way that the sum of the pth powers of the first n natural numbers is a rational integral function of n of degree p+1, the coefficient of n^p+1 being 1/(p+1).
15. _Difference-equations._--The summation of the series ... + u_(n+2) + u_(n-1) + u_n is a solution of the _difference-equation_ [Delta]v_n = u_(n+1), which may also be written (E-1)v_n = u_(n+1). This is a simple form of difference-equation. There are several forms which have been investigated; a simple form, more general than the above, is the _linear equation_ with _constant coefficients_--
v_(n+m) + a1v_(n+m-1) + a2v_(n+m-2) + ... + a_mv_n = N,
where a1, a2, ... am are constants, and N is a given function of n. This may be written
(E^m + a1E^(m-1) + ... + a_m)v_n = N
or
(E-p1)(E-p2) ... (E-p_m)v_n = N.
The solution, if p1, p2, ... pm are all different, is vn = C1p1^n + C2p2^n + ... + C_mp_m^n + V_n, where C1, C2 ... are constants, and v_n = V_n is any one solution of the equation. The method of finding a value for Vn depends on the form of N. Certain modifications are required when two or more of the p's are equal.
It should be observed, in all cases of this kind, that, in describing C1, C2 as "constants," it is meant that the value of any one, as C1, is the same for all values of n occurring in the series. A "constant" may, however, be a periodic function of n.
_Applications to Continuous Functions._
16. The cases of greatest practical importance are those in which u is a continuous function of x. The terms u1, u2 ... of the series then represent the successive values of u corresponding to x = x1, x2.... The important applications of the theory in these cases are to (i.) relations between differences and differential coefficients, (ii.) interpolation, or the determination of intermediate values of u, and (iii.) relations between sums and integrals.
17. Starting from any pair of values x0 and u0, we may suppose the interval h from x0 to x1 to be divided into q equal portions. If we suppose the corresponding values of u to be obtained, and their differences taken, the successive advancing differences of u0 being denoted by dPu0, dP²u0 ..., we have (§ 3 (ii.))
q·q - 1 u1 = u0 + qdPu0 + ------- dP²u0 + .... 1·2
When q is made indefinitely great, this (writing f(x) for u) becomes Taylor's Theorem (INFINITESIMAL CALCULUS)
h² f(x + h) = f(x) + hf'(x) + --- f"(x) + ..., 1·2
which, expressed in terms of operators, is
h² h³ E = 1 + hD + ---D² + ----- D³ + ... = e^(hD). 1·2 1·2·3
This gives the relation between [Delta] and D. Also we have
2q·2q - 1 u2 = u0 + 2qdPu0 + --------- dP²u0 + ... 1·2
3q·3q - 1 u3 = u0 + 3qdPu0 + --------- dP²u0 + ... 1·2 . . . . . .
and, if p is any integer,
p·p - 1 u_(p/q) = u0 + pdPu0 + ------- dP²u0 + .... 1·2
From these equations up/q could be expressed in terms of u0, u1, u2, ...; this is a particular case of interpolation (q.v.).
18. _Differences and Differential Coefficients._--The various formulae are most quickly obtained by symbolical methods; i.e. by dealing with the operators [Delta], E, D, ... as if they were algebraical quantities. Thus the relation E = e^(hD) (§ 17) gives
hD = log_e (1 + [Delta]) = [Delta] - ½[Delta]² + 1/3 [Delta]³ ...
/du\ or h( -- ) = [Delta]u0 - ½[Delta]²u0 + 1/3 [Delta]³u0 .... \dx/0
The formulae connecting central differences with differential coefficients are based on the relations [mu] = cosh ½hD = ½(e^ ½hD + e^ -½hD), [delta] = 2 sinh ½hD - e^ ½hD - e^ -½hD, and may be grouped as follows:--
u0 = u0 \ | [mu][delta]u0 = (hD + 1/6 h³D³ + 1/120 h^5 D^5 + ...)u0 | | [delta]²u0 = (h²D² + 1/12 h^4 D^4 + 1/360 h^6 D^6 + ...)u0 > | [mu][delta]³u0 = (h³D³ + 1/4 h^5 D^5 + ...)u0 | | [delta]^4 u0 = (h^4 D^4 + 1/6 h^6 D^6 + ...)u0 /
. . . . . . . . .
[mu]u_½ = (1 + 1/8 h²D² + 1/384 h^4 D^4 + 1/46080 h^6 D^6 + ...)u_½ \ | [delta]u_½ = (hD + 1/24 h³D³ + 1/1920 h^5 D^5 + ...)u_½ | | [mu][delta]²u_½ = (h²D² + 5/24 h^4 D^4 + 91/5760 h^6 D^6 + ...)u_½ > | [delta]³u_½ = (h³D³ + 1/8 h^5 D^5 + ...)u_½ | | [mu][delta]^4 u_½ = (h^4 D^4 + 7/24 h^6 D^6 + ...)u_½ /
. . . . . . . . .
u0 = u0 \ | hDu0 = ([mu][delta] - 1/6 [mu][delta]³ + 1/30 [mu][delta]^5 - ...)u0 | | h²D²u0 = ([delta]² - 1/12 [delta]^4 + 1/90 [delta]^6 - ...)u0 > | h³D³u0 = ([mu][delta]³ 1/4 [mu][delta]^5 + ...)u0 | | h^4 D^4 u_0 = ([delta]^4 - 1/6 [delta]^6 + ...)u0 /
. . . . . . . . .
u_½ = ([mu] - 1/8 [mu][delta]² + 3/128 [mu][delta]^4 - 5/1024 [mu][delta]^6 + ...)u_½ \ | hDu_½ = ([delta] - 1/24 [delta]³ + 3/640 [delta]^5 - ...)u_½ | | h²D²u_½ = ([mu][delta]² - 5/24 [mu][delta]^ + 259/5760 [mu][delta]^6 - ...)u_½ > | h³D³u_½ = ([delta]³ - 1/8 [delta]^5 + ...)u_½ | | h^4 D^4 u_½ = ([mu][delta]^4 - 7/24 [mu][delta]^6 + ...)u_½ /
. . . . . . . . .
When u is a rational integral function of x, each of the above series is a terminating series. In other cases the series will be an infinite one, and may be divergent; but it may be used for purposes of approximation up to a certain point, and there will be a "remainder," the limits of whose magnitude will be determinate.
19. _Sums and Integrals._--The relation between a sum and an integral is usually expressed by the _Euler-Maclaurin formula_. The principle of this formula is that, if um and um+1, are ordinates of a curve, distant h from one another, then for a first approximation to the area of the curve between um and um+1 we have ½h(u_m + u_m+1), and the difference between this and the true value of the area can be expressed as the difference of two expressions, one of which is a function of x_m, and the other is the same function of x_m+1. Denoting these by [phi](x_m) and [phi](xm+1), we have
_ x_m+1 / | udx = ½h(u_m + u_m+1) + [phi](x_m+1 ) - [phi](x_m). _/x_m
Adding a series of similar expressions, we find
_ x_n / | udx = h{½u_m + u_m+1 + u_m+2 + ... + u_n-1 + ½u_n} + [phi](x_n) - [phi](x_m). _/x_m
The function [phi](x) can be expressed in terms either of differential coefficients of u or of advancing or central differences; thus there are three formulae.
(i.) The Euler-Maclaurin formula, properly so called, (due independently to Euler and Maclaurin) is
_ x_n / 1 du_n 1 d³u_n 1 d^5 u_n | udx = h·[mu][sigma]u_n - -- h² ---- + --- h^4 ----- - ----- h^6 ------- + ... _/x_m 12 dx 720 dx³ 30240 dx^5
B1 du_n B2 d³u_n B3 d^5u_n = h·[mu][sigma]u_n - -- h2 ---- + -- h^4 ----- - -- h^6 ------ + ..., 2! dx 4! dx³ 6! dx^5
where B1, B2, B3 ... are _Bernoulli's numbers_.
(ii.) If we express differential coefficients in terms of advancing differences, we get a theorem which is due to Laplace:--
_ x_n 1 / - | udx = [mu][sigma](u_n - u0) - 1/12 ([Delta]u_n - [Delta]u0) + 1/24 ( [Delta]²u_n - [Delta]²u0) h _/x0
- 19/720 ([Delta]³u_n - [Delta]³u_0) + 3/160 ([Delta]^4 u_n - [Delta]^4 u0) - ...
For practical calculations this may more conveniently be written
_ x_n 1 / - | udx = [mu][sigma](u_n - u0) + 1/12 ([Delta]u0 - ½[Delta]²u0 + 19/60 [Delta]³u0 - ...) h _/x0
+ 1/12 ([Delta]'u_n - ½[Delta]'²u_n + 19/60 [Delta]'³u_n - ...),
where accented differences denote that the values of u are read backwards from un; i.e. [Delta]'un denotes u_n-1 - u_n, not (as in § 10) u_n - u_n-1.
(iii.) Expressed in terms of central differences this becomes
_ x_n 1 / - | udx = [mu][sigma](u_n - u0) - 1/12 [mu][delta]u_n + 11/720 [mu][delta]³u_n - ... h _/x0 + 1/12 [mu][delta]u0 - 11/720 [mu][delta]³u0 + ...
/ 1 11 191 2497 \ / \ = [mu]([sigma] - -- [delta] + --- [delta]³ - ----- [delta]^5 + ------- [delta]^7 - ...)(u_n - u0). \ 12 720 60480 3628800 / \ /
(iv.) There are variants of these formulae, due to taking hum+½ as the first approximation to the area of the curve between um and um+1; the formulae involve the sum u_½ + u_3/2 + ... + u_n-½ := [sigma](u_n - u0) (see MENSURATION).
20. The formulae in the last section can be obtained by symbolical methods from the relation
_ 1 / 1 1 - | udx = - D^1 u = --·u. h _/ h hD
Thus for central differences, if we write [theta] := ½hD, we have [mu] = cosh [theta], [delta] = 2 sinh [theta], [sigma] = [delta]^-1, and the result in (iii.) corresponds to the formula
/ / 1 2 2·4 \ sinh [theta] = [theta] cosh [theta]/ (1 + - sinh²[theta] - --- sinh^4[theta] + ----- sinh^6[theta] - ...). / \ 3 3·5 3·5·7 /
REFERENCES.--There is no recent English work on the theory of finite differences as a whole. G. Boole's _Finite Differences_ (1st ed., 1860, 2nd ed., edited by J. F. Moulton, 1872) is a comprehensive treatise, in which symbolical methods are employed very early. A. A. Markoff's _Differenzenrechnung_ (German trans., 1896) contains general formulae. (Both these works ignore central differences.) _Encycl. der math. Wiss._ vol. i. pt. 2, pp. 919-935, may also be consulted. An elementary treatment of the subject will be found in many text-books, e.g. G. Chrystal's _Algebra_ (pt. 2, ch. xxxi.). A. W. Sunderland, _Notes on Finite Differences_ (1885), is intended for actuarial students. Various central-difference formulae with references are given in _Proc. Lond. Math. Soc._ xxxi. pp. 449-488. For other references see INTERPOLATION. (W. F. SH.)
DIFFERENTIAL EQUATION, in mathematics, a relation between one or more functions and their differential coefficients. The subject is treated here in two parts: (1) an elementary introduction dealing with the more commonly recognized types of differential equations which can be solved by rule; and (2) the general theory.
_Part I.--Elementary Introduction._
Of equations involving only one independent variable, x (known as _ordinary_ differential equations), and one dependent variable, y, and containing only the first differential coefficient dy/dx (and therefore said to be of the first _order_), the simplest form is that reducible to the type
dy/dx = f(x)/F(y),
leading to the result fF(y)dy - ff(x)dx = A, where A is an arbitrary constant; this result is said to solve the differential equation, the problem of evaluating the integrals belonging to the integral calculus.
Another simple form is
dy/dx + yP = Q,
where P, Q are functions of x only; this is known as the linear equation, since it contains y and dy/dx only to the first degree. If
fPdx = u, we clearly have
d /dy \ --(ye^u) =e^u ( -- + Py) = e^u Q, dx \dx /
so that y = e^-u(fe^u Qdx + A) solves the equation, and is the only possible solution, A being an arbitrary constant. The rule for the solution of the linear equation is thus to multiply the equation by e^u, where u = fPdx.
A third simple and important form is that denoted by
y = px + f(p),
where p is an abbreviation for dy/dx; this is known as Clairaut's form. By differentiation in regard to x it gives
dp dp p = p + x-- + f'(p)--, dx dx
where
d f'(p) = -- f(p); dp
thus, either (i.) dp/dx = 0, that is, p is constant on the curve satisfying the differential equation, which curve is thus any one of the straight lines y = cx = f(c), where c is an arbitrary constant, or else, (ii.) x + [f]'(p) = 0; if this latter hypothesis be taken, and p be eliminated between x + f'(p) = 0 and y = px + f(p), a relation connecting x and y, not containing an arbitrary constant, will be found, which obviously represents the envelope of the straight lines y = cx + f(c).
In general if a differential equation [phi](x, y, dy/dx) = 0 be satisfied by any one of the curves F(x, y, c) = 0, where c is an arbitrary constant, it is clear that the envelope of these curves, when existent, must also satisfy the differential equation; for this equation prescribes a relation connecting only the co-ordinates x, y and the differential coefficient dy/dx, and these three quantities are the same at any point of the envelope for the envelope and for the particular curve of the family which there touches the envelope. The relation expressing the equation of the envelope is called a _singular_ solution of the differential equation, meaning an _isolated_ solution, as not being one of a family of curves depending upon an arbitrary parameter.
An extended form of Clairaut's equation expressed by
y = xF(p) + f(p)
may be similarly solved by first differentiating in regard to p, when it reduces to a linear equation of which x is the dependent and p the independent variable; from the integral of this linear equation, and the original differential equation, the quantity p is then to be eliminated.
Other types of solvable differential equations of the first order are (1)
M dy/dx = N,
where M, N are homogeneous polynomials in x and y, of the same order; by putting v = y/x and eliminating y, the equation becomes of the first type considered above, in v and x. An equation (aB <> bA)
(ax + by + c)dy/dx = Ax + By + C
may be reduced to this rule by first putting x + h, y + k for x and y, and determining h, k so that ah + bk + c = 0, Ah + Bk + C = 0.
(2) An equation in which y does not explicitly occur,
f(x, dy/dx) = 0,
may, theoretically, be reduced to the type dy/dx = F(x); similarly an equation F(y, dy/dx) = 0.
(3) An equation
f(dy/dx, x, y) = 0,
which is an integral polynomial in dy/dx, may, theoretically, be solved for dy/dx, as an algebraic equation; to any root dy/dx = F1(x, y) corresponds, suppose, a solution [phi]1(x, y, c) = 0, where c is an arbitrary constant; the product equation [phi]1(x, y, c)[phi]2(x, y, c) ... = 0, consisting of as many factors as there were values of dy/dx, is effectively as general as if we wrote [phi]1(x, y, c1) [phi]2(x, y, c2) ... = 0; for, to evaluate the first form, we must necessarily consider the factors separately, and nothing is then gained by the multiple notation for the various arbitrary constants. The equation [phi]1(x, y, c)[phi]2(x, y, c) ... = 0 is thus the solution of the given differential equation.
In all these cases there is, except for cases of singular solutions, one and only one arbitrary constant in the most general solution of the differential equation; that this must necessarily be so we may take as obvious, the differential equation being supposed to arise by elimination of this constant from the equation expressing its solution and the equation obtainable from this by differentiation in regard to x.
A further type of differential equation of the first order, of the form
dy/dx = A + By + Cy²
in which A, B, C are functions of x, will be briefly considered below under differential equations of the second order.
When we pass to ordinary differential equations of the second order, that is, those expressing a relation between x, y, dy/dx and d²y/dx², the number of types for which the solution can be found by a known procedure is very considerably reduced. Consider the general linear equation
d²y dy --- + P-- + Qy = R, dx² dx
where P, Q, R are functions of x only. There is no method always effective; the main general result for such a linear equation is that if any particular function of x, say y1, can be discovered, for which
d²y1 dy1 ---- + P--- + Qy1 = 0, dx² dx
then the substitution y = y1[eta] in the original equation, with R on the right side, reduces this to a linear equation of the first order with the dependent variable d[eta]/dx. In fact, if y = y1[eta] we have
dy d[eta] dy1 d²y d²[eta] dy1 d[eta] d²y1 -- = y1------ + [eta]--- and --- = y1------- + 2--- ------ + [eta]----, dx dx dx dx² dx² dx dx dx²
and thus
d²y dy d²[eta] / dy1 \ d[eta] /d²y1 dy1 \ --- + P -- + Qy = y1------- + ( 2--- + Py1) ------ + ( ---- + P--- + Qy1)[eta]; dx² dx dx² \ dx / dx \ dx² dx /
if then
d²y1 dy1 ---- + P --- + Qy1 = 0, dx² dx
and z denote d[eta]/dx, the original differential equation becomes
dz / dy1 \ y1-- + ( 2--- + Py1)z = R. dx \ dx /
From this equation z can be found by the rule given above for the linear equation of the first order, and will involve one arbitrary constant; thence y = y1 [eta] = y1 [int] zdx + Ay1, where A is another arbitrary constant, will be the general solution of the original equation, and, as was to be expected, involves two arbitrary constants.
The case of most frequent occurrence is that in which the coefficients P, Q are constants; we consider this case in some detail. If [t]* be a root of the quadratic equation [t]² + [t]P + Q = 0, it can be at once seen that a particular integral of the differential equation with zero on the right side is y1 = e^[theta]x. Supposing first the roots of the quadratic equation to be different, and [phi] to be the other root, so that [p] + [t] = -P, the auxiliary differential equation for z, referred to above, becomes dz/dx + ([t] - [p])z = Re^(-[t]^x), which leads to ze^{([t]-[p])^x} = B + [int] Re^(-[p]^x)dx, where B is an arbitrary constant, and hence to
(*) [t] = [theta]; [p] = [phi]. _ _ _ / / / y = Ae^([t]^x) + e^([t]^x)| Be^([p]-[t])^x dx + e^[t]^x | e^([p]-[t])^x | Re^-[p]^x dxdx, _/ _/ _/
or say to y = Ae^[t]^x + Ce^[p]^x + U, where A, C are arbitrary constants and U is a function of x, not present at all when R = 0. If the quadratic equation [t]² + P[t] + Q = 0 has equal roots, so that 2[t] = -P, the auxiliary equation in z becomes dz/dx = Re^-[t]^x, giving z = B + [int] Re^-[t]^x dx, where B is an arbitrary constant, and hence _ _ / / y = (A + Bx)e^[t]^x + e^[t]^x | | Re^-[t]^x dxdx, _/ _/
or, say, y = (A + Bx)e^[t]^x + U, where A, B are arbitrary constants, and U is a function of x not present at all when R = 0. The portion Ae^[t]^x + Be^[p]^x or (A + Bx)e^[t]^x of the solution, which is known as the _complementary function_, can clearly be written down at once by inspection of the given differential equation. The remaining portion U may, by taking the constants in the complementary function properly, be replaced by any particular solution whatever of the differential equation
d²v dy --- + P -- + Qy = R; dx² dx
for if u be any particular solution, this has a form
u = A0 e^[t]^x + B0 e^[p]^x + U,
or a form
u = (A0 + B0x)e^[t]^x + U;
thus the general solution can be written
(A - A0)e^[t]^x + (B - B0)e^[p]^x + u,
or
{A - A0 + (B - B0)x}e^[t]^x + u,
where A - A0, B - B0, like A, B, are arbitrary constants.
A similar result holds for a linear differential equation of any order, say
d^n y d^n-1 y ----- + P1 ------- + ... + P_n y = R, dx_n dx^n-1
where P1, P2, ... Pn are constants, and R is a function of x. If we form the algebraic equation [t]^n + P1[t]^n-1 + ... + P_n = 0, and all the roots of this equation be different, say they are [t]1, [t]2, ... [t]n, the general solution of the differential equation is
y = A1 e^[t]1^x + A2 e^[t]2^x + ... + A_n e^[t]_n^x + u,
where A1, A2, ... An are arbitrary constants, and u is any particular solution whatever; but if there be one root [t]1 repeated r times, the terms A1 e^[t]1^x + ... + A_r e^[t]_r^x must be replaced by (A1 + A2x + ... + A_r x^r-1)e^[t]1x where A1, ... An are arbitrary constants; the remaining terms in the complementary function will similarly need alteration of form if there be other repeated roots.
To complete the solution of the differential equation we need some method of determining a particular integral u; we explain a procedure which is effective for this purpose in the cases in which R is a sum of terms of the form e^ax[p](x), where [p](x) is an integral polynomial in x; this includes cases in which R contains terms of the form cos bx·[p](x) or sin bx·[p](x). Denote d/dx by D; it is clear that if u be any function of x, D(e^ax u) = e^ax Du + ae^ax u, or say, D(e^ax u) = e^ax (D + a)u; hence D²(e^ax u), i.e. d²/dx² (e^ax u), being equal to D(e^ax v), where v=(D + a)u, is equal to e^ax(D + a)v, that is to e^ax(D + a)²u. In this way we find D^n(e^ax u) = e^ax(D + a)^n u, where n is any positive integer. Hence if [psi](D) be any polynomial in D with constant coefficients, [psi](D)(e^ax u) = e^ax [psi](D + a)u. Next, denoting [int] udx by D^-1 u, and any solution of the differential equation dz/dx + az = u by z = (d + a)^-1 u, we have D[e^ax(D + a)^-1 u] = D(e^ax z) = e^ax(D + a)z = e^ax u, so that we may write D^-1(e^ax u) = e^ax(D+a)^-1 u, where the meaning is that one value of the left side is equal to one value of the right side; from this, the expression D^-2(e^axu), which means D^-1[D^-1(e^ax u)], is equal to D^-1(e^ax z) and hence to e^ax(D + a)^-1 z, which we write e^ax(D + a)^-2 u; proceeding thus we obtain
D^-n(e^ax u) = e^ax(D + a)^-n u,
where n is any positive integer, and the meaning, as before, is that one value of the first expression is equal to one value of the second. More generally, if [psi](D) be any polynomial in D with constant coefficients, and we agree to denote by 1/[psi](D) u any solution z of the differential equation [psi](D)z = u, we have, if v = 1/[psi](D + a) u, the identity [psi](D)(e^ax v) = e^ax [psi](D + a)v = e^ax u, which we write in the form
1 1 --------(e^ax u) = e^ax ------------ u. [psi](D) [psi](D + a)
This gives us the first step in the method we are explaining, namely that a solution of the differential equation [psi](D)y = e^ax u + e^bx v + ... where u, v, ... are any functions of x, is any function denoted by the expression
1 1 e^ax ------------ u + e^ax ------------ v + .... [psi](D + a) [psi](D + b)
It is now to be shown how to obtain one value of 1/[psi](D + a) u, when u is a polynomial in x, namely one solution of the differential equation [psi](D + a)z = u. Let the highest power of x entering in u be x^m; if t were a variable quantity, the rational fraction in t, 1/[psi](t + a), by first writing it as a sum of partial fractions, or otherwise, could be identically written in the form
K_r t^-r + K_r-1 t^-r+1 + ... + K1 t^-1 + H + H1t + ... + H_m t^m + t^m+1 [p](t)/[psi](t + a),
where [p](t) is a polynomial in t; this shows that there exists an identity of the form
1 = [psi](t + a)(K_r t^-r + ... + K1t^-1 + H + H1t + ... + H_m t^m) + [p](t)t^m+1,
and hence an identity
u = [psi](D + a)[K_r D^-r + ... + K1D^-1 + H + H1D + ... + H_m D^m]u + [p](D)D^m+1 u;
in this, since u contains no power of x higher than x^m, the second term on the right may be omitted. We thus reach the conclusion that a solution of the differential equation [psi](D + a)z = u is given by
z = (K_r D^-r + ... + K1D^-1 + H + H1D + ... + H_m D^m)u,
of which the operator on the right is obtained simply by expanding 1/[psi](D + a) in ascending powers of D, as if D were a numerical quantity, the expansion being carried as far as the highest power of D which, operating upon u, does not give zero. In this form every term in z is capable of immediate calculation.
_Example._--For the equation
d^4v d²y ---- + 2--- + y = x³ cos x or (D² + 1)²y = x³ cos x, dx^4 dx³
the roots of the associated algebraic equation ([t]²+1)² = 0 are [t] = ±i, each repeated; the complementary function is thus
(A + Bx)e^ix + (C + Dx)e^ix,
where A, B, C, D are arbitrary constants; this is the same as
(H + Kx) cos x + (M + Nx) sin x,
where H, K, M, N are arbitrary constants. To obtain a particular integral we must find a value of (1 + D²)^-2 x³ cos x; this is the real part of (1+D²)^-2 e^ix x³ and hence of e^ix [1 + (D + i)²]^-2 x³
or e^ix [2iD(1 + ½iD)]^-2 x³,
or -¼e^ix D^-2 (1 + iD - ¾D² - ½iD³ + 5/16 D^4 + 3/16 iD^5 ...)x³,
or -¼e^ix(1/20 x^5 + ¼ix^4 - ¾x³ - 3/2 ix² + 15/8 x + 9/8 i);
the real part of this is
-¼(1/20 x^5 - ¾x² + 15/8 x) cos x + ¼(¼x^4 - 3/2 x² + 9/8) sin x.
This expression added to the complementary function found above gives the complete integral; and no generality is lost by omitting from the particular integral the terms -15/32 x cos x + 9/32 sin x, which are of the types of terms already occurring in the complementary function.
The symbolical method which has been explained has wider applications than that to which we have, for simplicity of explanation, restricted it. For example, if [psi](x) be any function of x, and a1, a2, ... an be different constants, and [(t + a1) (t + a2) ... (t + an)]^-1 when expressed in partial fractions be written [Sigma]c_m(t + a_m)^-1, a particular integral of the differential equation (D + a1)(D + a2) ... (D + a_n)y = [psi](x) is given by
y = [Sigma]c_m(D + a_m)^-1 [psi](x) = [Sigma]c_m(D + a_m)^-1 e^-a m^x e^a m^x [psi](x) =
[Sigma]c_m e^-a m^x D^-1 (e^a m^x [psi](x)) = [Sigma]c_m e^-a m^x [int] e^a m^x [psi](x)dx.
The particular integral is thus expressed as a sum of n integrals.
A linear differential equation of which the left side has the form
d^ny d^n-1 y dy x^n ---- + P1x^n-1 ------- + ... + P_n-1 x-- + P_n y, dx^n dx^n-1 dx
where P1, ... Pn are constants, can be reduced to the case considered above. Writing x = e^t we have the identity
d^mu x^m ---- = [t]([t] - 1)([t] - 2) ... ([t] - m + 1)u, where [t] = d/dt. dx^m
When the linear differential equation, which we take to be of the second order, has variable coefficients, though there is no general rule for obtaining a solution in finite terms, there are some results which it is of advantage to have in mind. We have seen that if one solution of the equation obtained by putting the right side zero, say y1, be known, the equation can be solved. If y2 be another solution of
d²y dy --- + P-- + Qy = 0, dx² dx
there being no relation of the form my1 + ny2 = k, where m, n, k are constants, it is easy to see that
d/dx(y1'y2 - y1y2') = P(y1'y2 - y1y2'),
so that we have
y1'y2 - y1y2' = A exp.([int] Pdx),
where A is a suitably chosen constant, and exp. z denotes e^z. In terms of the two solutions y1, y2 of the differential equation having zero on the right side, the general solution of the equation with R = [phi](x) on the right side can at once be verified to be Ay1 + By2 + y1u - y2v, where u, v respectively denote the integrals _ _ / / u = |y2[phi](x)(y1'y2 - y2'y1)^-1 dx, v = |y1[phi](x)(y1'y2 - y2'y1)^-1 dx. _/ _/
The equation
d²y dy --- + P-- + Qy = 0, dx² dx
by writing y = v exp. (-½ [int] Pdx), is at once seen to be reduced to d²v/dx² + 1v = 0, where 1 = Q - ½dP/dx - ¼P². If [eta] = - 1/v dv/dx, the equation d²v/dx² + 1v = 0 becomes d[eta]/dx = 1 + [eta]², a non-linear equation of the first order.
More generally the equation
d[eta] ------ = A + B[eta] + C[eta]², dx
where A, B, C are functions of x, is, by the substitution
1 dy [eta] = - -- --, Cy dx
reduced to the linear equation
d²y / 1 dC\ dy --- - ( B + - -- )-- + ACy = 0. dx² \ C dx/ dx
The equation
d[eta] ------ = A + B[eta] + C[eta]², dx
known as Riccati's equation, is transformed into an equation of the same form by a substitution of the form [eta] = (aY + b)/(cY + d), where a, b, c, d are any functions of x, and this fact may be utilized to obtain a solution when A, B, C have special forms; in particular if any particular solution of the equation be known, say [eta]0, the substitution [eta] = [eta]0 - 1/Y enables us at once to obtain the general solution; for instance, when
d /A\ 2B = -- log( - ), dx \C/
a particular solution is [eta]0 = [root](-A/C). This is a case of the remark, often useful in practice, that the linear equation
d²y d[phi] dy [phi](x)--- + ½------ -- + [mu]y = 0, dx² dx dx
where [mu] is a constant, is reducible to a standard form by taking a new independent variable _ / z = | dx[[p](x)]^-½. _/
We pass to other types of equations of which the solution can be obtained by rule. We may have cases in which there are two dependent variables, x and y, and one independent variable t, the differential coefficients dx/dt, dy/dt being given as functions of x, y and t. Of such equations a simple case is expressed by the pair
dx dy -- = ax + by + c, -- = a'x + b'y + c', dt dt
wherein the coefficients a, b, c, a', b', c', are constants. To integrate these, form with the constant [lambda] the differential coefficient of z = x + [lambda]y, that is dz/dt = (a + [lambda]a')x + (b + [lambda]b')y + c + [lambda]c', the quantity [lambda] being so chosen that b + [lambda]b' = [lambda](a + [lambda]a'), so that we have dz/dt = (a + [lambda]a')z + c + [lambda]c'; this last equation is at once integrable in the form z(a + [lambda]a') + c + [lambda]c' = Ae^(a + [lambda]a')t, where A is an arbitrary constant. In general, the condition b + [lambda]b' = [lambda](a + [lambda]a') is satisfied by two different values of [lambda], say [lambda]1, [lambda]2; the solutions corresponding to these give the values of x +[lambda]1y and x + [lambda]2y, from which x and y can be found as functions of t, involving two arbitrary constants. If, however, the two roots of the quadratic equation for [lambda] are equal, that is, if (a - b')² + 4a'b = 0, the method described gives only one equation, expressing x + [lambda]y in terms of t; by means of this equation y can be eliminated from dx/dt = ax + by + c, leading to an equation of the form dx/dt = Px + Q + Re^(a + [lambda]a')t, where P, Q, R are constants. The integration of this gives x, and thence y can be found.
A similar process is applicable when we have three or more dependent variables whose differential coefficients in regard to the single independent variables are given as linear functions of the dependent variables with constant coefficients.
Another method of solution of the equations
dx/dt = ax + by + c, dy/dt = a'x + b'y + c',
consists in differentiating the first equation, thereby obtaining
d²x dx dy --- = a-- + b--; dt² dt dx
from the two given equations, by elimination of y, we can express dy/dt as a linear function of x and dx/dt; we can thus form an equation of the shape d²x/dt² = P + Qx + Rdx/dt, where P, Q, R are constants; this can be integrated by methods previously explained, and the integral, involving two arbitrary constants, gives, by the equation dx/dt = ax + by + c, the corresponding value of y. Conversely it should be noticed that any single linear differential equation
d²x dx --- = u + vx + w--, dt² dt
where u, v, w are functions of t, by writing y for dx/dt, is equivalent with the two equations dx/dt = y, dy/dt = u + vx + wy. In fact a similar reduction is possible for any system of differential equations with one independent variable.
Equations occur to be integrated of the form
Xdx + Ydy + Zdz = 0,
where X, Y, Z are functions of x, y, z. We consider only the case in which there exists an equation [phi](x, y, z) = C whose differential
dP[phi] dP[phi] dP[phi] -------dx + -------dy + -------dz = 0 dPx dPy dPz
is equivalent with the given differential equation; that is, [mu] being a proper function of x, y, z, we assume that there exist equations
dP[phi] dP[phi] v[phi] ------- = [mu]X, ------- = [mu]Y, ------ = [mu]Z; dPx vy vz
these equations require
dP dP ---([mu]Y) = ---([mu]Z), &c., dPz dPy
and hence
/dPZ dPY\ /dPX dPZ\ /dPY dPX\ X( --- - --- ) + Y( --- - --- ) + Z( --- - --- ) = 0; \dPy dPz/ \dPz dPx/ \dPx dPy/
conversely it can be proved that this is sufficient in order that [mu] may exist to render [mu](Xdx + Ydy + Zdz) a perfect differential; in particular it may be satisfied in virtue of the three equations such as
dPZ dPY --- - --- = 0; dPy dPz
in which case we may take [mu] = 1. Assuming the condition in its general form, take in the given differential equation a plane section of the surface [phi] = C parallel to the plane z, viz. put z constant, and consider the resulting differential equation in the two variables x, y, namely Xdx + Ydy = 0; let [psi](x, y, z) = constant, be its integral, the constant z entering, as a rule, in [psi] because it enters in X and Y. Now differentiate the relation [psi](x, y, z) = [f](z), where [f] is a function to be determined, so obtaining
dP[psi] dP[psi] /dP[psi] df\ -------dx + -------dy + ( ------- - -- )dz = 0; dPx dPy \ dPz dz/
there exists a function [sigma] of x, y, z such that
dP[psi] dP[psi] -------- = [sigma]X, ------- = [sigma]Y, dPx dPy
because [psi] = constant, is the integral of Xdx + Ydy = 0; we desire to prove that [f] can be chosen so that also, in virtue of [psi](x, y, z) = f(z), we have
dP[psi] df df dP[psi] ------- - -- = [sigma]Z, namely -- = ------- - [sigma]Z; dPz dz dz dPz
if this can be proved the relation [psi](x, y, z) - f(z) = constant, will be the integral of the given differential equation. To prove this it is enough to show that, in virtue of [psi](x, y, z) = [f](z), the function dP[psi]/dPx - [sigma]Z can be expressed in terms of z only. Now in consequence of the originally assumed relations,
dP[psi] dP[phi] dP[phi] ------- = [mu]X, ------- = [mu]Y, ------- = [mu]Z, dPx dPy dPz
we have
dP[psi] /dP[phi] [sigma] dP[psi] /dP[phi] ------- / ------- = ------- = ------- / -------, dPx / dPx [mu] dPy / dPy
and hence
dP[psi] dP[phi] dP[psi] dP[phi] ------- ------- - ------- ------- = 0; dPx dPy dPy dPx
this shows that, as functions of x and y, [psi] is a function of [phi] (see the note at the end of part i. of this article, on Jacobian determinants), so that we may write [psi] = F(z, [phi]), from which
[sigma] dPF dP[psi] dPF dPF dP[phi] dPF [sigma] dPF ------- = -------; then ------- = --- + ------- ------- = --- + ------- · [mu]Z = --- + [sigma]Z [mu] dP[phi] dPz dPz dP[phi] dPz dPz [mu] dPz
dP[psi] dPF or ------- - [sigma]Z = ---; dPz dPz
in virtue of [psi](x, y, z) = f(z), and [psi] = F(z, [phi]), the function [phi] can be written in terms of z only, thus dPF/dPz can be written in terms of z only, and what we required to prove is proved.
Consider lastly a simple type of differential equation containing _two_ independent variables, say x and y, and one dependent variable z, namely the equation
dPz dPz P--- + Q--- = R, dPx dPy
where P, Q, R are functions of x, y, z. This is known as Lagrange's linear partial differential equation of the first order. To integrate this, consider first the ordinary differential equations dx/dz = P/R, dy/dz = Q/R, and suppose that two functions u, v, of x, y, z can be determined, independent of one another, such that the equations u = a, v = b, where a, b are arbitrary constants, lead to these ordinary differential equations, namely such that
dPu dPu dPu dPv dPv dPv P--- + Q--- = R--- = 0 and P--- + Q--- = R--- = 0. dPx dPy dPz dPx dPy dPz
Then if F(x, y, z) = 0 be a relation satisfying the original differential equations, this relation giving rise to
dPF dPF dPz dPF dPF dPz dPF dPF dPF --- + --- --- = 0 and --- + --- --- = 0, we have P--- + Q--- = R--- = 0. dPx dPz dPx dPy dPz dPy dPx dPy dPz
It follows that the determinant of three rows and columns vanishes whose first row consists of the three quantities dPF/dPx, dPF/dPy, dPF/dPz, whose second row consists of the three quantities dPu/dPx, dPu/dPy, dPu/dPz, whose third row consists similarly of the partial derivatives of v. The vanishing of this so-called Jacobian determinant is known to imply that F is expressible as a function of u and v, unless these are themselves functionally related, which is contrary to hypothesis (see the note below on Jacobian determinants). Conversely, any relation [phi](u, v) = 0 can easily be proved, in virtue of the equations satisfied by u and v, to lead to
dz dz P-- + Q-- = R. dx dx
The solution of this partial equation is thus reduced to the solution of the two ordinary differential equations expressed by dx/P = dy/Q = dz/R. In regard to this problem one remark may be made which is often of use in practice: when one equation u = a has been found to satisfy the differential equations, we may utilize this to obtain the second equation v = b; for instance, we may, by means of u = a, eliminate z--when then from the resulting equations in x and y a relation v = b has been found containing x and y and a, the substitution a = u will give a relation involving x, y, z.
_Note on Jacobian Determinants._--The fact assumed above that the vanishing of the Jacobian determinant whose elements are the partial derivatives of three functions F, u, v, of three variables x, y, z, involves that there exists a functional relation connecting the three functions F, u, v, may be proved somewhat roughly as follows:--
The corresponding theorem is true for any number of variables. Consider first the case of two functions p, q, of two variables x, y. The function p, not being constant, must contain one of the variables, say x; we can then suppose x expressed in terms of y and the function p; thus the function q can be expressed in terms of y and the function p, say q = Q(p, y). This is clear enough in the simplest cases which arise, when the functions are rational. Hence we have
dPq dPQ dPp dPq dPQ dPp dPQ --- = --- --- and --- = --- --- + ---; dPx dPp dPx dPy dPp dPy dPy
these give
dPp dPq dPp dPq dPp dPQ --- --- - --- --- = --- ---; dPx dPy dPy dPx dPx dPy
by hypothesis dPp/dPx is not identically zero; therefore if the Jacobian determinant of p and q in regard to x and y is zero identically, so is dPQ/dPy, or Q does not contain y, so that q is expressible as a function of p only. Conversely, such an expression can be seen at once to make the Jacobian of p and q vanish identically.
Passing now to the case of three variables, suppose that the Jacobian determinant of the three functions F, u, v in regard to x, y, z is identically zero. We prove that if u, v are not themselves functionally connected, F is expressible as a function of u and v. Suppose first that the minors of the elements of dPF/dPx, dPF/dPy, dPF/dPz in the determinant are all identically zero, namely the three determinants such as
dPu dPv dPu dPv --- --- - --- ---; dPy dPz dPz dPy
then by the case of two variables considered above there exist three functional relations. [psi]1(u, v, x) = 0, [psi]2(u, v, y) = 0, [psi]3(u, v, z) = 0, of which the first, for example, follows from the vanishing of
dPu dPv dPu dPv --- --- - --- ---. dPy dPz dPz dPy
We cannot assume that x is absent from [psi]1, or y from [psi]2, or z from [psi]3; but conversely we cannot simultaneously have x entering in [psi]1, and y in [psi]2, and z in [psi]3, or else by elimination of u and v from the three equations [psi]1 = 0, [psi]2 = 0, [psi]3 = 0, we should find a necessary relation connecting the three independent quantities x, y, z; which is absurd. Thus when the three minors of dPF/dPx, dPF/dPy, dPF/dPz in the Jacobian determinant are all zero, there exists a functional relation connecting u and v only. Suppose no such relation to exist; we can then suppose, for example, that
dPu dPv dPu dPv --- --- - --- --- dPy dPz dPz dPy
is not zero. Then from the equations u(x, y, z) = u, v(x, y, z) = v we can express y and z in terms of u, v, and x (the attempt to do this could only fail by leading to a relation connecting u, v and x, and the existence of such a relation would involve that the determinant
dPu dPv dPu dPv --- --- - --- --- dPy dPz dPz dPy
was zero), and so write F in the form F(x, y, z) = [Phi](u, v, x). We then have
dPF dP[Phi] dPu dP[Phi] dPv dP[Phi] dPF dP[Phi] dPu dP[Phi] dPv dPF dP[Phi] dPu dP[Phi] dPv --- = ------- --- + ------- --- + -------, --- = ------- --- + ------- ---, --- = ------- --- + ------- ---; dPx dPu dPx dPv dPx dPx dPy dPu dPy dPv dPy dPz dPu dPz dPv dPz
thereby the Jacobian determinant of F, u, v is reduced to
dP[Phi] /dPu dPv dPu dPv\ -------( --- --- - --- --- ); dPx \dPy dPz dPz dPy/
by hypothesis the second factor of this does not vanish identically; hence dP[Phi]/dPx = 0 identically, and [Phi] does not contain x; so that F is expressible in terms of u, v only; as was to be proved.
_Part II.--General Theory._
Differential equations arise in the expression of the relations between quantities by the elimination of details, either unknown or regarded as unessential to the formulation of the relations in question. They give rise, therefore, to the two closely connected problems of determining what arrangement of details is consistent with them, and of developing, apart from these details, the general properties expressed by them. Very roughly, two methods of study can be distinguished, with the names Transformation-theories, Function-theories; the former is concerned with the reduction of the algebraical relations to the fewest and simplest forms, eventually with the hope of obtaining explicit expressions of the dependent variables in terms of the independent variables; the latter is concerned with the determination of the general descriptive relations among the quantities which are involved by the differential equations, with as little use of algebraical calculations as may be possible. Under the former heading we may, with the assumption of a few theorems belonging to the latter, arrange the theory of partial differential equations and Pfaff's problem, with their geometrical interpretations, as at present developed, and the applications of Lie's theory of transformation-groups to partial and to ordinary equations; under the latter, the study of linear differential equations in the manner initiated by Riemann, the applications of discontinuous groups, the theory of the singularities of integrals, and the study of potential equations with existence-theorems arising therefrom. In order to be clear we shall enter into some detail in regard to partial differential equations of the first order, both those which are linear in any number of variables and those not linear in two independent variables, and also in regard to the function-theory of linear differential equations of the second order. Space renders impossible anything further than the briefest account of many other matters; in particular, the theories of partial equations of higher than the first order, the function-theory of the singularities of ordinary equations not linear and the applications to differential geometry, are taken account of only in the bibliography. It is believed that on the whole the article will be more useful to the reader than if explanations of method had been further curtailed to include more facts.
When we speak of a function without qualification, it is to be understood that in the immediate neighbourhood of a particular set x0, y0, ... of values of the independent variables x, y, ... of the function, at whatever point of the range of values for x, y, ... under consideration x0, y0, ... may be chosen, the function can be expressed as a series of positive integral powers of the differences x - x0, y -y0, ..., convergent when these are sufficiently small (see FUNCTION: Functions of Complex Variables). Without this condition, which we express by saying that the function is developable about x0, y0, ..., many results provisionally stated in the transformation theories would be unmeaning or incorrect. If, then, we have a set of k functions, f1 ... fk of n independent variables x1 ... xn, we say that they are independent when n >= k and not every determinant of k rows and columns vanishes of the matrix of k rows and n columns whose r-th row has the constituents dfr/dx1, ... dfr/dxn; the justification being in the theorem, which we assume, that if the determinant involving, for instance, the first k columns be not zero for x1 = x1^0 ... xn = xn^0, and the functions be developable about this point, then from the equations f1 = c1, ... fk = ck we can express x1, ... xk by convergent power series in the differences x_k+1 - x_k+1^0, ... x_n - x_n^0, and so regard x1, ... xk as functions of the remaining variables. This we often express by saying that the equations f1 = c1, ... fk = ck can be solved for x1, ... xk. The explanation is given as a type of explanation often understood in what follows.
Ordinary equations of the first order.
Single homogeneous partial equation of the first order.
Proof of the existence of integrals.
We may conveniently begin by stating the theorem: If each of the n functions [phi]1, ... [phi]n of the (n + 1) variables x1, ... x_nt be developable about the values x1^0, ... x_n^0t^0, the n differential equations of the form dx1/dt = [phi]1(tx1, ... xn) are satisfied by convergent power series
x_r = x_r^0 + (t - t^0 ) A_r1 + (t - t0 )²A_r2 + ...
reducing respectively to x1^0, ... xn^0 when t = t^0; and the only functions satisfying the equations and reducing respectively to x1^0, ... xn^0 when t = t^0, are those determined by continuation of these series. If the result of solving these n equations for x1^0, ... xn^0 be written in the form [omega]1(x1, ... xnt) = x1^0, ... [omega]n(x1, ... xnt) = xn^0, it is at once evident that the differential equation
df/dt + [phi]1 df/dx1 + ... + [phi]n df/dxn = 0
possesses n integrals, namely, the functions [omega]1, ... [omega]n, which are developable about the values (x1^0 ... xn^0t^0) and reduce respectively to x1, ... xn when t = t^0. And in fact it has no other integrals so reducing. Thus this equation also possesses a unique integral reducing when t = t^0 to an arbitrary function [psi](x1, ... xn), this integral being. [psi]([omega]1, ... [omega]n). Conversely the existence of these _principal_ integrals [omega]1, ... [omega]n of the partial equation establishes the existence of the specified solutions of the ordinary equations dxi/dt = [phi]i. The following sketch of the proof of the existence of these principal integrals for the case n = 2 will show the character of more general investigations. Put x for x - x^0, &c., and consider the equation a(xyt) df/dx + b(xyt) df/dy = df/dt, wherein the functions a, b are developable about x = 0, y = 0, t = 0; say
a(xyt) = a0 + ta1 + t²a2/2! + ..., b(xyt) = b0 + tb1 + t²b2/2! + ...,
so that
ad/dx + bd/dy = [delta]0 + t[delta]1 + ½t²[delta]2 + ...,
where [delta] = a_r d/dx + b_r d/dy. In order that
f = p0 + tp1 + t²p2/2! + ...
wherein p0, p1 ... are power series in x, y, should satisfy the equation, it is necessary, as we find by equating like terms, that
p1 = [delta]0 p0, p2 = [delta]0 p1 + [delta]1 p0, &c.
and in general
p_s+1 = [delta]0 p_s + s1 [delta]1 p_s-1 + ... + [delta]_s p0,
where s_r = (s!)/(r!) (s - r)!
Now compare with the given equation another equation
A(xyt)dF/dx + B(xyt)dF/dy = dF/dt,
wherein each coefficient in the expansion of either A or B is real and positive, and not less than the absolute value of the corresponding coefficient in the expansion of a or b. In the second equation let us substitute a series
F = P0 + tP1 + t²P2/2! + ...,
wherein the coefficients in P0 are real and positive, and each not less than the absolute value of the corresponding coefficient in p0; then putting [Delta]r = A_r d/dx + B_r d/dy we obtain necessary equations of the same form as before, namely,
P1 = [Delta]0 P0, P2= [Delta]0 P1 + [Delta]1 P0, ...
and in general P_s+1 = [Delta]0 P_s, + s1[Delta]1 P_s-1 + ... + [Delta]_s P0. These give for every coefficient in Ps+1 an integral aggregate with real positive coefficients of the coefficients in P_s, P_s-1, ..., P0 and the coefficients in A and B; and they are the same aggregates as would be given by the previously obtained equations for the corresponding coefficients in p_s+1 in terms of the coefficients in ps, p_s-1, ..., p0 and the coefficients in a and b. Hence as the coefficients in P0 and also in A, B are real and positive, it follows that the values obtained in succession for the coefficients in P1, P2, ... are real and positive; and further, taking account of the fact that the absolute value of a sum of terms is not greater than the sum of the absolute values of the terms, it follows, for each value of s, that every coefficient in p_s+1 is, in absolute value, not greater than the corresponding coefficient in P_s+1. Thus if the series for F be convergent, the series for f will also be; and we are thus reduced to (1), specifying functions A, B with real positive coefficients, each in absolute value not less than the corresponding coefficient in a, b; (2) proving that the equation
AdF/dx + BdF/dy = dF/dt
possesses an integral P0 + tP1 + t²P2/2! + ... in which the coefficients in P0 are real and positive, and each not less than the absolute value of the corresponding coefficient in p0. If a, b be developable for x, y both in absolute value less than r and for t less in absolute value than R, and for such values a, b be both less in absolute value than the real positive constant M, it is not difficult to verify that we may take
/ x + y\-1 / t\-1 A = B = M( 1 - ----- ) ( 1 - - ), \ r / \ R/
and obtain _ _ | 4MR / x + y\-2 / t\-1 |½ F = r - (r - x - y) | 1 - ---(1 - ------) log (1 - - ) |, |_ r \ r / \ R/ _|
and that this solves the problem when x, y, t are sufficiently small for the two cases p0 = x, p0 = y. One obvious application of the general theorem is to the proof of the existence of an integral of an ordinary linear differential equation given by the n equations dy/dx = y1, dy1/dx = y2, ...,
dy_n-1/dx = p - p1 y_n-1 - ... - p_n y;
but in fact any simultaneous system of ordinary equations is reducible to a system of the form
dx1/dt = [phi](tx1, ... x_n).
Simultaneous linear partial equations.
Complete systems of linear partial equations.
Jacobian systems.
Suppose we have k homogeneous linear partial equations of the first order in n independent variables, the general equation being a_[sigma]1 df/dx1 + ... + a_[sigma]n df/dx_n = 0, where [sigma] = 1, ... k, and that we desire to know whether the equations have common solutions, and if so, how many. It is to be understood that the equations are linearly independent, which implies that k <= n and not every determinant of k rows and columns is identically zero in the matrix in which the i-th element of the [sigma]-th row is a[sigma]_i(i = 1, ... n, [sigma] = 1, ... k). Denoting the left side of the [sigma]-th equation by P[sigma]f, it is clear that every common solution of the two equations P_[sigma]f = 0, P_[rho]f = 0, is also a solution of the equation P_[rho](P_[sigma]f), P_[sigma](P_[rho]f), We immediately find, however, that this is also a linear equation, namely, [Sigma]H_i df/dx_i = 0 where H_i = P[rho]a[sigma]_i - P[sigma]a[rho]_i, and if it be not already contained among the given equations, or be linearly deducible from them, it may be added to them, as not introducing any additional limitation of the possibility of their having common solutions. Proceeding thus with every pair of the original equations, and then with every pair of the possibly augmented system so obtained, and so on continually, we shall arrive at a system of equations, linearly independent of each other and therefore not more than n in number, such that the combination, in the way described, of every pair of them, leads to an equation which is linearly deducible from them. If the number of this so-called _complete system_ is n, the equations give df/dx1 = 0 ... df/dxn = 0, leading to the nugatory result f = a constant. Suppose, then, the number of this system to be r < n; suppose, further, that from the matrix of the coefficients a determinant of r rows and columns not vanishing identically is that formed by the coefficients of the differential coefficients of f in regard to x1 ... x_r; also that the coefficients are all developable about the values x1 = x1^0, ... xn= xn^0, and that for these values the determinant just spoken of is not zero. Then the main theorem is that the complete system of r equations, and therefore the originally given set of k equations, have in common n - r solutions, say [omega]r+1, ... [omega]n, which reduce respectively to x_r+1, ... x_n when in them for x1, ... x_r are respectively put x1^0, ... x_r^0; so that also the equations have in common a solution reducing when x1 = x1^0, ... x_r = x_r^0 to an arbitrary function [psi](x_r+1, ... x_n) which is developable about x_r+1^0, ... x_n^0, namely, this common solution is [psi]([omega]_r+1, ... [omega]_n). It is seen at once that this result is a generalization of the theorem for r = 1, and its proof is conveniently given by induction from that case. It can be verified without difficulty (1) that if from the r equations of the complete system we form r independent linear aggregates, with coefficients not necessarily constants, the new system is also a complete system; (2) that if in place of the independent variables x1, ... xn we introduce any other variables which are independent functions of the former, the new equations also form a complete system. It is convenient, then, from the complete system of r equations to form r new equations by solving separately for df/dx1, ..., df/dx_r; suppose the general equation of the new system to be
Q_[sigma]f = df/dx_[sigma] + c_[sigma],r+1 df/dx_r+1 + ... + c_[sigma]n df/dx_n = 0 ([sigma] = 1, ... r).
Then it is easily obvious that the equation Q_[rho]Q_[sigma]f - Q_[sigma]Q_[rho]f = 0 contains only the differential coefficients of f in regard to x_r+1 ... xn; as it is at most a linear function of Q1f, ... Qrf, it must be identically zero. So reduced the system is called a Jacobian system. Of this system Q1f=0 has n - 1 principal solutions reducing respectively to x2, ... xn when
x1 = x1^0,
and its form shows that of these the first r - 1 are exactly x2 ... xr. Let these n - 1 functions together with x1 be introduced as n new independent variables in all the r equations. Since the first equation is satisfied by n - 1 of the new independent variables, it will contain no differential coefficients in regard to them, and will reduce therefore simply to df/dx1 = 0, expressing that any common solution of the r equations is a function only of the n - 1 remaining variables. Thereby the investigation of the common solutions is reduced to the same problem for r - 1 equations in n - 1 variables. Proceeding thus, we reach at length one equation in n - r + 1 variables, from which, by retracing the analysis, the proposition stated is seen to follow.
System of total differential equations.
The analogy with the case of one equation is, however, still closer. With the coefficients c_[sigma]j, of the equations Q_[sigma]f = 0 in transposed array ([sigma] = 1, ... r, j = r + 1, ... n) we can put down the (n - r) equations, dx_j = c1_j dx1 + ... + c_rj dx_r, equivalent to the r(n - r) equations dx_j/dx_[sigma] = c_[sigma]r. That consistent with them we may be able to regard x_r+1, ... x_n as functions of x1, ... x_r, these being regarded as independent variables, it is clearly necessary that when we differentiate c_[sigma]j in regard to x_[rho] on this hypothesis the result should be the same as when we differentiate c[rho]j, in regard to x[sigma] on this hypothesis. The differential coefficient of a function f of x1, ... xn on this hypothesis, in regard to x_[rho]j is, however,
df/dx_[rho] + c_[rho],r+1 df/dx_r+1 + ... + c_[rho]n df/dx_n,
namely, is Q_[rho]f. Thus the consistence of the n - r total equations requires the conditions Q_[rho]c_[sigma]j - Q_[sigma]c_[rho]j = 0, which are, however, verified in virtue of Q[rho](Q[sigma][f]) - Q_[sigma](Q_[rho]f) = 0. And it can in fact be easily verified that if [omega]_r+1, ... [omega]_n be the principal solutions of the Jacobian system, Q_[sigma]f = 0, reducing respectively to x_r+1, ... xn when x1 = x1^0, ... x_r = x_r^0, and the equations [omega]_r+1 = x_r+1^0, ... [omega]_n = x_n^0 be solved for x_r+1, ... x_n to give x_j = [psi]_j(x1, ... x_r, x_r+1^0, ... x_n^0), these values solve the total equations and reduce respectively to x_r+1^0, ... x_n^0 when x1 = x1^0 ... x_r = x_r^0. And the total equations have no other solutions with these initial values. Conversely, the existence of these solutions of the total equations can be deduced a priori and the theory of the Jacobian system based upon them. The theory of such total equations, in general, finds its natural place under the heading _Pfaffian Expressions_, below.
Geometrical interpretation and solution.
Mayer's method of integration.
A practical method of reducing the solution of the r equations of a Jacobian system to that of a single equation in n - r + 1 variables may be explained in connexion with a geometrical interpretation which will perhaps be clearer in a particular case, say n = 3, r = 2. There is then only one total equation, say dz = adz + bdy; if we do not take account of the condition of integrability, which is in this case da/dy + bda/dz = db/dx + adb/dz, this equation may be regarded as defining through an arbitrary point (x0, y0, z0) of three-dimensioned space (about which a, b are developable) a plane, namely, z - z0 = a0(x - x0) + b0(y - y0), and therefore, through this arbitrary point [oo]² directions, namely, all those in the plane. If now there be a surface z = [psi](x, y), satisfying dz = adz + bdy and passing through (x0, y0, z0), this plane will touch the surface, and the operations of passing along the surface from (x0, y0, z0) to
(x0 + dx0, y0, z0 + dz0)
and then to (x0 + dx0, y0 + dy0, Z0 + d¹z0), ought to lead to the same value of d^1z0 as do the operations of passing along the surface from (x0, y0, z0) to (x0, y0 + dy0, z0 + [delta]z0), and then to
(x_ + dx_ , y_ + dy_ , Z_ + [delta]¹z_ ), 0 0 0 0 0 0
namely, [delta]¹z0 ought to be equal to d¹z0. But we find
d¹z0 = a0dx0 + b(x0 + dx0 , y0, z0 + a0dx0)dy0 =
/db db \ a0dx0 + b0dy0 + dx0dy0( --- + a0--- ), \dx0 dz0/
and so at once reach the condition of integrability. If now we put x = x0 + t, y = y0 + mt, and regard m as constant, we shall in fact be considering the section of the surface by a fixed plane y - y0 = m(x - x0); along this section dz = dt(a + bm); if we then integrate the equation dx/dt = a + bm, where a, b are expressed as functions of m and t, with m kept constant, finding the solution which reduces to z0 for t = 0, and in the result again replace m by (y - y0)/(x - x0), we shall have the surface in question. In the general case the equations
dx_j - c_1j dx1 + ... c_rj dx_r
similarly determine through an arbitrary point x1^0, ... xn^0 a planar manifold of r dimensions in space of n dimensions, and when the conditions of integrability are satisfied, every direction in this manifold through this point is tangent to the manifold of r dimensions, expressed by [omega]_r+1 = x_r+1^0, ... [omega]_n = x_n^0, which satisfies the equations and passes through this point. If we put x1 = x1^0 = t, x2 = x2^0 = m2t, ... xr = xr^0 = mrt, and regard m2, ... mr as fixed, the (n-r) total equations take the form dx_j/dt = c_1j + m2c_2j + ... + m_rc_rj, and their integration is equivalent to that of the single partial equation
n df/dt + [Sigma](c_1j + m2c_2j + ... + m_rc_rj)df/dx_j = 0 j=r+1
in the n - r + 1 variables t, xr+1, ... xn. Determining the solutions [Omega]_r+1, ... [Omega]_n which reduce to respectively x_r+1, ... x_n when t = 0, and substituting t = x1 - x1^0, m2 = (x2 - x2^0)/(x1 - x1^0), ... mr = (xr - xr^0)/(x1 - x1^0), we obtain the solutions of the original system of partial equations previously denoted by [omega]_r+1, ... [omega]_n. It is to be remarked, however, that the presence of the fixed parameters m2, ... mr in the single integration may frequently render it more difficult than if they were assigned numerical quantities.
Pfaffian Expressions.
We have above considered the integration of an equation
dz = adz + bdy
on the hypothesis that the condition
da/dy + bda/dz = db/dz + adb/dz.
It is natural to inquire what relations among x, y, z, if any, are implied by, or are consistent with, a differential relation adx + bdy + cdx = 0, when a, b, c are unrestricted functions of x, y, z. This problem leads to the consideration of the so-called _Pfaffian Expression_ adx + bdy + cdz. It can be shown (1) if each of the quantities db/dz - dc/dy, dc/dx - da/dz, da/dy - db/dz, which we shall denote respectively by u23, u31, u12, be identically zero, the expression is the differential of a function of x, y, z, equal to dt say; (2) that if the quantity au23 + bu31 + cu12 is identically zero, the expression is of the form udt, i.e. it can be made a perfect differential by multiplication by the factor 1/u; (3) that in general the expression is of the form dt + u1dt1. Consider the matrix of four rows and three columns, in which the elements of the first row are a, b, c, and the elements of the (r+1)-th row, for r = 1, 2, 3, are the quantities u_r1, u_r2, u_r3, where u11 = u22 = u33 = 0. Then it is easily seen that the cases (1), (2), (3) above correspond respectively to the cases when (1) every determinant of this matrix of two rows and columns is zero, (2) every determinant of three rows and columns is zero, (3) when no condition is assumed. This result can be generalized as follows: if a1, ... an be any functions of x1, ... xn, the so-called Pfaffian expression a1dx1 + ... + a_ndx_n can be reduced to one or other of the two forms
u1dt1 + ... + u_kdt_k, dt + u1dt1 + ... + u_k-1 dt_k-1,
wherein t, u1 ..., t1, ... are independent functions of x1, ... xn, and k is such that in these two cases respectively 2k or 2k - 1 is the rank of a certain matrix of n + 1 rows and n columns, that is, the greatest number of rows and columns in a non-vanishing determinant of the matrix; the matrix is that whose first row is constituted by the quantities a1, ... an, whose s-th element in the (r+1)-th row is the quantity da_r/dx_s - da_s/dx_r. The proof of such a reduced form can be obtained from the two results: (1) If t be any given function of the 2m independent variables u1, ... um, t1, ... tm, the expression dt + u1 dt1 + ... + u_m dt_m can be put into the form u'1 dt'1 + ... + u'_mdt'_m. (2) If the quantities u1, ..., u1, t1, ... tm be connected by a relation, the expression n1dt1 + ... + umdtm can be put into the format dt' + u'1 dt'1 + ... + u'_m-1 dt'_m-1; and if the relation connecting u1, um, t1, ... tm be homogeneous in u1, ... um, then t' can be taken to be zero. These two results are deductions from the theory of _contact transformations_ (see below), and their demonstration requires, beside elementary algebraical considerations, only the theory of complete systems of linear homogeneous partial differential equations of the first order. When the existence of the reduced form of the Pfaffian expression containing only independent quantities is thus once assured, the identification of the number k with that defined by the specified matrix may, with some difficulty, be made _a posteriori_.
Single linear Pfaffian equation.
In all cases of a single Pfaffian equation we are thus led to consider what is implied by a relation dt - u1dt1 - ... - umdtm = 0, in which t, u1, ... um, t1 ..., tm are, except for this equation, independent variables. This is to be satisfied in virtue of one or several relations connecting the variables; these must involve relations connecting t, t1, ... tm only, and in one of these at least t must actually enter. We can then suppose that in one actual system of relations in virtue of which the Pfaffian equation is satisfied, all the relations connecting t, t1 ... tm only are given by
t = [psi](t_s+1 ... t_m), t1 = [psi]1(t_s+1 ... t_m), ... t_s = [psi]_s(t_s+1 ... t_m);
so that the equation
d[psi] - u1d[psi]1 - ... - u_s d[psi]_s - u_s+1 dt_s+1 - ... - u_m dt_m = 0
is identically true in regard to u1, ... um, t_s+1 ..., t_m; equating to zero the coefficients of the differentials of these variables, we thus obtain m - s relations of the form
d[psi]/dt_j - u1 d[psi]1/dt_j - ... - u_s d[psi]_s/dt_j - u_j = 0;
these m - s relations, with the previous s + 1 relations, constitute a set of m + 1 relations connecting the 2m + 1 variables in virtue of which the Pfaffian equation is satisfied independently of the form of the functions [psi],[psi]1, ... [psi]s. There is clearly such a set for each of the values s = 0, s = 1, ..., s = m - 1, s = m. And for any value of s there may exist relations additional to the specified m + 1 relations, provided they do not involve any relation connecting t, t1, ... tm only, and are consistent with the m - s relations connecting u1, ... um. It is now evident that, essentially, the integration of a Pfaffian equation
a1dx1 + ... + a_n dx_n = 0,
wherein a1, ... an are functions of x1, ... xn, is effected by the processes necessary to bring it to its reduced form, involving only independent variables. And it is easy to see that if we suppose this reduction to be carried out in all possible ways, there is no need to distinguish the classes of integrals corresponding to the various values of s; for it can be verified without difficulty that by putting t' = t - u1t1 - ... - u_s t_s, t'1 = u1, ... t'_s = u_s, u'1 = -t1, ..., u'_s = -t_s, t'_s+1 = t_s+1, ... t'_m = t_m, u'_s+1 = u_s+1, ... u'_m = u_m, the reduced equation becomes changed to dt' - u'1 dt'1 - ... - u'_m dt'_m = 0, and the general relations changed to
t' = [psi](t'_s+l, ... t'_m) - t'1[psi]1(t'_s+1, ... t'_m) - ... -t'_s[psi]_s(t'_s+1, ... t'_m), = [phi],
say, together with u'1 = d[phi]/dt'1, ..., u'm = d[phi]/dt'm, which contain only one relation connecting the variables t', t'1, ... t'm only.
Simultaneous Pfaffian equations.
This method for a single Pfaffian equation can, strictly speaking, be generalized to a simultaneous system of (n - r) Pfaffian equations dxj = c_1j dx1 + ... + c_rj dxr only in the case already treated, when this system is satisfied by regarding x_r+1, ... x_n as suitable functions of the independent variables x1, ... xr; in that case the integral manifolds are of r dimensions. When these are non-existent, there may be integral manifolds of higher dimensions; for if
d[phi] = [phi]1 dx_r + ... + [phi]_r dx_r + [phi]_r+1(c_1,r+1 dx1 + ... + c_r,r+1 dx_r) + [phi]_r+2 ( ) + ...
be identically zero, then [phi][sigma] + c[sigma]_,r+1 [phi]_r+1 + ... + c[sigma]_,n [phi]_n = 0, or [phi] satisfies the r partial differential equations previously associated with the total equations; when these are not a complete system, but included in a complete system of r - [mu] equations, having therefore n - r - [mu] independent integrals, the total equations are satisfied over a manifold of r + [mu] dimensions (see E. v. Weber, _Math. Annal._ 1v. (1901), p. 386).
Contact transformations.
It seems desirable to add here certain results, largely of algebraic character, which naturally arise in connexion with the theory of contact transformations. For any two functions of the 2n independent variables x1, ... xn, p1, ... pn we denote by ([phi][psi]) the sum of the n terms such as d[phi]d[psi]/dp_idx_i - d[psi]d[phi]/dp_idx_i. For two functions of the (2n + 1) independent variables z, x1, ... xn, p1, ... pn we denote by [phi][psi] the sum of the n terms such as
d[phi] /d[psi] d[psi]\ d[psi] /d[phi] d[phi]\ ------( ------ + p_i------ ) - ------( ------ + p_i------ ). dpi \ dxi dz / dpi \ dxi dz /
It can at once be verified that for any three functions [f[[phi][psi]]] + [[phi][psi]f]] + [[psi][f[phi]]] = df/dz [[phi][psi]] + d[phi]/dz [[psi]f] + d[psi]/dz [f[phi]], which when f, [phi],[psi] do not contain z becomes the identity (f([phi][psi])) + (phi([psi]f)) + ([psi](f[phi])) = 0. Then, if X1, ... Xn, P1, ... Pn be such functions Of x1, ... xn, p1 ... pn that P1 dX1 + ... + Pn dXn is identically equal to p1dx1 + ... + pn dxn, it can be shown by elementary algebra, after equating coefficients of independent differentials, (1) that the functions X1, ... Pn are independent functions of the 2n variables x1, ... pn, so that the equations x'i = Xi, p'i = Pi can be solved for x1, ... xn, p1, ... pn, and represent therefore a transformation, which we call a homogeneous contact transformation; (2) that the X1, ... Xn are homogeneous functions of p1, ... pn of zero dimensions, the P1, ... Pn are homogeneous functions of p1, ... pn of dimension one, and the ½n(n - 1) relations (Xi Xj) = 0 are verified. So also are the n² relations (Pi Xi) = 1, (Pi Xj) = 0, (Pi Pj) = 0. Conversely, if X1, ... Xn be independent functions, each homogeneous of zero dimension in p1, ... pn satisfying the ½n(n - 1) relations (Xi Xj) = 0, then P1, ... Pn can be uniquely determined, by solving linear algebraic equations, such that P1 dX1 + ... + Pn dXn = p1 dx1 + ... + pn dxn. If now we put n + 1 for n, put z for x_n+1, Z for X_n+1, Qi for -Pi/P_n+1, for i = 1, ... n, put qi for -p_i/p_n+1 and [sigma] for q_n+1/Q_n+1, and then finally write P1, ... Pn, p1, ... pn for Q1, ... Qn, q1, ... qn, we obtain the following results: If ZX1 ... Xn, P1, ... Pn be functions of z, x1, ... xn, p1, ... pn, such that the expression dZ - P1 dX1 - ... - Pn dXn is identically equal to [sigma](dz - p1 dx1 - ... - pn dxn), and [sigma] not zero, then (1) the functions Z, X1, ... Xn, P1, ... Pn are independent functions of z, x1, ... xn, p1, ... pn, so that the equations z' = Z, x'i = Xi, p'i = Pi can be solved for z, x1, ... xn, p1, ... pn and determine a transformation which we call a (non-homogeneous) contact transformation; (2) the Z, X1, ... Xn verify the ½n(n + 1) identities [Z Xi] = 0, [Xi Xj] = 0. And the further identities
[Pi Xi] = [sigma], [Pi Xj] = 0, [Pi Z] = [sigma]Pi, [Pi Pj] = 0,
dZ dXi dPi [Z[sigma]] = [sigma]-- - [sigma]², [Xi [sigma]] = [sigma]---, [Pi [sigma]] = [sigma]--- dz dz dz
are also verified. Conversely, if Z, x1, ... Xn be independent functions satisfying the identities [Z Xi] = 0, [Xi Xj] = 0, then [sigma], other than zero, and P1, ... Pn can be uniquely determined, by solution of algebraic equations, such that
dZ - P1 dX1 - ... - Pn dXn = [sigma](dz - p1 dx1 - ... - p_n dx_n).
Finally, there is a particular case of great importance arising when [sigma] = 1, which gives the results: (1) If U, X1, ... Xn, P1, ... Pn be 2n + 1 functions of the 2n independent variables x1, ... xn, p1, ... pn, satisfying the identity
dU + P1 dx1 + ... + Pn dXn = p1 dx1 + ... + p_n dx_n,
then the 2n functions P1, ... Pn, X1, ... Xn are independent, and we have
(Xi Xj) = 0, (Xi U) = [delta]Xi, (Pi Xi) = 1, (Pi Xj) = 0, (Pi Pj ) = 0, (Pi U) + Pi = [delta]Pi,
where [delta] denotes the operator p1d/dp1 + ... + pnd/dpn; (2) If X1, ... Xn be independent functions of x1, ... xn, p1, ... pn, such that (Xi Xj) = 0, then U can be found by a quadrature, such that
(Xi U) = [delta]Xi;
and when Xi, ... Xn, U satisfy these ½n(n + 1) conditions, then P1, ... Pn can be found, by solution of linear algebraic equations, to render true the identity dU + P1 dX1 + ... + Pn dXn = p1 dx1 + ... + pn dxn; (3) Functions X1, ... Xn, P1, ... Pn can be found to satisfy this differential identity when U is an arbitrary given function of x1, ... xn, p1, ... pn; but this requires integrations. In order to see what integrations, it is only necessary to verify the statement that if U be an arbitrary given function of x1, ... xn, p1, ... pn, and, for r < n, X1, ... Xr be independent functions of these variables, such that (X_[sigma] U) = [delta]X_[sigma], (X_[rho] X_[sigma]) = 0, for [rho], [sigma] = 1 ... r, then the r + 1 homogeneous linear partial differential equations of the first order (Uf) + [delta]f = 0, (X[rho]f) = 0, form a complete system. It will be seen that the assumptions above made for the reduction of Pfaffian expressions follow from the results here enunciated for contact transformations.
Partial differential equation of the first order.
Meaning of a solution of the equation.
We pass on now to consider the solution of any partial differential equation of the first order; we attempt to explain certain ideas relatively to a single equation with any number of independent variables (in particular, an ordinary equation of the first order with one independent variable) by speaking of a single equation with two independent variables x, y, and one dependent variable z. It will be seen that we are naturally led to consider systems of such simultaneous equations, which we consider below. The central discovery of the transformation theory of the solution of an equation F(x, y, z, dz/dx, dz/dy) = 0 is that its solution can always be reduced to the solution of partial equations which are _linear_. For this, however, we must regard dz/dx, dz/dy, during the process of integration, not as the differential coefficients of a function z in regard to x and y, but as variables independent of x, y, z, the too great indefiniteness that might thus appear to be introduced being provided for in another way. We notice that if z = [psi](x, y) be a solution of the differential equation, then dz = dxd[psi]/dx + dyd[psi]/dy; thus if we denote the equation by F(x, y, z, p, q,) = 0, and prescribe the condition dz = pdx + qdy for every solution, any solution such as z = [psi](x, y) will necessarily be associated with the equations p = dz/dx, q = dz/dy, and z will satisfy the equation in its original form. We have previously seen (under _Pfaffian Expressions_) that if five variables x, y, z, p, q, otherwise independent, be subject to dz - pdx - qdy = 0, they must in fact be subject to at least three mutual relations. If we associate with a point (x, y, z) the plane
Z - z = p(X - x) + q(Y - y)
passing through it, where X, Y, Z are current co-ordinates, and call this association a surface-element; and if two consecutive elements of which the point(x + dx, y + dy, z + dz) of one lies on the plane of the other, for which, that is, the condition dz = pdx + qdy is satisfied, be said to be _connected,_ and an infinity of connected elements following one another continuously be called a _connectivity_, then our statement is that a connectivity consists of not more than [oo]² elements, the whole number of elements (x, y, z, p, q) that are possible being called [oo]^5. The solution of an equation F(x, y, z, dz/dx, dz/dy) = 0 is then to be understood to mean finding in all possible ways, from the [oo]^4 elements (x, y, z, p, q) which satisfy F(x, y, z, p, q) = 0 a set of [oo]² elements forming a connectivity; or, more analytically, finding in all possible ways two relations G = 0, H = 0 connecting x, y, z, p, q and independent of F = 0, so that the three relations together may involve
dz = pdx + qdy.
Such a set of three relations may, for example, be of the form z = [psi](x, y), p = d[psi]/dx, q = d[psi]/dy; but it may also, as another case, involve two relations z = [psi](y), x = [psi]1(y) connecting x, y, z, the third relation being
[psi]'(y) = p[psi]'1(y) + q,
the connectivity consisting in that case, geometrically, of a curve in space taken with [oo]¹ of its tangent planes; or, finally, a connectivity is constituted by a fixed point and all the planes passing through that point. This generalized view of the meaning of a solution of F = 0 is of advantage, moreover, in view of anomalies otherwise arising from special forms of the equation itself. For instance, we may include the case, sometimes arising when the equation to be solved is obtained by transformation from another equation, in which F does not contain either p or q. Then the equation has [oo]² solutions, each consisting of an arbitrary point of the surface F = 0 and all the [oo]² planes passing through this point; it also has [oo]² solutions, each consisting of a curve drawn on the surface F = 0 and all the tangent planes of this curve, the whole consisting of [oo]² elements; finally, it has also an isolated (or singular) solution consisting of the points of the surface, each associated with the tangent plane of the surface thereat, also [oo]² elements in all. Or again, a linear equation F = Pp + Qq - R = 0, wherein P, Q, R are functions of x, y, z only, has [oo]² solutions, each consisting of one of the curves defined by
dx/P = dy/Q = dz/R
taken with all the tangent planes of this curve; and the same equation has [oo]² solutions, each consisting of the points of a surface containing [oo]¹ of these curves and the tangent planes of this surface. And for the case of n variables there is similarly the possibility of n + 1 kinds of solution of an equation F(x1, ... xn, z, p1, ... pn) = 0; these can, however, by a simple contact transformation be reduced to one kind, in which there is only one relation z' = [psi](x'1, ... x'n) connecting the new variables x'1, ... x'n, z' (see under PFAFFIAN EXPRESSIONS); just as in the case of the solution
z = [psi](y), x = [psi]1(y), [psi]'(y) = p[psi]'1(y) + q
of the equation Pp + Qq = R the transformation z' = z - px, x' = p, p' = -x, y' = y, q' = q gives the solution
z' = [psi](y') + x'[psi]1(y'), p' = dz'/dx', q' = dz'/dy'
of the transformed equation. These explanations take no account of the possibility of p and q being infinite; this can be dealt with by writing p = -u/w, q = -v/w, and considering homogeneous equations in u, v, w, with udx + vdy + wdz = 0 as the differential relation necessary for a connectivity; in practice we use the ideas associated with such a procedure more often without the appropriate notation.
Order of the ideas.
In utilizing these general notions we shall first consider the theory of characteristic chains, initiated by Cauchy, which shows well the nature of the relations implied by the given differential equation; the alternative ways of carrying out the necessary integrations are suggested by considering the method of Jacobi and Mayer, while a good summary is obtained by the formulation in terms of a Pfaffian expression.
Characteristic chains.
Consider a solution of F = 0 expressed by the three independent equations F = 0, G = 0, H = 0. If it be a solution in which there is more than one relation connecting x, y, z, let new variables x', y', z', p', q' be introduced, as before explained under PFAFFIAN EXPRESSIONS, in which z' is of the form
z' = z - p1x1 - ... - p_s x_s (s = 1 or 2),
so that the solution becomes of a form z' = [psi](x'y'), p' = d[psi]/dx', q' = d[psi]/dy', which then will identically satisfy the transformed equations F' = 0, G' = 0, H' = 0. The equation F' = 0, if x', y', z' be regarded as fixed, states that the plane Z - z' = p'(X - x') + q'(Y - y') is tangent to a certain cone whose vertex is (x', y', z'), the consecutive point (x' + dx', y' + dy', z' + dz') of the generator of contact being such that
/dF' /dF' / / dF' dF'\ dx'/ -- = dy'/ -- = dz'/ ( p'--- + q' --- ). / dp' / dq' / \ dp' dq'/
Passing in this direction on the surface z' = [psi](x', y') the tangent plane of the surface at this consecutive point is (p' + dp', q' + dq'), where, since F'(x', y', [psi], d[psi]/dx', d[psi]/dy') = 0 is identical, we have dx' (dF'/dx' + p'dF'/dz') + dp'dF'/dp' = 0. Thus the equations, which we shall call the characteristic equations,
/dF' /dF' // dF' dF'\ // dF' dF'\ dx'/ --- = dy'/ --- = dz'/( p' --- + q'--- ) = dp'/( - --- - p'--- ) / dp' / dq' / \ dp' dq'/ / \ dx' dz'/
// dF' dF'\ = dq'/( - --- - q'--- ) / \ dy' dz'/
are satisfied along a connectivity of [oo]¹ elements consisting of a curve on z' = [psi](x', y') and the tangent planes of the surface along this curve. The equation F' = 0, when p', q' are fixed, represents a curve in the plane Z - z' = p'(X - x') + q'(Y - y') passing through (x', y', z'); if (x' + [delta]x', y' + [delta]y', z' + [delta]z') be a consecutive point of this curve, we find at once
/dF' dF'\ /dF' dF'\ [delta]x'( --- + p'--- ) + [delta]y'( --- + q'--- ) = 0; \dx' dz'/ \dy' dz'/
thus the equations above give [delta]x'dp' + [delta]y'dq' = 0, or the tangent line of the plane curve, is, on the surface z' = [psi](x', y'), in a direction conjugate to that of the generator of the cone. Putting each of the fractions in the characteristic equations equal to dt, the equations enable us, starting from an arbitrary element x'0, y'0, z'0, p'0, q'0, about which all the quantities F', dF'/dp', &c., occurring in the denominators, are developable, to define, from the differential equation F' = 0 alone, a connectivity of [oo]¹ elements, which we call a _characteristic chain_; and it is remarkable that when we transform again to the original variables (x, y, z, p, q), the form of the differential equations for the chain is unaltered, so that they can be written down at once from the equation F = 0. Thus we have proved that the characteristic chain starting from any ordinary element of any integral of this equation F = 0 consists only of elements belonging to this integral. For instance, if the equation do not contain p, q, the characteristic chain, starting from an arbitrary plane through an arbitrary point of the surface F = 0, consists of a pencil of planes whose axis is a tangent line of the surface F = 0. Or if F = 0 be of the form Pp + Qq = R, the chain consists of a curve satisfying dx/P = dy/Q = dz/R and a single infinity of tangent planes of this curve, determined by the tangent plane chosen at the initial point. In all cases there are [oo]³ characteristic chains, whose aggregate may therefore be expected to exhaust the [oo]^4 elements satisfying F = 0.
Complete integral constructed with characteristic chains.
Consider, in fact, a single infinity of connected elements each satisfying F = 0, say a chain connectivity T, consisting of elements specified by x0, y0, z0, p0, q0, which we suppose expressed as functions of a parameter u, so that
U0 = dz0/du - p0dx0/du - q0dy0/du
is everywhere zero on this chain; further, suppose that each of F, dF/dp, ... , dF/dx + pdF/dz is developable about each element of this chain T, and that T is _not_ a characteristic chain. Then consider the aggregate of the characteristic chains issuing from all the elements of T. The [oo]² elements, consisting of the aggregate of these characteristic chains, satisfy F = 0, provided the chain connectivity T consists of elements satisfying F = 0; for each characteristic chain satisfies dF = 0. It can be shown that these chains are connected; in other words, that if x, y, z, p, q, be any element of one of these characteristic chains, not only is
dz/dt - pdx/dt - qdy/dt = 0,
as we know, but also U = dz/du - pdx/du - qdy/du is also zero. For we have
dU d /dz dx dy\ d /dz dx dy\ -- = --( -- - p-- - q-- ) - --( -- - p-- - q-- ) dt dt \du du du/ du \dt dt dt/
dp dx dp dx dq dy dq dy = -- -- - -- -- + -- -- - -- -- , du dt dt du du dt dt du
which is equal to
dp dF dx /dF dF\ dq dF dy /dF dF\ dF -- -- + --( -- + p-- ) + -- -- + --( -- + q-- ) = - -- U. du dp du \dx dz/ du dq du \dy dz/ dz
dF As -- is a developable function of t, this, giving dz _ / / t dF \ U = U_{0} exp( - | --dt ), \ _/t0 dz /
shows that U is everywhere zero. Thus integrals of F = 0 are obtainable by considering the aggregate of characteristic chains issuing from arbitrary chain connectivities T satisfying F = 0; and such connectivities T are, it is seen at once, determinable without integration. Conversely, as such a chain connectivity T can be taken out from the elements of any given integral all possible integrals are obtainable in this way. For instance, an arbitrary curve in space, given by x0 = [theta](u), y0 = [phi](u), z0 = [psi](u), determines by the two equations F(x0, y0, z0, p0, q0) = 0, [psi]'(u) = p0[theta]'(u) + q0[phi]'(u), such a chain connectivity T, through which there passes a perfectly definite integral of the equation F = 0. By taking [oo]² initial chain connectivities T, as for instance by taking the curves x0 = [theta], y0 = [phi], z0 = [psi] to be the [oo]² curves upon an arbitrary surface, we thus obtain [oo]² integrals, and so [oo]^4 elements satisfying F = 0. In general, if functions G, H, independent of F, be obtained, such that the equations F = 0, G = b, H = c represent an integral for all values of the constants b, c, these equations are said to constitute a _complete integral_. Then [oo]^4 elements satisfying F = 0 are known, and in fact every other form of integral can be obtained without further integrations.
Operations necessary for integration of F = a.
In the foregoing discussion of the differential equations of a characteristic chain, the denominators dF/dp, ... may be supposed to be modified in form by means of F = 0 in any way conducive to a simple integration. In the immediately following explanation of ideas, however, we consider indifferently all equations F = constant; when a function of x, y, z, p, q is said to be zero, it is meant that this is so identically, not in virtue of F = 0; in other words, we consider the integration of F = a, where a is an arbitrary constant. In the theory of linear partial equations we have seen that the integration of the equations of the characteristic chains, from which, as has just been seen, that of the equation F = a follows at once, would be involved in completely integrating the single linear homogeneous partial differential equation of the first order [Ff] = 0 where the notation is that explained above under CONTACT TRANSFORMATIONS. One obvious integral is f = F. Putting F = a, where a is arbitrary, and eliminating one of the independent variables, we can reduce this equation [Ff] = 0 to one in four variables; and so on. Calling, then, the determination of a single integral of a single homogeneous partial differential equation of the first order in n independent variables, _an operation of order_ n - 1, the characteristic chains, and therefore the most general integral of F = a, can be obtained by successive operations of orders 3, 2, 1. If, however, an integral of F = a be represented by F = a, G = b, H = c, where b and c are arbitrary constants, the expression of the fact that a characteristic chain of F = a satisfies dG = 0, gives [FG] = 0; similarly, [FH] = 0 and [GH] = 0, these three relations being identically true. Conversely, suppose that an integral G, independent of F, has been obtained of the equation [Ff] = 0, which is an operation of order three. Then it follows from the identity [f[[phi][psi]]] + [[phi][[psi]f]] + [[psi][f[phi]]] = df/dz [[psi][phi]] + d[phi]/dz [psif] + d[psi]/dz [f[phi]] before remarked, by putting [phi] = F, [psi] = G, and then [Ff] = A(f), [Gf] = B(f), that AB(f) - BA(f) = dF/dz B(f) - dG/dz A(f), so that the two linear equations [Ff] = 0, [Gf] = 0 form a complete system; as two integrals F, G are known, they have a common integral H, independent of F, G, determinable by an operation of order one only. The three functions F, G, H thus identically satisfy the relations [FG] = [GH] = [FH] = 0. The [oo]² elements satisfying F = a, G = b, H = c, wherein a, b, c are assigned constants, can then be seen to constitute an integral of F = a. For the conditions that a characteristic chain of G = b issuing from an element satisfying F = a, G = b, H = c should consist only of elements satisfying these three equations are simply [FG] = 0, [GH] = 0. Thus, starting from an arbitrary element of (F = a, G = b, H = c), we can single out a connectivity of elements of (F = a, G = b, H = c) forming a characteristic chain of G = b; then the aggregate of the characteristic chains of F = a issuing from the elements of this characteristic chain of G = b will be a connectivity consisting only of elements of
(F = a, G = b, H = c),
and will therefore constitute an integral of F = a; further, it will include all elements of (F = a, G = b, H = c). This result follows also from a theorem given under CONTACT TRANSFORMATIONS, which shows, moreover, that though the characteristic chains of F = a are not determined by the three equations F = a, G = b, H = c, no further integration is now necessary to find them. By this theorem, since identically [FG] = [GH] = [FH] = 0, we can find, by the solution of linear algebraic equations only, a non-vanishing function [sigma] and two functions A, C, such that
dG - AdF - CdH = [sigma](dz - pdz - qdy);
thus all the elements satisfying F = a, G = b, H = c, satisfy dz = pdx + qdy and constitute a connectivity, which is therefore an integral of F = a. While, further, from the associated theorems, F, G, H, A, C are independent functions and [FC] = 0. Thus C may be taken to be the remaining integral independent of G, H, of the equation [Ff] = 0, whereby the characteristic chains are entirely determined.
The single equation F = 0 and Pfaffian formulations.
When we consider the particular equation F = 0, neglecting the case when neither p nor q enters, and supposing p to enter, we may express p from F = 0 in terms of x, y, z, q, and then eliminate it from all other equations. Then instead of the equation [Ff] = 0, we have, if F = 0 give p = [psi](x, y, z, q), the equation
/df df\ d[psi] /df df\ /d[psi] d[psi]\ df [Sigma]f = - ( -- + [psi] -- ) + ------ ( -- + q -- ) - ( ------ + q ------ ) -- = 0, \dx dz/ dq \dy dz/ \ dy dz / dq
moreover obtainable by omitting the term in df/dp in [p-[psi], f] = 0. Let x0, y0, z0, q0, be values about which the coefficients in this equation are developable, and let [zeta], [eta], [omega] be the principal solutions reducing respectively to z, y and q when x = x0. Then the equations p = [psi], [zeta] = z0, [eta] = y0, [omega] = q0 represent a characteristic chain issuing from the element x0, y0, z0, [psi]0, q0; we have seen that the aggregate of such chains issuing from the elements of an arbitrary chain satisfying
dz0 = p0dx0 - q0dy0 = 0
constitute an integral of the equation p = [psi]. Let this arbitrary chain be taken so that x0 is constant; then the condition for initial values is only
dz0 - q0dy0 = 0,
and the elements of the integral constituted by the characteristic chains issuing therefrom satisfy
d[zeta] - [omega]d[eta] = 0.
Hence this equation involves dz - [psi]dx - qdy = 0, or we have
dz - [psi]dx - qdy = [sigma](d[zeta] - [omega]d[eta]),
where [sigma] is not zero. Conversely, the integration of p = [psi] is, essentially, the problem of writing the expression dz - [psi]dx - qdy in the form [sigma](d[zeta] - [omega]d[eta]), as must be possible (from what was said under _Pfaffian Expressions_).
System of equations of the first order.
To integrate a system of simultaneous equations of the first order X1 = a1, ... Xr = ar in n independent variables x1, ... xn and one dependent variable z, we write p1 for dz/dx1, &c., and attempt to find n + 1 - r further functions Z, X_r+1 ... Xn, such that the equations Z = a, Xi = ai,(i = 1, ... n) involve dz - p1dx1 - ... - pndxn = 0. By an argument already given, the common integral, if existent, must be satisfied by the equations of the characteristic chains of any one equation Xi = ai; thus each of the expressions [Xi Xj] must vanish in virtue of the equations expressing the integral, and we may without loss of generality assume that each of the corresponding ½r(r - 1) expressions formed from the r given differential equations vanishes in virtue of these equations. The determination of the remaining n + 1 - r functions may, as before, be made to depend on characteristic chains, which in this case, however, are manifolds of r dimensions obtained by integrating the equations [X1f] = 0, ... [Xrf] = 0; or having obtained one integral of this system other than X1, ... Xr, say Xr+1, we may consider the system [X1f] = 0, ... [X_r+1 f] = 0, for which, again, we have a choice; and at any stage we may use Mayer's method and reduce the simultaneous linear equations to one equation involving parameters; while if at any stage of the process we find some but not all of the integrals of the simultaneous system, they can be used to simplify the remaining work; this can only be clearly explained in connexion with the theory of so-called function groups for which we have no space. One result arising is that the simultaneous system p1 = [phi]1, ... pr = [phi]r, wherein p1, ... pr are not involved in [phi]1, ... [phi]r, if it satisfies the ½r(r - 1) relations [pi - [phi]i, pj - [phi]j] = 0, has a solution z = [psi](x1, ... xn), p1 = d[psi]/dx1, ... pn = d[psi]/dxn, reducing to an arbitrary function of x_r+1, ... xn only, when x1 = x1^0, ... xr = xr^0 under certain conditions as to developability; a generalization of the theorem for linear equations. The problem of integration of this system is, as before, to put
dz - [phi]1dx1 - ... - [phi]_r dx_r - p_r+1 dx_r+1 - ... - p_n dx_n
into the form [sigma](d[zeta] - [omega]_r+1 + d[xi]_r+1 - ... - [omega]_n d[xi]_n); and here [zeta], [xi]_r+1, ... [xi]_n, [omega]_r+1, ... [omega]_n may be taken, as before, to be principal integrals of a certain complete system of linear equations; those, namely, determining the characteristic chains.
Equations of dynamics.
If L be a function of t and of the 2n quantities x1, ... xn, [.x]1, ... [.x]n, where [.x]i, denotes dxi/dt, &c., and if in the n equations
d / dL \ dL --- (--------) = ---- dt \ dx_i / dx_i
we put p_i = dL/d[.x]_i, and so express [.x]1 , ... [.x]_n in terms of t, x_i, ... x_n, p1, ... p_n, assuming that the determinant of the quantities d²L/dx_i d[.x]_j is not zero; if, further, H denote the function of t, x1, ... xn, p1, ... pn, numerically equal to p1[.x]1 + ... + pn[.x]n - L, it is easy to prove that dpi/dt = -dH/dxi, dxi/dt = dH/dp_i. These so-called _canonical_ equations form part of those for the characteristic chains of the single partial equation dz/dt + H(t, x1, ... xn, dz/dx1, ..., dz/dx_n) = 0, to which then the solution of the original equations for x1 ... xn can be reduced. It may be shown (1) that if z = [psi](t, x1, ... xn, c1, .. cn) + c be a complete integral of this equation, then pi = d[psi]/dx_i, d[psi]/dc_i = e_i are 2n equations giving the solution of the canonical equations referred to, where c1 ... cn and e1, ... en are arbitrary constants; (2) that if xi = Xi(t, x^01, ... pn^0), pi=Pi(t, x1^0, ... p^0n) be the principal solutions of the canonical equations for t = t^0, and [omega] denote the result of substituting these values in p1dH/dp1 + ... + pndH/dpn - H, and [Omega] = [int] [t0 to t] [omega]dt, where, after integration, [Omega] is to be expressed as a function of t, x1, ... xn, x1^0, ... xn^0, then z = [Omega] + z^0 is a complete integral of the partial equation.
Application of theory of continuous groups to formal theories.
A system of differential equations is said to allow a certain continuous group of transformations (see GROUPS, THEORY OF) when the introduction for the variables in the differential equations of the new variables given by the equations of the group leads, for all values of the parameters of the group, to the same differential equations in the new variables. It would be interesting to verify in examples that this is the case in at least the majority of the differential equations which are known to be integrable in finite terms. We give a theorem of very general application for the case of a simultaneous complete system of linear partial homogeneous differential equations of the first order, to the solution of which the various differential equations discussed have been reduced. It will be enough to consider whether the given differential equations allow the infinitesimal transformations of the group.
It can be shown easily that sufficient conditions in order that a complete system [Pi]1f = 0 ... [Pi]kf = 0, in n independent variables, should allow the infinitesimal transformation Pf = 0 are expressed by k equations [Pi]_i Pf - P[Pi]_i f = [lambda]_i1 [Pi]1f + ... + [lambda]_ik [Pi]_kf. Suppose now a complete system of n - r equations in n variables to allow a group of r infinitesimal transformations (P1f, ..., Prf) which has an invariant subgroup of r - 1 parameters (P1f, ..., Pr-1f), it being supposed that the n quantities [Pi]1f, ..., [Pi]_n-r f, P1 f, ..., P_r f are not connected by an identical linear equation (with coefficients even depending on the independent variables). Then it can be shown that one solution of the complete system is determinable by a quadrature. For each of [Pi]_i P_[sigma] f - P_[sigma] [Pi]_i f is a linear function of [Pi]1f, ..., [Pi]_n-r f and the simultaneous system of independent equations [Pi]1f = 0, ... [Pi]_n-r f = 0, P1f = 0, ... P_r-1 f = 0 is therefore a complete system, allowing the infinitesimal transformation Prf. This complete system of n - 1 equations has therefore one common solution [omega], and P_r([omega]) is a function of [omega]. By choosing [omega] suitably, we can then make Pr([omega]) = 1. From this equation and the n - 1 equations [Pi]_i[omega] = 0, P_[sigma][omega] = 0, we can determine [omega] by a quadrature only. Hence can be deduced a much more general result, _that if the group of r parameters be integrable, the complete system can be entirety solved by quadratures_; it is only necessary to introduce the solution found by the first quadrature as an independent variable, whereby we obtain a complete system of n - r equations in n - 1 variables, subject to an integrable group of r - 1 parameters, and to continue this process. We give some examples of the application of the theorem. (1) If an equation of the first order y' = [psi](x, y) allow the infinitesimal transformation [xi]df/dx + [eta]df/dy, the integral curves [omega](x, y) = y°, wherein [omega](x, y) is the solution of df/dx + [psi](x, y) df/dy = 0 reducing to y for x = x°, are interchanged among themselves by the infinitesimal transformation, or [omega](x, y) can be chosen to make [xi]d[omega]/dx + [eta]d[omega]/dy = 1; this, with d[omega]/dx + [psi]d[omega]/dy = 0, determines [omega] as the integral of the complete differential (dy - [psi]dx)/([eta] - [psi][xi]). This result itself shows that every ordinary differential equation of the first order is subject to an infinite number of infinitesimal transformations. But every infinitesimal transformation [xi]df/dx + [eta]df/dy can by change of variables (after integration) be brought to the form df/dy, and all differential equations of the first order allowing this group can then be reduced to the form F(x, dy/dx) = 0. (2) In an ordinary equation of the second order y" = [psi](x, y, y'), equivalent to dy/dx = y1, dy1/dx = [psi](x, y, y1), if H, H1 be the solutions for y and y1 chosen to reduce to y^0 and y1° when x = x°, and the equations H = y, H1= y1 be equivalent to [omega] = y°, [omega]1 = y1°, then [omega], [omega]1 are the principal solutions of [Pi]f = df/dx + y1df/dy + [psi]df/dy1 = 0. If the original equation allow an infinitesimal transformation whose first _extended_ form (see GROUPS) is Pf = [xi]df/dx + [eta]df/dy + [eta]1df/dy1, where [eta]1[delta]t is the increment of dy/dx when [xi][delta]t, [eta][delta]t are the increments of x, y, and is to be expressed in terms of x, y, y1, then each of P[omega] and P[omega]1 must be functions of [omega] and [omega]1, or the partial differential equation [Pi]f must allow the group Pf. Thus by our general theorem, if the differential equation allow a group of two parameters (and such a group is always integrable), it can be solved by quadratures, our explanation sufficing, however, only provided the form [Pi]f and the two infinitesimal transformations are not linearly connected. It can be shown, from the fact that [eta]1 is a quadratic polynomial in y1, that no differential equation of the second order can allow more than 8 really independent infinitesimal transformations, and that every homogeneous linear differential equation of the second order allows just 8, being in fact reducible to d²y/dx² = 0. Since every group of more than two parameters has subgroups of two parameters, a differential equation of the second order allowing a group of more than two parameters can, as a rule, be solved by quadratures. By transforming the group we see that if a differential equation of the second order allows a single infinitesimal transformation, it can be transformed to the form F(x, d[gamma]/dx, d²[gamma]/dx²); this is not the case for every differential equation of the second order. (3) For an ordinary differential equation of the third order, allowing an integrable group of three parameters whose infinitesimal transformations are not linearly connected with the partial equation to which the solution of the given ordinary equation is reducible, the similar result follows that it can be integrated by quadratures. But if the group of three parameters be simple, this result must be replaced by the statement that the integration is reducible to quadratures and that of a so-called Riccati equation of the first order, of the form dy/dx = A + By + Cy², where A, B, C are functions of x. (4) Similarly for the integration by quadratures of an ordinary equation yn = [psi](x, y, y1, ... yn-1) of any order. Moreover, the group allowed by the equation may quite well consist of extended contact transformations. An important application is to the case where the differential equation is the resolvent equation defining the group of transformations or rationality group of another differential equation (see below); in particular, when the rationality group of an ordinary linear differential equation is integrable, the equation can be solved by quadratures.
Consideration of function theories of differential equations.
Following the practical and provisional division of theories of differential equations, to which we alluded at starting, into transformation theories and function theories, we pass now to give some account of the latter. These are both a necessary logical complement of the former, and the only remaining resource when the expedients of the former have been exhausted. While in the former investigations we have dealt only with values of the independent variables about which the functions are developable, the leading idea now becomes, as was long ago remarked by G. Green, the consideration of the neighbourhood of the values of the variables for which this developable character ceases. Beginning, as before, with existence theorems applicable for ordinary values of the variables, we are to consider the cases of failure of such theorems.
A general existence theorem.
When in a given set of differential equations the number of equations is greater than the number of dependent variables, the equations cannot be expected to have common solutions unless certain conditions of compatibility, obtainable by equating different forms of the same differential coefficients deducible from the equations, are satisfied. We have had examples in systems of linear equations, and in the case of a set of equations p1 = [phi]1, ..., pr = [phi]r. For the case when the number of equations is the same as that of dependent variables, the following is a general theorem which should be referred to: Let there be r equations in r dependent variables z1, ... zr and n independent variables x1, ... xn; let the differential coefficient of z[sigma] of highest order which enters be of order h[sigma], and suppose d^h_[sigma] z_[sigma]/dx1^h_[sigma] to enter, so that the equations can be written d^h_[sigma] z_[sigma]/dx1^h_[sigma] = [Phi]_[sigma], where in the general differential coefficient of z_[rho] which enters in [Phi]_[sigma], say
d^(k1 + ... + kn) z_[rho]/dx1^k1 ... dx_n^k_n,
we have k1 < h_[rho] and k1 + ... + k_n <= h_[rho]. Let a1, ... an, b1, ... br, and b[rho]_(k1 ... kn) be a set of values of
x1, ... x_n, z1, ... z_r
and of the differential coefficients entering in [Phi]_[sigma] about which all the functions [Phi]1, ... [Phi]_r, are developable. Corresponding to each dependent variable z_[sigma], we take now a set of h_[sigma] functions of x2, ... xn, say [phi][sigma], [phi][sigma]^(1), ..., [phi][sigma]^(h-1) arbitrary save that they must be developable about a2, a3, ... an, and such that for these values of x2, ... xn, the function [phi]_[rho] reduces to b_[rho], and the differential coefficient
d^(k2 + ... + kn) [phi]_[rho]^(k1)/dx2^k2 ... dx_n^kn
reduces to b^kn_(k1 ... kn). Then the theorem is that there exists one, and only one, set of functions z1, ... z_r, of x2, ... x_n developable about a1, ... an satisfying the given differential equations, and such that for x1 = a1 we have
z_[sigma] = [phi]_[sigma], dz_[sigma]/dx1 = [phi]_[sigma]^(1), ... d^(h_[sigma]-1) z_[sigma]/d^(h_[sigma]-1) x1 = [phi][sigma]^(h_[sigma]-1).
And, moreover, if the arbitrary functions [phi]_[sigma], [phi]_[sigma]^(1) ... contain a certain number of arbitrary variables t1, ... tm, and be developable about the values t1°, ... tm° of these variables, the solutions z1, ... zr will contain t1, ... tm, and be developable about t1°, ... tm°.
Singular points of solutions.
The proof of this theorem may be given by showing that if ordinary power series in x1 - -a1, ... xn - an, t1 - t1°, ... tm - tm° be substituted in the equations wherein in z[sigma] the coefficients of (x1 - a1)°, x1 - a1, ..., (x1 - a1)^(h_[sigma]-1) are the arbitrary functions [phi]_[sigma], [phi]_[sigma]^(1), ..., [phi]_[sigma]^h-1, divided respectively by 1, 1!, 2!, &c., then the differential equations determine uniquely all the other coefficients, and that the resulting series are convergent. We rely, in fact, upon the theory of monogenic analytical functions (see FUNCTION), a function being determined entirely by its development in the neighbourhood of one set of values of the independent variables, from which all its other values arise by _continuation_; it being of course understood that the coefficients in the differential equations are to be continued at the same time. But it is to be remarked that there is no ground for believing, if this method of continuation be utilized, that the function is single-valued; we may quite well return to the same values of the independent variables with a different value of the function; belonging, as we say, to a different branch of the function; and there is even no reason for assuming that the number of branches is finite, or that different branches have the same singular points and regions of existence. Moreover, and this is the most difficult consideration of all, all these circumstances may be dependent upon the values supposed given to the arbitrary constants of the integral; in other words, the singular points may be either _fixed_, being determined by the differential equations themselves, or they may be _movable_ with the variation of the arbitrary constants of integration. Such difficulties arise even in establishing the reversion of an elliptic integral, in solving the equation
/dx\² ( -- ) = (x-a1)(x - a2)(x - a3)(x - a4); \ds/
about an ordinary value the right side is developable; if we put x - a1 = t1², the right side becomes developable about t1 = 0; if we put x = 1/t, the right side of the changed equation is developable about t = 0; it is quite easy to show that the integral reducing to a definite value x0 for a value s0 is obtainable by a series in integral powers; this, however, must be supplemented by showing that for no value of s does the value of x become entirely undetermined.
Linear differential equations with rational coefficients.
These remarks will show the place of the theory now to be sketched of a particular class of ordinary linear homogeneous differential equations whose importance arises from the completeness and generality with which they can be discussed. We have seen that if in the equations dy/dx = y1, dy1/dx = y2, ..., dy_n-2/dx = y_n-1,
dy_n-1/dx = a_n y + a_n-1 y1 + ... + a1 y_n-1,
where a1, a2, ..., an are now to be taken to be rational functions of x, the value x = xº be one for which no one of these rational functions is infinite, and yº, yº1, ..., yº_n-1 be quite arbitrary finite values, then the equations are satisfied by
y = yºu + yº1u1 + ... + yº_n-1 u_n-1,
where u, u1, ..., un-1 are functions of x, independent of yº, ... yº_n-1, developable about x = xº; this value of y is such that for x = xº the functions y, y1 ... y_n-1 reduce respectively to yº, yº1, ... yº_n-1; it can be proved that the region of existence of these series extends within a circle centre xº and radius equal to the distance from xº of the nearest point at which one of a1, ... an becomes infinite. Now consider a region enclosing xº and only one of the places, say [Sigma], at which one of a1, ... an becomes infinite. When x is made to describe a closed curve in this region, including this point [Sigma] in its interior, it may well happen that the continuations of the functions u, u1, ..., u_n-1 give, when we have returned to the point x, values v, v1, ..., v_n-1, so that the integral under consideration becomes changed to yº + yº1v1 + ... + yº_n-1 v_n-1. At xº let this branch and the corresponding values of y1, ... y_n-1 be [eta]º, [eta]º1, ... [eta]º_n-1; then, as there is only one series satisfying the equation and reducing to ([eta]º, [eta]º1, ... [eta]º_n-1) for x = xº and the coefficients in the differential equation are single-valued functions, we must have [eta]ºu + [eta]º1u1 + ... + [eta]º_n-1 u_n-1 = yºv + yº1v1 + ... + yº_n-1 v_n-1; as this holds for arbitrary values of yº ... yº_n-1, upon which u, ... u_n-1 and v, ... v_n-1 do not depend, it follows that each of v, ... v_n-1 is a linear function of u, ... u_n-1 with constant coefficients, say v_i = A_i1 u + ... + A_in u_n-1. Then
yºv + ... + yº_n-1 v_n-1 = ([Sigma]_i A_i1 y_iº)u + ... + ([Sigma]_i A_in yº_i)u_n-1;
this is equal to [mu](yºu + ... + yº_n-1 u_n-1) if [Sigma]_i A_ir yº_i = [mu]yº_r-1; eliminating yº ... yº_n-1 from these linear equations, we have a determinantal equation of order n for [mu]; let [mu]1 be one of its roots; determining the ratios of yº, y1º, ... yº_n-1 to satisfy the linear equations, we have thus proved that there exists an integral, H, of the equation, which when continued round the point [Sigma] and back to the starting-point, becomes changed to H1 = [mu]1H. Let now [xi] be the value of x at [Sigma] and r1 one of the values of (1/2[pi]i) log [mu]1; consider the function (x - [xi])^r1 H; when x makes a circuit round x = [xi], this becomes changed to
exp(-2[pi]ir1) (x - [xi])^-r1 [mu]H,
that is, is unchanged; thus we may put H = (x - [xi])^r1 [phi]1, [phi]1 being a function single-valued for paths in the region considered described about [Sigma], and therefore, by Laurent's Theorem (see FUNCTION), capable of expression in the annular region about this point by a series of positive and negative integral powers of x - [xi], which in general may contain an infinite number of negative powers; there is, however, no reason to suppose r1 to be an integer, or even real. Thus, if all the roots of the determinantal equation in [mu] are different, we obtain n integrals of the forms (x -[xi])^r1 phi1, ..., (x - [xi])^rn [phi]_n. In general we obtain as many integrals of this form as there are really different roots; and the problem arises to discover, in case a root be k times repeated, k - 1 equations of as simple a form as possible to replace the k - 1 equations of the form yº + ... + yº_n-1 v_n-1 = [mu](yº + ... + yº_n-1 u_n-1) which would have existed had the roots been different. The most natural method of obtaining a suggestion lies probably in remarking that if r2 = r1 + h, there is an integral [(x - [xi])^(r1 + h) [phi]2 - (x -[xi])^r1 [phi]1]/h, where the coefficients in [phi]2 are the same functions of r1 + h as are the coefficients in [phi]1 of r1; when h vanishes, this integral takes the form _ _ | d[phi]1 | (x - [xi])^r1 | ------- + [phi]1 log (x - [xi])|, |_ dr1 _|
or say (x-[xi])^r1 [[phi]1 + [psi]1 log (x - [xi])];
denoting this by 2[pi]i[mu]1K, and (x-[xi])^r1 [phi]1 by H, a circuit of the point [xi] changes K into
1 K' = ----------- [e^(2[pi]ir1) (x - [xi])^r1 [psi]1 + e^(2[pi]ir1) (x - [xi])^r1 [phi]1 (2[pi]i + log(x - [xi]))] 2[pi]i[mu]1
= [mu]1K + H.
A similar artifice suggests itself when three of the roots of the determinantal equation are the same, and so on. We are thus led to the result, which is justified by an examination of the algebraic conditions, that whatever may be the circumstances as to the roots of the determinantal equation, n integrals exist, breaking up into batches, the values of the constituents H1, H2, ... of a batch after circuit about x = [xi] being H1' = [mu]1H1, H2' = [mu]1H2 + H1, H3' = [mu]1H3 + H2, and so on. And this is found to lead to the forms (x - [xi])^r1 [phi]1, (x - [xi])^r1 [[psi]1 + [phi]1 log (x - [xi])], (x - [xi])^r1 [[chi]1 + [chi]2 log (x - [xi]) + [phi]1(log(x - [xi]))²], and so on. Here each of [phi]1, [psi]1, [chi]1, [chi]2, ... is a series of positive and negative integral powers of x - [xi] in which the number of negative powers may be infinite.
Regular equations.
It appears natural enough now to inquire whether, under proper conditions for the forms of the rational functions a1, ... an, it may be possible to ensure that in each of the series [phi]1, [psi]1, [chi]1, ... the number of negative powers shall be finite. Herein lies, in fact, the limitation which experience has shown to be justified by the completeness of the results obtained. Assuming n integrals in which in each of [phi]1, [psi]1, [chi]1 ... the number of negative powers is finite, there is a definite homogeneous linear differential equation having these integrals; this is found by forming it to have the form
y'^n = (x - [xi])^-1 b1y'^(n-1) + (x - [xi])^-2 b2y'^(n-2) + ... +(x - [xi])^-n b_n y,
where b1, ... bn are finite for x = [xi]. Conversely, assume the equation to have this form. Then on substituting a series of the form (x - [xi])^r [1 + A1(x - [xi]) + A2(x - [xi])² + ... ] and equating the coefficients of like powers of x-[xi], it is found that r must be a root of an algebraic equation of order n; this equation, which we shall call the index equation, can be obtained at once by substituting for y only (x - [xi])^r and replacing each of b1, ... bn by their values at x = [xi]; arrange the roots r1, r2, ... of this equation so that the real part of ri is equal to, or greater than, the real part of r_i+1, and take r equal to r1; it is found that the coefficients A1, A2 ... are uniquely determinate, and that the series converges within a circle about x = [xi] which includes no other of the points at which the rational functions a1 ... an become infinite. We have thus a solution H1 = (x -[xi])^r1 [phi]1 of the differential equation. If we now substitute in the equation y = H1 f[eta]dx, it is found to reduce to an equation of order n - 1 for [eta] of the form
[eta]'^(n-1) = (x - [xi])^-1 c1[eta]'^(n-2) + ... + (x-[xi])^(n-1) c_n-1 [eta],
where c1, ... c_n-1 are not infinite at x = [xi]. To this equation precisely similar reasoning can then be applied; its index equation has in fact the roots r2 - r1 - 1, ... , rn - r1 - 1; if r2 - r1 be zero, the integral (x - [xi])^-1 [psi]1 of the [eta] equation will give an integral of the original equation containing log (x - [xi]); if r2 - r1 be an integer, and therefore a negative integer, the same will be true, unless in [psi]1 the term in (x - [xi])^(r1 - r2) be absent; if neither of these arise, the original equation will have an integral (x -[xi])^r2 [phi]2. The [eta] equation can now, by means of the one integral of it belonging to the index r2 - r1 - 1, be similarly reduced to one of order n - 2, and so on. The result will be that stated above. We shall say that an equation of the form in question is _regular_ about x = [xi].
Fuchsian equations.
Equation of the second order.
We may examine in this way the behaviour of the integrals at all the points at which any one of the rational functions a1 ... an becomes infinite; in general we must expect that beside these the value x = [oo] will be a singular point for the solutions of the differential equation. To test this we put x = 1/t throughout, and examine as before at t = 0. For instance, the ordinary linear equation with constant coefficients has no singular point for finite values of x; at x = [oo] it has a singular point and is not regular; or again, Bessel's equation x² + xy' + (x² - n²)y = 0 is regular about x = 0, but not about x = [oo]. An equation regular at all the finite singularities and also at x = [oo] is called a Fuchsian equation. We proceed to examine particularly the case of an equation of the second order
y" + ay' + by = 0.
Putting x = 1/t, it becomes
d²y/dt² + (2t^-1 - at^-2)dy/dt + bt^-4 y = 0,
which is not regular about t = 0 unless 2 - at^-1 and bt^-2, that is, unless ax and bx² are finite at x =[oo]; which we thus assume; putting y = t^r(1 + A1t + ... ), we find for the index equation at x = [inifinity] the equation r(r - 1) + r(2 - ax)_0 + (bx²)_0 = 0. If there be finite singular points at [xi]1, ... [xi]m, where we assume m>1, the cases m = 0, m = 1 being easily dealt with, and if [phi](x) = (x - [xi]1) ... (x -[xi]m), we must have a.[phi](x) and b·[[phi](x)]² finite for all finite values of x, equal say to the respective polynomials [psi](x) and [theta](x), of which by the conditions at x = [oo] the highest respective orders possible are m - 1 and 2(m - 1). The index equation at x = [xi]1 is r(r - 1) + r[psi]([xi]1)/[phi]'([xi]1) + [theta]([xi])1/[[phi]'([xi]1)]² = 0, and if [alpha]1, [beta]1 be its roots, we have [alpha]1 + [beta]1 = 1 - [psi]([xi]1)/[phi]'([xi]1) and [alpha]1[beta]1 = [theta]([xi])1/[[phi]'([xi]1)]². Thus by an elementary theorem of algebra, the sum [Sigma](1 - [alpha]i - [beta]i)/(x - [xi]i), extended to the m finite singular points, is equal to [psi](x)/[phi](x), and the sum [Sigma](1 - [alpha]i - [beta]i) is equal to the ratio of the coefficients of the highest powers of x in [psi](x) and [phi](x), and therefore equal to 1 + [alpha] + [beta], where [alpha], [beta] are the indices at x = [oo]. Further, if (x, 1)m-2 denote the integral part of the quotient [theta](x)/[phi](x), we have [Sigma][alpha]_i[beta]_i[phi]'([xi]_i)/(x - [xi]_i) equal to -(x, 1)_m-2 + [theta](x)/[phi](x), and the coefficient of x^m-2 in (x, 1)_m-2 is [alpha][beta]. Thus the differential equation has the form
y" + y'[Sigma](1 - [alpha]_i - [beta]_i)/(x - [xi]_i) + y[(x, 1)_m-2 + [Sigma][alpha]_i[beta]_i[phi]'([xi]_i)/(x - [xi]_i)]/[phi](x) = 0.
If, however, we make a change in the dependent variable, putting y = (x - [xi]1)^[alpha]1 ... (x - [xi]_m)^[alpha] m[eta], it is easy to see that the equation changes into one having the same singular points about each of which it is regular, and that the indices at x = [xi]_i become 0 and [beta]_i - [alpha]_i, which we shall denote by [lambda]i, for (x -[xi]_i)^[alpha]j can be developed in positive integral powers of x -[xi]_i about x = [xi]_i; by this transformation the indices at x = [oo] are changed to
[alpha] + [alpha]1 + ... + [alpha]m, [beta] + [beta]1 + ... + [beta]m
which we shall denote by [lambda], [mu]. If we suppose this change to have been introduced, and still denote the independent variable by y, the equation has the form
y" + y'[Sigma](1 - [lambda]_i)/(x - [xi]_i) + y(x, 1)_m-2/[phi](x) = 0,
while [lambda] + [mu] + [lambda]1 + ... + [lambda]_m = m - 1. Conversely, it is easy to verify that if [lambda][mu] be the coefficient of x^m-2 in (x, 1)_m-2, this equation has the specified singular points and indices whatever be the other coefficients in (x, 1)_m-2.
Hypergeometric equation.
Thus we see that (beside the cases m = 0, m = 1) the "Fuchsian equation" of the second order with _two_ finite singular points is distinguished by the fact that it has a definite form when the singular points and the indices are assigned. In that case, putting (x - [xi]1)/(x - [xi]2) = t/(t - 1), the singular points are transformed to 0, 1, [oo], and, as is clear, without change of indices. Still denoting the independent variable by x, the equation then has the form
x(1 - x)y" + y'[1 - [lambda]1 - x(1 + [lambda] + [mu])] - [lambda][mu]y = 0,
which is the ordinary hypergeometric equation. Provided none of [lambda]1, [lambda]2, [lambda] - [mu] be zero or integral about x = 0, it has the solutions
F([lambda], [mu], 1 - [lambda]1, x), x^[lambda]1 F([lambda] + [lambda]1, [mu] + [lambda]1, 1 + [lambda]1, x);
about x = 1 it has the solutions
F([lambda], [mu], 1 - [lambda]2, 1 - x), (1 - x)^[lambda]1 F([lambda] + [lambda]2, [mu] + [lambda]2, 1 + [lambda]2, 1 - x),
where [lambda] + [mu] + [lambda]1 + [lambda]2 = 1; about x = [oo] it has the solutions
x^-[lambda] F([lambda], [lambda] + [lambda]1, [lambda] - [mu] + 1, x^-1), x^-[mu] F([mu], [mu] + [lambda]1, [mu] - [lambda] + 1, x^-1),
where F([alpha], [beta], [gamma], x) is the series
[alpha][beta]x [alpha]([alpha] + 1)[beta]([beta] + 1)x² 1 + -------------- + ---------------------------------------- ..., [gamma] 1·2·[gamma]([gamma] + 1)
which converges when |x| < 1, whatever [alpha], [beta], [gamma] may be, converges for all values of x for which |x| = 1 provided the real part of [gamma] - [alpha] - [beta] < 0 algebraically, and converges for all these values except x = 1 provided the real part of [gamma] - [alpha] -[beta] > -1 algebraically.
In accordance with our general theory, logarithms are to be expected in the solution when one of [lambda]1, [lambda]2, [lambda] - [mu] is zero or integral. Indeed when [lambda]1 is a negative integer, not zero, the second solution about x = 0 would contain vanishing factors in the denominators of its coefficients; in case [lambda] or [mu] be one of the positive integers 1, 2, ... (-[lambda]1), vanishing factors occur also in the numerators; and then, in fact, the second solution about x = 0 becomes x^[lambda]1 times an integral polynomial of degree (-[lambda]1) - [lambda] or of degree (-[lambda]1) - [mu]. But when [lambda]1 is a negative integer including zero, and neither [lambda] nor [mu] is one of the positive integers 1, 2 ... (-[lambda]1), the second solution about x = 0 involves a term having the factor log x. When [lambda]1 is a positive integer, not zero, the second solution about x = 0 persists as a solution, in accordance with the order of arrangement of the roots of the index equation in our theory; the first solution is then replaced by an integral polynomial of degree -[lambda] or -[mu]1, when [lambda] or [mu] is one of the negative integers 0, -1, -2, ..., 1 - [lambda]1, but otherwise contains a logarithm. Similarly for the solutions about x = 1 or x = [oo]; it will be seen below how the results are deducible from those for x = 0.
March of the Integral.
Denote now the solutions about x = 0 by u1, u2; those about x = 1 by v1, v2; and those about x = [oo] by w1, w2; in the region (S0S1) common to the circles S0, S1 of radius 1 whose centres are the points x = 0, x = 1, all the first four are valid, and there exist equations u1 =Av1 + Bv2, u2 = Cv1 + Dv2 where A, B, C, D are constants; in the region (S1S) lying inside the circle S1 and outside the circle S0, those that are valid are v1, v2, w1, w2, and there exist equations v1 = Pw1 + Qw2, v2 = Rw1 + Tw2, where P, Q, R, T are constants; thus considering any integral whose expression within the circle S0 is au1 + bu2, where a, b are constants, the same integral will be represented within the circle S1 by (aA + bC)v1 + (aB + bD)v2, and outside these circles will be represented by
[(aA + bC)P + (aB + bD)R]w1 + [(aA + bC)Q + (aB + bD)T]w2.
A single-valued branch of such integral can be obtained by making a barrier in the plane joining [oo] to 0 and 1 to [oo]; for instance, by excluding the consideration of real negative values of x and of real positive values greater than 1, and defining the phase of x and x - 1 for real values between 0 and 1 as respectively 0 and [pi].
Transformation of the equation into itself.
We can form the Fuchsian equation of the second order with three arbitrary singular points [xi]1, [xi]2, [xi]3, and no singular point at x = [oo], and with respective indices [alpha]1, [beta]1, [alpha]2, [beta]2, [alpha]3, [beta]3 such that [alpha]1 + [beta]1 + [alpha]2 + [beta]2 + [alpha]3 + [beta]3 = 1. This equation can then be transformed into the hypergeometric equation in 24 ways; for out of [xi]1, [xi]2, [xi]3 we can in six ways choose two, say [xi]1, [xi]2, which are to be transformed respectively into 0 and 1, by (x - [xi]1)/(x - [xi]2) = t(t - 1); and then there are four possible transformations of the dependent variable which will reduce one of the indices at t = 0 to zero and one of the indices at t = 1 also to zero, namely, we may reduce either [alpha]1 or [beta]1 at t = 0, and simultaneously either [alpha]2 or [beta]2 at t = 1. Thus the hypergeometric equation itself can be transformed into itself in 24 ways, and from the expression F([lambda], [mu], 1 - [lambda]1, x) which satisfies it follow 23 other forms of solution; they involve four series in each of the arguments, x, x-1, 1/x, 1/(1-x), (x-1)/x, x/(x-1). Five of the 23 solutions agree with the fundamental solutions already described about x = 0, x = 1, x = [oo]; and from the principles by which these were obtained it is immediately clear that the 24 forms are, in value, equal in fours.
Inversion. Modular functions.
The quarter periods K, K' of Jacobi's theory of elliptic functions, of which K = [int] [0 to [pi]/2] (1 - h sin²[theta])^-½ d[theta], and K' is the same function of 1-h, can easily be proved to be the solutions of a hypergeometric equation of which h is the independent variable. When K, K' are regarded as defined in terms of h by the differential equation, the ratio K'/K is an infinitely many valued function of h. But it is remarkable that Jacobi's own theory of theta functions leads to an expression for h in terms of K'/K (see FUNCTION) in terms of single-valued functions. We may then attempt to investigate, in general, in what cases the independent variable x of a hypergeometric equation is a single-valued function of the ratio s of two independent integrals of the equation. The same inquiry is suggested by the problem of ascertaining in what cases the hypergeometric series F([alpha], [beta], [gamma], x) is the expansion of an algebraic (irrational) function of x. In order to explain the meaning of the question, suppose that the plane of x is divided along the real axis from -[oo] to 0 and from 1 to +[oo], and, supposing logarithms not to enter about x = 0, choose two quite definite integrals y1, y2 of the equation, say
y1 = F([lambda], [mu], 1-[lambda]1, x), y2 = x^[lambda]1 F([lambda] + [lambda]1, [mu] + [lambda]1, 1 + [lambda]1, x),
with the condition that the phase of x is zero when x is real and between 0 and 1. Then the value of [sigma] = y2/y1 is definite for all values of x in the divided plane, [sigma] being a single-valued monogenic branch of an analytical function existing and without singularities all over this region. If, now, the values of [sigma] that so arise be plotted on to another plane, a value p + iq of [sigma] being represented by a point (p, q) of this [stigma]-plane, and the value of x from which it arose being mentally associated with this point of the [sigma]-plane, these points will fill a connected region therein, with a continuous boundary formed of four portions corresponding to the two sides of the two barriers of the x-plane. The question is then, firstly, whether the same value of s can arise for two different values of x, that is, whether the same point (p, q) of the [sigma]-plane can arise twice, or in other words, whether the region of the [sigma]-plane overlaps itself or not. Supposing this is not so, a second part of the question presents itself. If in the x-plane the barrier joining -[oo] to 0 be momentarily removed, and x describe a small circle with centre at x = 0 starting from a point x = -h - ik, where h, k are small, real, and positive and coming back to this point, the original value s at this point will be changed to a value [sigma], which in the original case did not arise for this value of x, and possibly not at all. If, now, after restoring the barrier the values arising by continuation from [sigma] be similarly plotted on the s-plane, we shall again obtain a region which, while not overlapping itself, may quite possibly overlap the former region. In that case two values of x would arise for the same value or values of the quotient y2/y1, arising from two different branches of this quotient. We shall understand then, by the condition that x is to be a single-valued function of x, that the region in the [stimga]-plane corresponding to any branch is not to overlap itself, and that no two of the regions corresponding to the different branches are to overlap. Now in describing the circle about x = 0 from x = -h - ik to -h + ik, where h is small and k evanescent,
[stigma] = x^[lambda]1 F([lambda] + [lambda]1, [mu] + [lambda]1, 1 + [lambda]1, x)/F([lambda], [mu], 1 - [lambda]1, x)
is changed to [sigma] = [stigma]e^(2[pi]i[lambda])1. Thus the two portions of boundary of the s-region corresponding to the two sides of the barrier (-[oo], 0) meet (at [sigmaf] = 0 if the real part of [lambda]1 be positive) at an angle 2[pi]L1, where L1 is the absolute value of the real part of [lambda]1; the same is true for the [sigma]-region representing the branch [sigma]. The condition that the s-region shall not overlap itself requires, then, L1 = 1. But, further, we may form an infinite number of branches [sigma] = [stigma]e^(2[pi]i[lambda])1, [sigma]1 = e^(2[pi]i[lambda])1, ... in the same way, and the corresponding regions in the plane upon which y2/y1 is represented will have a common point and each have an angle 2[pi]L1; if neither overlaps the preceding, it will happen, if L1 is not zero, that at length one is reached overlapping the first, unless for some positive integer [alpha] we have 2[pi][alpha]L1 = 2[pi], in other words L1 = 1/a. If this be so, the branch [sigma]_a-1 = [stigma]e^(2[pi]ia[lambda])1 will be represented by a region having the angle at the common point common with the region for the branch [stigma]; but not altogether coinciding with this last region unless [lambda]1 be real, and therefore = ±1/a; then there is only a finite number, a, of branches obtainable in this way by crossing the barrier (-[oo], 0). In precisely the same way, if we had begun by taking the quotient
[stigma]' = (x - 1)^[lambda]2 F([lambda] + [lambda]2, [mu] + [lambda]2, 1 + [lambda]2, 1 - x)/F([lambda], [mu], 1 - [lambda]2, 1 - x)
of the two solutions about x = 1, we should have found that x is not a single-valued function of [stigma]' unless [lambda]2 is the inverse of an integer, or is zero; as [stigma]' is of the form (A[stigma] + B)/(C[stigma] + D), A, B, C, D constants, the same is true in our case; equally, by considering the integrals about x = [oo] we find, as a third condition necessary in order that x may be a single-valued function of [stigma], that [lambda] - [mu] must be the inverse of an integer or be zero. These three differences of the indices, namely, [lambda]1, [lambda]2, [lambda] - [mu], are the quantities which enter in the differential equation satisfied by x as a function of [stigma], which is easily found to be
x111 3²x²11 - ---- + ------ = ½(h - h1 - h2)x^-1 (x - 1)^-1 + ½h1 x^-2 + ½h2(x - 1)^-2, x1³ 2x1^4
where x1 = dx/d[stigma], &c.; and h1 = 1 - y1², h2 = 1 - [lambda]2², h3 = 1 - ([lambda] - [mu])². Into the converse question whether the three conditions are sufficient to ensure (1) that the [stigma] region corresponding to any branch does not overlap itself, (2) that no two such regions overlap, we have no space to enter. The second question clearly requires the inquiry whether the group (that is, the monodromy group) of the differential equation is properly discontinuous. (See GROUPS, THEORY OF.)
The foregoing account will give an idea of the nature of the function theories of differential equations; it appears essential not to exclude some explanation of a theory intimately related both to such theories and to transformation theories, which is a generalization of Galois's theory of algebraic equations. We deal only with the application to homogeneous linear differential equations.
Rationality group of a linear equation.
Irreducibility of a rational equation.
In general a function of variables x1, x2 ... is said to be rational when it can be formed from them and the integers 1, 2, 3, ... by a finite number of additions, subtractions, multiplications and divisions. We generalize this definition. Assume that we have assigned a fundamental series of quantities and functions of x, in which x itself is included, such that all quantities formed by a finite number of additions, subtractions, multiplications, divisions _and differentiations in regard to x_, of the terms of this series, are themselves members of this series. Then the quantities of this series, and only these, are called _rational_. By a rational function of quantities p, q, r, ... is meant a function formed from them and any of the fundamental rational quantities by a finite number of the five fundamental operations. Thus it is a function which would be called, simply, rational if the fundamental series were widened by the addition to it of the quantities p, q, r, ... and those derivable from them by the five fundamental operations. A rational ordinary differential equation, with x as independent and y as dependent variable, is then one which equates to zero a rational function of y, the order k of the differential equation being that of the highest differential coefficient y^(k) which enters; only such equations are here discussed. Such an equation P = 0 is called _irreducible_ when, firstly, being arranged as an integral polynomial in y^(k), this polynomial is not the product of other polynomials in y^(k) also of rational form; and, secondly, the equation has no solution satisfying also a rational equation of lower order. From this it follows that if an irreducible equation P = 0 have one solution satisfying another rational equation Q = 0 of the same or higher order, then all the solutions of P = 0 also satisfy Q = 0. For from the equation P = 0 we can by differentiation express y^(k+1), y^(k+2), ... in terms of x, y, y^(1), ... , y^(k), and so put the function Q rationally in terms of these quantities only. It is sufficient, then, to prove the result when the equation Q = 0 is of the same order as P = 0. Let both the equations be arranged as integral polynomials in y^(k); their algebraic eliminant in regard to y^(k) must then vanish identically, for they are known to have one common solution not satisfying an equation of lower order; thus the equation P = 0 involves Q = 0 for all solutions of P = 0.
The variant function for a linear equation.
Now let y^(n) = [alpha]1y^(n-1) + ... + [alpha]_n y be a given rational homogeneous linear differential equation; let y1, ... yn be n particular functions of x, unconnected by any equation with constant coefficients of the form c1y1 + ... + cnyn = 0, all satisfying the differential equation; let [eta]1, ... [eta]n be linear functions of y1, ... yn, say [eta]i = A_i1 y1 + ... + A_in yn, where the constant coefficients Aij have a non-vanishing determinant; write ([eta]) = A(y), these being the equations of a general linear homogeneous group whose transformations may be denoted by A, B, .... We desire to form a rational function [phi]([eta]), or say [phi](A(y)), of [eta]1, ... [eta], in which the [eta]² constants Aij shall all be essential, and not reduce effectively to a fewer number, as they would, for instance, if the y1, ... yn were connected by a linear equation with constant coefficients. Such a function is in fact given, if the solutions y1, ... yn be developable in positive integral powers about x = a, by [phi]([eta]) = [eta]1 + (x - a)^n[eta]2 + ... + (x - a)^(n-1)n[eta]n. Such a function, V, we call a _variant_.
The resolvent eqution.
Then differentiating V in regard to x, and replacing [eta]i^(n) by its value a1[eta]^(n-1) + ... + an[eta], we can arrange dV/dx, and similarly each of d²/dx² ... d^NV/dx^N, where N = n², as a linear function of the N quantities [eta]1, ... [eta]n, ... [eta]1^(n-1), ... [eta]n^(n-1), and thence by elimination obtain a linear differential equation for V of order N with rational coefficients. This we denote by F = 0. Further, each of [eta]1 ... [eta]n is expressible as a linear function of V, dV/dx, ... d^(N-1)V/dx^(N-1), with rational coefficients not involving any of the n² coefficients A_ij, since otherwise V would satisfy a linear equation of order less than N, which is impossible, as it involves (linearly) the n² arbitrary coefficients Aij, which would not enter into the coefficients of the supposed equation. In particular, y1 ,.. yn are expressible rationally as linear functions of [omega], d[omega]/dx, ... d^(N-1)[omega]/dx^(N-1), where [omega] is the particular function [phi](y). Any solution W of the equation F = 0 is derivable from functions [zeta]1, ... [zeta]n, which are linear functions of y1, ... yn, just as V was derived from [eta]1, ... [eta]n; but it does not follow that these functions [zeta]i, ... [zeta]n are obtained from y1, ... yn by a transformation of the linear group A, B, ... ; for it may happen that the determinant d([zeta]1, ... [zeta]n)/(dy1, ... yn) is zero. In that case [zeta]1, ... [zeta]n may be called a singular set, and W a singular solution; it satisfies an equation of lower than the N-th order. But every solution V, W, ordinary or singular, of the equation F = 0, is expressible rationally in terms of [omega], d[omega]/dx, ... d^(N-1)[omega]/dx^(N-1); we shall write, simply, V = r([omega]). Consider now the rational irreducible equation of lowest order, not necessarily a linear equation, which is satisfied by [omega]; as y1, ... yn are particular functions, it may quite well be of order less than N; we call it the _resolvent equation_, suppose it of order p, and denote it by [gamma](v). Upon it the whole theory turns. In the first place, as [gamma](v) = 0 is satisfied by the solution [omega] of F = 0, all the solutions of [gamma](v) are solutions F = 0, and are therefore rationally expressible by [omega]; any one may then be denoted by r([omega]). If this solution of F = 0 be not singular, it corresponds to a transformation A of the linear group (A, B, ...), effected upon y1, ... yn. The coefficients Aij of this transformation follow from the expressions before mentioned for [eta]1 ... [eta]n in terms of V, dV/dx, d²V/dx², ... by substituting V = r([omega]); thus they depend on the p arbitrary parameters which enter into the general expression for the integral of the equation [gamma](v) = 0. Without going into further details, it is then clear enough that the resolvent equation, being irreducible and such that any solution is expressible rationally, with p parameters, in terms of the solution [omega], enables us to define a linear homogeneous group of transformations of y1 ... yn depending on p parameters; and every operation of this (continuous) group corresponds to a rational transformation of the solution of the resolvent equation. This is the group called the _rationality group_, or the _group of transformations_ of the original homogeneous linear differential equation.
The group must not be confounded with a subgroup of itself, the _monodromy group_ of the equation, often called simply the group of the equation, which is a set of transformations, not depending on arbitrary variable parameters, arising for one particular fundamental set of solutions of the linear equation (see GROUPS, THEORY OF).
The fundamental theorem in regard to the rationality group.
The importance of the rationality group consists in three propositions. (1) Any rational function of y1, ... yn which is unaltered in value by the transformations of the group can be written in rational form. (2) If any rational function be changed in form, becoming a rational function of y1, ... yn, a transformation of the group applied to its new form will leave its value unaltered. (3) Any homogeneous linear transformation leaving unaltered the value of every rational function of y1, ... yn which has a rational value, belongs to the group. It follows from these that any group of linear homogeneous transformations having the properties (1) (2) is identical with the group in question. It is clear that with these properties the group must be of the greatest importance in attempting to discover what functions of x must be regarded as rational in order that the values of y1 ... yn may be expressed. And this is the problem of solving the equation from another point of view.
LITERATURE.--([alpha]) _Formal or Transformation Theories for Equations of the First Order_:--E. Goursat, _Leçons sur l'intégration des équations aux dérivées partielles du premier ordre_ (Paris, 1891); E. v. Weber, _Vorlesungen über das Pfaff'sche Problem und die Theorie der partiellen Differentialgleichungen erster Ordnung_ (Leipzig, 1900); S. Lie und G. Scheffers, _Geometrie der Berührungstransformationen_, Bd. i. (Leipzig, 1896); Forsyth, _Theory of Differential Equations, Part i., Exact Equations and Pfaff's Problem_ (Cambridge, 1890); S. Lie, "Allgemeine Untersuchungen über Differentialgleichungen, die eine continuirliche endliche Gruppe gestatten" (Memoir), _Mathem. Annal._xxv. (1885), pp. 71-151; S. Lie und G. Scheffers, _Vorlesungen über Differentialgleichungen mit bekannten infinitesimalen Transformationen_ (Leipzig, 1891). A very full bibliography is given in the book of E. v. Weber referred to; those here named are perhaps sufficiently representative of modern works. Of classical works may be named: Jacobi, _Vorlesungen über Dynamik_ (von A. Clebsch, Berlin, 1866); _Werke, Supplementband_; G Monge, _Application de l'analyse à la géométrie_ (par M. Liouville, Paris, 1850); J. L. Lagrange, _Leçons sur le calcul des fonctions_ (Paris, 1806), and _Théorie des fonctions analytiques_ (Paris, Prairial, an V); G. Boole, _A Treatise on Differential Equations_ (London, 1859); and _Supplementary Volume_ (London, 1865); Darboux, _Leçons sur la théorie générale des surfaces_, tt. i.-iv. (Paris, 1887-1896); S. Lie, _Théorie der transformationsgruppen_ ii. (on Contact Transformations) (Leipzig, 1890).
([beta]) _Quantitative or Function Theories for Linear Equations_:--C. Jordan, _Cours d'analyse_, t. iii. (Paris, 1896); E. Picard, _Traité d'analyse_, tt. ii. and iii. (Paris, 1893, 1896); Fuchs, _Various Memoirs, beginning with that in Crelle's Journal_, Bd. lxvi. p. 121; Riemann, _Werke_, 2^r Aufl. (1892); Schlesinger, _Handbuch der Theorie der linearen Differentialgleichungen_, Bde. i.-ii. (Leipzig, 1895-1898); Heffter, _Einleitung in die Theorie der linearen Differentialgleichungen mit einer unabhängigen Variablen_ (Leipzig, 1894); Klein, _Vorlesungen über lineare Differentialgleichungen der zweiten Ordnung_ (Autographed, Göttingen, 1894); and _Vorlesungen über die hypergeometrische Function_ (Autographed, Göttingen, 1894); Forsyth, _Theory of Differential Equations, Linear Equations_.
([gamma]) _Rationality Group (of Linear Differential Equations)_:--Picard, _Traité d'Analyse_, as above, t. iii.; Vessiot, _Annales de l'École Normale_, série III. t. ix. p. 199 (Memoir); S. Lie, _Transformationsgruppen_, as above, iii. A connected account is given in Schlesinger, as above, Bd. ii., erstes Theil.
([delta]) _Function Theories of Non-Linear Ordinary Equations_:--Painlevé, _Leçons sur la théorie analytique des équations différentielles_ (Paris, 1897, Autographed); Forsyth, _Theory of Differential Equations, Part ii., Ordinary Equations not Linear_ (two volumes, ii. and iii.) (Cambridge, 1900); Königsberger, _Lehrbuch der Theorie der Differentialgleichungen_ (Leipzig, 1889); Painlevé, _Leçons sur l'intégration des équations differentielles de la mécanique et applications_ (Paris, 1895).
([epsilon]) _Formal Theories of Partial Equations of the Second and Higher Orders_:--E. Goursat, _Leçons sur l'intégration des équations aux dérivées partielles du second ordre_, tt. i. and ii. (Paris, 1896, 1898); Forsyth, _Treatise on Differential Equations_ (London, 1889); and _Phil. Trans. Roy. Soc._ (A.), vol. cxci. (1898), pp. 1-86.
([zeta]) See also the six extensive articles in the second volume of the German _Encyclopaedia of Mathematics_. (H. F. BA.)
DIFFLUGIA (L. Leclerc), a genus of lobose Rhizopoda, characterized by a shell formed of sand granules cemented together; these are swallowed by the animal, and during the process of bud-fission they pass to the surface of the daughter-bud and are cemented there. _Centropyxis_ (Steia) and _Lecqueureuxia_ (Schlumberg) differ only in minor points.
DIFFRACTION OF LIGHT.--1. When light proceeding from a small source falls upon an opaque object, a shadow is cast upon a screen situated behind the obstacle, and this shadow is found to be bordered by alternations of brightness and darkness, known as "diffraction bands." The phenomena thus presented were described by Grimaldi and by Newton. Subsequently T. Young showed that in their formation interference plays an important part, but the complete explanation was reserved for A. J. Fresnel. Later investigations by Fraunhofer, Airy and others have greatly widened the field, and under the head of "diffraction" are now usually treated all the effects dependent upon the limitation of a beam of light, as well as those which arise from irregularities of any kind at surfaces through which it is transmitted, or at which it is reflected.
2. _Shadows._--In the infancy of the undulatory theory the objection most frequently urged against it was the difficulty of explaining the very existence of shadows. Thanks to Fresnel and his followers, this department of optics is now precisely the one in which the theory has gained its greatest triumphs. The principle employed in these investigations is due to C. Huygens, and may be thus formulated. If round the origin of waves an ideal closed surface be drawn, the whole action of the waves in the region beyond may be regarded as due to the motion continually propagated across the various elements of this surface. The wave motion due to any element of the surface is called a _secondary_ wave, and in estimating the total effect regard must be paid to the phases as well as the amplitudes of the components. It is usually convenient to choose as the surface of resolution a _wave-front_, i.e. a surface at which the primary vibrations are in one phase. Any obscurity that may hang over Huygens's principle is due mainly to the indefiniteness of thought and expression which we must be content to put up with if we wish to avoid pledging ourselves as to the character of the vibrations. In the application to sound, where we know what we are dealing with, the matter is simple enough in principle, although mathematical difficulties would often stand in the way of the calculations we might wish to make. The ideal surface of resolution may be there regarded as a flexible lamina; and we know that, if by forces locally applied every element of the lamina be made to move normally to itself exactly as the air at that place does, the external aerial motion is fully determined. By the principle of superposition the whole effect may be found by integration of the partial effects due to each element of the surface, the other elements remaining at rest.
We will now consider in detail the important case in which uniform plane waves are resolved at a surface coincident with a wave-front (OQ). We imagine a wave-front divided into elementary rings or zones--often named after Huygens, but better after Fresnel--by spheres described round P (the point at which the aggregate effect is to be estimated), the first sphere, touching the plane at O, with a radius equal to PO, and the succeeding spheres with radii increasing at each step by ½[lambda]. There are thus marked out a series of circles, whose radii x are given by x² + r² = (r + ½n[lambda])², or x² = n[lambda]r nearly; so that the rings are at first of nearly equal area. Now the effect upon P of each element of the plane is proportional to its area; but it depends also upon the distance from P, and possibly upon the inclination of the secondary ray to the direction of vibration and to the wave-front.
O x Q --------------------------- | / | / | / | / | / | / | / r| / | / | / | / | / | / | / | / | / P|/
FIG. 1.
The latter question can only be treated in connexion with the dynamical theory (see below, § 11); but under all ordinary circumstances the result is independent of the precise answer that may be given. All that it is necessary to assume is that the effects of the successive zones gradually diminish, whether from the increasing obliquity of the secondary ray or because (on account of the limitation of the region of integration) the zones become at last more and more incomplete. The component vibrations at P due to the successive zones are thus nearly equal in amplitude and opposite in phase (the phase of each corresponding to that of the infinitesimal circle midway between the boundaries), and the series which we have to sum is one in which the terms are alternately opposite in sign and, while at first nearly constant in numerical magnitude, gradually diminish to zero. In such a series each term may be regarded as very nearly indeed destroyed by the halves of its immediate neighbours, and thus the sum of the whole series is represented by half the first term, which stands over uncompensated. The question is thus reduced to that of finding the effect of the first zone, or central circle, of which the area is [pi][lambda]r.
We have seen that the problem before us is independent of the law of the secondary wave as regards obliquity; but the result of the integration necessarily involves the law of the intensity and phase of a secondary wave as a function of r, the distance from the origin. And we may in fact, as was done by A. Smith (_Camb. Math. Journ._, 1843, 3, p. 46), determine the law of the secondary wave, by comparing the result of the integration with that obtained by supposing the primary wave to pass on to P without resolution.
Now as to the phase of the secondary wave, it might appear natural to suppose that it starts from any point Q with the phase of the primary wave, so that on arrival at P, it is retarded by the amount corresponding to QP. But a little consideration will prove that in that case the series of secondary waves could not reconstitute the primary wave. For the aggregate effect of the secondary waves is the half of that of the first Fresnel zone, and it is the central element only of that zone for which the distance to be travelled is equal to r. Let us conceive the zone in question to be divided into infinitesimal rings of equal area. The effects due to each of these rings are equal in amplitude and of phase ranging uniformly over half a complete period. The phase of the resultant is midway between those of the extreme elements, that is to say, a quarter of a period behind that due to the element at the centre of the circle. It is accordingly necessary to suppose that the secondary waves start with a phase one-quarter of a period in advance of that of the primary wave at the surface of resolution.
Further, it is evident that account must be taken of the variation of phase in estimating the magnitude of the effect at P of the first zone. The middle element alone contributes without deduction; the effect of every other must be found by introduction of a resolving factor, equal to cos [theta], if [theta] represent the difference of phase between this element and the resultant. Accordingly, the amplitude of the resultant will be less than if all its components had the same phase, in the ratio
_ +½[pi] / | cos [theta]d[theta] : [pi], _/-½[pi]
or 2 : [pi]. Now 2 area /[pi] = 2[lambda]r; so that, in order to reconcile the amplitude of the primary wave (taken as unity) with the half effect of the first zone, the amplitude, at distance r, of the secondary wave emitted from the element of area dS must be taken to be
dS/[lambda]r (1).
By this expression, in conjunction with the quarter-period acceleration of phase, the law of the secondary wave is determined.
That the amplitude of the secondary wave should vary as r^-1 was to be expected from considerations respecting energy; but the occurrence of the factor [lambda]^-1, and the acceleration of phase, have sometimes been regarded as mysterious. It may be well therefore to remember that precisely these laws apply to a secondary wave of sound, which can be investigated upon the strictest mechanical principles.
The recomposition of the secondary waves may also be treated analytically. If the primary wave at O be cos kat, the effect of the secondary wave proceeding from the element dS at Q is
dS dS ------------- cos k(at - [rho] + ¼[lambda]) = ------------- sin k(at - [rho]). [lambda][rho] [lambda][rho]
If dS = 2[pi]xdx, we have for the whole effect
_[oo] 2[pi] / sin k(at - [rho])x dx - -------- | ---------------------, [lambda] _/ 0 [rho]
or, since xdx = [rho]d[rho], k = 2[pi]/[lambda],
_[oo] _ _ / | |[oo] -k | sin k(at - [rho])d[rho] = | -cos k(at - [rho])| . _/r |_ _|r
In order to obtain the effect of the primary wave, as retarded by traversing the distance r, viz. cos k(at - r), it is necessary to suppose that the integrated term vanishes at the upper limit. And it is important to notice that without some further understanding the integral is really ambiguous. According to the assumed law of the secondary wave, the result must actually depend upon the precise radius of the outer boundary of the region of integration, supposed to be exactly circular. This case is, however, at most very special and exceptional. We may usually suppose that a large number of the outer rings are incomplete, so that the integrated term at the upper limit may properly be taken to vanish. If a formal proof be desired, it may be obtained by introducing into the integral a factor such as e^-h[rho], in which h is ultimately made to diminish without limit.
When the primary wave is plane, the area of the first Fresnel zone is [pi][lambda]r, and, since the secondary waves vary as r^-1, the intensity is independent of r, as of course it should be. If, however, the primary wave be spherical, and of radius a at the wave-front of resolution, then we know that at a distance r further on the amplitude of the primary wave will be diminished in the ratio a:(r + a). This may be regarded as a consequence of the altered area of the first Fresnel zone. For, if x be its radius, we have
/ {(r + ½[lambda])² - x²} + \/ {a² - x²} = r + a,
so that
x² = [lambda]ar/(a + r) nearly.
Since the distance to be travelled by the secondary waves is still r, we see how the effect of the first zone, and therefore of the whole series is proportional to a/(a + r). In like manner may be treated other cases, such as that of a primary wave-front of unequal principal curvatures.
The general explanation of the formation of shadows may also be conveniently based upon Fresnel's zones. If the point under consideration be so far away from the geometrical shadow that a large number of the earlier zones are complete, then the illumination, determined sensibly by the first zone, is the same as if there were no obstruction at all. If, on the other hand, the point be well immersed in the geometrical shadow, the earlier zones are altogether missing, and, instead of a series of terms beginning with finite numerical magnitude and gradually diminishing to zero, we have now to deal with one of which the terms diminish to zero _at both ends_. The sum of such a series is very approximately zero, each term being neutralized by the halves of its immediate neighbours, which are of the opposite sign. The question of light or darkness then depends upon whether the series begins or ends abruptly. With few exceptions, abruptness can occur only in the presence of the first term, viz. when the secondary wave of least retardation is unobstructed, or when a _ray_ passes through the point under consideration. According to the undulatory theory the light cannot be regarded strictly as travelling along a ray; but the existence of an unobstructed ray implies that the system of Fresnel's zones can be commenced, and, if a large number of these zones are fully developed and do not terminate abruptly, the illumination is unaffected by the neighbourhood of obstacles. Intermediate cases in which a few zones only are formed belong especially to the province of diffraction.
An interesting exception to the general rule that full brightness requires the existence of the first zone occurs when the obstacle assumes the form of a small circular disk parallel to the plane of the incident waves. In the earlier half of the 18th century R. Delisle found that the centre of the circular shadow was occupied by a bright point of light, but the observation passed into oblivion until S. D. Poisson brought forward as an objection to Fresnel's theory that it required at the centre of a circular shadow a point as bright as if no obstacle were intervening. If we conceive the primary wave to be broken up at the plane of the disk, a system of Fresnel's zones can be constructed which begin from the circumference; and the first zone external to the disk plays the part ordinarily taken by the centre of the entire system. The whole effect is the half of that of the first existing zone, and this is sensibly the same as if there were no obstruction.
When light passes through a small circular or annular aperture, the illumination at any point along the axis depends upon the precise relation between the aperture and the distance from it at which the point is taken. If, as in the last paragraph, we imagine a system of zones to be drawn commencing from the inner circular boundary of the aperture, the question turns upon the manner in which the series terminates at the outer boundary. If the aperture be such as to fit exactly an integral number of zones, the aggregate effect may be regarded as the half of those due to the first and last zones. If the number of zones be even, the action of the first and last zones are antagonistic, and there is complete darkness at the point. If on the other hand the number of zones be odd, the effects conspire; and the illumination (proportional to the square of the amplitude) is four times as great as if there were no obstruction at all.
The process of augmenting the resultant illumination at a particular point by stopping some of the secondary rays may be carried much further (Soret, _Pogg. Ann._, 1875, 156, p. 99). By the aid of photography it is easy to prepare a plate, transparent where the zones of odd order fall, and opaque where those of even order fall. Such a plate has the power of a condensing lens, and gives an illumination out of all proportion to what could be obtained without it. An even greater effect (fourfold) can be attained by providing that the stoppage of the light from the alternate zones is replaced by a phase-reversal without loss of amplitude. R. W. Wood (_Phil. Mag._, 1898, 45, p 513) has succeeded in constructing zone plates upon this principle.
In such experiments the narrowness of the zones renders necessary a pretty close approximation to the geometrical conditions. Thus in the case of the circular disk, equidistant (r) from the source of light and from the screen upon which the shadow is observed, the width of the first exterior zone is given by
dx = [lambda](2r)/4(2x),
2x being the diameter of the disk. If 2r = 1000 cm., 2x = 1 cm., [lambda] = 6 × 10^-5 cm., then dx = .0015 cm. Hence, in order that this zone may be perfectly formed, there should be no error in the circumference of the order of .001 cm. (It is easy to see that the radius of the bright spot is of the same order of magnitude.) The experiment succeeds in a dark room of the length above mentioned, with a threepenny bit (supported by three threads) as obstacle, the origin of light being a small needle hole in a plate of tin, through which the sun's rays shine horizontally after reflection from an external mirror. In the absence of a heliostat it is more convenient to obtain a point of light with the aid of a lens of short focus.
The amplitude of the light at any point in the axis, when plane waves are incident perpendicularly upon an annular aperture, is, as above,
cos k(at - r1) - cos k(at - r2) = 2 sin kat sin k(r1 - r2),
r2, r1 being the distances of the outer and inner boundaries from the point in question. It is scarcely necessary to remark that in all such cases the calculation applies in the first instance to homogeneous light, and that, in accordance with Fourier's theorem, each homogeneous component of a mixture may be treated separately. When the original light is white, the presence of some components and the absence of others will usually give rise to coloured effects, variable with the precise circumstances of the case.
Although the matter can be fully treated only upon the basis of a dynamical theory, it is proper to point out at once that there is an element of assumption in the application of Huygens's principle to the calculation of the effects produced by opaque screens of limited extent. Properly applied, the principle could not fail; but, as may readily be proved in the case of sonorous waves, it is not in strictness sufficient to assume the expression for a secondary wave suitable when the primary wave is undisturbed, with mere limitation of the integration to the transparent parts of the screen. But, except perhaps in the case of very fine gratings, it is probable that the error thus caused is insignificant; for the incorrect estimation of the secondary waves will be limited to distances of a few wave-lengths only from the boundary of opaque and transparent parts.
3. _Fraunhofer's Diffraction Phenomena._--A very general problem in diffraction is the investigation of the distribution of light over a screen upon which impinge divergent or convergent spherical waves after passage through various diffracting apertures. When the waves are convergent and the recipient screen is placed so as to contain the centre of convergency--the image of the original radiant point, the calculation assumes a less complicated form. This class of phenomena was investigated by J. von Fraunhofer (upon principles laid down by Fresnel), and are sometimes called after his name. We may conveniently commence with them on account of their simplicity and great importance in respect to the theory of optical instruments.
If f be the radius of the spherical wave at the place of resolution, where the vibration is represented by cos kat, then at any point M (fig. 2) in the recipient screen the vibration due to an element dS of the wave-front is (§ 2)
dS - ------------- sin k(at - [rho]), [lambda][rho]
[rho] being the distance between M and the element dS.
Taking co-ordinates in the plane of the screen with the centre of the wave as origin, let us represent M by [xi], [eta], and P (where dS is situated) by x, y, z. Then
[rho]² = (x - [xi])² + (y - [eta])² + z², f² = x² + y² + z²;
so that
[rho]² = f² - 2x[xi] - 2y[eta] + [xi]² + [eta]².
In the applications with which we are concerned, [xi], [eta] are very small quantities; and we may take
/ x[xi] + y[eta]\ [rho] = f ( 1 - -------------- ). \ f² /
At the same time dS may be identified with dxdy, and in the denominator [rho] may be treated as constant and equal to f. Thus the expression for the vibration at M becomes
_ _ 1 / / / x[xi] + y[eta]\ - ------------- | | sin k ( at - f + -------------- ) dxdy (1); [lambda]²[f]² _/_/ \ f /
and for the intensity, represented by the square of the amplitude,
_ _ _ _ 1 | / / x[xi] + y[eta] |² I² = ------------ | | | sin k -------------- dxdy | [lambda]²f² |_ _/_/ f _| _ _ _ _ 1 | / / x[xi] + y[eta] |² + ----------- | | | cos k -------------- dxdy | (2). [lambda]²f² |_ _/_/ f _|
This expression for the intensity becomes rigorously applicable when f is indefinitely great, so that ordinary optical aberration disappears. The incident waves are thus plane, and are limited to a plane aperture coincident with a wave-front. The integrals are then properly functions of the _direction_ in which the light is to be estimated.
In experiment under ordinary circumstances it makes no difference whether the collecting lens is in front of or behind the diffracting aperture. It is usually most convenient to employ a telescope focused upon the radiant point, and to place the diffracting apertures immediately in front of the object-glass. What is seen through the eye-piece in any case is the same as would be depicted upon a screen in the focal plane.
Before proceeding to special cases it may be well to call attention to some general properties of the solution expressed by (2) (see Bridge, _Phil. Mag._, 1858).
If when the aperture is given, the wave-length (proportional to k^-1) varies, the composition of the integrals is unaltered, provided [xi] and [eta] are taken universely proportional to [lambda]. A diminution of [lambda] thus leads to a simple proportional shrinkage of the diffraction pattern, attended by an augmentation of brilliancy in proportion to [lambda]^-2.
If the wave-length remains unchanged, similar effects are produced by an increase in the scale of the aperture. The linear dimension of the diffraction pattern is inversely as that of the aperture, and the brightness at corresponding points is as the _square_ of the area of aperture.
If the aperture and wave-length increase in the same proportion, the size and shape of the diffraction pattern undergo no change.
We will now apply the integrals (2) to the case of a rectangular aperture of width a parallel to x and of width b parallel to y. The limits of integration for x may thus be taken to be -½a and +½a, and for y to be -½b, +½b. We readily find (with substitution for k of 2[pi]/[lambda])
[pi]a[xi] [pi]b[eta] sin² --------- sin² ---------- a²b² f[lambda] f[lambda] I² = ----------- · ----------------- · --------------- (3), f²[lambda]² [pi]²a²[xi]² [pi]²b²[eta]² ------------ ------------- f²[lambda]² f²[lambda]²
as representing the distribution of light in the image of a mathematical point when the aperture is rectangular, as is often the case in spectroscopes.
The second and third factors of (3) being each of the form sin²u/u², we have to examine the character of this function. It vanishes when u = m[pi], m being any whole number other than zero. When u = 0, it takes the value unity. The maxima occur when
u = tan u, (4),
and then
sin²u/u² = cos²u (5).
To calculate the roots of (5) we may assume
u = (m + ½)[pi] - y = U - y,
where y is a positive quantity which is small when u is large. Substituting this, we find cot y = U - y, whence
1 / y y- \ y³ 2y^5 17y^7 y = - ( 1 + - + -- + ... ) - -- ---- - -----. U \ U U² / 3 15 315
This equation is to be solved by successive approximation. It will readily be found that
2 13 146 u = U - y = U - U^-1 - -- U^-3 - -- U^-5 - --- U^-7 - ... (6). 3 15 105
In the first quadrant there is no root after zero, since tan u > u, and in the second quadrant there is none because the signs of u and tan u are opposite. The first root after zero is thus in the third quadrant, corresponding to m = 1. Even in this case the series converges sufficiently to give the value of the root with considerable accuracy, while for higher values of m it is all that could be desired. The actual values of u/[pi] (calculated in another manner by F. M. Schwerd) are 1.4303, 2.4590, 3.4709, 4.4747, 5.4818, 6.4844, &c.
Since the maxima occur when u = (m + ½)[pi] nearly, the successive values are not very different from
4 4 4 ------, ------, -------, &c. 9[pi]² 25[pi] 49[pi]²
The application of these results to (3) shows that the field is brightest at the centre [xi] = 0, [eta] = 0, viz. at the geometrical image of the radiant point. It is traversed by dark lines whose equations are
[xi] = mf[lambda]/a, [eta] = mf[lambda]/b.
Within the rectangle formed by pairs of consecutive dark lines, and not far from its centre, the brightness rises to a maximum; but these subsequent maxima are in all cases much inferior to the brightness at the centre of the entire pattern ([xi] = 0, [eta] = 0).
By the principle of energy the illumination over the entire focal plane must be equal to that over the diffracting area; and thus, in accordance with the suppositions by which (3) was obtained, its value when integrated from [xi] = [oo] to [xi] = +[oo], and from [eta] = -[oo] to [eta] = +[oo] should be equal to ab. This integration, employed originally by P. Kelland (_Edin. Trans._ 15, p. 315) to determine the absolute intensity of a secondary wave, may be at once effected by means of the known formula
_+[oo] _+[oo] / sin²u / sin u | ----- du = | ----- du = [pi]. _/ u² _/ u -[oo] -[oo]
It will be observed that, while the total intensity is proportional to ab, the intensity at the focal point is proportional to a²b². If the aperture be increased, not only is the total brightness over the focal plane increased with it, but there is also a concentration of the diffraction pattern. The form of (3) shows immediately that, if a and b be altered, the co-ordinates of any characteristic point in the pattern vary as a^-1 and b^-1.
The contraction of the diffraction pattern with increase of aperture is of fundamental importance in connexion with the resolving power of optical instruments. According to common optics, where images are absolute, the diffraction pattern is supposed to be infinitely small, and two radiant points, however near together, form separated images. This is tantamount to an assumption that [lambda] is infinitely small. The actual finiteness of [lambda] imposes a limit upon the separating or resolving power of an optical instrument.
This indefiniteness of images is sometimes said to be due to diffraction by the edge of the aperture, and proposals have even been made for curing it by causing the transition between the interrupted and transmitted parts of the primary wave to be less abrupt. Such a view of the matter is altogether misleading. What requires explanation is not the imperfection of actual images so much as the possibility of their being as good as we find them.
At the focal point ([xi] = 0, [eta] = 0) all the secondary waves agree in phase, and the intensity is easily expressed, whatever be the form of the aperture. From the general formula (2), if A be the _area_ of aperture,
I0² = A²/[lambda]²f² (7).
The formation of a sharp image of the radiant point requires that the illumination become insignificant when [xi], [eta] attain small values, and this insignificance can only arise as a consequence of discrepancies of phase among the secondary waves from various parts of the aperture. So long as there is no sensible discrepancy of phase there can be no sensible diminution of brightness as compared with that to be found at the focal point itself. We may go further, and lay it down that there can be no considerable loss of brightness until the difference of phase of the waves proceeding from the nearest and farthest parts of the aperture amounts to ¼[lambda].
When the difference of phase amounts to [lambda], we may expect the resultant illumination to be very much reduced. In the particular case of a rectangular aperture the course of things can be readily followed, especially if we conceive f to be infinite. In the direction (suppose horizontal) for which [eta] = 0, [xi]/f = sin [theta], the phases of the secondary waves range over a complete period when sin [theta] = [lambda]/a, and, since all parts of the horizontal aperture are equally effective, there is in this direction a complete compensation and consequent absence of illumination. When sin [theta] = 3/2[lambda]/a, the phases range one and a half periods, and there is revival of illumination. We may compare the brightness with that in the direction [theta] = 0. The phase of the resultant amplitude is the same as that due to the central secondary wave, and the discrepancies of phase among the components reduce the amplitude in the proportion
_+3/2[pi] 1 / ----- | cos [phi] d[phi]: 1, 3[pi] _/-3/2[pi]
or -2/3[pi]:1; so that the brightness in this direction is 4/9[pi]² of the maximum at [theta] = 0. In like manner we may find the illumination in any other direction, and it is obvious that it vanishes when sin [theta] is any multiple of [lamba]/a.
The reason of the augmentation of resolving power with aperture will now be evident. The larger the aperture the smaller are the angles through which it is necessary to deviate from the principal direction in order to bring in specified discrepancies of phase--the more concentrated is the image.
In many cases the subject of examination is a luminous line of uniform intensity, the various points of which are to be treated as independent sources of light. If the image of the line be [xi] = 0, the intensity at any point [xi], [eta] of the diffraction pattern may be represented by
[pi]a[xi] _+[oo] sin²--------- / a²b [lambda]f | I²d[eta] = --------- ------------- (8), _/ [lambda]f [pi]²a²[xi]² -[oo] ------------ [lambda]²f²
the same law as obtains for a luminous point when horizontal directions are alone considered. The definition of a fine vertical line, and consequently the resolving power for contiguous vertical lines, is thus _independent of the vertical aperture of the instrument_, a law of great importance in the theory of the spectroscope.
The distribution of illumination in the image of a luminous line is shown by the curve ABC (fig. 3), representing the value of the function sin²u/u² from u = 0 to u = 2[pi]. The part corresponding to negative values of u is similar, OA being a line of symmetry.
Let us now consider the distribution of brightness in the image of a double line whose components are of equal strength, and at such an angular interval that the central line in the image of one coincides with the first zero of brightness in the image of the other. In fig. 3 the curve of brightness for one component is ABC, and for the other OA'C'; and the curve representing half the combined brightnesses is E'BE. The brightness (corresponding to B) midway between the two central points AA' is .8106 of the brightness at the central points themselves. We may consider this to be about the limit of closeness at which there could be any decided appearance of resolution, though doubtless an observer accustomed to his instrument would recognize the duplicity with certainty. The obliquity, corresponding to u = [pi], is such that the phases of the secondary waves range over a complete period, i.e. such that the projection of the horizontal aperture upon this direction is one wave-length. We conclude that a _double line cannot be fairly resolved unless its components subtend an angle exceeding that subtended by the wave-length of light at a distance equal to the horizontal aperture_. This rule is convenient on account of its simplicity; and it is sufficiently accurate in view of the necessary uncertainty as to what exactly is meant by resolution.
If the angular interval between the components of a double line be half as great again as that supposed in the figure, the brightness midway between is .1802 as against 1.0450 at the central lines of each image. Such a falling off in the middle must be more than sufficient for resolution. If the angle subtended by the components of a double line be twice that subtended by the wave-length at a distance equal to the horizontal aperture, the central bands are just clear of one another, and there is a line of absolute blackness in the middle of the combined images.
The resolving power of a telescope with circular or rectangular aperture is easily investigated experimentally. The best object for examination is a grating of fine wires, about fifty to the inch, backed by a sodium flame. The object-glass is provided with diaphragms pierced with round holes or slits. One of these, of width equal, say, to one-tenth of an inch, is inserted in front of the object-glass, and the telescope, carefully focused all the while, is drawn gradually back from the grating until the lines are no longer seen. From a measurement of the maximum distance the least angle between consecutive lines consistent with resolution may be deduced, and a comparison made with the rule stated above.
Merely to show the dependence of resolving power on aperture it is not necessary to use a telescope at all. It is sufficient to look at wire gauze backed by the sky or by a flame, through a piece of blackened cardboard, pierced by a needle and held close to the eye. By varying the distance the point is easily found at which resolution ceases; and the observation is as sharp as with a telescope. The function of the telescope is in fact to allow the use of a wider, and therefore more easily measurable, aperture. An interesting modification of the experiment may be made by using light of various wave-lengths.
Since the limitation of the width of the central band in the image of a luminous line depends upon discrepancies of phase among the secondary waves, and since the discrepancy is greatest for the waves which come from the edges of the aperture, the question arises how far the operation of the central parts of the aperture is advantageous. If we imagine the aperture reduced to two equal narrow slits bordering its edges, compensation will evidently be complete when the projection on an oblique direction is equal to ½[lambda], instead of [lambda] as for the complete aperture. By this procedure the width of the central band in the diffraction pattern is halved, and so far an advantage is attained. But, as will be evident, the bright bands bordering the central band are now not inferior to it in brightness; in fact, a band similar to the central band is reproduced an indefinite number of times, so long as there is no sensible discrepancy of phase in the secondary waves proceeding from the various parts of the _same_ slit. Under these circumstances the narrowing of the band is paid for at a ruinous price, and the arrangement must be condemned altogether.
A more moderate suppression of the central parts is, however, sometimes advantageous. Theory and experiment alike prove that a double line, of which the components are equally strong, is better resolved when, for example, one-sixth of the horizontal aperture is blocked off by a central screen; or the rays quite at the centre may be allowed to pass, while others a little farther removed are blocked off. Stops, each occupying one-eighth of the width, and with centres situated at the points of trisection, answer well the required purpose.
It has already been suggested that the principle of energy requires that the general expression for I² in (2) when integrated over the whole of the plane [xi], [eta] should be equal to A, where A is the area of the aperture. A general analytical verification has been given by Sir G. G. Stokes (_Edin. Trans._, 1853, 20, p. 317). Analytically expressed--
_ _+[oo] _ _ / / / / | | I² d[xi]d[eta] = | | dxdy = A (9). _/_/-[oo] _/_/
We have seen that I0² (the intensity at the focal point) was equal to A²/[lambda]²f². If A' be the area over which the intensity must be I0² in order to give the actual total intensity in accordance with
_ _+[oo] / / A'I0² = | | I² d[xi]d[eta], _/_/-[oo]
the relation between A and A' is AA' = [lambda]²f². Since A' is in some sense the area of the diffraction pattern, it may be considered to be a rough criterion of the definition, and we infer that the definition of a point depends principally upon the area of the aperture, and only in a very secondary degree upon the shape when the area is maintained constant.
4. _Theory of Circular Aperture._--We will now consider the important case where the form of the aperture is circular.
Writing for brevity
k[xi]/f = p, k[eta]/f = q, (1),
we have for the general expression (§ 11) of the intensity
[lambda]²f²I² = S² + C² (2),
where _ _ / / S = | | sin(px + qy)dx dy, (3), _/_/ _ _ / / C = | | cos(px + qy)dx dy, (4). _/_/
When, as in the application to rectangular or circular apertures, the form is symmetrical with respect to the axes both of x and y, S = 0, and C reduces to _ _ / / C = | | cos px cos qy dx dy, (5). _/_/
In the case of the circular aperture the distribution of light is of course symmetrical with respect to the focal point p = 0, q = 0; and C is a function of p and q only through [sqrt](p² + q²). It is thus sufficient to determine the intensity along the axis of p. Putting q = 0, we get _ _ _+R / / / / C = | | cos px dx dy = 2 | cos px \/(R² - x²) dx, _/_/ _/-R
R being the radius of the aperture. This integral is the Bessel's function of order unity, defined by
_[pi] z / J1(z) = ---- | cos(z cos [phi]) sin² [phi] d[phi] (6). [pi] _/0
Thus, if x = R cos [phi],
2J1(pR) C = [pi]²R ------- (7); pR
and the illumination at distance r from the focal point is
/ 2[pi]Rr \ 4J1²( --------- ) [pi]²R^4 \f[lambda]/ I² = ----------- · ----------------- (8). [lambda]²f² / 2[pi]Rr \² ( --------- ) \f[lambda]/
The ascending series for J1(z), used by Sir G. B. Airy (_Camb. Trans._, 1834) in his original investigation of the diffraction of a circular object-glass, and readily obtained from (6), is
z z³ z^5 z^7 J1(z) = - - ---- + ------- - ---------- + ... (9). 2 2²·4 2²·4²·6 2²·4²·6²·8
When z is great, we may employ the semi-convergent series _ / / 2 \ | 3·5·1 /1\² J1(z) = / ( ----- ) sin (z - ¼[pi]) |1 + ------ ( - ) \/ \[pi]z/ |_ 8·16 \z/ _ 3·5·7·9·1·3·5 /1\^4 | - ------------- ( - ) + ... | 8·16·24·32 \z/ _| _ / / 2 \ | 3 1 3·5·7·1·3 /1\ ³ + / ( ----- ) cos (z - ¼[pi]) | - · - - --------- ( - ) \/ \[pi]z/ |_8 z 8·16·24 \z/ _ 3·5·7·9·11·1·3·5·7 /1\^5 | + ------------------ ( - ) - ... | ... (10). 8·16·24·32·40 \z/ _|
A table of the values of 2z^-1J1(z) has been given by E. C. J. Lommel (_Schlömilch_, 1870, 15, p. 166), to whom is due the first systematic application of Bessel's functions to the diffraction integrals.
The illumination vanishes in correspondence with the roots of the equation J1(z) = 0. If these be called z1 z2, z3, ... the radii of the dark rings in the diffraction pattern are
f[lambda]z1 f[lambda]z2 -----------, -----------, ... 2[pi]R 2[pi]R
being thus _inversely_ proportional to R.
The integrations may also be effected by means of polar co-ordinates, taking first the integration with respect to [phi] so as to obtain the result for an infinitely thin annular aperture. Thus, if
x = [rho] cos [phi], y = [rho] sin [phi],
_ _ _R _2[pi] / / / / C = | | cos px dx dy = | | cos (p[rho] cos [theta]) [rho]d[rho] d[theta]. _/_/ _/0 _/0
Now by definition
_½[pi] 2 / z² z^4 z^6 J0(z) = ---- | cos(z cos[theta])d[theta] = -- + ----- - -------- + ... (11). [pi] _/0 2² 2²·4² 2²·4²·6²
The value of C for an annular aperture of radius r and width dr is thus
dC = 2 [pi]J0 (p[rho]) [rho] d[rho], (12).
For the complete circle,
_ pR 2[pi] / C = ----- | J0(z) zdz p² _/0
2[pi] /p²R² p^4 R^4 p^6 R^6 \ = ------ ( ---- - ------- + -------- - ... ) p² \ 2 2²·4² 2²·4²·6² /
2J1(pR) = [pi]R² · ------- as before. pR
In these expressions we are to replace p by k[xi]/f, or rather, since the diffraction pattern is symmetrical, by kr/f, where r is the distance of any point in the focal plane from the centre of the system.
The roots of J0(z) after the first may be found from
z .050561 .053041 .262051 ---- = i - .25 + ------- - --------- + ---------- ... (13), [pi] 4i - 1 (4i - 1)³ (4i - 1)^5
and those of J1(z) from
z .151982 .015399 .245835 ---- = i + .25 - ------- + --------- + ---------- ... (14), [pi] 4i + 1 (4i + 1)³ (4i + 1)^5
formulae derived by Stokes (_Camb. Trans._, 1850, vol. ix.) from the descending series.[1] The following table gives the actual values:--
+---+--------------------+--------------------+ | | z | z | | i | ---- for J0(z) = 0 | ---- for J1(z) = 0 | | | [pi] | [pi] | +---+--------------------+--------------------+ | 1 | 7655 | 1 2197 | | 2 | 1 7571 | 2 2330 | | 3 | 2 7546 | 3 2383 | | 4 | 3 7534 | 4 2411 | | 5 | 4 7527 | 5 2428 | | 6 | 5 7522 | 6 2439 | | 7 | 6 7519 | 7 2448 | | 8 | 7 7516 | 8 2454 | | 9 | 8 7514 | 9 2459 | |10 | 9 7513 | 10 2463 | +---+--------------------+--------------------+
In both cases the image of a mathematical point is thus a symmetrical ring system. The greatest brightness is at the centre, where
dC = 2[pi][rho] d[rho], C = [pi]R².
For a certain distance outwards this remains sensibly unimpaired and then gradually diminishes to zero, as the secondary waves become discrepant in phase. The subsequent revivals of brightness forming the bright rings are necessarily of inferior brilliancy as compared with the central disk.
The first dark ring in the diffraction pattern of the complete circular aperture occurs when
r/f = 1.2197 × [lambda]/2R (15).
We may compare this with the corresponding result for a rectangular aperture of width a,
[xi]/f =[lambda]/a;
and it appears that in consequence of the preponderance of the central parts, the compensation in the case of the circle does not set in at so small an obliquity as when the circle is replaced by a rectangular aperture, whose side is equal to the diameter of the circle.
Again, if we compare the complete circle with a narrow annular aperture of the same radius, we see that in the latter case the first dark ring occurs at a much smaller obliquity, viz.
r/f = .7655 × [lambda]/2R.
It has been found by Sir William Herschel and others that the definition of a telescope is often improved by stopping off a part of the central area of the object-glass; but the advantage to be obtained in this way is in no case great, and anything like a reduction of the aperture to a narrow annulus is attended by a development of the external luminous rings sufficient to outweigh any improvement due to the diminished diameter of the central area.[2]
The maximum brightnesses and the places at which they occur are easily determined with the aid of certain properties of the Bessel's functions. It is known (see SPHERICAL HARMONICS) that
J0'(z) = -J1(z), (16);
1 J2(z) = - J1(z) - J1'(z) (17); z
2 J0(z) + J2(z) = - J1(z) (18). z
The maxima of C occur when
d /J1(z)\ J1'(z) J1(z) -- (-------) = ------ - ----- = 0; dz \ z / z z²
or by 17 when J2(z) = 0. When z has one of the values thus determined,
2 - J1(z) = J0(z). z
The accompanying table is given by Lommel, in which the first column gives the roots of J2(z) = 0, and the second and third columns the corresponding values of the functions specified. If appears that the maximum brightness in the first ring is only about 1/57 of the brightness at the centre.
+-------------------------------------------+ | z 2z^-1 J1(z) 4z^-2 J1²(z) | +-------------------------------------------+ | | | .000000 +1.000000 1.000000 | | 5.135630 - .132279 .017498 | | 8.417236 + .064482 .004158 | | 11.619857 - .040008 .001601 | | 14.795938 + .027919 .000779 | | 17.959820 - .020905 .000437 | +-------------------------------------------+
We will now investigate the total illumination distributed over the area of the circle of radius r. We have
[pi]²R^4 4J1²(z) I^2 = ----------- · ------- (19), [lambda]²f² z²
where
z = 2[pi]Rr/[lambda]f (20).
Thus _ _ _ / [lambda]²f² / / 2[pi] | I²rdr = ----------- | I²zdz = [pi]R²·2 | z^-1 J1²(z)dz. _/ 2[pi]R² _/ _/
Now by (17), (18)
z^-1 J1(z) = J0(z) - J1'(z);
so that
d d z^-1J1²(z) = ½ -- J0² - ½ -- J1²(z), dz dz
and
_z / 2 | z^-1 J1²(z)dz = 1 - J0²(z) - J1²(z) (21). _/0
If r, or z, be infinite, J0(z), J1(z) vanish, and the whole illumination is expressed by [pi]R², in accordance with the general principle. In any case the proportion of the whole illumination to be found outside the circle of radius r is given by
J0²(z) + J1²(z).
For the dark rings J1(z) = 0; so that the fraction of illumination outside any dark ring is simply J0²(z). Thus for the first, second, third and fourth dark rings we get respectively .161, .090, .062, .047, showing that more than 9/10ths of the whole light is concentrated within the area of the second dark ring (_Phil. Mag._, 1881).
When z is great, the descending series (10) gives
2J1(z) 2 / / 2 \ ------ = - / ( ----- ) sin(z - ¼[pi]) (22); z z \/ \[pi]z/
so that the places of maxima and minima occur at equal intervals.
The mean brightness varies as z^-3 (or as r^-3), and the integral found by multiplying it by zdz and integrating between 0 and [oo] converges.
It may be instructive to contrast this with the case of an infinitely narrow annular aperture, where the brightness is proportional to J0²(z). When z is great,
/ 2 J0(z) = \ / ----- cos(z^-¼ [pi]). \/ [pi]z
The mean brightness varies as z^-1; and the integral _ / [oo] | J0²(z)z dz is not convergent. _/ 0
5. _Resolving Power of Telescopes._--The efficiency of a telescope is of course intimately connected with the size of the disk by which it represents a mathematical point. In estimating theoretically the resolving power on a double star we have to consider the illumination of the field due to the superposition of the two independent images. If the angular interval between the components of a double star were equal to twice that expressed in equation (15) above, the central disks of the diffraction patterns would be just in contact. Under these conditions there is no doubt that the star would appear to be fairly resolved, since the brightness of its external ring system is too small to produce any material confusion, unless indeed the components are of very unequal magnitude. The diminution of the star disks with increasing aperture was observed by Sir William Herschel, and in 1823 Fraunhofer formulated the law of inverse proportionality. In investigations extending over a long series of years, the advantage of a large aperture in separating the components of close double stars was fully examined by W. R. Dawes.
The resolving power of telescopes was investigated also by J. B. L. Foucault, who employed a scale of equal bright and dark alternate parts; it was found to be proportional to the aperture and independent of the focal length. In telescopes of the best construction and of moderate aperture the performance is not sensibly prejudiced by outstanding aberration, and the limit imposed by the finiteness of the waves of light is practically reached. M. E. Verdet has compared Foucault's results with theory, and has drawn the conclusion that the radius of the visible part of the image of a luminous point was equal to half the radius of the first dark ring.
The application, unaccountably long delayed, of this principle to the microscope by H. L. F. Helmholtz in 1871 is the foundation of the important doctrine of the _microscopic limit_. It is true that in 1823 Fraunhofer, inspired by his observations upon gratings, had very nearly hit the mark.[3] And a little before Helmholtz, E. Abbe published a somewhat more complete investigation, also founded upon the phenomena presented by gratings. But although the argument from gratings is instructive and convenient in some respects, its use has tended to obscure the essential unity of the principle of the limit of resolution whether applied to telescopes or microscopes.
In fig. 4, AB represents the axis of an optical instrument (telescope or microscope), A being a point of the object and B a point of the image. By the operation of the object-glass LL' all the rays issuing from A arrive in the same phase at B. Thus if A be self-luminous, the illumination is a maximum at B, where all the secondary waves agree in phase. B is in fact the centre of the diffraction disk which constitutes the image of A. At neighbouring points the illumination is less, in consequence of the discrepancies of phase which there enter. In like manner if we take a neighbouring point P, also self-luminous, in the plane of the object, the waves which issue from it will arrive at B with phases no longer absolutely concordant, and the discrepancy of phase will increase as the interval AP increases. When the interval is very small the discrepancy, though mathematically existent, produces no practical effect; and the illumination at B due to P is as important as that due to A, the intensities of the two luminous sources being supposed equal. Under these conditions it is clear that A and P are not separated in the image. The question is to what amount must the distance AP be increased in order that the difference of situation may make itself felt in the image. This is necessarily a question of degree; but it does not require detailed calculations in order to show that the discrepancy first becomes conspicuous when the phases corresponding to the various secondary waves which travel from P to B range over a complete period. The illumination at B due to P then becomes comparatively small, indeed for some forms of aperture evanescent. The extreme discrepancy is that between the waves which travel through the outermost parts of the object-glass at L and L'; so that if we adopt the above standard of resolution, the question is where must P be situated in order that the relative retardation of the rays PL and PL' may on their arrival at B amount to a wave-length ([lambda]). In virtue of the general law that the reduced optical path is stationary in value, this retardation may be calculated without allowance for the different paths pursued on the farther side of L, L', so that the value is simply PL - PL'. Now since AP is very small, AL' - PL' = AP sin [alpha], where [alpha] is the angular semi-aperture L'AB. In like manner PL - AL has the same value, so that
PL - PL' = 2AP sin [alpha].
According to the standard adopted, the condition of resolution is therefore that AP, or [epsilon], should exceed ½[lambda]/sin [alpha]. If [epsilon] be less than this, the images overlap too much; while if [epsilon] greatly exceed the above value the images become unnecessarily separated.
In the above argument the whole space between the object and the lens is supposed to be occupied by matter of one refractive index, and [lambda] represents the wave-length _in this medium_ of the kind of light employed. If the restriction as to uniformity be violated, what we have ultimately to deal with is the wave-length in the medium immediately surrounding the object.
Calling the refractive index [mu], we have as the critical value of [epsilon],
[epsilon] = ½[lambda]0/[mu] sin[alpha], (1),
[lambda]0 being the wave-length _in vacuo_. The denominator [mu] sin [alpha] is the quantity well known (after Abbe) as the "numerical aperture."
The extreme value possible for [alpha] is a right angle, so that for the microscopic limit we have
[epsilon] = ½[lambda]0/[mu] (2).
The limit can be depressed only by a diminution in [lambda]0, such as photography makes possible, or by an increase in [mu], the refractive index of the medium in which the object is situated.
The statement of the law of resolving power has been made in a form appropriate to the microscope, but it admits also of immediate application to the telescope. If 2R be the diameter of the object-glass and D the distance of the object, the angle subtended by AP is [epsilon]/D, and the angular resolving power is given by
[lambda]/2D sin[alpha] = [lambda]/2R (3).
This method of derivation (substantially due to Helmholtz) makes it obvious that there is no essential difference of principle between the two cases, although the results are conveniently stated in different forms. In the case of the telescope we have to deal with a linear measure of aperture and an angular limit of resolution, whereas in the case of the microscope the limit of resolution is linear, and it is expressed in terms of angular aperture.
It must be understood that the above argument distinctly assumes that the different parts of the object are self-luminous, or at least that the light proceeding from the various points is without phase relations. As has been emphasized by G. J. Stoney, the restriction is often, perhaps usually, violated in the microscope. A different treatment is then necessary, and for some of the problems which arise under this head the method of Abbe is convenient.
The importance of the general conclusions above formulated, as imposing a limit upon our powers of direct observation, can hardly be overestimated; but there has been in some quarters a tendency to ascribe to it a more precise character than it can bear, or even to mistake its meaning altogether. A few words of further explanation may therefore be desirable. The first point to be emphasized is that nothing whatever is said as to the smallness of a single object that may be made visible. The eye, unaided or armed with a telescope, is able to see, as points of light, stars subtending no sensible angle. The visibility of a star is a question of brightness simply, and has nothing to do with resolving power. The latter element enters only when it is a question of recognizing the duplicity of a double star, or of distinguishing detail upon the surface of a planet. So in the microscope there is nothing except lack of light to hinder the visibility of an object however small. But if its dimensions be much less than the half wave-length, it can only be seen as a whole, and its parts cannot be distinctly separated, although in cases near the border line some inference may be possible, founded upon experience of what appearances are presented in various cases. Interesting observations upon particles, _ultra-microscopic_ in the above sense, have been recorded by H. F. W. Siedentopf and R. A. Zsigmondy (_Drude's Ann._, 1903, 10, p. 1).
In a somewhat similar way a dark linear interruption in a bright ground may be visible, although its actual width is much inferior to the half wave-length. In illustration of this fact a simple experiment may be mentioned. In front of the naked eye was held a piece of copper foil perforated by a fine needle hole. Observed through this the structure of some wire gauze just disappeared at a distance from the eye equal to 17 in., the gauze containing 46 meshes to the inch. On the other hand, a single wire 0.034 in. in diameter remained fairly visible up to a distance of 20 ft. The ratio between the limiting angles subtended by the periodic structure of the gauze and the diameter of the wire was (.022/.034) × (240/17) = 9.1. For further information upon this subject reference may be made to _Phil. Mag._, 1896, 42, p. 167; _Journ. R. Micr. Soc._, 1903, p. 447.
6. _Coronas or Glories._--The results of the theory of the diffraction patterns due to circular apertures admit of an interesting application to _coronas_, such as are often seen encircling the sun and moon. They are due to the interposition of small spherules of water, which act the part of diffracting obstacles. In order to the formation of a well-defined corona it is essential that the particles be exclusively, or preponderatingly, of one size.
If the origin of light be treated as infinitely small, and be seen in focus, whether with the naked eye or with the aid of a telescope, the whole of the light in the absence of obstacles would be concentrated in the immediate neighbourhood of the focus. At other parts of the field the effect is the same, in accordance with the principle known as Babinet's, whether the imaginary screen in front of the object-glass is generally transparent but studded with a number of opaque circular disks, or is generally opaque but perforated with corresponding apertures. Since at these points the resultant due to the whole aperture is zero, any two portions into which the whole may be divided must give equal and opposite resultants. Consider now the light diffracted in a direction many times more oblique than any with which we should be concerned, were the whole aperture uninterrupted, and take first the effect of a single small aperture. The light in the proposed direction is that determined by the size of the small aperture in accordance with the laws already investigated, and its phase depends upon the position of the aperture. If we take a direction such that the light (of given wave-length) from a single aperture vanishes, the evanescence continues even when the whole series of apertures is brought into contemplation. Hence, whatever else may happen, there must be a system of dark rings formed, the same as from a single small aperture. In directions other than these it is a more delicate question how the partial effects should be compounded. If we make the extreme suppositions of an infinitely small source and absolutely homogeneous light, there is no escape from the conclusion that the light in a definite direction is arbitrary, that is, dependent upon the chance distribution of apertures. If, however, as in practice, the light be heterogeneous, the source of finite area, the obstacles in motion, and the discrimination of different directions imperfect, we are concerned merely with the mean brightness found by varying the arbitrary phase-relations, and this is obtained by simply multiplying the brightness due to a single aperture by the number of apertures (n) (see INTERFERENCE OF LIGHT, § 4). The diffraction pattern is therefore that due to a single aperture, merely brightened n times.
In his experiments upon this subject Fraunhofer employed plates of glass dusted over with lycopodium, or studded with small metallic disks of uniform size; and he found that the diameters of the rings were proportional to the length of the waves and inversely as the diameter of the disks.
In another respect the observations of Fraunhofer appear at first sight to be in disaccord with theory; for his measures of the diameters of the red rings, visible when white light was employed, correspond with the law applicable to dark rings, and not to the different law applicable to the luminous maxima. Verdet has, however, pointed out that the observation in this form is essentially different from that in which homogeneous red light is employed, and that the position of the red rings would correspond to the _absence_ of blue-green light rather than to the greatest abundance of red light. Verdet's own observations, conducted with great care, fully confirm this view, and exhibit a complete agreement with theory.
By measurements of coronas it is possible to infer the size of the particles to which they are due, an application of considerable interest in the case of natural coronas--the general rule being the larger the corona the smaller the water spherules. Young employed this method not only to determine the diameters of cloud particles (e.g. 1/1000 in.), but also those of fibrous material, for which the theory is analogous. His instrument was called the _eriometer_ (see "Chromatics," vol. iii. of supp. to _Ency. Brit._, 1817).
7. _Influence of Aberration. Optical Power of Instruments._--Our investigations and estimates of resolving power have thus far proceeded upon the supposition that there are no optical imperfections, whether of the nature of a regular aberration or dependent upon irregularities of material and workmanship. In practice there will always be a certain aberration or error of phase, which we may also regard as the deviation of the actual wave-surface from its intended position. In general, we may say that aberration is unimportant when it nowhere (or at any rate over a relatively small area only) exceeds a small fraction of the wave-length ([lamda]). Thus in estimating the intensity at a focal point, where, in the absence of aberration, all the secondary waves would have exactly the same phase, we see that an aberration nowhere exceeding ¼[lambda] can have but little effect.
The only case in which the influence of small aberration upon the entire image has been calculated (_Phil. Mag._, 1879) is that of a rectangular aperture, traversed by a cylindrical wave with aberration equal to cx³. The aberration is here unsymmetrical, the wave being in advance of its proper place in one half of the aperture, but behind in the other half. No terms in x or x² need be considered. The first would correspond to a general turning of the beam; and the second would imply imperfect focusing of the central parts. The effect of aberration may be considered in two ways. We may suppose the aperture (a) constant, and inquire into the operation of an increasing aberration; or we may take a given value of c (i.e. a given wave-surface) and examine the effect of a varying aperture. The results in the second case show that an increase of aperture up to that corresponding to an extreme aberration of half a period has no ill effect upon the central band (§ 3), but it increases unduly the intensity of one of the neighbouring lateral bands; and the practical conclusion is that the best results will be obtained from an aperture giving an extreme aberration of from a quarter to half a period, and that with an increased aperture aberration is not so much a direct cause of deterioration as an obstacle to the attainment of that improved definition which should accompany the increase of aperture.
If, on the other hand, we suppose the aperture given, we find that aberration begins to be distinctly mischievous when it amounts to about a quarter period, i.e. when the wave-surface deviates at each end by a quarter wave-length from the true plane.
As an application of this result, let us investigate what amount of temperature disturbance in the tube of a telescope may be expected to impair definition. According to J. B. Biot and F. J. D. Arago, the index [mu] for air at t° C. and at atmospheric pressure is given by
.00029 [mu] - 1 = -----------. 1 + .0037 t
If we take 0° C. as standard temperature,
[delta][mu] = -1.1 × 10^-6.
Thus, on the supposition that the irregularity of temperature t extends through a length l, and produces an acceleration of a quarter of a wave-length,
¼[lambda] = 1.1 lt × 10^-6;
or, if we take [lambda] = 5.3 × 10^-5,
lt = 12,
the unit of length being the centimetre.
We may infer that, in the case of a telescope tube 12 cm. long, a stratum of air heated 1° C. lying along the top of the tube, and occupying a moderate fraction of the whole volume, would produce a not insensible effect. If the change of temperature progressed uniformly from one side to the other, the result would be a lateral displacement of the image without loss of definition; but in general both effects would be observable. In longer tubes a similar disturbance would be caused by a proportionally less difference of temperature. S. P. Langley has proposed to obviate such ill-effects by stirring the air included within a telescope tube. It has long been known that the definition of a carbon bisulphide prism may be much improved by a vigorous shaking.
We will now consider the application of the principle to the formation of images, unassisted by reflection or refraction (_Phil. Mag._, 1881). The function of a lens in forming an image is to compensate by its variable thickness the differences of phase which would otherwise exist between secondary waves arriving at the focal point from various parts of the aperture. If we suppose the diameter of the lens to be given (2R), and its focal length f gradually to increase, the original differences of phase at the image of an infinitely distant luminous point diminish without limit. When f attains a certain value, say f1, the extreme error of phase to be compensated falls to ¼[lambda]. But, as we have seen, such an error of phase causes no sensible deterioration in the definition; so that from this point onwards the lens is useless, as only improving an image already sensibly as perfect as the aperture admits of. Throughout the operation of increasing the focal length, the resolving power of the instrument, which depends only upon the aperture, remains unchanged; and we thus arrive at the rather startling conclusion that a telescope of any degree of resolving power might be constructed without an object-glass, if only there were no limit to the admissible focal length. This last proviso, however, as we shall see, takes away almost all practical importance from the proposition.
To get an idea of the magnitudes of the quantities involved, let us take the case of an aperture of 1/5 in., about that of the pupil of the eye. The distance f1, which the actual focal length must exceed, is given by
/ \/ (f1² + R²) - f1 = ¼[lambda];
so that
f1 = 2R²/[lambda] (1).
Thus, if [lambda] = 1/4000, R = 1/10, we find
f1 = 800 inches.
The image of the sun thrown upon a screen at a distance exceeding 66 ft., through a hole 1/5 in. in diameter, is therefore at least as well defined as that seen direct.
As the minimum focal length increases with the square of the aperture, a quite impracticable distance would be required to rival the resolving power of a modern telescope. Even for an aperture of 4 in., f1 would have to be 5 miles.
A similar argument may be applied to find at what point an achromatic lens becomes sensibly superior to a single one. The question is whether, when the adjustment of focus is correct for the central rays of the spectrum, the error of phase for the most extreme rays (which it is necessary to consider) amounts to a quarter of a wave-length. If not, the substitution of an achromatic lens will be of no advantage. Calculation shows that, if the aperture be 1/5 in., an achromatic lens has no sensible advantage if the focal length be greater than about 11 in. If we suppose the focal length to be 66 ft., a single lens is practically perfect up to an aperture of 1.7 in.
Another obvious inference from the necessary imperfection of optical images is the uselessness of attempting anything like an absolute destruction of spherical aberration. An admissible error of phase of ¼[lambda] will correspond to an error of 1/8[lambda] in a reflecting and ½[lambda] in a (glass) refracting surface, the incidence in both cases being perpendicular. If we inquire what is the greatest admissible longitudinal aberration ([delta]f) in an object-glass according to the above rule, we find
[delta]f = [lambda][alpha]^-2 (2),
[alpha] being the angular semi-aperture.
In the case of a single lens of glass with the most favourable curvatures, [delta]f is about equal to [alpha]²f, so that [alpha]^4 must not exceed [lambda]/f. For a lens of 3 ft. focus this condition is satisfied if the aperture does not exceed 2 in.
When parallel rays fall directly upon a spherical mirror the longitudinal aberration is only about one-eighth as great as for the most favourably shaped single lens of equal focal length and aperture. Hence a spherical mirror of 3 ft. focus might have an aperture of 2½ in., and the image would not suffer materially from aberration.
On the same principle we may estimate the least visible displacement of the eye-piece of a telescope focused upon a distant object, a question of interest in connexion with range-finders. It appears (_Phil. Mag._, 1885, 20, p. 354) that a displacement [delta]f from the true focus will not sensibly impair definition, provided
[delta]f < f²[lambda]/R² (3),
2R being the diameter of aperture. The linear accuracy required is thus a function of the _ratio_ of aperture to focal length. The formula agrees well with experiment.
The principle gives an instantaneous solution of the question of the ultimate optical efficiency in the method of "mirror-reading," as commonly practised in various physical observations. A rotation by which one edge of the mirror advances ¼[lambda] (while the other edge retreats to a like amount) introduces a phase-discrepancy of a whole period where before the rotation there was complete agreement. A rotation of this amount should therefore be easily visible, but the limits of resolving power are being approached; and the conclusion is independent of the focal length of the mirror, and of the employment of a telescope, provided of course that the reflected image is seen in focus, and that the full width of the mirror is utilized.
A comparison with the method of a material pointer, attached to the parts whose rotation is under observation, and viewed through a microscope, is of interest. The limiting efficiency of the microscope is attained when the angular aperture amounts to 180°; and it is evident that a lateral displacement of the point under observation through ½[lambda] entails (at the old image) a phase-discrepancy of a whole period, one extreme ray being accelerated and the other retarded by half that amount. We may infer that the limits of efficiency in the two methods are the same when the length of the pointer is equal to the width of the mirror.
[Illustraton: FIG. 5.]
We have seen that in perpendicular reflection a surface error not exceeding 1/8[lambda] may be admissible. In the case of oblique reflection at an angle [phi], the error of retardation due to an elevation BD (fig. 5) is
QQ' - QS = BD sec [phi](1 - cos SQQ') = BD sec [phi] (1 + cos 2[phi]) = 2BD cos [phi];
from which it follows that an error of given magnitude in the figure of a surface is less important in oblique than in perpendicular reflection. It must, however, be borne in mind that errors can sometimes be compensated by altering adjustments. If a surface intended to be flat is affected with a slight general curvature, a remedy may be found in an alteration of focus, and the remedy is the less complete as the reflection is more oblique.
The formula expressing the optical power of prismatic spectroscopes may readily be investigated upon the principles of the wave theory. Let A0B0 be a plane wave-surface of the light before it falls upon the prisms, AB the corresponding wave-surface for a particular part of the spectrum after the light has passed the prisms, or after it has passed the eye-piece of the observing telescope. The path of a ray from the wave-surface A0B0 to A or B is determined by the condition that the optical distance, [int] [mu]ds, is a minimum; and, as AB is by supposition a wave-surface, this optical distance is the same for both points. Thus _ _ / / | [mu]ds (for A) = | [mu]ds (for B) (4). _/ _/
We have now to consider the behaviour of light belonging to a neighbouring part of the spectrum. The path of a ray from the wave-surface A0B0 to the point A is changed; but in virtue of the minimum property the change may be neglected in calculating the optical distance, as it influences the result by quantities of the second order only in the changes of refrangibility. Accordingly, the optical distance from A0B0 to A is represented by [int]([mu] + [delta][mu])ds, the integration being along the original path A0 ... A; and similarly the optical distance between A0B0 and B is represented by [int] ([mu] + [delta][mu])ds, the integration being along B0 ... B. In virtue of (4) the difference of the optical distances to A and B is _ _ / / | [delta][mu]ds (along B0 ... B) - | [delta][mu]ds (along A0 ... A) (5). _/ _/
The new wave-surface is formed in such a position that the optical distance is constant; and therefore the _dispersion_, or the angle through which the wave-surface is turned by the change of refrangibility, is found simply by dividing (5) by the distance AB. If, as in common flint-glass spectroscopes, there is only one dispersing substance, [int] [delta][mu] ds = [delta][mu]·s, where s is simply the thickness traversed by the ray. If t2 and t1 be the thicknesses traversed by the extreme rays, and a denote the width of the emergent beam, the dispersion [theta] is given by
[theta] = [delta][mu](t2 - t1)/a,
or, if t1 be negligible,
[theta] = [delta][mu]t/a (6).
The condition of resolution of a double line whose components subtend an angle [theta] is that [theta] must exceed [lambda]/a. Hence, in order that a double line may be resolved whose components have indices [mu] and [mu] + [delta][mu], it is necessary that t should exceed the value given by the following equation:--
t = [lambda]/[delta][mu] (7).
8. _Diffraction Gratings._--Under the heading "Colours of Striated Surfaces," Thomas Young (_Phil. Trans._, 1802) in his usual summary fashion gave a general explanation of these colours, including the law of sines, the striations being supposed to be straight, parallel and equidistant. Later, in his article "Chromatics" in the supplement to the 5th edition of this encyclopaedia, he shows that the colours "lose the mixed character of periodical colours, and resemble much more the ordinary prismatic spectrum, with intervals completely dark interposed," and explains it by the consideration that any phase-difference which may arise at neighbouring striae is multiplied in proportion to the total number of striae.
The theory was further developed by A. J. Fresnel (1815), who gave a formula equivalent to (5) below. But it is to J. von Fraunhofer that we owe most of our knowledge upon this subject. His recent discovery of the "fixed lines" allowed a precision of observation previously impossible. He constructed gratings up to 340 periods to the inch by straining fine wire over screws. Subsequently he ruled gratings on a layer of gold-leaf attached to glass, or on a layer of grease similarly supported, and again by attacking the glass itself with a diamond point. The best gratings were obtained by the last method, but a suitable diamond point was hard to find, and to preserve. Observing through a telescope with light perpendicularly incident, he showed that the position of any ray was dependent only upon the grating interval, viz. the distance from the centre of one wire or line to the centre of the next, and not otherwise upon the thickness of the wire and the magnitude of the interspace. In different gratings the lengths of the spectra and their distances from the axis were inversely proportional to the grating interval, while with a given grating the distances of the various spectra from the axis were as 1, 2, 3, &c. To Fraunhofer we owe the first accurate measurements of wave-lengths, and the method of separating the overlapping spectra by a prism dispersing in the perpendicular direction. He described also the complicated patterns seen when a point of light is viewed through two superposed gratings, whose lines cross one another perpendicularly or obliquely. The above observations relate to transmitted light, but Fraunhofer extended his inquiry to the light _reflected_. To eliminate the light returned from the hinder surface of an engraved grating, he covered it with a black varnish. It then appeared that under certain angles of incidence parts of the resulting spectra were _completely polarized_. These remarkable researches of Fraunhofer, carried out in the years 1817-1823, are republished in his _Collected Writings_ (Munich, 1888).
The principle underlying the action of gratings is identical with that discussed in § 2, and exemplified in J. L. Soret's "zone plates." The alternate Fresnel's zones are blocked out or otherwise modified; in this way the original compensation is upset and a revival of light occurs in unusual directions. If the source be a point or a line, and a collimating lens be used, the incident waves may be regarded as plane. If, further, on leaving the grating the light be received by a focusing lens, e.g. the object-glass of a telescope, the Fresnel's zones are reduced to parallel and equidistant straight strips, which at certain angles coincide with the ruling. The directions of the lateral spectra are such that the passage from one element of the grating to the corresponding point of the next implies a retardation of an integral number of wave-lengths. If the grating be composed of alternate transparent and opaque parts, the question may be treated by means of the general integrals (§ 3) by merely limiting the integration to the transparent parts of the aperture. For an investigation upon these lines the reader is referred to Airy's _Tracts_, to Verdet's _Leçons_, or to R. W. Wood's _Physical Optics_. If, however, we assume the theory of a simple rectangular aperture (§ 3); the results of the ruling can be inferred by elementary methods, which are perhaps more instructive.
Apart from the ruling, we know that the image of a mathematical line will be a series of narrow bands, of which the central one is by far the brightest. At the middle of this band there is complete agreement of phase among the secondary waves. The dark lines which separate the bands are the places at which the phases of the secondary wave range over an integral number of periods. If now we suppose the aperture AB to be covered by a great number of opaque strips or bars of width d, separated by transparent intervals of width a, the condition of things in the directions just spoken of is not materially changed. At the central point there is still complete agreement of phase; but the amplitude is diminished in the ratio of a : a + d. In another direction, making a small angle with the last, such that the projection of AB upon it amounts to a few wave-lengths, it is easy to see that the mode of interference is the same as if there were no ruling. For example, when the direction is such that the projection of AB upon it amounts to one wave-length, the elementary components neutralize one another, because their phases are distributed symmetrically, though discontinuously, round the entire period. The only effect of the ruling is to diminish the amplitude in the ratio a : a + d; and, except for the difference in illumination, the appearance of a line of light is the same as if the aperture were perfectly free.
The lateral (spectral) images occur in such directions that the projection of the element (a + d) of the grating upon them is an exact multiple of [lambda]. The effect of each of the n elements of the grating is then the same; and, unless this vanishes on account of a particular adjustment of the ratio a : d, the resultant amplitude becomes comparatively very great. These directions, in which the retardation between A and B is exactly mn[lambda], may be called the principal directions. On either side of any one of them the illumination is distributed according to the same law as for the central image (m = 0), vanishing, for example, when the retardation amounts to (mn ± 1)[lambda]. In considering the relative brightnesses of the different spectra, it is therefore sufficient to attend merely to the principal directions, provided that the whole deviation be not so great that its cosine differs considerably from unity.
We have now to consider the amplitude due to a single element, which we may conveniently regard as composed of a transparent part a bounded by two opaque parts of width ½d. The phase of the resultant effect is by symmetry that of the component which comes from the middle of a. The fact that the other components have phases differing from this by amounts ranging between ± am[pi]/(a + d) causes the resultant amplitude to be less than for the central image (where there is complete phase agreement). If Bm denote the brightness of the m^th lateral image, and B0 that of the central image, we have
_ _+ am[pi]/(a + d) _ | / 2am[pi] |² /a + d \² am[pi] B_m : B0 = | | cosx dx ÷ ------- | = ( ------ ) sin² ------ (1). |_ _/ a + d _| \am[pi]/ a + d -am[pi]/(a + d)
If B denotes the brightness of the central image when the whole of the space occupied by the grating is transparent, we have
B0 : B = a² : (a + d)²,
and thus
1 am[pi] Bm : B = ------- sin² ------ (2). m²[pi]² a + d
The sine of an angle can never be greater than unity; and consequently under the most favourable circumstances only 1/m²[pi]² of the original light can be obtained in the m^th spectrum. We conclude that, with a grating composed of transparent and opaque parts, the utmost light obtainable in any one spectrum is in the first, and there amounts to 1/[pi]², or about 1/10, and that for this purpose a and d must be equal. When d = a the general formula becomes
sin² ½m[pi] Bm : B = ----------- (3), m²[pi]²
showing that, when m is even, Bm vanishes, and that, when m is odd,
Bm : B = 1/m²[pi]².
The third spectrum has thus only 1/9 of the brilliancy of the first.
Another particular case of interest is obtained by supposing a small relatively to (a + d). Unless the spectrum be of very high order, we have simply
Bm : B = a/(a + d)² (4);
so that the brightnesses of all the spectra are the same.
The light stopped by the opaque parts of the grating, together with that distributed in the central image and lateral spectra, ought to make up the brightness that would be found in the central image, were all the apertures transparent. Thus, if a = d, we should have
1 1 2 / 1 1 \ 1 = - + - + ----- ( 1 + - + -- + ... ), 2 4 [pi]² \ 9 25 /
which is true by a known theorem. In the general case
___m=[oo] a / a \² 2 \ 1 /m[pi]a\ ----- = ( ----- ) + ----- > -- sin²( ------ ), a + d \a + d/ [pi]² /__ m² \ a + d/ m=1
a formula which may be verified by Fourier's theorem.
According to a general principle formulated by J. Babinet, the brightness of a lateral spectrum is not affected by an interchange of the transparent and opaque parts of the grating. The vibrations corresponding to the two parts are precisely antagonistic, since if both were operative the resultant would be zero. So far as the application to gratings is concerned, the same conclusion may be derived from (2).
From the value of Bm : B0 we see that no lateral spectrum can surpass the central image in brightness; but this result depends upon the hypothesis that the ruling acts by opacity, which is generally very far from being the case in practice. In an engraved glass grating there is no opaque material present by which light could be absorbed, and the effect depends upon a difference of retardation in passing the alternate parts. It is possible to prepare gratings which give a lateral spectrum brighter than the central image, and the explanation is easy. For if the alternate parts were equal and alike transparent, but so constituted as to give a relative retardation of ½[lambda], it is evident that the central image would be entirely extinguished, while the first spectrum would be four times as bright as if the alternate parts were opaque. If it were possible to introduce at every part of the aperture of the grating an arbitrary retardation, all the light might be concentrated in any desired spectrum. By supposing the retardation to vary uniformly and continuously we fall upon the case of an ordinary prism: but there is then no diffraction spectrum in the usual sense. To obtain such it would be necessary that the retardation should gradually alter by a wave-length in passing over any element of the grating, and then fall back to its previous value, thus springing suddenly over a wave-length (_Phil. Mag._, 1874, 47, p. 193). It is not likely that such a result will ever be fully attained in practice; but the case is worth stating, in order to show that there is no theoretical limit to the concentration of light of assigned wave-length in one spectrum, and as illustrating the frequently observed unsymmetrical character of the spectra on the two sides of the central image.[4]
We have hitherto supposed that the light is incident perpendicularly upon the grating; but the theory is easily extended. If the incident rays make an angle [theta] with the normal (fig. 6), and the diffracted rays make an angle [phi] (upon the same side), the relative retardation from each element of width (a + d) to the next is (a + d) (sin[theta] + sin[phi]); and this is the quantity which is to be equated to m[lambda]. Thus
sin[theta] + sin[phi] = 2 sin ½([theta] + [phi]) cos ½([theta] - [phi]) = m[lambda]/(a + d) (5).
The "deviation" is ([theta] + [phi]), and is therefore a minimum when [theta] = [phi], i.e. when the grating is so situated that the angles of incidence and diffraction are equal.
In the case of a reflection grating the same method applies. If [theta] and [phi] denote the angles with the normal made by the incident and diffracted rays, the formula (5) still holds, and, if the deviation be reckoned from the direction of the regularly reflected rays, it is expressed as before by ([theta] + [phi]), and is a minimum when [theta] = [phi], that is, when the diffracted rays return upon the course of the incident rays.
In either case (as also with a prism) the position of minimum deviation leaves the width of the beam unaltered, i.e. neither magnifies nor diminishes the angular width of the object under view.
From (5) we see that, when the light falls perpendicularly upon a grating ([theta] = 0), there is no spectrum formed (the image corresponding to m = 0 not being counted as a spectrum), if the grating interval [sigma] or (a + d) is less than [lambda]. Under these circumstances, if the material of the grating be completely transparent, the whole of the light must appear in the direct image, and the ruling is not perceptible. From the absence of spectra Fraunhofer argued that there must be a microscopic limit represented by [lambda]; and the inference is plausible, to say the least (_Phil. Mag._, 1886). Fraunhofer should, however, have fixed the microscopic limit at ½[lambda], as appears from (5), when we suppose [theta] = ½[pi], [phi] = ½[pi].
We will now consider the important subject of the resolving power of gratings, as dependent upon the number of lines (n) and the order of the spectrum observed (m). Let BP (fig. 8) be the direction of the principal maximum (middle of central band) for the wave-length [lambda] in the m^th spectrum. Then the relative retardation of the extreme rays (corresponding to the edges A, B of the grating) is mn[lambda]. If BQ be the direction for the first minimum (the darkness between the central and first lateral band), the relative retardation of the extreme rays is (mn + 1)[lambda]. Suppose now that [lambda] + [delta][lambda] is the wave-length for which BQ gives the principal maximum, then
(mn + 1)[lambda] = mn([lambda] + [delta][lambda]);
whence
[delta][lambda]/[lambda] = 1/mn (6).
According to our former standard, this gives the smallest difference of wave-lengths in a double line which can be just resolved; and we conclude that the resolving power of a grating depends only upon the total number of lines, and upon the order of the spectrum, without regard to any other considerations. It is here of course assumed that the n lines are really utilized.
In the case of the D lines the value of [delta][lambda]/[lambda] is about 1/1000; so that to resolve this double line in the first spectrum requires 1000 lines, in the second spectrum 500, and so on.
It is especially to be noticed that the resolving power does not depend directly upon the closeness of the ruling. Let us take the case of a grating 1 in. broad, and containing 1000 lines, and consider the effect of interpolating an additional 1000 lines, so as to bisect the former intervals. There will be destruction by interference of the first, third and odd spectra generally; while the advantage gained in the spectra of even order is not in dispersion, nor in resolving power, but simply in brilliancy, which is increased four times. If we now suppose half the grating cut away, so as to leave 1000 lines in half an inch, the dispersion will not be altered, while the brightness and resolving power are halved.
There is clearly no theoretical limit to the resolving power of gratings, even in spectra of given order. But it is possible that, as suggested by Rowland,[5] the structure of natural spectra may be too coarse to give opportunity for resolving powers much higher than those now in use. However this may be, it would always be possible, with the aid of a grating of given resolving power, to construct artificially from white light mixtures of slightly different wave-length whose resolution or otherwise would discriminate between powers inferior and superior to the given one.[6]
If we define as the "dispersion" in a particular part of the spectrum the ratio of the angular interval d[theta] to the corresponding increment of wave-length d[lambda], we may express it by a very simple formula. For the alteration of wave-length entails, at the two limits of a diffracted wave-front, a relative retardation equal to mnd[lambda]. Hence, if a be the width of the diffracted beam, and d[theta] the angle through which the wave-front is turned,
ad[theta] = mn d[lambda],
or dispersion = mn/a (7).
The resolving power and the width of the emergent beam fix the optical character of the instrument. The latter element must eventually be decreased until less than the diameter of the pupil of the eye. Hence a wide beam demands treatment with further apparatus (usually a telescope) of high magnifying power.
In the above discussion it has been supposed that the ruling is accurate, and we have seen that by increase of m a high resolving power is attainable with a moderate number of lines. But this procedure (apart from the question of illumination) is open to the objection that it makes excessive demands upon accuracy. According to the principle already laid down it can make but little difference in the principal direction corresponding to the first spectrum, provided each line lie within a quarter of an interval (a + d) from its theoretical position. But, to obtain an equally good result in the m^th spectrum, the error must be less than 1/m of the above amount.[7]
There are certain errors of a systematic character which demand special consideration. The spacing is usually effected by means of a screw, to each revolution of which corresponds a large number (e.g. one hundred) of lines. In this way it may happen that although there is almost perfect periodicity with each revolution of the screw after (say) 100 lines, yet the 100 lines themselves are not equally spaced. The "ghosts" thus arising were first described by G. H. Quincke (_Pogg. Ann._, 1872, 146, p. 1), and have been elaborately investigated by C. S. Peirce (_Ann. Journ. Math._, 1879, 2, p. 330), both theoretically and experimentally. The general nature of the effects to be expected in such a case may be made clear by means of an illustration already employed for another purpose. Suppose two similar and accurately ruled transparent gratings to be superposed in such a manner that the lines are parallel. If the one set of lines exactly bisect the intervals between the others, the grating interval is practically halved, and the previously existing spectra of odd order vanish. But a very slight relative displacement will cause the apparition of the odd spectra. In this case there is approximate periodicity in the half interval, but complete periodicity only after the whole interval. The advantage of approximate bisection lies in the superior brilliancy of the surviving spectra; but in any case the compound grating may be considered to be perfect in the longer interval, and the definition is as good as if the bisection were accurate.
The effect of a gradual increase in the interval (fig. 9) as we pass across the grating has been investigated by M. A. Cornu (_C.R._, 1875, 80, p. 655), who thus explains an anomaly observed by E. E. N. Mascart. The latter found that certain gratings exercised a converging power upon the spectra formed upon one side, and a corresponding diverging power upon the spectra on the other side. Let us suppose that the light is incident perpendicularly, and that the grating interval increases from the centre towards that edge which lies nearest to the spectrum under observation, and decreases towards the hinder edge. It is evident that the waves from _both_ halves of the grating are accelerated in an increasing degree, as we pass from the centre outwards, as compared with the phase they would possess were the central value of the grating interval maintained throughout. The irregularity of spacing has thus the effect of a convex lens, which accelerates the marginal relatively to the central rays. On the other side the effect is reversed. This kind of irregularity may clearly be present in a degree surpassing the usual limits, without loss of definition, when the telescope is focused so as to secure the best effect.
It may be worth while to examine further the other variations from correct ruling which correspond to the various terms expressing the deviation of the wave-surface from a perfect plane. If x and y be co-ordinates in the plane of the wave-surface, the axis of y being parallel to the lines of the grating, and the origin corresponding to the centre of the beam, we may take as an approximate equation to the wave-surface
x² y² z = ------ + Bxy + ------- + [alpha]x³ + [beta]x²y + [gamma]xy² + [delta]y³ + ... (8); 2[rho] 2[rho]'
and, as we have just seen, the term in x² corresponds to a linear error in the spacing. In like manner, the term in y² corresponds to a general _curvature_ of the lines (fig. 10), and does not influence the definition at the (primary) focus, although it may introduce astigmatism.[8] If we suppose that everything is symmetrical on the two sides of the primary plane y = 0, the coefficients B, [beta], [delta] vanish. In spite of any inequality between [rho] and [rho]', the definition will be good to this order of approximation, provided [alpha] and [gamma] vanish. The former measures the _thickness_ of the primary focal line, and the latter measures its _curvature_. The error of ruling giving rise to [alpha] is one in which the intervals increase or decrease in _both_ directions from the centre outwards (fig. 11), and it may often be compensated by a slight rotation in azimuth of the object-glass of the observing telescope. The term in [gamma] corresponds to a _variation_ of curvature in crossing the grating (fig. 12).
When the plane zx is not a plane of symmetry, we have to consider the terms in xy, x²y, and y³. The first of these corresponds to a deviation from parallelism, causing the interval to alter gradually as we pass _along_ the lines (fig. 13). The error thus arising may be compensated by a rotation of the object-glass about one of the diameters y = ± x. The term in x²y corresponds to a deviation from parallelism in the same direction on both sides of the central line (fig. 14); and that in y³ would be caused by a curvature such that there is a point of inflection at the middle of each line (fig. 15).
All the errors, except that depending on [alpha], and especially those depending on [gamma] and [delta], can be diminished, without loss of resolving power, by contracting the _vertical_ aperture. A linear error in the spacing, and a general curvature of the lines, are eliminated in the ordinary use of a grating.
The explanation of the difference of focus upon the two sides as due to unequal spacing was verified by Cornu upon gratings purposely constructed with an increasing interval. He has also shown how to rule a plane surface with lines so disposed that the grating shall of itself give well-focused spectra.
A similar idea appears to have guided H. A. Rowland to his brilliant invention of concave gratings, by which spectra can be photographed without any further optical appliance. In these instruments the lines are ruled upon a spherical surface of speculum metal, and mark the intersections of the surface by a system of parallel and equidistant planes, of which the middle member passes through the centre of the sphere. If we consider for the present only the primary plane of symmetry, the figure is reduced to two dimensions. Let AP (fig. 16) represent the surface of the grating, O being the centre of the circle. Then, if Q be any radiant point and Q' its image (primary focus) in the spherical mirror AP, we have
1 1 2cos[phi] -- + - = ---------, v1 u a
where v1 = AQ', u = AQ, a = OA, [phi] = angle of incidence QAO, equal to the angle of reflection Q'AO. If Q be on the circle described upon OA as diameter, so that u = a cos [phi], then Q' lies also upon the same circle; and in this case it follows from the symmetry that the unsymmetrical aberration (depending upon a) vanishes.
This disposition is adopted in Rowland's instrument; only, in addition to the central image formed at the angle [phi]' = [phi], there are a series of spectra with various values of [phi]', but all disposed upon the same circle. Rowland's investigation is contained in the paper already referred to; but the following account of the theory is in the form adopted by R. T. Glazebrook (_Phil. Mag._, 1883).
In order to find the difference of optical distances between the courses QAQ', QPQ', we have to express QP - QA, PQ' - AQ'. To find the former, we have, if OAQ = [phi], AOP = [omega],
QP² = u² + 4a²sin²½[omega] - 4au sin ½[omega] sin (½[omega] - [phi]) = (u + a sin[phi] sin[omega])² - a² sin²[phi] sin²[omega] + 4a sin² ½[omega](a - u cos[phi]).
Now as far as [omega]^4
4 sin² ½[omega] = sin²[omega] + ¼sin^4[omega],
and thus to the same order
QP² = (u + a sin [phi] sin [omega])² -a cos [phi](u - a cos [phi]) sin²[omega] + ¼ a(a - u cos[phi]) sin^4 [omega].
But if we now suppose that Q lies on the circle u = a cos [phi], the middle term vanishes, and we get, correct as far as [omega]^4,
/ / a² sin²[phi] sin^4[omega]\ QP = (u + a sin[phi] sin[omega]) / ( 1 + ------------------------- ); \/ \ 4u / so that
QP - u = a sin [phi] sin [omega] + 1/8 a sin[phi] tan[phi] sin^4 [omega] (9),
in which it is to be noticed that the adjustment necessary to secure the disappearance of sin²[omega] is sufficient also to destroy the term in sin³[omega].
A similar expression can be found for Q'P - Q'A; and thus, if Q'A = v, Q'AO = [phi]', where v = a cos [phi]', we get
QP + PQ' - QA -AQ' = a sin[omega] (sin[phi] - sin[phi]') + 1/8 a sin^4 [omega] (sin[phi] tan[phi] + sin[phi]' tan[phi]') (10).
If [phi]' = [phi], the term of the first order vanishes, and the reduction of the difference of path _via_ P and _via_ A to a term of the fourth order proves not only that Q and Q' are conjugate foci, but also that the foci are exempt from the most important term in the aberration. In the present application [phi]' is not necessarily equal to [phi]; but if P correspond to a line upon the grating, the difference of retardations for consecutive positions of P, so far as expressed by the term of the first order, will be equal to [-+] m[lambda] (m integral), and therefore without influence, provided
[sigma] (sin[phi] - sin[phi]') = ± m[lambda] (11),
where [sigma] denotes the constant interval between the planes containing the lines. This is the ordinary formula for a reflecting plane grating, and it shows that the spectra are formed in the usual directions. They are here focused (so far as the rays in the primary plane are concerned) upon the circle OQ'A, and the outstanding aberration is of the fourth order.
In order that a large part of the field of view may be in focus at once, it is desirable that the locus of the focused spectrum should be nearly perpendicular to the line of vision. For this purpose Rowland places the eye-piece at O, so that [phi] = 0, and then by (11) the value of [phi]' in the m^th spectrum is
[sigma] sin [phi]' = ± m[lambda] (12).
If [omega] now relate to the edge of the grating, on which there are altogether n lines,
n[sigma] = 2a sin [omega],
and the value of the last term in (10) becomes
1/16 n[sigma] sin³[omega] sin[phi]' tan[phi]',
or
1/16 mn[lambda] sin³[omega] tan [phi]' (13).
This expresses the retardation of the extreme relatively to the central ray, and is to be reckoned positive, whatever may be the signs of [omega], and [phi]'. If the semi-angular aperture ([omega]) be 1/100, and tan [phi]' = 1, mn might be as great as four millions before the error of phase would reach ¼[lambda]. If it were desired to use an angular aperture so large that the aberration according to (13) would be injurious, Rowland points out that on his machine there would be no difficulty in applying a remedy by making [sigma] slightly variable towards the edges. Or, retaining [sigma] constant, we might attain compensation by so polishing the surface as to bring the circumference slightly forward in comparison with the position it would occupy upon a true sphere.
It may be remarked that these calculations apply to the rays in the primary plane only. The image is greatly affected with astigmatism; but this is of little consequence, if [gamma] in (8) be small enough. Curvature of the primary focal line having a very injurious effect upon definition, it may be inferred from the excellent performance of these gratings that [gamma] is in fact small. Its value does not appear to have been calculated. The other coefficients in (8) vanish in virtue of the symmetry.
The mechanical arrangements for maintaining the focus are of great simplicity. The grating at A and the eye-piece at O are rigidly attached to a bar AO, whose ends rest on carriages, moving on rails OQ, AQ at right angles to each other. A tie between the middle point of the rod OA and Q can be used if thought desirable.
The absence of chromatic aberration gives a great advantage in the comparison of overlapping spectra, which Rowland has turned to excellent account in his determinations of the relative wave-lengths of lines in the solar spectrum (_Phil. Mag._, 1887).
For absolute determinations of wave-lengths plane gratings are used. It is found (Bell, _Phil. Mag._, 1887) that the angular measurements present less difficulty than the comparison of the grating interval with the standard metre. There is also some uncertainty as to the actual temperature of the grating when in use. In order to minimize the heating action of the light, it might be submitted to a preliminary prismatic analysis before it reaches the slit of the spectrometer, after the manner of Helmholtz.
In spite of the many improvements introduced by Rowland and of the care with which his observations were made, recent workers have come to the conclusion that errors of unexpected amount have crept into his measurements of wave-lengths, and there is even a disposition to discard the grating altogether for fundamental work in favour of the so-called "interference methods," as developed by A. A. Michelson, and by C. Fabry and J. B. Pérot. The grating would in any case retain its utility for the reference of new lines to standards otherwise fixed. For such standards a relative accuracy of at least one part in a million seems now to be attainable.
Since the time of Fraunhofer many skilled mechanicians have given their attention to the ruling of gratings. Those of Nobert were employed by A. J. Ångström in his celebrated researches upon wave-lengths. L. M. Rutherfurd introduced into common use the reflection grating, finding that speculum metal was less trying than glass to the diamond point, upon the permanence of which so much depends. In Rowland's dividing engine the screws were prepared by a special process devised by him, and the resulting gratings, plane and concave, have supplied the means for much of the best modern optical work. It would seem, however, that further improvements are not excluded.
There are various copying processes by which it is possible to reproduce an original ruling in more or less perfection. The earliest is that of Quincke, who coated a glass grating with a chemical silver deposit, subsequently thickened with copper in an electrolytic bath. The metallic plate thus produced formed, when stripped from its support, a reflection grating reproducing many of the characteristics of the original. It is best to commence the electrolytic thickening in a silver acetate bath. At the present time excellent reproductions of Rowland's speculum gratings are on the market (Thorp, Ives, Wallace), prepared, after a suggestion of Sir David Brewster, by coating the original with a varnish, e.g. of celluloid. Much skill is required to secure that the film when stripped shall remain undeformed.
A much easier method, applicable to glass originals, is that of photographic reproduction by contact printing. In several papers dating from 1872, Lord Rayleigh (see _Collected Papers_, i. 157, 160, 199, 504;