The Great Newline Schism

One way to overcome the line-ending problem when working on multiple operating systems is to use your revision control system.

For example, Subversion can be configured to automatically checkout files with the correct line endings based on the platform on which you are running. If you check-in a file on Windows using CRLF line endings, then checkout the same file under Unix, Subversion gives you LF endings instead.

See here for further information:
http://svnbook.red-bean.com/en/1.4/svn.advanced.props.file-portability.html

I wonder what POSIX has to say about this?

I did a quick ā€œnewline or linefeedā€ search into the word query, but nothing came up for me, execpt various names of header files.

Is this behavior defined anywhere as being a UNIX platform standard?

We are getting dragged by backward-compatibility issues, thatā€™s why changes are slow.

Hereā€™s how to get rid of those pesky CRLF newlinesā€¦

$ sed 's/.$//'

After which Iā€™d recommend something like ā€œformat /x /q c:ā€ and the subsequent installation of a REAL OS ā€¦ something which has said ā€œsedā€ installed, for example. :stuck_out_tongue:

Wow I just noticed that this is a non-issue with windows 7. Just use unix-newlines. The only application (of the few tested) I have found which does not understand unix-newlines as newlines is the useless notepad. For instance it seems the following applications understand unix-newlines just fine in windows 7:

  • cmd scripts
  • powershell scripts
  • word 2013 (I can open a txt file with unix-newlines, though I never
    use that, I can also paste text with unix-newlines and get
    correct/desired line breaking)
  • OneNote 2013 (pasting text)
  • wordpad (not that I use it)
  • Sublime Text 3 (naturally, just on the list because it the best! :slight_smile:
  • Eclipse

That cmd-scripts work with unix-newlines was the most suprising and crucial feature for me. There are bound to be gotchas that may be discovered over the years, but so far so good. I think Microsoft is trying to help hereā€¦

PS: I trust its worthwhile bumping the issue after 3+ years, since I cannot find this information with Google, and I trust mostly everybody watching this topic is still interested in getting rid of the newline-gotchas.

I understand that the CR/LF was used by the old type 33 teleprinters which transmitted data at 10 characters/second but took 2/10th of a second to do a CR so the LF was added as padding to Give the print head time to return. The CP/M OS used the CR/LF to use the 33 as a printer. DOS followed suit to be compatable with CP/M. Thus Windows 10 uses CR/LF to maintain compatability with the 1963 Model 33 teletype machine.

Iā€™ll have to look into the \ vs / thing

I believe the convention used for command line options on CP/M was to use the forward slash character to begin each switch (instead of single hyphen or double hyphen or keyword based options). Additionally, disks on CP/M didnā€™t (usually) support anything resembling a directory hierarchy. Thus, to maintain as much appearance of compatibility with CP/M as possible, a different character was needed as a path separator so that a directory would not be mistaken for a command line option, so Windows continues to use backslash to this day.

If it was truly modeled after what happens on a typewriter, it would have been [LF] [CR], as when you hit the lever, first the roller is turned up as many spaces as the spacing switch was set for (1, 2, or 3), and then the lever hits the stop and the exerted force serves to return the carriage to the right. That it is [CR][LF] is definitely a testament to the old teletypes (a.k.a. TTYs (sound familiar?)) which took a couple of times longer to perform a CR than an LF.

Everyone needs to just standardise on LF for files. CR is sometimes useful for outputting objects such as progress bars on a terminal emulator.

Also, seeing as the pathname convention for UNIX is no longer under any enforceable action (otherwise Linux, BSD, and every other UNIX-alike which does not bear the UNIX trademark would be issued a C&D and weā€™d have to resort to using something stupid like the colon for path separation (ā€¦is there a holding company for Data General who could come after us for that?)), Windows should just switch from backslash to slash for paths, and from slash to dash for options.

Itā€™s kind of funny that ASCII changed [LF] back to [NL] after having changed it to [LF] in the first place.

Oh, and DAMN UNICODE to whichever hell whence it came.

1 Like