submodule--helper: do not call utf8_fprintf() unnecessarily
The helper function utf8_fprintf(fp, ...) has exactly the same
effect to the output stream fp as fprintf(fp, ...) does, and the
only difference is that its return value counts in display columns
consumed (assuming that the payload is encoded in UTF-8), as opposed
to number of bytes.
There is no reason to call it unless the caller cares about its
return value.
GNU grep allows "\(A\|B\)" as alternation in BRE, but this is an
extension not understood by some other implementations of grep
(Michael Kebe reported an breakage on Solaris).
Rewrite the offending test to ERE and use egrep instead.
Noticed-by: Michael Kebe <michael.kebe@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Defining USE_ASCIIDOCTOR=1 when building Git uses asciidoctor over
asciidoc when generating DocBook and man page documentation. However,
the contrib/subtree module does not presently honour that flag.
This causes a build failure when asciidoc is not present on the build
system. Instead, adapt the main Documentation/Makefile logic to use
asciidoctor when requested.
Signed-off-by: A. Wilcox <AWilcox@Wilcox-Tech.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
apply: check git diffs for mutually exclusive header lines
A file can either be added, removed, copied, or renamed, but no two of
these actions can be done by the same patch. Some of these combinations
provoke error messages due to missing file names, and some are only
caught by an assertion. Check git patches already as they are parsed
and report conflicting lines on sight.
Found by Vegard Nossum using AFL.
Reported-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
An empty string as mode specification is accepted silently by git apply,
as Vegard Nossum found out using AFL. It's interpreted as zero. Reject
such bogus file modes, and only accept ones consisting exclusively of
octal digits.
Reported-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2c93286a (fix "git apply --index ..." not to deref NULL) added a check
for git patches missing a +++ line, preventing a segfault. Check for
missing --- lines as well, and add a test for each case.
Found by Vegard Nossum using AFL.
Original-patch-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git add -p" were updated in 2.12 timeframe to cope with custom
core.commentchar but the implementation was buggy and a
metacharacter like $ and * did not work.
The preceding bitmap entries have a 1-byte XOR-offset and 1-byte flags,
so their size is not a multiple of 4. Thus the name-hash cache is only
guaranteed to be 2-byte aligned and so we must use get_be32 rather than
indexing the array directly.
Signed-off-by: James Clarke <jrtc27@jrtc27.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
has_sha1_file_with_flags() implements many mechanisms in common with
sha1_object_info_extended(). Make has_sha1_file_with_flags() a
convenience function for sha1_object_info_extended() instead.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently, regardless of the contents of the "struct object_info" passed
to sha1_object_info_extended(), that function always accesses the
packfile whenever it returns information about a packed object, since it
needs to populate "u.packed".
Add the ability to pass NULL, and use NULL-ness of the argument to
activate an optimization in which sha1_object_info_extended() does not
needlessly access the packfile. A subsequent patch will make use of this
optimization.
A similar optimization is not made for the cached and loose cases as it
would not cause a significant performance improvement.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
sha1_file: teach sha1_object_info_extended more flags
Improve sha1_object_info_extended() by supporting additional
flags. This allows has_sha1_file_with_flags() to be modified to use
sha1_object_info_extended() in a subsequent patch.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
t1700: make sure split-index respects core.sharedrepository
Add a few tests to check that both the split-index file and the
shared-index file are created using the right permissions when
core.sharedrepository is set.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
read-cache: use shared perms when writing shared index
Since f6ecc62dbf (write_shared_index(): use tempfile module, 2015-08-10)
write_shared_index() has been using mks_tempfile() to create the
temporary file that will become the shared index.
But even before that, it looks like the functions used to create this
file didn't call adjust_shared_perm(), which means that the shared
index file has always been created with 600 permissions regardless
of the shared permission settings.
Because of that, on repositories created with `git init --shared=all`
and using the split index feature, one gets an error like:
Merge branch 'sg/revision-parser-skip-prefix' into maint
Code clean-up.
* sg/revision-parser-skip-prefix:
revision.c: use skip_prefix() in handle_revision_pseudo_opt()
revision.c: use skip_prefix() in handle_revision_opt()
revision.c: stricter parsing of '--early-output'
revision.c: stricter parsing of '--no-{min,max}-parents'
revision.h: turn rev_info.early_output back into an unsigned int
The result from "git diff" that compares two blobs, e.g. "git diff
$commit1:$path $commit2:$path", used to be shown with the full
object name as given on the command line, but it is more natural to
use the $path in the output and use it to look up .gitattributes.
* jk/diff-blob:
diff: use blob path for blob/file diffs
diff: use pending "path" if it is available
diff: use the word "path" instead of "name" for blobs
diff: pass whole pending entry in blobinfo
handle_revision_arg: record paths for pending objects
handle_revision_arg: record modes for "a..b" endpoints
t4063: add tests of direct blob diffs
get_sha1_with_context: dynamically allocate oc->path
get_sha1_with_context: always initialize oc->symlink_path
sha1_name: consistently refer to object_context as "oc"
handle_revision_arg: add handle_dotdot() helper
handle_revision_arg: hoist ".." check out of range parsing
handle_revision_arg: stop using "dotdot" as a generic pointer
handle_revision_arg: simplify commit reference lookups
handle_revision_arg: reset "dotdot" consistently
"git describe --contains" penalized light-weight tags so much that
they were almost never considered. Instead, give them about the
same chance to be considered as an annotated tag that is the same
age as the underlying commit would.
* jc/name-rev-lw-tag:
name-rev: favor describing with tags and use committer date to tiebreak
name-rev: refactor logic to see if a new candidate is a better name
A common pattern to free a piece of memory and assign NULL to the
pointer that used to point at it has been replaced with a new
FREE_AND_NULL() macro.
* ab/free-and-null:
*.[ch] refactoring: make use of the FREE_AND_NULL() macro
coccinelle: make use of the "expression" FREE_AND_NULL() rule
coccinelle: add a rule to make "expression" code use FREE_AND_NULL()
coccinelle: make use of the "type" FREE_AND_NULL() rule
coccinelle: add a rule to make "type" code use FREE_AND_NULL()
git-compat-util: add a FREE_AND_NULL() wrapper around free(ptr); ptr = NULL
Using "git add d/i/r" when d/i/r is the top of the working tree of
a separate repository would create a gitlink in the index, which
would appear as a not-quite-initialized submodule to others. We
learned to give warnings when this happens.
* jk/warn-add-gitlink:
t: move "git add submodule" into test blocks
add: warn when adding an embedded repository
Fix configuration codepath to pay proper attention to commondir
that is used in multi-worktree situation, and isolate config API
into its own header file.
* bw/config-h:
config: don't implicitly use gitdir or commondir
config: respect commondir
setup: teach discover_git_directory to respect the commondir
config: don't include config.h by default
config: remove git_config_iter
config: create config.h
* bw/ls-files-sans-the-index:
ls-files: factor out tag calculation
ls-files: factor out debug info into a function
ls-files: convert show_files to take an index
ls-files: convert show_ce_entry to take an index
ls-files: convert prune_cache to take an index
ls-files: convert ce_excluded to take an index
ls-files: convert show_ru_info to take an index
ls-files: convert show_other_files to take an index
ls-files: convert show_killed_files to take an index
ls-files: convert write_eolinfo to take an index
ls-files: convert overlay_tree_on_cache to take an index
tree: convert read_tree to take an index parameter
convert: convert renormalize_buffer to take an index
convert: convert convert_to_git to take an index
convert: convert convert_to_git_filter_fd to take an index
convert: convert crlf_to_git to take an index
convert: convert get_cached_convert_stats_ascii to take an index
The code to pick up and execute command alias definition from the
configuration used to switch to the top of the working tree and
then come back when the expanded alias was executed, which was
unnecessarilyl complex. Attempt to simplify the logic by using the
early-config mechanism that does not chdir around.
* js/alias-early-config:
alias: use the early config machinery to expand aliases
t7006: demonstrate a problem with aliases in subdirectories
t1308: relax the test verifying that empty alias values are disallowed
help: use early config when autocorrecting aliases
config: report correct line number upon error
discover_git_directory(): avoid setting invalid git_dir
The pretty-format specifiers like '%h', '%t', etc. had an
optimization that no longer works correctly. In preparation/hope
of getting it correctly implemented, first discard the optimization
that is broken.
* rs/pretty-add-again:
pretty: recalculate duplicate short hashes
A new test to show the interaction between the pattern [^a-z]
(which matches '/') and a slash in a path has been added. The
pattern should not match the slash with "pathmatch", but should
with "wildmatch".
* ab/wildmatch-glob-slash-test:
wildmatch test: cover a blind spot in "/" matching
doc: clarify syntax for %C(auto,...) in pretty formats
The manual correctly describes the syntax with `auto,` but the
trailing `,` is hard to spot in a terminal. The HTML format does not
have this problem. Adding an example helps both worlds.
Signed-off-by: Andreas Heiduk <asheiduk@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
strbuf.h comment: discuss strbuf_addftime() arguments in order
Change the comment documenting the strbuf_addftime() function to
discuss the parameters in the order in which they appear, which makes
this easier to read than discussing them out of order.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
sha1_file: guard against invalid loose subdirectory numbers
Loose object subdirectories have hexadecimal names based on the first
byte of the hash of contained objects, thus their numerical
representation can range from 0 (0x00) to 255 (0xff). Change the type
of the corresponding variable in for_each_file_in_obj_subdir() and
associated callback functions to unsigned int and add a range check.
Suggested-by: Jeff King <peff@peff.net> Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
sha1_file: let for_each_file_in_obj_subdir() handle subdir names
The function for_each_file_in_obj_subdir() takes a object subdirectory
number and expects the name of the same subdirectory to be included in
the path strbuf. Avoid this redundancy by letting the function append
the hexadecimal subdirectory name itself. This makes it a bit easier
and safer to use the function -- it becomes impossible to specify
different subdirectories in subdir_nr and path.
Suggested-by: Jeff King <peff@peff.net> Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove the unused wildopts placeholder struct from being passed to all
wildmatch() invocations, or rather remove all the boilerplate NULL
parameters.
This parameter was added back in commit 9b3497cab9 ("wildmatch: rename
constants and update prototype", 2013-01-01) as a placeholder for
future use. Over 4 years later nothing has made use of it, let's just
remove it. It can be added in the future if we find some reason to
start using such a parameter.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Introduce 'repo_submodule_init()' which performs initialization of a
'struct repository' as a submodule of another 'struct repository'.
The resulting submodule 'struct repository' can be in one of three states:
1. The submodule is initialized and has a worktree.
2. The submodule is initialized but does not have a worktree. This
would occur when the submodule's gitdir is present in the
superproject's 'gitdir/modules/' directory yet the submodule has not
been checked out in superproject's worktree.
3. The submodule remains uninitialized due to an error in the
initialization process or there is no matching submodule at the
provided path in the superproject.
Signed-off-by: Brandon Williams <bmwill@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
submodule: convert is_submodule_initialized to work on a repository
Convert 'is_submodule_initialized()' to take a repository object and
while we're at it, lets rename the function to 'is_submodule_active()'
and remove the NEEDSWORK comment.
Signed-off-by: Brandon Williams <bmwill@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
submodule-config: store the_submodule_cache in the_repository
Refactor how 'the_submodule_cache' is handled so that it can be stored
inside of a repository object. Also migrate 'the_submodule_cache' to be
stored in 'the_repository'.
Signed-off-by: Brandon Williams <bmwill@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Teach the config machinery to read config information from a repository
object. This involves storing a 'struct config_set' inside the
repository object and adding a number of functions (repo_config*) to be
able to query a repository's config.
The current config API enables lazy-loading of the config. This means
that when 'git_config_get_int()' is called, if the_config_set hasn't
been populated yet, then it will be populated and properly initialized by
reading the necessary config files (system wide .gitconfig, user's home
.gitconfig, and the repository's config). To maintain this paradigm,
the new API to read from a repository object's config will also perform
this lazy-initialization.
Since both APIs (git_config_get* and repo_config_get*) have the same
semantics we can migrate the default config to be stored within
'the_repository' and just have the 'git_config_get*' family of functions
redirect to the 'repo_config_get*' functions.
Signed-off-by: Brandon Williams <bmwill@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
path: add repo_worktree_path and strbuf_repo_worktree_path
Introduce 'repo_worktree_path' and 'strbuf_repo_worktree_path' which
take a repository struct and constructs a path relative to the
repository's worktree.
Signed-off-by: Brandon Williams <bmwill@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
path: worktree_git_path() should not use file relocation
git_path is a convenience function that usually produces a string
$GIT_DIR/<path>. Since v2.5.0-rc0~143^2~35 (git_path(): be aware of
file relocation in $GIT_DIR, 2014-11-30), as a side benefit callers
get support for path relocation variables like $GIT_OBJECT_DIRECTORY:
- git_path("index") is $GIT_INDEX_FILE when set
- git_path("info/grafts") is $GIT_GRAFTS_FILE when set
- git_path("objects/<foo>") is $GIT_OBJECT_DIRECTORY/<foo> when set
- git_path("hooks/<foo>") is <foo> under core.hookspath when set
- git_path("refs/<foo>") etc (see path.c::common_list) is relative
to $GIT_COMMON_DIR instead of $GIT_DIR
worktree_git_path, by comparison, is designed to resolve files in a
specific worktree's git dir. Unfortunately, it shares code with
git_path and performs the same relocation. The result is that paths
that are meant to be relative to the specified worktree's git dir end
up replaced by paths from environment variables within the current git
dir.
Luckily, no current callers pass such arguments. The relocation was
noticed when testing the result of merging two patches under review,
one of which introduces a caller:
* The first patch made git prune check the index file in each
worktree's git dir (using worktree_git_path(wt, "index")) for
objects not to prune. This would trigger the unwanted relocation
when GIT_INDEX_FILE is set, causing objects reachable from the
index to be pruned.
* The second patch simplified the relocation logic for index,
info/grafts, objects, and hooks to happen unconditionally instead of
based on whether environment or configuration variables are set.
This caused the relocation to trigger even when GIT_INDEX_FILE is
not set.
[jn: rewrote commit message; skipping all relocation instead of just
GIT_INDEX_FILE]
Signed-off-by: Brandon Williams <bmwill@google.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
path: always pass in commondir to update_common_dir
Instead of passing in 'NULL' and having 'update_common_dir()' query for
the commondir, have the callers of 'update_common_dir()' be responsible
for providing the commondir.
Signed-off-by: Brandon Williams <bmwill@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move all path related declarations from cache.h to a new path.h header
file. This makes cache.h smaller and makes it easier to add new path
related functions.
Signed-off-by: Brandon Williams <bmwill@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Introduce the repository object 'struct repository' which can be used to
hold all state pertaining to a git repository.
Some of the benefits of object-ifying a repository are:
1. Make the code base more readable and easier to reason about.
2. Allow for working on multiple repositories, specifically
submodules, within the same process. Currently the process for
working on a submodule involves setting up an argv_array of options
for a particular command and then launching a child process to
execute the command in the context of the submodule. This is
clunky and can require lots of little hacks in order to ensure
correctness. Ideally it would be nice to simply pass a repository
and an options struct to a command.
3. Eliminating reliance on global state will make it easier to
enable the use of threading to improve performance.
Signed-off-by: Brandon Williams <bmwill@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
'GIT_TOPLEVEL_PREFIX_ENVIRONMENT' was added in (b58a68c1c setup: allow
for prefix to be passed to git commands) to aid in fixing a bug where
'ls-files' and 'grep' were not able to properly recurse when called from
within a subdirectory. Add a 'NEEDSWORK' comment indicating that this
envvar should be removed once 'ls-files' and 'grep' can recurse
in-process.
Signed-off-by: Brandon Williams <bmwill@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
setup: don't perform lazy initialization of repository state
Under some circumstances (bogus GIT_DIR value or the discovered gitdir
is '.git') 'setup_git_directory()' won't initialize key repository
state. This leads to inconsistent state after running the setup code.
To account for this inconsistent state, lazy initialization is done once
a caller asks for the repository's gitdir or some other piece of
repository state. This is confusing and can be error prone.
Instead let's tighten the expected outcome of 'setup_git_directory()'
and ensure that it initializes repository state in all cases that would
have been handled by lazy initialization.
This also lets us drop the requirement to have 'have_git_dir()' check if
the environment variable GIT_DIR was set as that will be handled by the
end of the setup code.
Signed-off-by: Brandon Williams <bmwill@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Merge branches 'bw/ls-files-sans-the-index' and 'bw/config-h' into bw/repo-object
* bw/ls-files-sans-the-index:
ls-files: factor out tag calculation
ls-files: factor out debug info into a function
ls-files: convert show_files to take an index
ls-files: convert show_ce_entry to take an index
ls-files: convert prune_cache to take an index
ls-files: convert ce_excluded to take an index
ls-files: convert show_ru_info to take an index
ls-files: convert show_other_files to take an index
ls-files: convert show_killed_files to take an index
ls-files: convert write_eolinfo to take an index
ls-files: convert overlay_tree_on_cache to take an index
tree: convert read_tree to take an index parameter
convert: convert renormalize_buffer to take an index
convert: convert convert_to_git to take an index
convert: convert convert_to_git_filter_fd to take an index
convert: convert crlf_to_git to take an index
convert: convert get_cached_convert_stats_ascii to take an index
* bw/config-h:
config: don't implicitly use gitdir or commondir
config: respect commondir
setup: teach discover_git_directory to respect the commondir
config: don't include config.h by default
config: remove git_config_iter
config: create config.h
alias: use the early config machinery to expand aliases
t7006: demonstrate a problem with aliases in subdirectories
t1308: relax the test verifying that empty alias values are disallowed
help: use early config when autocorrecting aliases
config: report correct line number upon error
discover_git_directory(): avoid setting invalid git_dir
Newly added tests to t3420 in this series prepare expected
human-readable output from "git rebase -i" and then compare the
actual output with it. As the output from the command is designed
to go through i18n/l10n, we need to use test_i18ncmp to tell
GETTEXT_POISON build that it is OK the output does not match.
This patch aims to detangle (a) the usage of `git-submodule`
from (b) the concept of submodules and (c) how the actual
implementation looks like, such as where they are configured
and (d) what the best practices are.
To do so, move the conceptual parts of the 'git-submodule'
man page to a new man page gitsubmodules(7). This new page
is just like gitmodules(5), gitattributes(5), gitcredentials(7),
gitnamespaces(7), gittutorial(7), which introduce a concept
rather than explaining a specific command.
Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
As there is no portable way to pass timezone information to
strftime, some output format from "git log" and friends are
impossible to produce. Teach our own strbuf_addftime to replace %z
and %Z with caller-supplied values to help working around this.
* rs/strbuf-addftime-zZ:
date: use localtime() for "-local" time formats
t0006: check --date=format zone offsets
strbuf: let strbuf_addftime handle %z and %Z itself
* sg/revision-parser-skip-prefix:
revision.c: use skip_prefix() in handle_revision_pseudo_opt()
revision.c: use skip_prefix() in handle_revision_opt()
revision.c: stricter parsing of '--early-output'
revision.c: stricter parsing of '--no-{min,max}-parents'
revision.h: turn rev_info.early_output back into an unsigned int
The "add" section for 'git-submodule' is redundant in its
description and the short synopsis line. Fix it.
Remove the redundant mentioning of the 'repository' argument
being mandatory.
The text is hard to read because of back-references, so remove
those.
Replace the word "humanish" by "canonical" as that conveys better
what we do to guess the path.
While at it, quote all occurrences of '.gitmodules' as that is an
important file in the submodule context, also link to it on its
first mention.
Helped-by: Stefan Beller <sbeller@google.com> Signed-off-by: Kaartic Sivaraam <kaarticsivaraam91196@gmail.com> Reviewed-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
sha1_name: cache readdir(3) results in find_short_object_filename()
Read each loose object subdirectory at most once when looking for unique
abbreviated hashes. This speeds up commands like "git log --pretty=%h"
considerably, which previously caused one readdir(3) call for each
candidate, even for subdirectories that were visited before.
The new cache is kept until the program ends and never invalidated. The
same is already true for pack indexes. The inherent racy nature of
finding unique short hashes makes it still fit for this purpose -- a
conflicting new object may be added at any time. Tasks with higher
consistency requirements should not use it, though.
The cached object names are stored in an oid_array, which is quite
compact. The bitmap for remembering which subdir was already read is
stored as a char array, with one char per directory -- that's not quite
as compact, but really simple and incurs only an overhead equivalent to
11 hashes after all.
Suggested-by: Jeff King <peff@peff.net> Helped-by: Jeff King <peff@peff.net> Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
status: contextually notify user about an initial commit
The existing message, "Initial commit", makes sense for the commit template
notifying users that it's their initial commit, but is confusing when
merely checking the status of a fresh repository (or orphan branch)
without having any commits yet.
Change the output of "status" to say "No commits yet" when "git
status" is run on a fresh repo (or orphan branch), while retaining the
current "Initial commit" message displayed in the template that's
displayed in the editor when the initial commit is being authored.
Correspondingly change the output of "short status" to "No commits yet
on " when "git status -sb" is run on a fresh repo (or orphan branch).
A few alternatives considered were,
* Waiting for initial commit
* Your current branch does not have any commits
* Current branch waiting for initial commit
The most succint one among the alternatives was chosen.
[with help on tests from Ævar]
Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Kaartic Sivaraam <kaarticsivaraam91196@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
read_object() and sha1_object_info_extended() both implement mechanisms
such as object replacement, retrying the packed store after failing to
find the object in the packed store then the loose store, and being able
to mark a packed object as bad and then retrying the whole process.
Consolidating these mechanisms would be a great help to maintainability.
Therefore, consolidate them by extending sha1_object_info_extended() to
support the functionality needed, and then modifying read_object() to
use sha1_object_info_extended().
Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The LOOKUP_REPLACE_OBJECT flag controls whether the
lookup_replace_object() function is invoked by
sha1_object_info_extended(), read_sha1_file_extended(), and
lookup_replace_object_extended(), but it is not immediately clear which
functions accept that flag.
Therefore restrict this flag to only sha1_object_info_extended(),
renaming it appropriately to OBJECT_INFO_LOOKUP_REPLACE and adding some
documentation. Update read_sha1_file_extended() to have a boolean
parameter instead, and delete lookup_replace_object_extended().
parse_sha1_header() also passes this flag to
parse_sha1_header_extended() since commit 46f0344 ("sha1_file: support
reading from a loose object of unknown type", 2015-05-03), but that has
had no effect since that commit. Therefore this patch also removes this
flag from that invocation.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The LOOKUP_UNKNOWN_OBJECT flag was introduced in commit 46f0344
("sha1_file: support reading from a loose object of unknown type",
2015-05-03) in order to support a feature in cat-file subsequently
introduced in commit 39e4ae3 ("cat-file: teach cat-file a
'--allow-unknown-type' option", 2015-05-03). Despite its name and
location in cache.h, this flag is used neither in
read_sha1_file_extended() nor in any of the lookup functions, but used
only in sha1_object_info_extended().
Therefore rename this flag to OBJECT_INFO_ALLOW_UNKNOWN_TYPE, taking the
name of the cat-file flag that invokes this feature, and move it closer
to the declaration of sha1_object_info_extended(). Also add
documentation for this flag.
OBJECT_INFO_ALLOW_UNKNOWN_TYPE is defined to 2, not 1, to avoid
conflicting with LOOKUP_REPLACE_OBJECT. Avoidance of this conflict is
necessary because sha1_object_info_extended() supports both flags.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
die(): stop hiding errors due to overzealous recursion guard
Change the recursion limit for the default die routine from a *very*
low 1 to 1024. This ensures that infinite recursions are broken, but
doesn't lose the meaningful error messages under threaded execution
where threads concurrently start to die.
The intent of the existing code, as explained in commit cd163d4b4e ("usage.c: detect recursion in die routines and bail out
immediately", 2012-11-14), is to break infinite recursion in cases
where the die routine itself calls die(), and would thus infinitely
recurse.
However, doing that very aggressively by immediately printing out
"recursion detected in die handler" if we've already called die() once
means that threaded invocations of git can end up only printing out
the "recursion detected" error, while hiding the meaningful error.
An example of this is running a threaded grep which dies on execution
against pretty much any repo, git.git will do:
With the current version of git this will print some combination of
multiple PCRE failures that caused the abort and multiple "recursion
detected", some invocations will print out multiple "recursion
detected" errors with no PCRE error at all!
Before this change, running the above grep command 1000 times against
git.git[1] and taking the top 20 results will on my system yield the
following distribution of actual errors ("E") and recursion
errors ("R"):
322 E R
306 E
116 E R R
65 R R
54 R E
49 E E
44 R
15 E R R R
9 R R R
7 R E R
5 R R E
3 E R R R R
2 E E R
1 R R R R
1 R R R E
1 R E R R
The exact results are obviously random and system-dependent, but this
shows the race condition in this code. Some small part of the time
we're about to print out the actual error ("E") but another thread's
recursion error beats us to it, and sometimes we print out nothing but
the recursion error.
With this change we get, now with "W" to mean the new warning being
emitted indicating that we've called die() many times:
502 E
160 E W E
120 E E
53 E W
35 E W E E
34 W E E
29 W E E E
16 E E W
16 E E E
11 W E E E E
7 E E W E
4 W E
3 W W E E
2 E W E E E
1 W W E
1 W E W E
1 E W W E E E
1 E W W E E
1 E W W E
1 E W E E W
Which still sucks a bit, due to a still present race-condition in this
code we're sometimes going to print out several errors still, or
several warnings, or two duplicate errors without the warning.
But we will never have a case where we completely hide the actual
error as we do now.
Now, git-grep could make use of the pluggable error facility added in
commit c19a490e37 ("usage: allow pluggable die-recursion checks",
2013-04-16). There's other threaded code that calls set_die_routine()
or set_die_is_recursing_routine().
But this is about fixing the general die() behavior with threading
when we don't have such a custom routine yet. Right now the common
case is not an infinite recursion in the handler, but us losing error
messages by default because we're overly paranoid about our recursion
check.
So let's just set the recursion limit to a number higher than the
number of threads we're ever likely to spawn. Now we won't lose
errors, and if we have a recursing die handler we'll still die within
microseconds.
There are race conditions in this code itself, in particular the
"dying" variable is not thread mutexed, so we e.g. won't be dying at
exactly 1024, or for that matter even be able to accurately test
"dying == 2", see the cases where we print out more than one "W"
above.
But that doesn't really matter, for the recursion guard we just need
to die "soon", not at exactly 1024 calls, and for printing the correct
error and only one warning most of the time in the face of threaded
death this is good enough and a net improvement on the current code.
1. for i in {1..1000}; do git grep -P --threads=8 '(*LIMIT_MATCH=1)-?-?-?---$' 2>&1|perl -pe 's/^fatal: r.*/R/; s/^fatal: p.*/E/; s/^warning.*/W/' | tr '\n' ' '; echo; done | sort | uniq -c | sort -nr | head -n 20
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>